pMHC movement based on peptide anchoring

Introduction

In this notebook, apo-holo comparisons between different anchoring patterns of MHC alleles plotted next to each other. The anchoring patterns were determined by finding the peptide motifs of each MHC allele from the MHCMotifAtlas and using these strong motifs as anchors for the peptide.

[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from python_pdb.formats.residue import THREE_TO_ONE_CODE
[ ]:
mhc_anchor_position_df = pd.read_csv('../data/external/mhc_motif_atlas.csv')
mhc_anchor_position_df = mhc_anchor_position_df.query("grade in ('high', 'dominant')")
mhc_anchor_position_df = mhc_anchor_position_df.sort_values('allele_slug')
mhc_anchor_position_df = mhc_anchor_position_df[['allele_slug', 'position', 'amino_acid', 'peptide_length']]
mhc_anchor_position_df = mhc_anchor_position_df.reset_index(drop=True)

mhc_anchor_position_df
allele_slug position amino_acid peptide_length
0 hla_a_01_01 2 T 9
1 hla_a_01_01 3 D 9
2 hla_a_01_01 9 Y 9
3 hla_a_02_01 2 L 9
4 hla_a_02_01 9 L 9
... ... ... ... ...
319 hla_g_01_03 1 K 9
320 hla_g_01_04 1 R 9
321 hla_g_01_04 3 P 9
322 hla_g_01_04 1 K 9
323 hla_g_01_04 9 L 9

324 rows × 4 columns

Load apo-holo comparisons

[3]:
apo_holo_comparison = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I-comparisons/pmhc_per_res_apo_holo.csv')

peptide_apo_holo_comparison = apo_holo_comparison.query("chain_type == 'antigen_chain'").copy()
peptide_apo_holo_comparison
[3]:
complex_id structure_x_name structure_y_name chain_type residue_name residue_seq_id residue_insert_code rmsd ca_distance chi_angle_change com_distance
181 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain MET 1 NaN 0.700531 0.237162 1.570636 0.364628
182 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain VAL 2 NaN 0.114569 0.089359 0.009407 0.043732
183 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain TRP 3 NaN 0.400840 0.233532 -0.019115 0.363989
184 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain GLY 4 NaN 0.495618 0.262271 NaN 0.448252
185 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain PRO 5 NaN 0.734430 0.486409 -0.842595 0.536198
... ... ... ... ... ... ... ... ... ... ... ...
209372 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain ARG 5 NaN 1.064663 0.157924 0.364432 0.705033
209373 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain THR 6 NaN 0.421897 0.345439 0.097701 0.344074
209374 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain PHE 7 NaN 1.225982 0.317819 -0.347835 0.883236
209375 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 8 NaN 1.323615 0.310356 0.612672 0.270049
209376 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 9 NaN 1.599917 0.192969 0.885323 0.175807

9857 rows × 11 columns

[4]:
peptide_apo_holo_comparison['peptide_length'] = \
    peptide_apo_holo_comparison.groupby(['complex_id',
                                         'structure_x_name',
                                         'structure_y_name']).transform('size')
[5]:
peptide_apo_holo_comparison['amino_acid'] = peptide_apo_holo_comparison['residue_name'].map(THREE_TO_ONE_CODE)

Load summary data

[6]:
summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')
summary_df['group_name'] = summary_df['file_name'].str.replace('.pdb', '')

summary_df
/var/scratch/bmcmaste/2178229/ipykernel_2204500/1956598565.py:2: FutureWarning: The default value of regex will change from True to False in a future version.
  summary_df['group_name'] = summary_df['file_name'].str.replace('.pdb', '')
[6]:
file_name pdb_id structure_type state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug group_name
0 1ao7_D-E-C-A-B_tcr_pmhc.pdb 1ao7 tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... LLFGYPVYV hla_a_02_01 1ao7_D-E-C-A-B_tcr_pmhc
1 1b0g_C-A-B_pmhc.pdb 1b0g pmhc apo NaN NaN C A B NaN ALWGFFPVL hla_a_02_01 1b0g_C-A-B_pmhc
2 1b0g_F-D-E_pmhc.pdb 1b0g pmhc apo NaN NaN F D E NaN ALWGFFPVL hla_a_02_01 1b0g_F-D-E_pmhc
3 1bd2_D-E-C-A-B_tcr_pmhc.pdb 1bd2 tcr_pmhc holo D E C A B NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... LLFGYPVYV hla_a_02_01 1bd2_D-E-C-A-B_tcr_pmhc
4 1bii_P-A-B_pmhc.pdb 1bii pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd 1bii_P-A-B_pmhc
... ... ... ... ... ... ... ... ... ... ... ... ... ...
386 7rtd_C-A-B_pmhc.pdb 7rtd pmhc apo NaN NaN C A B NaN YLQPRTFLL hla_a_02_01 7rtd_C-A-B_pmhc
387 7rtr_D-E-C-A-B_tcr_pmhc.pdb 7rtr tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
388 8gvb_A-B-P-H-L_tcr_pmhc.pdb 8gvb tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RYPLTFGW hla_a_24_02 8gvb_A-B-P-H-L_tcr_pmhc
389 8gvg_A-B-P-H-L_tcr_pmhc.pdb 8gvg tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RFPLTFGW hla_a_24_02 8gvg_A-B-P-H-L_tcr_pmhc
390 8gvi_A-B-P-H-L_tcr_pmhc.pdb 8gvi tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... RYPLTFGW hla_a_24_02 8gvi_A-B-P-H-L_tcr_pmhc

391 rows × 13 columns

Annotate apo-holo data with allele information

[7]:
peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(summary_df, how='left', left_on='complex_id', right_on='group_name')
peptide_apo_holo_comparison
[7]:
complex_id structure_x_name structure_y_name chain_type residue_name residue_seq_id residue_insert_code rmsd ca_distance chi_angle_change ... state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug group_name
0 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain MET 1 NaN 0.700531 0.237162 1.570636 ... holo D E C A B NSAFQY-TYSSGN-AMRGDSSYKLI-SGHDY-FNNNVP-ASSLWEK... MVWGPDPLYV hla_a_02_01 5c0a_D-E-C-A-B_tcr_pmhc
1 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain VAL 2 NaN 0.114569 0.089359 0.009407 ... holo D E C A B NSAFQY-TYSSGN-AMRGDSSYKLI-SGHDY-FNNNVP-ASSLWEK... MVWGPDPLYV hla_a_02_01 5c0a_D-E-C-A-B_tcr_pmhc
2 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain TRP 3 NaN 0.400840 0.233532 -0.019115 ... holo D E C A B NSAFQY-TYSSGN-AMRGDSSYKLI-SGHDY-FNNNVP-ASSLWEK... MVWGPDPLYV hla_a_02_01 5c0a_D-E-C-A-B_tcr_pmhc
3 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain GLY 4 NaN 0.495618 0.262271 NaN ... holo D E C A B NSAFQY-TYSSGN-AMRGDSSYKLI-SGHDY-FNNNVP-ASSLWEK... MVWGPDPLYV hla_a_02_01 5c0a_D-E-C-A-B_tcr_pmhc
4 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb antigen_chain PRO 5 NaN 0.734430 0.486409 -0.842595 ... holo D E C A B NSAFQY-TYSSGN-AMRGDSSYKLI-SGHDY-FNNNVP-ASSLWEK... MVWGPDPLYV hla_a_02_01 5c0a_D-E-C-A-B_tcr_pmhc
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
9852 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain ARG 5 NaN 1.064663 0.157924 0.364432 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
9853 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain THR 6 NaN 0.421897 0.345439 0.097701 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
9854 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain PHE 7 NaN 1.225982 0.317819 -0.347835 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
9855 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 8 NaN 1.323615 0.310356 0.612672 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
9856 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 9 NaN 1.599917 0.192969 0.885323 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc

9857 rows × 26 columns

Combine anchoring information with apo-holo comparisons

[8]:
mhc_anchor_position_df['anchor'] = True
[9]:
peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(
    mhc_anchor_position_df,
    how='left',
    left_on=['mhc_slug', 'residue_seq_id', 'amino_acid', 'peptide_length'],
    right_on=['allele_slug', 'position', 'amino_acid', 'peptide_length'],
)
[10]:
def collate_anchors(group: pd.DataFrame) -> list[int]:
    return sorted(group[group['anchor'] == True]['residue_seq_id'].unique().tolist())

anchoring_strategies = peptide_apo_holo_comparison.groupby(['structure_x_name', 'structure_y_name']).apply(collate_anchors)
anchoring_strategies.name = 'anchoring_strategy'
anchoring_strategies = anchoring_strategies.reset_index()

peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(anchoring_strategies, how='left', on=['structure_x_name', 'structure_y_name'])
[11]:
def markup_anchor(anchors: list[int]) -> str:
    anchors = [f'p{anchor}' for anchor in anchors]
    return '-'.join(anchors)

peptide_apo_holo_comparison['anchoring_strategy_str'] = peptide_apo_holo_comparison['anchoring_strategy'].map(markup_anchor)
[12]:
def find_dominant_anchor(group: pd.DataFrame) -> str:
    anchor_types = group['anchoring_strategy_str'].unique()
    lengths = np.array([len(anchor) for anchor in anchor_types])
    index = np.argmax(lengths)

    return anchor_types[index]

dominant_anchors = peptide_apo_holo_comparison.groupby('mhc_slug').apply(find_dominant_anchor)
dominant_anchors.name = 'dominant_anchor'
dominant_anchors = dominant_anchors.reset_index()

peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(dominant_anchors, how='left', on='mhc_slug')

The below correction is applied as it is assumed that those peptide marked as solely ‘p2’ or ‘p9’ anchors are most likely still anchored by something at either end that is not necessarily the high or dominant motif.

[13]:
peptide_apo_holo_comparison['dominant_anchor'] = peptide_apo_holo_comparison['dominant_anchor'].map(
    lambda strategy: 'p2-p9' if strategy in ('p2', 'p9') else strategy
)
[14]:
peptide_apo_holo_comparison_with_anchor = peptide_apo_holo_comparison.query("anchoring_strategy_str != ''")

Visualising the results

[15]:
g = sns.catplot(peptide_apo_holo_comparison_with_anchor,
                col='dominant_anchor',
                x='residue_seq_id', y='rmsd',
                sharex=False,
                color='salmon',
                kind='bar')

def annotate(data, **kws):
    groups = data.groupby(['mhc_slug', 'peptide_sequence'])
    ax = plt.gca()
    ax.text(0.45, 0.9, f'n={len(groups)}', fontsize=16, transform=ax.transAxes)

g.map_dataframe(annotate)
[15]:
<seaborn.axisgrid.FacetGrid at 0x7f6443502380>
../_images/source_pMHC_movement_based_on_peptide_anchoring_22_1.png
[16]:
max_rmsd = peptide_apo_holo_comparison_with_anchor['rmsd'].max()

for mhc_slug, group in peptide_apo_holo_comparison_with_anchor.groupby('mhc_slug'):
    print(mhc_slug)
    print(group['dominant_anchor'].unique()[0])
    plot = sns.barplot(group, x='residue_seq_id', y='rmsd', color='salmon')
    plot.set_ylim((0, np.ceil(max_rmsd)))
    plt.show()
hla_a_02_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_1.png
hla_a_24_02
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_3.png
hla_b_07_02
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_5.png
hla_b_08_01
p2-p5-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_7.png
hla_b_35_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_9.png
hla_b_37_01
p2-p5-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_11.png
hla_b_42_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_13.png
hla_b_44_05
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_15.png
hla_b_53_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_17.png
hla_b_81_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_19.png
hla_e_01_03
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_21.png
[17]:
sns.barplot(peptide_apo_holo_comparison_with_anchor,
            hue='dominant_anchor',
            x='residue_seq_id', y='rmsd')
[17]:
<AxesSubplot: xlabel='residue_seq_id', ylabel='rmsd'>
../_images/source_pMHC_movement_based_on_peptide_anchoring_24_1.png
[ ]:
representative_alleles = (peptide_apo_holo_comparison_with_anchor.groupby('dominant_anchor')[['dominant_anchor', 'mhc_slug']]
                                                                 .sample(1, random_state=1)
                                                                 .reset_index(drop=True))
representative_alleles
dominant_anchor mhc_slug
0 p2-p5-p9 hla_b_08_01
1 p2-p9 hla_e_01_03
[19]:
sns.barplot(peptide_apo_holo_comparison_with_anchor.query("mhc_slug.isin(@representative_alleles['mhc_slug'])"),
            hue='mhc_slug',
            x='residue_seq_id', y='rmsd')
[19]:
<AxesSubplot: xlabel='residue_seq_id', ylabel='rmsd'>
../_images/source_pMHC_movement_based_on_peptide_anchoring_26_1.png

Conclusion

From the visualisations it is clear that the profile of peptide conformational changes depends on the mhc allele and how the peptides are anchored by the allele.