pMHC movement based on peptide anchoring

Introduction

In this notebook, apo-holo comparisons between different anchoring patterns of MHC alleles plotted next to each other. The anchoring patterns were determined by finding the peptide motifs of each MHC allele from the MHCMotifAtlas and using these strong motifs as anchors for the peptide.

[1]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
from python_pdb.formats.residue import THREE_TO_ONE_CODE
[2]:
mhc_anchor_position_df = pd.read_csv('../data/external/mhc_motif_atlas.csv')
mhc_anchor_position_df = mhc_anchor_position_df.query("grade in ('high', 'dominant')")
mhc_anchor_position_df = mhc_anchor_position_df.sort_values('allele_slug')
mhc_anchor_position_df = mhc_anchor_position_df[['allele_slug', 'position', 'amino_acid', 'peptide_length']]
mhc_anchor_position_df = mhc_anchor_position_df.reset_index(drop=True)

mhc_anchor_position_df
[2]:
allele_slug position amino_acid peptide_length
0 hla_a_01_01 2 T 9
1 hla_a_01_01 3 D 9
2 hla_a_01_01 9 Y 9
3 hla_a_02_01 2 L 9
4 hla_a_02_01 9 L 9
... ... ... ... ...
319 hla_g_01_03 1 K 9
320 hla_g_01_04 1 R 9
321 hla_g_01_04 3 P 9
322 hla_g_01_04 1 K 9
323 hla_g_01_04 9 L 9

324 rows × 4 columns

Load apo-holo comparisons

[3]:
apo_holo_comparison = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I-comparisons/pmhc_per_res_apo_holo.csv')

peptide_apo_holo_comparison = apo_holo_comparison.query("chain_type == 'antigen_chain'").copy()
peptide_apo_holo_comparison
[3]:
complex_id structure_x_name structure_y_name chain_type residue_name residue_seq_id residue_insert_code rmsd ca_distance chi_angle_change com_distance
181 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain VAL 1 NaN 0.354933 0.330887 0.200330 0.316020
182 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain VAL 2 NaN 0.262440 0.073709 0.030793 0.085332
183 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain VAL 3 NaN 0.256739 0.248310 -0.007898 0.215858
184 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain GLY 4 NaN 0.568505 0.622876 NaN 0.318856
185 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain GLY 7 NaN 0.346240 0.192317 NaN 0.324260
... ... ... ... ... ... ... ... ... ... ... ...
126149 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain ARG 5 NaN 1.064663 0.157924 0.364432 0.705033
126150 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain THR 6 NaN 0.421897 0.345439 0.097701 0.344074
126151 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain PHE 7 NaN 1.225982 0.317819 -0.347835 0.883236
126152 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 8 NaN 1.323615 0.310356 0.612672 0.270049
126153 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 9 NaN 1.599917 0.192969 0.885323 0.175807

5838 rows × 11 columns

[4]:
peptide_apo_holo_comparison['peptide_length'] = \
    peptide_apo_holo_comparison.groupby(['complex_id',
                                         'structure_x_name',
                                         'structure_y_name']).transform('size')
[5]:
peptide_apo_holo_comparison['amino_acid'] = peptide_apo_holo_comparison['residue_name'].map(THREE_TO_ONE_CODE)

Load summary data

[6]:
summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')
summary_df['group_name'] = summary_df['file_name'].str.replace('.pdb', '')

summary_df
/var/scratch/bmcmaste/1915160/ipykernel_3863244/1956598565.py:2: FutureWarning: The default value of regex will change from True to False in a future version.
  summary_df['group_name'] = summary_df['file_name'].str.replace('.pdb', '')
[6]:
file_name pdb_id structure_type state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug group_name
0 1ao7_D-E-C-A-B_tcr_pmhc.pdb 1ao7 tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... LLFGYPVYV hla_a_02_01 1ao7_D-E-C-A-B_tcr_pmhc
1 1bd2_D-E-C-A-B_tcr_pmhc.pdb 1bd2 tcr_pmhc holo D E C A B NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... LLFGYPVYV hla_a_02_01 1bd2_D-E-C-A-B_tcr_pmhc
2 1bii_A-B-P_pmhc.pdb 1bii pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd 1bii_A-B-P_pmhc
3 1ddh_A-B-P_pmhc.pdb 1ddh pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd 1ddh_A-B-P_pmhc
4 1duz_A-B-C_pmhc.pdb 1duz pmhc apo NaN NaN C A B NaN LLFGYPVYV hla_a_02_01 1duz_A-B-C_pmhc
... ... ... ... ... ... ... ... ... ... ... ... ... ...
353 8gon_D-E-C-A-B_tcr_pmhc.pdb 8gon tcr_pmhc holo D E C A B TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... NaN NaN 8gon_D-E-C-A-B_tcr_pmhc
354 8gop_A-B_tcr.pdb 8gop tcr apo A B NaN NaN NaN TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... NaN NaN 8gop_A-B_tcr
355 8gvb_A-B-P-H-L_tcr_pmhc.pdb 8gvb tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RYPLTFGW hla_a_24_02 8gvb_A-B-P-H-L_tcr_pmhc
356 8gvg_A-B-P-H-L_tcr_pmhc.pdb 8gvg tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RFPLTFGW hla_a_24_02 8gvg_A-B-P-H-L_tcr_pmhc
357 8gvi_A-B-P-H-L_tcr_pmhc.pdb 8gvi tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... RYPLTFGW hla_a_24_02 8gvi_A-B-P-H-L_tcr_pmhc

358 rows × 13 columns

Annotate apo-holo data with allele information

[7]:
peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(summary_df, how='left', left_on='complex_id', right_on='group_name')
peptide_apo_holo_comparison
[7]:
complex_id structure_x_name structure_y_name chain_type residue_name residue_seq_id residue_insert_code rmsd ca_distance chi_angle_change ... state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug group_name
0 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain VAL 1 NaN 0.354933 0.330887 0.200330 ... holo D E C A B TRDTAYY-QPWWGEQN-AMSVPSGDGSYQFT-MNHEY-SVGEGT-A... VVVGADGVGK hla_a_11_01 7ow6_D-E-C-A-B_tcr_pmhc
1 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain VAL 2 NaN 0.262440 0.073709 0.030793 ... holo D E C A B TRDTAYY-QPWWGEQN-AMSVPSGDGSYQFT-MNHEY-SVGEGT-A... VVVGADGVGK hla_a_11_01 7ow6_D-E-C-A-B_tcr_pmhc
2 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain VAL 3 NaN 0.256739 0.248310 -0.007898 ... holo D E C A B TRDTAYY-QPWWGEQN-AMSVPSGDGSYQFT-MNHEY-SVGEGT-A... VVVGADGVGK hla_a_11_01 7ow6_D-E-C-A-B_tcr_pmhc
3 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain GLY 4 NaN 0.568505 0.622876 NaN ... holo D E C A B TRDTAYY-QPWWGEQN-AMSVPSGDGSYQFT-MNHEY-SVGEGT-A... VVVGADGVGK hla_a_11_01 7ow6_D-E-C-A-B_tcr_pmhc
4 7ow6_D-E-C-A-B_tcr_pmhc 7ow4_A-B-C_pmhc.pdb 7ow4_D-E-F_pmhc.pdb antigen_chain GLY 7 NaN 0.346240 0.192317 NaN ... holo D E C A B TRDTAYY-QPWWGEQN-AMSVPSGDGSYQFT-MNHEY-SVGEGT-A... VVVGADGVGK hla_a_11_01 7ow6_D-E-C-A-B_tcr_pmhc
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
5833 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain ARG 5 NaN 1.064663 0.157924 0.364432 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
5834 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain THR 6 NaN 0.421897 0.345439 0.097701 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
5835 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain PHE 7 NaN 1.225982 0.317819 -0.347835 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
5836 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 8 NaN 1.323615 0.310356 0.612672 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc
5837 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_A-B-C_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb antigen_chain LEU 9 NaN 1.599917 0.192969 0.885323 ... holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc

5838 rows × 26 columns

Combine anchoring information with apo-holo comparisons

[8]:
mhc_anchor_position_df['anchor'] = True
[9]:
peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(
    mhc_anchor_position_df,
    how='left',
    left_on=['mhc_slug', 'residue_seq_id', 'amino_acid', 'peptide_length'],
    right_on=['allele_slug', 'position', 'amino_acid', 'peptide_length'],
)
[10]:
def collate_anchors(group: pd.DataFrame) -> list[int]:
    return sorted(group[group['anchor'] == True]['residue_seq_id'].unique().tolist())

anchoring_strategies = peptide_apo_holo_comparison.groupby(['structure_x_name', 'structure_y_name']).apply(collate_anchors)
anchoring_strategies.name = 'anchoring_strategy'
anchoring_strategies = anchoring_strategies.reset_index()

peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(anchoring_strategies, how='left', on=['structure_x_name', 'structure_y_name'])
[11]:
def markup_anchor(anchors: list[int]) -> str:
    anchors = [f'p{anchor}' for anchor in anchors]
    return '-'.join(anchors)

peptide_apo_holo_comparison['anchoring_strategy_str'] = peptide_apo_holo_comparison['anchoring_strategy'].map(markup_anchor)
[12]:
def find_dominant_anchor(group: pd.DataFrame) -> str:
    anchor_types = group['anchoring_strategy_str'].unique()
    lengths = np.array([len(anchor) for anchor in anchor_types])
    index = np.argmax(lengths)

    return anchor_types[index]

dominant_anchors = peptide_apo_holo_comparison.groupby('mhc_slug').apply(find_dominant_anchor)
dominant_anchors.name = 'dominant_anchor'
dominant_anchors = dominant_anchors.reset_index()

peptide_apo_holo_comparison = peptide_apo_holo_comparison.merge(dominant_anchors, how='left', on='mhc_slug')

The below correction is applied as it is assumed that those peptide marked as solely ‘p2’ or ‘p9’ anchors are most likely still anchored by something at either end that is not necessarily the high or dominant motif.

[13]:
peptide_apo_holo_comparison['dominant_anchor'] = peptide_apo_holo_comparison['dominant_anchor'].map(
    lambda strategy: 'p2-p9' if strategy in ('p2', 'p9') else strategy
)
[14]:
peptide_apo_holo_comparison_with_anchor = peptide_apo_holo_comparison.query("anchoring_strategy_str != ''")

Visualising the results

[15]:
g = sns.catplot(peptide_apo_holo_comparison_with_anchor,
                col='dominant_anchor',
                x='residue_seq_id', y='rmsd',
                sharex=False,
                color='salmon',
                kind='bar')

def annotate(data, **kws):
    groups = data.groupby(['mhc_slug', 'peptide_sequence'])
    ax = plt.gca()
    ax.text(0.45, 0.9, f'n={len(groups)}', fontsize=16, transform=ax.transAxes)

g.map_dataframe(annotate)
../_images/source_pMHC_movement_based_on_peptide_anchoring_22_0.png
[16]:
max_rmsd = peptide_apo_holo_comparison_with_anchor['rmsd'].max()

for mhc_slug, group in peptide_apo_holo_comparison_with_anchor.groupby('mhc_slug'):
    print(mhc_slug)
    print(group['dominant_anchor'].unique()[0])
    plot = sns.barplot(group, x='residue_seq_id', y='rmsd', color='salmon')
    plot.set_ylim((0, np.ceil(max_rmsd)))
    plt.show()
hla_a_02_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_1.png
hla_a_02_06
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_3.png
hla_a_03_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_5.png
hla_a_24_02
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_7.png
hla_b_08_01
p2-p5-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_9.png
hla_b_35_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_11.png
hla_b_37_01
p2-p5-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_13.png
hla_b_42_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_15.png
hla_b_44_05
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_17.png
hla_b_53_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_19.png
hla_b_81_01
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_21.png
hla_c_08_02
p2-p3
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_23.png
hla_e_01_03
p2-p9
../_images/source_pMHC_movement_based_on_peptide_anchoring_23_25.png
[17]:
sns.barplot(peptide_apo_holo_comparison_with_anchor,
            hue='dominant_anchor',
            x='residue_seq_id', y='rmsd')
[17]:
<AxesSubplot: xlabel='residue_seq_id', ylabel='rmsd'>
../_images/source_pMHC_movement_based_on_peptide_anchoring_24_1.png
[18]:
representative_alleles = (peptide_apo_holo_comparison_with_anchor.groupby('dominant_anchor')[['dominant_anchor', 'mhc_slug']]
                                                                 .sample(1, random_state=1)
                                                                 .reset_index(drop=True))
representative_alleles
[18]:
dominant_anchor mhc_slug
0 p2-p3 hla_c_08_02
1 p2-p5-p9 hla_b_08_01
2 p2-p9 hla_a_02_01
[19]:
sns.barplot(peptide_apo_holo_comparison_with_anchor.query("mhc_slug.isin(@representative_alleles['mhc_slug'])"),
            hue='mhc_slug',
            x='residue_seq_id', y='rmsd')
[19]:
<AxesSubplot: xlabel='residue_seq_id', ylabel='rmsd'>
../_images/source_pMHC_movement_based_on_peptide_anchoring_26_1.png

Conclusion

From the visualisations it is clear that the profile of peptide conformational changes depends on the mhc allele and how the peptides are anchored by the allele.