Per residue changes of TCR CDRs between apo and holo structures

Introduction

In this notebook, we look if there are certain residue positions that move more than others in the TCR CDR loops. We use these visualisations to further characterise the movement of CDR loops as either rigid-body, where the entire loop moves as one unit, or plastic, where the loop can mold to its target.

[1]:
import os

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns
[2]:
from matplotlib_inline.backend_inline import set_matplotlib_formats
set_matplotlib_formats('svg')
[3]:
DATA_DIR = '../data/processed/apo-holo-tcr-pmhc-class-I-comparisons'

Loading Data

[4]:
results = pd.read_csv(os.path.join(DATA_DIR, 'tcr_per_res_apo_holo_loop_align.csv'))
results
[4]:
complex_id structure_x_name structure_y_name chain_type cdr residue_name residue_seq_id residue_insert_code rmsd ca_distance chi_angle_change com_distance
0 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 ASP 27 NaN 4.922807 2.215234 -1.001709 3.836500
1 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 ARG 28 NaN 7.683418 2.322292 -1.010462 6.119157
2 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 GLY 29 NaN 0.657793 0.718576 NaN 0.452200
3 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 SER 36 NaN 1.224430 0.404912 -2.505061 0.866544
4 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 GLN 37 NaN 1.133408 0.467132 0.667185 0.798590
... ... ... ... ... ... ... ... ... ... ... ... ...
12051 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 ASP 109 NaN 0.507077 0.180564 0.198956 0.208659
12052 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 ILE 114 NaN 2.164965 0.158682 3.679442 0.829175
12053 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 GLU 115 NaN 1.577728 0.195887 3.091039 0.995351
12054 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 GLN 116 NaN 0.204783 0.197683 -0.014126 0.166472
12055 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 TYR 117 NaN 0.180118 0.164981 -0.024244 0.170852

12056 rows × 12 columns

[5]:
summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')

ids = summary_df['file_name'].str.replace('.pdb', '')
ids.name = 'id'
summary_df = summary_df.set_index(ids)

summary_df
/var/scratch/bmcmaste/1760251/ipykernel_1034960/322943147.py:3: FutureWarning: The default value of regex will change from True to False in a future version.
  ids = summary_df['file_name'].str.replace('.pdb', '')
[5]:
file_name pdb_id structure_type state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug
id
1ao7_D-E-C-A-B_tcr_pmhc 1ao7_D-E-C-A-B_tcr_pmhc.pdb 1ao7 tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... LLFGYPVYV hla_a_02_01
1bd2_D-E-C-A-B_tcr_pmhc 1bd2_D-E-C-A-B_tcr_pmhc.pdb 1bd2 tcr_pmhc holo D E C A B NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... LLFGYPVYV hla_a_02_01
1bii_A-B-P_pmhc 1bii_A-B-P_pmhc.pdb 1bii pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd
1ddh_A-B-P_pmhc 1ddh_A-B-P_pmhc.pdb 1ddh pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd
1duz_A-B-C_pmhc 1duz_A-B-C_pmhc.pdb 1duz pmhc apo NaN NaN C A B NaN LLFGYPVYV hla_a_02_01
... ... ... ... ... ... ... ... ... ... ... ... ...
8gon_D-E-C-A-B_tcr_pmhc 8gon_D-E-C-A-B_tcr_pmhc.pdb 8gon tcr_pmhc holo D E C A B TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... NaN NaN
8gop_A-B_tcr 8gop_A-B_tcr.pdb 8gop tcr apo A B NaN NaN NaN TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... NaN NaN
8gvb_A-B-P-H-L_tcr_pmhc 8gvb_A-B-P-H-L_tcr_pmhc.pdb 8gvb tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RYPLTFGW hla_a_24_02
8gvg_A-B-P-H-L_tcr_pmhc 8gvg_A-B-P-H-L_tcr_pmhc.pdb 8gvg tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RFPLTFGW hla_a_24_02
8gvi_A-B-P-H-L_tcr_pmhc 8gvi_A-B-P-H-L_tcr_pmhc.pdb 8gvi tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... RYPLTFGW hla_a_24_02

358 rows × 12 columns

[6]:
results = results.merge(
    summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
    how='left',
    left_on='complex_id',
    right_index=True,
)
[7]:
results = results.merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
    suffixes=('_x', '_y')
)
[8]:
results['comparison'] = results['state_x'] + '-' + results['state_y']
results['comparison'] = results['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)
[9]:
results['structure_comparison'] = results.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results = results.drop_duplicates(['structure_comparison',
                                   'chain_type',
                                   'cdr',
                                   'residue_name',
                                   'residue_seq_id',
                                   'residue_insert_code'])
[10]:
results = results.groupby(['cdr_sequences_collated',
                           'comparison',
                           'chain_type',
                           'cdr',
                           'residue_name',
                           'residue_seq_id',
                           'residue_insert_code'], dropna=False)[['ca_distance', 'rmsd', 'chi_angle_change', 'com_distance']].mean().reset_index()
[11]:
results['resi'] = results['residue_seq_id'].apply(str) + results['residue_insert_code'].fillna('')
[12]:
results = results.query("comparison == 'apo-holo'")
[13]:
results
[13]:
cdr_sequences_collated comparison chain_type cdr residue_name residue_seq_id residue_insert_code ca_distance rmsd chi_angle_change com_distance resi
0 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo alpha_chain 1 ALA 27 NaN 0.508006 0.739782 NaN 0.528628 27
1 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo alpha_chain 1 GLY 29 NaN 0.881840 0.898233 NaN 0.816910 29
2 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo alpha_chain 1 PRO 37 NaN 0.572616 0.749906 -0.030074 0.541702 37
3 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo alpha_chain 1 SER 38 NaN 0.502938 0.630893 -0.117107 0.568768 38
4 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo alpha_chain 1 THR 28 NaN 0.499294 1.930195 3.759095 0.572771 28
... ... ... ... ... ... ... ... ... ... ... ... ...
1629 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo beta_chain 3 LEU 108 NaN 1.022477 1.809773 -1.412711 1.135954 108
1630 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo beta_chain 3 SER 106 NaN 1.676174 1.573190 -0.192425 1.510255 106
1631 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo beta_chain 3 SER 107 NaN 1.091206 1.232135 2.393467 0.895901 107
1632 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo beta_chain 3 THR 115 NaN 1.236042 1.347615 -0.280349 1.218896 115
1633 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo beta_chain 3 TYR 117 NaN 1.102913 1.209585 -0.137036 0.922437 117

1132 rows × 12 columns

Visualising Results

Ca movement

[14]:
sns.catplot(results.sort_values('resi'),
            x='resi', y='ca_distance',
            row='chain_type', col='cdr',
            color='salmon',
            sharex=False,
            kind='bar')
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_18_0.svg

Residue RMSD difference

[15]:
sns.catplot(results.sort_values('resi'),
            x='resi', y='rmsd',
            row='chain_type', col='cdr',
            color='salmon',
            sharex=False,
            kind='bar')
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_20_0.svg

Centre of Mass Changes

[16]:
sns.catplot(results.sort_values('resi'),
            x='resi', y='com_distance',
            row='chain_type', col='cdr',
            color='salmon',
            sharex=False,
            kind='bar')
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_22_0.svg

Comparing Residue Identity to movment

[17]:
aa_types = results['residue_name'].map({
    'ARG': 'Charged',
    'HIS': 'Charged',
    'LYS': 'Charged',
    'ASP': 'Charged',
    'GLU': 'Charged',
    'SER': 'Polar',
    'THR': 'Polar',
    'ASN': 'Polar',
    'GLN': 'Polar',
    'CYS': 'Special',
    'GLY': 'Special',
    'PRO': 'Special',
    'ALA': 'Hydrophobic',
    'VAL': 'Hydrophobic',
    'ILE': 'Hydrophobic',
    'LEU': 'Hydrophobic',
    'MET': 'Hydrophobic',
    'PHE': 'Hydrophobic',
    'TYR': 'Hydrophobic',
    'TRP': 'Hydrophobic',
})

colour = aa_types.map({
    'Charged': 'lightblue',
    'Polar': 'orange',
    'Special': 'yellow',
    'Hydrophobic': 'pink',
})
[18]:
aa_means = results.groupby('residue_name')['rmsd'].mean().to_dict()
results['aa_mean'] = results['residue_name'].map(aa_means)
[19]:
sns.barplot(results.sort_values('aa_mean', ascending=False), y='rmsd', x='residue_name', palette=colour)
plt.xticks(rotation=-45)
print()

../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_26_1.svg
[20]:
num_side_chain_heavy_atoms = results['residue_name'].map({
    'ARG': 7,
    'HIS': 7,
    'LYS': 5,
    'ASP': 4,
    'GLU': 5,
    'SER': 2,
    'THR': 3,
    'ASN': 4,
    'GLN': 5,
    'CYS': 2,
    'GLY': 0,
    'PRO': 3,
    'ALA': 1,
    'VAL': 3,
    'ILE': 4,
    'LEU': 4,
    'MET': 4,
    'PHE': 7,
    'TYR': 8,
    'TRP': 10,
})
[21]:
sns.scatterplot(x=num_side_chain_heavy_atoms, y=results['rmsd'], hue=aa_types)
plt.xlabel('number of heavy atoms in side chain')
plt.show()
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_28_0.svg

Measuring differences in \(\chi\)-angles

Do the side chains change conformation as well?

[22]:
results['chi_angle_change_mag'] = results['chi_angle_change'].apply(np.abs)
results['chi_angle_change_deg_mag'] = results['chi_angle_change_mag'].apply(np.degrees)
[23]:
sns.catplot(results.sort_values('resi'),
            x='resi', y='chi_angle_change',
            row='chain_type', col='cdr',
            color='salmon',
            sharex=False,
            kind='bar')
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1560: RuntimeWarning: All-NaN slice encountered
  r, k = function_base._ureduce(a,
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
[23]:
<seaborn.axisgrid.FacetGrid at 0x7fb9bd787c40>
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_31_2.svg
[24]:
sns.catplot(results.sort_values('resi'),
            x='resi', y='chi_angle_change_mag',
            row='chain_type', col='cdr',
            color='salmon',
            sharex=False,
            kind='bar')
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1560: RuntimeWarning: All-NaN slice encountered
  r, k = function_base._ureduce(a,
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
[24]:
<seaborn.axisgrid.FacetGrid at 0x7fb9bd018bb0>
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_32_2.svg
[25]:
sns.catplot(results.sort_values('resi'),
            x='resi', y='chi_angle_change_deg_mag',
            row='chain_type', col='cdr',
            color='salmon',
            sharex=False,
            kind='bar')
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/numpy/lib/nanfunctions.py:1560: RuntimeWarning: All-NaN slice encountered
  r, k = function_base._ureduce(a,
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
/home/b/bmcmaste/.local/lib/python3.10/site-packages/seaborn/algorithms.py:98: RuntimeWarning: Mean of empty slice
  boot_dist.append(f(*sample, **func_kwargs))
../_images/source_Per_residue_changes_of_TCR_CDR_between_apo_and_holo_structures_33_1.svg

Conclusion

From these results, it seems that the CDR-3 loops undergo plastic deformation but the CDR-1 and CDR-2 loops act as rigid bodies when contacting MHC molecules. This was determined by the flat profiles of the CDR-1 and -2 loops and the peaked, normal-like, distributions of CDR-3 loops (both CDR-A3 and CDR-B3).