Length dependency of conformational change

Introduction

In this notebook, we set out to determine if there was a correlation between conformational change and the length of CDR loops or peptides. We looked at both the correlation of bulk movements when loops where aligned on the framework regions and also the deformation effects when loops are aligned to one another. All peptide conformations are measure from alignment on the antigen binding groove floor.

[1]:
import os

import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import scipy
import seaborn as sns

Loading Meta data

[2]:
DATA_DIR = '../data/processed/apo-holo-tcr-pmhc-class-I-comparisons'
[3]:
apo_holo_summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')

apo_holo_summary_df['id'] = apo_holo_summary_df['file_name'].str.replace('.pdb$', '', regex=True)
[4]:
cdr_types = ['CDR-A1', 'CDR-A2', 'CDR-A3','CDR-B1', 'CDR-B2', 'CDR-B3']
apo_holo_summary_df[cdr_types] = apo_holo_summary_df['cdr_sequences_collated'].str.split('-').apply(pd.Series)
[5]:
apo_holo_summary_df
[5]:
file_name pdb_id structure_type state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug id CDR-A1 CDR-A2 CDR-A3 CDR-B1 CDR-B2 CDR-B3
0 1ao7_D-E-C-A-B_tcr_pmhc.pdb 1ao7 tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... LLFGYPVYV hla_a_02_01 1ao7_D-E-C-A-B_tcr_pmhc DRGSQS IYSNGD AVTTDSWGKLQ MNHEY SVGAGI ASRPGLAGGRPEQY
1 1b0g_C-A-B_pmhc.pdb 1b0g pmhc apo NaN NaN C A B NaN ALWGFFPVL hla_a_02_01 1b0g_C-A-B_pmhc NaN NaN NaN NaN NaN NaN
2 1b0g_F-D-E_pmhc.pdb 1b0g pmhc apo NaN NaN F D E NaN ALWGFFPVL hla_a_02_01 1b0g_F-D-E_pmhc NaN NaN NaN NaN NaN NaN
3 1bd2_D-E-C-A-B_tcr_pmhc.pdb 1bd2 tcr_pmhc holo D E C A B NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... LLFGYPVYV hla_a_02_01 1bd2_D-E-C-A-B_tcr_pmhc NSMFDY ISSIKDK AAMEGAQKLV MNHEY SVGAGI ASSYPGGGFYEQY
4 1bii_P-A-B_pmhc.pdb 1bii pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd 1bii_P-A-B_pmhc NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
386 7rtd_C-A-B_pmhc.pdb 7rtd pmhc apo NaN NaN C A B NaN YLQPRTFLL hla_a_02_01 7rtd_C-A-B_pmhc NaN NaN NaN NaN NaN NaN
387 7rtr_D-E-C-A-B_tcr_pmhc.pdb 7rtr tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc DRGSQS IYSNGD AVNRDDKII SEHNR FQNEAQ ASSPDIEQY
388 8gvb_A-B-P-H-L_tcr_pmhc.pdb 8gvb tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RYPLTFGW hla_a_24_02 8gvb_A-B-P-H-L_tcr_pmhc YGATPY YFSGDTLV AVGFTGGGNKLT SEHNR FQNEAQ ASSDRDRVPETQY
389 8gvg_A-B-P-H-L_tcr_pmhc.pdb 8gvg tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RFPLTFGW hla_a_24_02 8gvg_A-B-P-H-L_tcr_pmhc YGATPY YFSGDTLV AVGFTGGGNKLT SEHNR FQNEAQ ASSDRDRVPETQY
390 8gvi_A-B-P-H-L_tcr_pmhc.pdb 8gvi tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... RYPLTFGW hla_a_24_02 8gvi_A-B-P-H-L_tcr_pmhc YGATPY YFSGDTLV AVVFTGGGNKLT SEHNR FQNEAQ ASSLRDRVPETQY

391 rows × 19 columns

TCR Analysis

Load Data

[6]:
results_tcr_fw_align = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_results.csv'))
results_tcr_fw_align['alignment'] = 'framework'
[7]:
results_tcr_loop_align = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_results.csv'))
results_tcr_loop_align['alignment'] = 'loop'
[8]:
results_tcr = pd.concat([results_tcr_fw_align, results_tcr_loop_align])
results_tcr
[8]:
complex_id structure_x_name structure_y_name chain_type cdr rmsd alignment
0 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 1.932806 framework
1 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 2 1.308598 framework
2 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 3 1.244062 framework
3 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb beta_chain 1 0.809066 framework
4 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb beta_chain 2 0.688597 framework
... ... ... ... ... ... ... ...
809 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb alpha_chain 2 0.206414 loop
810 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb alpha_chain 3 0.459090 loop
811 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 1 0.255668 loop
812 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 2 0.175123 loop
813 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 0.223801 loop

1628 rows × 7 columns

Merge with metadata

[9]:
apo_holo_summary_cdrs_df = apo_holo_summary_df.melt(
    id_vars=[col for col in apo_holo_summary_df.columns if col not in cdr_types],
    var_name='cdr_type',
    value_name='cdr_sequence',
    value_vars=cdr_types,
)
[10]:
apo_holo_summary_cdrs_df[['chain_type', 'cdr']] = apo_holo_summary_cdrs_df['cdr_type'].map(
    lambda cdr_type: ('alpha_chain' if cdr_type[-2] == 'A' else 'beta_chain', int(cdr_type[-1]))
).apply(pd.Series)
[11]:
apo_holo_summary_cdrs_df['cdr_length'] = apo_holo_summary_cdrs_df['cdr_sequence'].str.len()
[12]:
apo_holo_summary_cdrs_df
[12]:
file_name pdb_id structure_type state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug id cdr_type cdr_sequence chain_type cdr cdr_length
0 1ao7_D-E-C-A-B_tcr_pmhc.pdb 1ao7 tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... LLFGYPVYV hla_a_02_01 1ao7_D-E-C-A-B_tcr_pmhc CDR-A1 DRGSQS alpha_chain 1 6.0
1 1b0g_C-A-B_pmhc.pdb 1b0g pmhc apo NaN NaN C A B NaN ALWGFFPVL hla_a_02_01 1b0g_C-A-B_pmhc CDR-A1 NaN alpha_chain 1 NaN
2 1b0g_F-D-E_pmhc.pdb 1b0g pmhc apo NaN NaN F D E NaN ALWGFFPVL hla_a_02_01 1b0g_F-D-E_pmhc CDR-A1 NaN alpha_chain 1 NaN
3 1bd2_D-E-C-A-B_tcr_pmhc.pdb 1bd2 tcr_pmhc holo D E C A B NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... LLFGYPVYV hla_a_02_01 1bd2_D-E-C-A-B_tcr_pmhc CDR-A1 NSMFDY alpha_chain 1 6.0
4 1bii_P-A-B_pmhc.pdb 1bii pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd 1bii_P-A-B_pmhc CDR-A1 NaN alpha_chain 1 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2341 7rtd_C-A-B_pmhc.pdb 7rtd pmhc apo NaN NaN C A B NaN YLQPRTFLL hla_a_02_01 7rtd_C-A-B_pmhc CDR-B3 NaN beta_chain 3 NaN
2342 7rtr_D-E-C-A-B_tcr_pmhc.pdb 7rtr tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 7rtr_D-E-C-A-B_tcr_pmhc CDR-B3 ASSPDIEQY beta_chain 3 9.0
2343 8gvb_A-B-P-H-L_tcr_pmhc.pdb 8gvb tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RYPLTFGW hla_a_24_02 8gvb_A-B-P-H-L_tcr_pmhc CDR-B3 ASSDRDRVPETQY beta_chain 3 13.0
2344 8gvg_A-B-P-H-L_tcr_pmhc.pdb 8gvg tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RFPLTFGW hla_a_24_02 8gvg_A-B-P-H-L_tcr_pmhc CDR-B3 ASSDRDRVPETQY beta_chain 3 13.0
2345 8gvi_A-B-P-H-L_tcr_pmhc.pdb 8gvi tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... RYPLTFGW hla_a_24_02 8gvi_A-B-P-H-L_tcr_pmhc CDR-B3 ASSLRDRVPETQY beta_chain 3 13.0

2346 rows × 18 columns

[13]:
results_tcr = results_tcr.merge(
    apo_holo_summary_cdrs_df[['file_name', 'pdb_id', 'structure_type', 'state']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    apo_holo_summary_cdrs_df[['file_name', 'pdb_id', 'structure_type', 'state']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
).merge(
    apo_holo_summary_cdrs_df[['id',
                              'cdr_sequences_collated',
                              'peptide_sequence',
                              'mhc_slug',
                              'cdr_sequence',
                              'cdr_length',
                              'cdr_type',
                              'chain_type',
                              'cdr']],
    how='left',
    left_on=['complex_id', 'chain_type', 'cdr'],
    right_on=['id', 'chain_type', 'cdr'],
)

Normalise data

[14]:
results_tcr['comparison'] = results_tcr['state_x'] + '-' + results_tcr['state_y']
results_tcr['comparison'] = results_tcr['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)
results_tcr = results_tcr.query("comparison == 'apo-holo'").reset_index(drop=True)
[15]:
results_tcr['structure_comparison'] = results_tcr.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_tcr = results_tcr.drop_duplicates(['structure_comparison', 'chain_type', 'cdr', 'alignment'])
[16]:
results_tcr = results_tcr.groupby(['cdr_sequences_collated',
                                   'alignment',
                                   'comparison',
                                   'cdr_length',
                                   'cdr_sequence',
                                   'cdr_type',
                                   'chain_type',
                                   'cdr'])['rmsd'].mean().reset_index()

Visualise Results

Alignment on Framework region

[17]:
data = results_tcr.query("alignment == 'framework'")

r, p_val = scipy.stats.pearsonr(data['cdr_length'], data['rmsd'])

sns.lmplot(data.sort_values('cdr_type'), x='cdr_length', y='rmsd', scatter=False)
sns.scatterplot(data.sort_values('cdr_type'), x='cdr_length', y='rmsd', hue='cdr_type')

plt.text(4, 8, f'$r^2$ = {r**2: .2f}, p-value = {p_val: .2e}')

plt.show()
../_images/source_Length_dependency_of_conformational_change_25_0.png

Alignment on loops

[18]:
data = results_tcr.query("alignment == 'loop'")

r, p_val = scipy.stats.pearsonr(data['cdr_length'], data['rmsd'])

sns.lmplot(data.sort_values('cdr_type'), x='cdr_length', y='rmsd', scatter=False)
sns.scatterplot(data.sort_values('cdr_type'), x='cdr_length', y='rmsd', hue='cdr_type')

plt.text(7, 2.5, f'$r^2$ = {r**2: .2f}, p-value = {p_val: .2e}')

plt.show()
../_images/source_Length_dependency_of_conformational_change_27_0.png

TCR apex calculations

[19]:
results_per_res_tcr = pd.read_csv(os.path.join(DATA_DIR, 'tcr_per_res_apo_holo_loop_align.csv'))
results_per_res_tcr
[19]:
complex_id structure_x_name structure_y_name chain_type cdr residue_name residue_seq_id residue_insert_code rmsd ca_distance chi_angle_change com_distance
0 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 ASP 27 NaN 4.922807 2.215234 -1.001709 3.836500
1 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 ARG 28 NaN 7.683418 2.322292 -1.010462 6.119157
2 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 GLY 29 NaN 0.657793 0.718576 NaN 0.452200
3 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 SER 36 NaN 1.224430 0.404912 -2.505061 0.866544
4 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 GLN 37 NaN 1.133408 0.467132 0.667185 0.798590
... ... ... ... ... ... ... ... ... ... ... ... ...
6228 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 ASP 109 NaN 0.507077 0.180564 0.198956 0.208659
6229 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 ILE 114 NaN 2.164965 0.158682 3.679442 0.829175
6230 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 GLU 115 NaN 1.577728 0.195887 3.091039 0.995351
6231 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 GLN 116 NaN 0.204783 0.197683 -0.014126 0.166472
6232 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 TYR 117 NaN 0.180118 0.164981 -0.024244 0.170852

6233 rows × 12 columns

[20]:
results_per_res_tcr = results_per_res_tcr.merge(
    apo_holo_summary_cdrs_df[['file_name', 'pdb_id', 'structure_type', 'state']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    apo_holo_summary_cdrs_df[['file_name', 'pdb_id', 'structure_type', 'state']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
).merge(
    apo_holo_summary_cdrs_df[['id',
                              'cdr_sequences_collated',
                              'peptide_sequence',
                              'mhc_slug',
                              'cdr_sequence',
                              'cdr_length',
                              'cdr_type',
                              'chain_type',
                              'cdr']],
    how='left',
    left_on=['complex_id', 'chain_type', 'cdr'],
    right_on=['id', 'chain_type', 'cdr'],
)
[21]:
results_per_res_tcr['comparison'] = results_per_res_tcr['state_x'] + '-' + results_per_res_tcr['state_y']
results_per_res_tcr['comparison'] = results_per_res_tcr['comparison'].map(
    lambda entry: 'apo-holo' if entry == 'holo-apo' else entry
)
results_per_res_tcr = results_per_res_tcr.query("comparison == 'apo-holo'").reset_index(drop=True)
[22]:
results_per_res_tcr['structure_comparison'] = results_per_res_tcr.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_per_res_tcr = results_per_res_tcr.drop_duplicates(['structure_comparison', 'chain_type', 'cdr',
                                                           'residue_name', 'residue_seq_id', 'residue_insert_code'])
[23]:
results_per_res_tcr
[23]:
complex_id structure_x_name structure_y_name chain_type cdr residue_name residue_seq_id residue_insert_code rmsd ca_distance ... state_y id cdr_sequences_collated peptide_sequence mhc_slug cdr_sequence cdr_length cdr_type comparison structure_comparison
0 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 ASP 27 NaN 4.922807 2.215234 ... apo 3qdg_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNFGGGKLI-MRHNA-SNTAGT-ASSLSFGTEAF ELAGIGILTV hla_a_02_01 DRGSQS 6.0 CDR-A1 apo-holo 3qdg_D-E-C-A-B_tcr_pmhc.pdb-3qeu_A-B_tcr.pdb
36 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 ARG 28 NaN 7.683418 2.322292 ... apo 3qdg_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNFGGGKLI-MRHNA-SNTAGT-ASSLSFGTEAF ELAGIGILTV hla_a_02_01 DRGSQS 6.0 CDR-A1 apo-holo 3qdg_D-E-C-A-B_tcr_pmhc.pdb-3qeu_A-B_tcr.pdb
72 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 GLY 29 NaN 0.657793 0.718576 ... apo 3qdg_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNFGGGKLI-MRHNA-SNTAGT-ASSLSFGTEAF ELAGIGILTV hla_a_02_01 DRGSQS 6.0 CDR-A1 apo-holo 3qdg_D-E-C-A-B_tcr_pmhc.pdb-3qeu_A-B_tcr.pdb
108 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 SER 36 NaN 1.224430 0.404912 ... apo 3qdg_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNFGGGKLI-MRHNA-SNTAGT-ASSLSFGTEAF ELAGIGILTV hla_a_02_01 DRGSQS 6.0 CDR-A1 apo-holo 3qdg_D-E-C-A-B_tcr_pmhc.pdb-3qeu_A-B_tcr.pdb
144 3qdg_D-E-C-A-B_tcr_pmhc 3qdg_D-E-C-A-B_tcr_pmhc.pdb 3qeu_A-B_tcr.pdb alpha_chain 1 GLN 37 NaN 1.133408 0.467132 ... apo 3qdg_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNFGGGKLI-MRHNA-SNTAGT-ASSLSFGTEAF ELAGIGILTV hla_a_02_01 DRGSQS 6.0 CDR-A1 apo-holo 3qdg_D-E-C-A-B_tcr_pmhc.pdb-3qeu_A-B_tcr.pdb
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
158220 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 ASP 109 NaN 0.507077 0.180564 ... holo 7rtr_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 ASSPDIEQY 9.0 CDR-B3 apo-holo 7n1d_A-B_tcr.pdb-7rtr_D-E-C-A-B_tcr_pmhc.pdb
158256 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 ILE 114 NaN 2.164965 0.158682 ... holo 7rtr_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 ASSPDIEQY 9.0 CDR-B3 apo-holo 7n1d_A-B_tcr.pdb-7rtr_D-E-C-A-B_tcr_pmhc.pdb
158292 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 GLU 115 NaN 1.577728 0.195887 ... holo 7rtr_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 ASSPDIEQY 9.0 CDR-B3 apo-holo 7n1d_A-B_tcr.pdb-7rtr_D-E-C-A-B_tcr_pmhc.pdb
158328 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 GLN 116 NaN 0.204783 0.197683 ... holo 7rtr_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 ASSPDIEQY 9.0 CDR-B3 apo-holo 7n1d_A-B_tcr.pdb-7rtr_D-E-C-A-B_tcr_pmhc.pdb
158364 7rtr_D-E-C-A-B_tcr_pmhc 7n1d_A-B_tcr.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb beta_chain 3 TYR 117 NaN 0.180118 0.164981 ... holo 7rtr_D-E-C-A-B_tcr_pmhc DRGSQS-IYSNGD-AVNRDDKII-SEHNR-FQNEAQ-ASSPDIEQY YLQPRTFLL hla_a_02_01 ASSPDIEQY 9.0 CDR-B3 apo-holo 7n1d_A-B_tcr.pdb-7rtr_D-E-C-A-B_tcr_pmhc.pdb

4400 rows × 29 columns

[24]:
def compute_apex_movement(loop: pd.DataFrame) -> float:
    loop_len = len(loop)

    if loop_len % 2 == 0:
        return np.mean([loop.iloc[loop_len // 2]['ca_distance'], loop.iloc[(loop_len // 2) - 1]['ca_distance']])

    return loop.iloc[loop_len // 2]['ca_distance']
[25]:
results_tcr_apex = results_per_res_tcr.groupby(['structure_x_name',
                                                'structure_y_name',
                                                'cdr_type',
                                                'cdr',
                                                'chain_type',
                                                'cdr_length']).apply(compute_apex_movement)
results_tcr_apex.name = 'apex_ca_distance'
results_tcr_apex = results_tcr_apex.reset_index()
[26]:
results_tcr_apex
[26]:
structure_x_name structure_y_name cdr_type cdr chain_type cdr_length apex_ca_distance
0 1ao7_D-E-C-A-B_tcr_pmhc.pdb 3qh3_A-B_tcr.pdb CDR-A1 1 alpha_chain 6.0 0.153049
1 1ao7_D-E-C-A-B_tcr_pmhc.pdb 3qh3_A-B_tcr.pdb CDR-A2 2 alpha_chain 6.0 0.447122
2 1ao7_D-E-C-A-B_tcr_pmhc.pdb 3qh3_A-B_tcr.pdb CDR-A3 3 alpha_chain 11.0 2.461146
3 1ao7_D-E-C-A-B_tcr_pmhc.pdb 3qh3_A-B_tcr.pdb CDR-B1 1 beta_chain 5.0 0.319232
4 1ao7_D-E-C-A-B_tcr_pmhc.pdb 3qh3_A-B_tcr.pdb CDR-B2 2 beta_chain 6.0 0.167444
... ... ... ... ... ... ... ...
569 7r7z_A-B_tcr.pdb 7r80_A-B-E-C-D_tcr_pmhc.pdb CDR-A2 2 alpha_chain 7.0 0.173771
570 7r7z_A-B_tcr.pdb 7r80_A-B-E-C-D_tcr_pmhc.pdb CDR-A3 3 alpha_chain 11.0 0.508615
571 7r7z_A-B_tcr.pdb 7r80_A-B-E-C-D_tcr_pmhc.pdb CDR-B1 1 beta_chain 5.0 0.108211
572 7r7z_A-B_tcr.pdb 7r80_A-B-E-C-D_tcr_pmhc.pdb CDR-B2 2 beta_chain 6.0 0.219891
573 7r7z_A-B_tcr.pdb 7r80_A-B-E-C-D_tcr_pmhc.pdb CDR-B3 3 beta_chain 13.0 0.789117

574 rows × 7 columns

[27]:
r, p_val = scipy.stats.pearsonr(results_tcr_apex['cdr_length'], results_tcr_apex['apex_ca_distance'])

sns.lmplot(results_tcr_apex.sort_values('cdr_type'), x='cdr_length', y='apex_ca_distance', scatter=False)
sns.scatterplot(results_tcr_apex.sort_values('cdr_type'), x='cdr_length', y='apex_ca_distance', hue='cdr_type')

plt.text(6.75, 4, f'$r^2$ = {r**2: .2f}, p-value = {p_val: .2e}')

plt.show()
../_images/source_Length_dependency_of_conformational_change_37_0.png

Peptide Analysis

Load data

[28]:
results_peptide = pd.read_csv(os.path.join(DATA_DIR, 'pmhc_tcr_contact_apo_holo.csv'))

results_peptide = results_peptide.query("chain_type == 'antigen_chain'")
results_peptide = results_peptide.drop(columns=['chain_type', 'tcr_contact'])

results_peptide
[28]:
complex_id structure_x_name structure_y_name rmsd
2 5c0a_D-E-C-A-B_tcr_pmhc 5c0a_D-E-C-A-B_tcr_pmhc.pdb 5n1y_C-A-B_pmhc.pdb 0.448858
5 5wlg_D-E-C-A-B_tcr_pmhc 5wlg_D-E-C-A-B_tcr_pmhc.pdb 5wli_C-A-B_pmhc.pdb 0.498148
8 5wlg_D-E-C-A-B_tcr_pmhc 5wlg_D-E-C-A-B_tcr_pmhc.pdb 5wli_F-D-E_pmhc.pdb 0.519507
11 5wlg_D-E-C-A-B_tcr_pmhc 5wlg_D-E-C-A-B_tcr_pmhc.pdb 5wli_I-G-H_pmhc.pdb 0.459025
14 5wlg_D-E-C-A-B_tcr_pmhc 5wlg_D-E-C-A-B_tcr_pmhc.pdb 5wli_L-J-K_pmhc.pdb 0.494705
... ... ... ... ...
3301 7rtr_D-E-C-A-B_tcr_pmhc 7n6d_O-M-N_pmhc.pdb 7rtd_C-A-B_pmhc.pdb 0.486296
3304 7rtr_D-E-C-A-B_tcr_pmhc 7n6d_O-M-N_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb 0.448905
3307 7rtr_D-E-C-A-B_tcr_pmhc 7p3d_C-A-B_pmhc.pdb 7rtd_C-A-B_pmhc.pdb 0.773692
3310 7rtr_D-E-C-A-B_tcr_pmhc 7p3d_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb 0.765295
3313 7rtr_D-E-C-A-B_tcr_pmhc 7rtd_C-A-B_pmhc.pdb 7rtr_D-E-C-A-B_tcr_pmhc.pdb 0.455078

1096 rows × 4 columns

Merge with metadata

[29]:
apo_holo_summary_df['peptide_length'] = apo_holo_summary_df['peptide_sequence'].str.len()
[30]:
results_peptide = results_peptide.merge(
    apo_holo_summary_df[['file_name', 'pdb_id', 'state']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    apo_holo_summary_df[['file_name', 'pdb_id', 'state']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
).merge(
    apo_holo_summary_df[['id', 'peptide_sequence', 'peptide_length', 'mhc_slug']],
    how='left',
    left_on='complex_id',
    right_on='id',
)

Normalise data

[31]:
results_peptide['comparison'] = results_peptide['state_x'] + '-' + results_peptide['state_y']
results_peptide['comparison'] = results_peptide['comparison'].map(
    lambda entry: 'apo-holo' if entry == 'holo-apo' else entry,
)
results_peptide = results_peptide.query("comparison == 'apo-holo'").reset_index(drop=True)
[32]:
results_peptide['structure_comparison'] = results_peptide.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_peptide = results_peptide.drop_duplicates('structure_comparison')
[33]:
results_peptide = results_peptide.groupby(['peptide_sequence',
                                           'comparison',
                                           'peptide_length',
                                           'mhc_slug'])['rmsd'].mean().reset_index()
[34]:
results_peptide
[34]:
peptide_sequence comparison peptide_length mhc_slug rmsd
0 AAGIGILTV apo-holo 9.0 hla_a_02_01 1.025622
1 ALGIGILTV apo-holo 9.0 hla_a_02_01 1.274516
2 ALWGFFPVL apo-holo 9.0 hla_a_02_01 0.351748
3 ALWGPDPAAA apo-holo 10.0 hla_a_02_01 0.703094
4 APRGPHGGAASGL apo-holo 13.0 hla_b_07_02 3.523593
... ... ... ... ... ...
75 VVVGAGGVGK apo-holo 10.0 hla_a_11_01 1.483028
76 YGFRNVVHI apo-holo 9.0 h2_db 0.275006
77 YLGGPDFPTI apo-holo 10.0 hla_a_02_01 0.868938
78 YLQPRTFLL apo-holo 9.0 hla_a_02_01 0.584577
79 YQFGPDFPIA apo-holo 10.0 hla_a_02_01 0.408370

80 rows × 5 columns

Visualise results

[35]:
r, p_val = scipy.stats.pearsonr(results_peptide['peptide_length'], results_peptide['rmsd'])

sns.lmplot(results_peptide.sort_values('mhc_slug'), x='peptide_length', y='rmsd', scatter=False)
ax = sns.scatterplot(results_peptide.sort_values('mhc_slug'), x='peptide_length', y='rmsd', hue='mhc_slug')
sns.move_legend(ax, "upper left", bbox_to_anchor=(1, 1))

plt.text(8.5, 4, f'$r^2$ = {r**2: .2f}, p-value = {p_val: .2e}')

plt.show()
../_images/source_Length_dependency_of_conformational_change_50_0.png

Conclusion

Both TCR CDR loops and peptides show a correlation between length and amount of conformational change. For the TCRs, there is a correlation in both paradigms, framework alignment and loop alignment, but the correlation is much stronger looking at the loop alignments. The increased correlation for loop alignments make sense since the conformational changes from the framework regions can be driven by other parts of the protein, but when the loops are aligned together, the only differences can be driven by changes in the loops themselves, implying the loop length has more of an effect.