Comparison of apo and holo CDR loops

Introduction

The following notebook aims to quantify the movement of CDR domains in TCR variable regions between unbound (apo) and bound (holo) conformations. The holo conforomations in this case are TCRs bound to class I pMHC complexes.

[1]:
import os
import itertools

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import seaborn as sns
[2]:
DATA_DIR = '../data/processed/apo-holo-tcr-pmhc-class-I-comparisons'
[3]:
NOISE_LEVEL = 0.5 # Å
[4]:
summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')

ids = summary_df['file_name'].str.replace('.pdb', '', regex=True)
ids.name = 'id'
summary_df = summary_df.set_index(ids)

summary_df
[4]:
file_name pdb_id structure_type state alpha_chain beta_chain antigen_chain mhc_chain1 mhc_chain2 cdr_sequences_collated peptide_sequence mhc_slug
id
1ao7_D-E-C-A-B_tcr_pmhc 1ao7_D-E-C-A-B_tcr_pmhc.pdb 1ao7 tcr_pmhc holo D E C A B DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... LLFGYPVYV hla_a_02_01
1bd2_D-E-C-A-B_tcr_pmhc 1bd2_D-E-C-A-B_tcr_pmhc.pdb 1bd2 tcr_pmhc holo D E C A B NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... LLFGYPVYV hla_a_02_01
1bii_A-B-P_pmhc 1bii_A-B-P_pmhc.pdb 1bii pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd
1ddh_A-B-P_pmhc 1ddh_A-B-P_pmhc.pdb 1ddh pmhc apo NaN NaN P A B NaN RGPGRAFVTI h2_dd
1duz_A-B-C_pmhc 1duz_A-B-C_pmhc.pdb 1duz pmhc apo NaN NaN C A B NaN LLFGYPVYV hla_a_02_01
... ... ... ... ... ... ... ... ... ... ... ... ...
8gon_D-E-C-A-B_tcr_pmhc 8gon_D-E-C-A-B_tcr_pmhc.pdb 8gon tcr_pmhc holo D E C A B TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... NaN NaN
8gop_A-B_tcr 8gop_A-B_tcr.pdb 8gop tcr apo A B NaN NaN NaN TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... NaN NaN
8gvb_A-B-P-H-L_tcr_pmhc 8gvb_A-B-P-H-L_tcr_pmhc.pdb 8gvb tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RYPLTFGW hla_a_24_02
8gvg_A-B-P-H-L_tcr_pmhc 8gvg_A-B-P-H-L_tcr_pmhc.pdb 8gvg tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... RFPLTFGW hla_a_24_02
8gvi_A-B-P-H-L_tcr_pmhc 8gvi_A-B-P-H-L_tcr_pmhc.pdb 8gvi tcr_pmhc holo A B P H L YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... RYPLTFGW hla_a_24_02

358 rows × 12 columns

Calculating RMSD between conformations

RMSD is used to compute the difference between all states of the same TCR. Comparions between the different apo forms and the differenceholo forms are also done so that these can be used as a control for the differences in apo to holo comparisons. This process is done two different ways. The first, termed framework alignment, computes the comparison between TCRs after aligning on the framework region of the proteins. This method captures the “global” conformational changes that these loops undergo including any combined rotations and deformations. The second method, termed loop alignment, does the same calculations as the first method, but after aligning the two loops being compared on their backbones. This method captures the deformation aspects of loops regardless of any hinging from the framework region.

[5]:
def categorize_movement(rmsd: float) -> str:
    if rmsd < NOISE_LEVEL:
        return f'Little Movement (<{NOISE_LEVEL} Å)'

    if NOISE_LEVEL <= rmsd < 1.0:
        return f'Some Movement ({NOISE_LEVEL} to 1.0 Å)'

    if 1.0 <= rmsd < 2.0:
        return 'Movement (1.0 to 2.0 Å)'

    if 2.0 <= rmsd < 4.0:
        return 'Large Movement (2.0 to 4.0 Å)'

    if 4.0 <= rmsd:
        return 'Significant Movement (>4.0 Å)'


movement_order = pd.CategoricalDtype(categories=[f'Little Movement (<{NOISE_LEVEL} Å)',
                                                 f'Some Movement ({NOISE_LEVEL} to 1.0 Å)',
                                                 'Movement (1.0 to 2.0 Å)',
                                                 'Large Movement (2.0 to 4.0 Å)',
                                                 'Significant Movement (>4.0 Å)'], ordered=True)

Framework Alignment

[6]:
results_framework = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_results.csv'))
[7]:
results_framework = results_framework.merge(
    summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
    how='left',
    left_on='complex_id',
    right_index=True,
)
[8]:
results_framework_holo_holo = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_holo.csv'))
[9]:
results_framework_holo_holo['cdr_sequences_collated'] = None

results_framework_holo_holo['mhc_slug'] = None
results_framework_holo_holo['peptide_sequence'] = None
[10]:
cdr_pattern = r'[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+'
cdr_complex_ids = results_framework_holo_holo['complex_id'].str.contains(cdr_pattern)

holo_cdr_sequences_collated = results_framework_holo_holo[cdr_complex_ids]['complex_id']
results_framework_holo_holo.loc[cdr_complex_ids, 'cdr_sequences_collated'] = holo_cdr_sequences_collated
[11]:
results_framework_holo_holo = results_framework_holo_holo.query('cdr_sequences_collated.notnull()')
[12]:
results_framework = pd.concat([results_framework, results_framework_holo_holo])
[13]:
results_framework = results_framework.merge(
    summary_df[['file_name','pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
    suffixes=('_x', '_y')
)
[14]:
results_framework['comparison'] = results_framework['state_x'] + '-' + results_framework['state_y']
results_framework['comparison'] = results_framework['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)
[15]:
results_framework['structure_comparison'] = results_framework.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_framework = results_framework.drop_duplicates(['structure_comparison', 'chain_type', 'cdr'])
[16]:
results_framework = results_framework.groupby(['cdr_sequences_collated', 'comparison', 'cdr', 'chain_type'])['rmsd'].mean().reset_index()
[17]:
results_framework['movement'] = results_framework['rmsd'].map(categorize_movement).astype(movement_order)
[18]:
results_framework
[18]:
cdr_sequences_collated comparison cdr chain_type rmsd movement
0 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 1 alpha_chain 1.193604 Movement (1.0 to 2.0 Å)
1 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 1 beta_chain 0.473654 Little Movement (<0.5 Å)
2 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 2 alpha_chain 0.905868 Some Movement (0.5 to 1.0 Å)
3 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 2 beta_chain 0.865963 Some Movement (0.5 to 1.0 Å)
4 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 3 alpha_chain 2.635113 Large Movement (2.0 to 4.0 Å)
... ... ... ... ... ... ...
318 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 1 beta_chain 0.646786 Some Movement (0.5 to 1.0 Å)
319 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 2 alpha_chain 2.650761 Large Movement (2.0 to 4.0 Å)
320 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 2 beta_chain 0.786869 Some Movement (0.5 to 1.0 Å)
321 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 3 alpha_chain 1.956881 Movement (1.0 to 2.0 Å)
322 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 3 beta_chain 3.343748 Large Movement (2.0 to 4.0 Å)

323 rows × 6 columns

Loop Alignment

[19]:
results_loop = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_results.csv'))
[20]:
results_loop= results_loop.merge(
    summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
    how='left',
    left_on='complex_id',
    right_index=True,
)
[21]:
results_loop_holo_holo = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_holo.csv'))
[22]:
results_loop_holo_holo['cdr_sequences_collated'] = None

results_loop_holo_holo['mhc_slug'] = None
results_loop_holo_holo['peptide_sequence'] = None
[23]:
cdr_pattern = r'[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+'
cdr_complex_ids = results_loop_holo_holo['complex_id'].str.contains(cdr_pattern)

holo_cdr_sequences_collated = results_loop_holo_holo[cdr_complex_ids]['complex_id']
results_loop_holo_holo.loc[cdr_complex_ids, 'cdr_sequences_collated'] = holo_cdr_sequences_collated
[24]:
results_loop_holo_holo = results_loop_holo_holo.query('cdr_sequences_collated.notnull()')
[25]:
results_loop = pd.concat([results_loop, results_loop_holo_holo])
[26]:
results_loop = results_loop.merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
    suffixes=('_x', '_y')
)
[27]:
results_loop['comparison'] = results_loop['state_x'] + '-' + results_loop['state_y']
results_loop['comparison'] = results_loop['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)
[28]:
results_loop['structure_comparison'] = results_loop.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_loop = results_loop.drop_duplicates(['structure_comparison', 'chain_type', 'cdr'])
[29]:
results_loop = results_loop.groupby(['cdr_sequences_collated', 'comparison', 'cdr', 'chain_type'])['rmsd'].mean().reset_index()
[30]:
results_loop['movement'] = results_loop['rmsd'].map(categorize_movement).astype(movement_order)
[31]:
results_loop
[31]:
cdr_sequences_collated comparison cdr chain_type rmsd movement
0 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 1 alpha_chain 0.801538 Some Movement (0.5 to 1.0 Å)
1 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 1 beta_chain 0.239049 Little Movement (<0.5 Å)
2 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 2 alpha_chain 0.725440 Some Movement (0.5 to 1.0 Å)
3 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 2 beta_chain 0.774079 Some Movement (0.5 to 1.0 Å)
4 ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... apo-holo 3 alpha_chain 1.210009 Movement (1.0 to 2.0 Å)
... ... ... ... ... ... ...
318 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 1 beta_chain 0.113593 Little Movement (<0.5 Å)
319 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 2 alpha_chain 1.082823 Movement (1.0 to 2.0 Å)
320 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 2 beta_chain 0.201656 Little Movement (<0.5 Å)
321 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 3 alpha_chain 1.475462 Movement (1.0 to 2.0 Å)
322 YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... apo-holo 3 beta_chain 2.503307 Large Movement (2.0 to 4.0 Å)

323 rows × 6 columns

Visualizing and analysing the results

Measuring the distance between apo and holo confomations

Since RMSD was calculated for all pairings of the same TCR, the apo-holo (or holo-apo) comparisons were selected.

Framework Alignment

[32]:
results_framework['chain_type_formatted'] = results_framework['chain_type'].str.replace('_', ' ').str.title()
[33]:
sns.boxplot(data=results_framework.query("comparison == 'apo-holo'"), y='rmsd', x='cdr', hue='chain_type_formatted')

x = np.linspace(-0.75, 2.75)
y = np.repeat(NOISE_LEVEL, len(x))

plt.plot(x, y, '--r', label=f'Noise Level ({NOISE_LEVEL: .2f} Å)')
plt.xlim(-0.75, 2.75)

plt.legend(title='Chain Type')
plt.xlabel('CDR')
plt.ylabel('RMSD (Å)')

plt.show()

results_framework.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])['rmsd'].describe()
../_images/source_Comparison_of_apo_and_holo_CDR_loops_40_0.png
[33]:
count mean std min 25% 50% 75% max
chain_type cdr
alpha_chain 1 25.0 1.625591 1.126121 0.172478 0.849734 1.308559 2.056843 5.547374
2 24.0 1.360652 0.725418 0.496448 0.894783 1.121992 1.509622 3.170339
3 25.0 2.205034 1.666278 0.157864 1.124128 1.678234 2.635113 7.883364
beta_chain 1 25.0 0.818192 0.428709 0.211640 0.504221 0.770227 0.983855 2.106660
2 25.0 0.879987 0.718325 0.311457 0.526078 0.705481 0.947803 3.928769
3 25.0 1.375520 0.977938 0.142801 0.705002 1.001740 1.823828 3.820573
[34]:
sns.displot(results_framework.query("comparison == 'apo-holo'"),
            x='rmsd', col='cdr', row='chain_type_formatted')
[34]:
<seaborn.axisgrid.FacetGrid at 0x7f9d0735eb30>
../_images/source_Comparison_of_apo_and_holo_CDR_loops_41_1.png
[35]:
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_framework.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])]
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=33.47222908277405, pvalue=3.0322000362602383e-06)
[36]:
combos = [(((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y))
          for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in itertools.combinations(treatments, 2)
          if cdr_x == cdr_y]

significance_level = 0.05 / len(combos)

framework_stats = {
    'chain_type_x': [],
    'cdr_x': [],
    'chain_type_y': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    framework_stats['chain_type_x'].append(chain_x)
    framework_stats['cdr_x'].append(cdr_x)
    framework_stats['chain_type_y'].append(chain_y)
    framework_stats['cdr_y'].append(cdr_y)

    framework_stats['statistic'].append(stat)
    framework_stats['p_val'].append(p_val)

framework_stats = pd.DataFrame(framework_stats)
framework_stats['significant'] = framework_stats['p_val'] < significance_level

framework_stats
[36]:
chain_type_x cdr_x chain_type_y cdr_y statistic p_val significant
0 alpha_chain 1 beta_chain 1 3.346992 0.000817 True
1 alpha_chain 2 beta_chain 2 3.380000 0.000725 True
2 alpha_chain 3 beta_chain 3 2.202223 0.027650 False
[37]:
combos = list(itertools.combinations([(cdr, df['rmsd'].to_numpy())
                                       for cdr, df in results_framework.query("comparison == 'apo-holo'").groupby('cdr')], 2))

significance_level = 0.05 / len(combos)

framework_stats = {
    'cdr_x': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for (cdr_x, sample_x), (cdr_y, sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)


    framework_stats['cdr_x'].append(cdr_x)
    framework_stats['cdr_y'].append(cdr_y)

    framework_stats['statistic'].append(stat)
    framework_stats['p_val'].append(p_val)

framework_stats = pd.DataFrame(framework_stats)
framework_stats['significant'] =  framework_stats['p_val'] < significance_level

framework_stats
[37]:
cdr_x cdr_y statistic p_val significant
0 1 2 0.335930 0.736924 False
1 1 3 -2.343899 0.019083 False
2 2 3 -2.820412 0.004796 True

The figure above shows there appears to be an increase in the movement of CDR3 compared to CDR1 and CDR2 and that the alpha chain has more movement than the beta chain of the TCR.

[38]:
movement_counts = results_framework.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
movement_counts = movement_counts[movement_counts > 0]

ax = movement_counts.plot.pie(title='CDR Movement from Framework Alignment',
                              autopct='%1.2f%%',
                              legend=True,
                              ylabel='',
                              labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')

plt.show()
../_images/source_Comparison_of_apo_and_holo_CDR_loops_46_0.png

Loop Alignment

[39]:
results_loop['chain_type_formatted'] = results_loop['chain_type'].str.replace('_', ' ').str.title()
[40]:
sns.boxplot(data=results_loop.query("comparison == 'apo-holo'"), y='rmsd', x='cdr', hue='chain_type_formatted')

x = np.linspace(-0.75, 2.75)
y = np.repeat(NOISE_LEVEL, len(x))

plt.plot(x, y, '--r', label=f'Noise Level ({NOISE_LEVEL: .2f} Å)')
plt.xlim(-0.75, 2.75)

plt.legend(title='Chain Type')
plt.xlabel('CDR')
plt.ylabel('RMSD (Å)')

plt.show()

results_loop.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])['rmsd'].describe()
../_images/source_Comparison_of_apo_and_holo_CDR_loops_49_0.png
[40]:
count mean std min 25% 50% 75% max
chain_type cdr
alpha_chain 1 25.0 0.539703 0.370003 0.037604 0.265344 0.398626 0.865431 1.317612
2 24.0 0.644121 0.446783 0.215395 0.294647 0.439994 0.979735 1.684088
3 25.0 1.121232 0.650441 0.077000 0.584030 1.083251 1.423754 2.495455
beta_chain 1 25.0 0.279281 0.184982 0.110227 0.149009 0.214492 0.363066 0.744923
2 25.0 0.339790 0.232802 0.056751 0.191979 0.248698 0.398983 0.988705
3 25.0 0.886316 0.691293 0.102327 0.341379 0.693077 1.220448 2.503307
[41]:
sns.displot(results_loop.query("comparison == 'apo-holo'"),
            x='rmsd', col='cdr', row='chain_type_formatted')
[41]:
<seaborn.axisgrid.FacetGrid at 0x7f9d03eedc30>
../_images/source_Comparison_of_apo_and_holo_CDR_loops_50_1.png
[42]:
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_loop.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])]
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=48.69824429530195, pvalue=2.5578145006489472e-09)
[43]:
combos = [(((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y))
          for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in itertools.combinations(treatments, 2)
          if cdr_x == cdr_y]

significance_level = 0.05 / len(combos)

loop_stats = {
    'chain_x': [],
    'cdr_x': [],
    'chain_y': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    loop_stats['chain_x'].append(chain_x)
    loop_stats['cdr_x'].append(cdr_x)
    loop_stats['chain_y'].append(chain_y)
    loop_stats['cdr_y'].append(cdr_y)

    loop_stats['statistic'].append(stat)
    loop_stats['p_val'].append(p_val)

loop_stats = pd.DataFrame(loop_stats)

loop_stats['significant'] = loop_stats['p_val'] < significance_level

loop_stats
[43]:
chain_x cdr_x chain_y cdr_y statistic p_val significant
0 alpha_chain 1 beta_chain 1 3.172366 0.001512 True
1 alpha_chain 2 beta_chain 2 3.120000 0.001809 True
2 alpha_chain 3 beta_chain 3 1.600735 0.109436 False
[44]:
combos = list(itertools.combinations([(cdr, df['rmsd'].to_numpy())
                                      for cdr, df in results_loop.query("comparison == 'apo-holo'").groupby('cdr')], 2))

significance_level = 0.05 / len(combos)

loop_stats = {
    'cdr_x': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for (cdr_x, sample_x), (cdr_y, sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    loop_stats['cdr_x'].append(cdr_x)
    loop_stats['cdr_y'].append(cdr_y)

    loop_stats['statistic'].append(stat)
    loop_stats['p_val'].append(p_val)

loop_stats = pd.DataFrame(loop_stats)

loop_stats['significant'] = loop_stats['p_val'] < significance_level

loop_stats
[44]:
cdr_x cdr_y statistic p_val significant
0 1 2 -1.252739 2.103007e-01 False
1 1 3 -5.053170 4.345369e-07 True
2 2 3 -4.325099 1.524634e-05 True

The figure above shows there appears to be an increase in the movement of CDR3 compared to CDR1 and CDR2 and that the alpha chain has more movement than the beta chain of the TCR. In this alignment procedure, the CDR-1 and CDR-2 loops median drop below the noise level line, indicating that these show different effects to the CDR3 loops.

[45]:
movement_counts = results_loop.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
movement_counts = movement_counts[movement_counts > 0]

ax = movement_counts.plot.pie(title='CDR Movement from Loop Alignment',
                              autopct='%1.2f%%',
                              legend=True,
                              ylabel='',
                              labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')

plt.show()
../_images/source_Comparison_of_apo_and_holo_CDR_loops_55_0.png
[46]:
results_loop.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
[46]:
Little Movement (<0.5 Å)         83
Some Movement (0.5 to 1.0 Å)     37
Movement (1.0 to 2.0 Å)          23
Large Movement (2.0 to 4.0 Å)     6
Significant Movement (>4.0 Å)     0
Name: movement, dtype: int64

Comparison of apo-holo, apo-apo and holo-holo conformational changes

The following analysis aims to ascertain whether there is notable movement in the CDR domains between the apo and holo conformations, using apo-apo and holo-holo differences as controls.

Framework Alignment

[47]:
sns.catplot(results_framework.sort_values(['comparison', 'chain_type', 'cdr']),
            x='comparison',
            y='rmsd',
            col='cdr',
            row='chain_type',
            kind='box')
../_images/source_Comparison_of_apo_and_holo_CDR_loops_59_0.png
[48]:
sns.displot(results_framework.sort_values(['comparison', 'chain_type', 'cdr']),
            hue='comparison',
            x='rmsd',
            col='cdr',
            row='chain_type',
            kind='kde')
[48]:
<seaborn.axisgrid.FacetGrid at 0x7f9d02e1fa00>
../_images/source_Comparison_of_apo_and_holo_CDR_loops_60_1.png
[49]:
results_framework.groupby(['comparison', 'chain_type', 'cdr']).size()
[49]:
comparison  chain_type   cdr
apo-apo     alpha_chain  1      11
                         2      11
                         3      11
            beta_chain   1      11
                         2      11
                         3      11
apo-holo    alpha_chain  1      25
                         2      24
                         3      25
            beta_chain   1      25
                         2      25
                         3      25
holo-holo   alpha_chain  1      18
                         2      18
                         3      18
            beta_chain   1      18
                         2      18
                         3      18
dtype: int64
[50]:
treatment_options = ['comparison', 'chain_type', 'cdr']
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_framework.groupby(treatment_options)]
treatments
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=112.0659833804159, pvalue=4.924957542598049e-16)
[51]:
combos = []
for pairing in list(itertools.combinations(treatments, 2)):
    chain_type_x = pairing[0][0][1]
    cdr_x = pairing[0][0][2]
    chain_type_y = pairing[1][0][1]
    cdr_y = pairing[1][0][2]

    if (chain_type_x, cdr_x) == (chain_type_y, cdr_y):
        combos.append(pairing)

significance_level = 0.05 / len(combos)
print(significance_level)
statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

momvement_statistics_fw = pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})

momvement_statistics_fw['p_val'] = momvement_statistics_fw['p_val'].map(lambda num: f'{num:.2e}')

momvement_statistics_fw
0.002777777777777778
[51]:
comparison_x chain_type_x cdr_x comparison_y chain_type_y cdr_y statistic p_val significant
0 apo-apo alpha_chain 1 apo-holo alpha_chain 1 -2.558466 1.05e-02 False
1 apo-apo alpha_chain 1 holo-holo alpha_chain 1 0.404520 6.86e-01 False
2 apo-apo alpha_chain 2 apo-holo alpha_chain 2 -2.274141 2.30e-02 False
3 apo-apo alpha_chain 2 holo-holo alpha_chain 2 0.404520 6.86e-01 False
4 apo-apo alpha_chain 3 apo-holo alpha_chain 3 -2.215048 2.68e-02 False
5 apo-apo alpha_chain 3 holo-holo alpha_chain 3 2.112493 3.46e-02 False
6 apo-apo beta_chain 1 apo-holo beta_chain 1 -1.356502 1.75e-01 False
7 apo-apo beta_chain 1 holo-holo beta_chain 1 1.303453 1.92e-01 False
8 apo-apo beta_chain 2 apo-holo beta_chain 2 -0.188880 8.50e-01 False
9 apo-apo beta_chain 2 holo-holo beta_chain 2 2.382173 1.72e-02 False
10 apo-apo beta_chain 3 apo-holo beta_chain 3 -1.390844 1.64e-01 False
11 apo-apo beta_chain 3 holo-holo beta_chain 3 2.112493 3.46e-02 False
12 apo-holo alpha_chain 1 holo-holo alpha_chain 1 3.495798 4.73e-04 True
13 apo-holo alpha_chain 2 holo-holo alpha_chain 2 3.507467 4.52e-04 True
14 apo-holo alpha_chain 3 holo-holo alpha_chain 3 4.529767 5.90e-06 True
15 apo-holo beta_chain 1 holo-holo beta_chain 1 3.372707 7.44e-04 True
16 apo-holo beta_chain 2 holo-holo beta_chain 2 3.692745 2.22e-04 True
17 apo-holo beta_chain 3 holo-holo beta_chain 3 3.988164 6.66e-05 True
[52]:
combos = list(itertools.combinations(treatments, 2))

significance_level = 0.05 / len(combos)

statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})
[52]:
comparison_x chain_type_x cdr_x comparison_y chain_type_y cdr_y statistic p_val significant
0 apo-apo alpha_chain 1 apo-apo alpha_chain 2 -0.229828 0.818226 False
1 apo-apo alpha_chain 1 apo-apo alpha_chain 3 -1.214803 0.224441 False
2 apo-apo alpha_chain 1 apo-apo beta_chain 1 0.164163 0.869603 False
3 apo-apo alpha_chain 1 apo-apo beta_chain 2 -0.361158 0.717982 False
4 apo-apo alpha_chain 1 apo-apo beta_chain 3 -0.623818 0.532747 False
... ... ... ... ... ... ... ... ... ...
148 holo-holo alpha_chain 3 holo-holo beta_chain 2 0.601133 0.547751 False
149 holo-holo alpha_chain 3 holo-holo beta_chain 3 0.664411 0.506428 False
150 holo-holo beta_chain 1 holo-holo beta_chain 2 0.506218 0.612704 False
151 holo-holo beta_chain 1 holo-holo beta_chain 3 0.474579 0.635087 False
152 holo-holo beta_chain 2 holo-holo beta_chain 3 -0.158193 0.874305 False

153 rows × 9 columns

The analysis of the plots and statistical tests shows that there is a statistically significant (p-value: 0.05) difference between the target and controls (as seen by the results of the Kruskal-Wallis test) but it also shows that there is a significant difference between the apo-holo comparisons and the holo-holo baseline.

Loop Alignment

[53]:
sns.catplot(results_loop.sort_values(['comparison', 'chain_type', 'cdr']),
            x='comparison',
            y='rmsd',
            col='cdr',
            row='chain_type',
            kind='box')
../_images/source_Comparison_of_apo_and_holo_CDR_loops_67_0.png
[54]:
sns.displot(results_loop.sort_values(['comparison', 'chain_type', 'cdr']),
            x='rmsd',
            hue='comparison',
            col='cdr',
            row='chain_type',
            kind='kde')
[54]:
<seaborn.axisgrid.FacetGrid at 0x7f9d0366b1f0>
../_images/source_Comparison_of_apo_and_holo_CDR_loops_68_1.png
[55]:
results_loop.groupby(['comparison', 'chain_type', 'cdr']).size()
[55]:
comparison  chain_type   cdr
apo-apo     alpha_chain  1      11
                         2      11
                         3      11
            beta_chain   1      11
                         2      11
                         3      11
apo-holo    alpha_chain  1      25
                         2      24
                         3      25
            beta_chain   1      25
                         2      25
                         3      25
holo-holo   alpha_chain  1      18
                         2      18
                         3      18
            beta_chain   1      18
                         2      18
                         3      18
dtype: int64
[56]:
treatment_options = ['comparison', 'chain_type', 'cdr']
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_loop.groupby(treatment_options)]
treatments
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=106.74425009179026, pvalue=4.928764973630228e-15)
[57]:
combos = []
for pairing in list(itertools.combinations(treatments, 2)):
    chain_type_x = pairing[0][0][1]
    cdr_x = pairing[0][0][2]
    chain_type_y = pairing[1][0][1]
    cdr_y = pairing[1][0][2]

    if (chain_type_x, cdr_x) == (chain_type_y, cdr_y):
        combos.append(pairing)

significance_level = 0.05 / len(combos)
print(significance_level)
statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

momvement_statistics_loop = pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})

momvement_statistics_loop['p_val'] = momvement_statistics_loop['p_val'].map(lambda num: f'{num:.2e}')

momvement_statistics_loop
0.002777777777777778
[57]:
comparison_x chain_type_x cdr_x comparison_y chain_type_y cdr_y statistic p_val significant
0 apo-apo alpha_chain 1 apo-holo alpha_chain 1 -2.592808 9.52e-03 False
1 apo-apo alpha_chain 1 holo-holo alpha_chain 1 -1.663026 9.63e-02 False
2 apo-apo alpha_chain 2 apo-holo alpha_chain 2 -3.979747 6.90e-05 True
3 apo-apo alpha_chain 2 holo-holo alpha_chain 2 -2.517013 1.18e-02 False
4 apo-apo alpha_chain 3 apo-holo alpha_chain 3 -1.974655 4.83e-02 False
5 apo-apo alpha_chain 3 holo-holo alpha_chain 3 1.303453 1.92e-01 False
6 apo-apo beta_chain 1 apo-holo beta_chain 1 -1.665579 9.58e-02 False
7 apo-apo beta_chain 1 holo-holo beta_chain 1 -0.764093 4.45e-01 False
8 apo-apo beta_chain 2 apo-holo beta_chain 2 -1.150451 2.50e-01 False
9 apo-apo beta_chain 2 holo-holo beta_chain 2 0.898933 3.69e-01 False
10 apo-apo beta_chain 3 apo-holo beta_chain 3 -1.837288 6.62e-02 False
11 apo-apo beta_chain 3 holo-holo beta_chain 3 0.539360 5.90e-01 False
12 apo-holo alpha_chain 1 holo-holo alpha_chain 1 1.821754 6.85e-02 False
13 apo-holo alpha_chain 2 holo-holo alpha_chain 2 2.262062 2.37e-02 False
14 apo-holo alpha_chain 3 holo-holo alpha_chain 3 4.283584 1.84e-05 True
15 apo-holo beta_chain 1 holo-holo beta_chain 1 0.960114 3.37e-01 False
16 apo-holo beta_chain 2 holo-holo beta_chain 2 2.708013 6.77e-03 False
17 apo-holo beta_chain 3 holo-holo beta_chain 3 3.594272 3.25e-04 True
[58]:
combos = list(itertools.combinations(treatments, 2))

significance_level = 0.05 / len(combos)

statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})
[58]:
comparison_x chain_type_x cdr_x comparison_y chain_type_y cdr_y statistic p_val significant
0 apo-apo alpha_chain 1 apo-apo alpha_chain 2 0.689483 0.490520 False
1 apo-apo alpha_chain 1 apo-apo alpha_chain 3 -2.134113 0.032834 False
2 apo-apo alpha_chain 1 apo-apo beta_chain 1 0.558153 0.576740 False
3 apo-apo alpha_chain 1 apo-apo beta_chain 2 -0.820813 0.411753 False
4 apo-apo alpha_chain 1 apo-apo beta_chain 3 -1.280468 0.200381 False
... ... ... ... ... ... ... ... ... ...
148 holo-holo alpha_chain 3 holo-holo beta_chain 2 2.056509 0.039733 False
149 holo-holo alpha_chain 3 holo-holo beta_chain 3 0.727688 0.466805 False
150 holo-holo beta_chain 1 holo-holo beta_chain 2 0.284747 0.775838 False
151 holo-holo beta_chain 1 holo-holo beta_chain 3 -1.328821 0.183907 False
152 holo-holo beta_chain 2 holo-holo beta_chain 3 -1.676846 0.093573 False

153 rows × 9 columns

Similarily here, there is a significant difference between the comparison types but in this case the post-hoc testing shows a difference between the comparisons of apo-holo and holo-holo structures for CDR3 loops only.

Conclusion

Overall, this analysis has showed that all six CDR loops undergo bulk conformational changes between apo and holo states, but that only the CDR3 loops (both \(\alpha\)- and \(\beta\)-chain) deform when the loops are aligned to each other.