Comparison of apo and holo CDR loops
Introduction
The following notebook aims to quantify the movement of CDR domains in TCR variable regions between unbound (apo) and bound (holo) conformations. The holo conforomations in this case are TCRs bound to class I pMHC complexes.
[1]:
import os
import itertools
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import seaborn as sns
[2]:
DATA_DIR = '../data/processed/apo-holo-tcr-pmhc-class-I-comparisons'
[3]:
NOISE_LEVEL = 0.5 # Å
[4]:
summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')
ids = summary_df['file_name'].str.replace('.pdb', '', regex=True)
ids.name = 'id'
summary_df = summary_df.set_index(ids)
summary_df
[4]:
file_name | pdb_id | structure_type | state | alpha_chain | beta_chain | antigen_chain | mhc_chain1 | mhc_chain2 | cdr_sequences_collated | peptide_sequence | mhc_slug | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
id | ||||||||||||
1ao7_D-E-C-A-B_tcr_pmhc | 1ao7_D-E-C-A-B_tcr_pmhc.pdb | 1ao7 | tcr_pmhc | holo | D | E | C | A | B | DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA... | LLFGYPVYV | hla_a_02_01 |
1bd2_D-E-C-A-B_tcr_pmhc | 1bd2_D-E-C-A-B_tcr_pmhc.pdb | 1bd2 | tcr_pmhc | holo | D | E | C | A | B | NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG... | LLFGYPVYV | hla_a_02_01 |
1bii_A-B-P_pmhc | 1bii_A-B-P_pmhc.pdb | 1bii | pmhc | apo | NaN | NaN | P | A | B | NaN | RGPGRAFVTI | h2_dd |
1ddh_A-B-P_pmhc | 1ddh_A-B-P_pmhc.pdb | 1ddh | pmhc | apo | NaN | NaN | P | A | B | NaN | RGPGRAFVTI | h2_dd |
1duz_A-B-C_pmhc | 1duz_A-B-C_pmhc.pdb | 1duz | pmhc | apo | NaN | NaN | C | A | B | NaN | LLFGYPVYV | hla_a_02_01 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
8gon_D-E-C-A-B_tcr_pmhc | 8gon_D-E-C-A-B_tcr_pmhc.pdb | 8gon | tcr_pmhc | holo | D | E | C | A | B | TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... | NaN | NaN |
8gop_A-B_tcr | 8gop_A-B_tcr.pdb | 8gop | tcr | apo | A | B | NaN | NaN | NaN | TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR... | NaN | NaN |
8gvb_A-B-P-H-L_tcr_pmhc | 8gvb_A-B-P-H-L_tcr_pmhc.pdb | 8gvb | tcr_pmhc | holo | A | B | P | H | L | YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... | RYPLTFGW | hla_a_24_02 |
8gvg_A-B-P-H-L_tcr_pmhc | 8gvg_A-B-P-H-L_tcr_pmhc.pdb | 8gvg | tcr_pmhc | holo | A | B | P | H | L | YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD... | RFPLTFGW | hla_a_24_02 |
8gvi_A-B-P-H-L_tcr_pmhc | 8gvi_A-B-P-H-L_tcr_pmhc.pdb | 8gvi | tcr_pmhc | holo | A | B | P | H | L | YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL... | RYPLTFGW | hla_a_24_02 |
358 rows × 12 columns
Calculating RMSD between conformations
RMSD is used to compute the difference between all states of the same TCR. Comparions between the different apo forms and the differenceholo forms are also done so that these can be used as a control for the differences in apo to holo comparisons. This process is done two different ways. The first, termed framework alignment, computes the comparison between TCRs after aligning on the framework region of the proteins. This method captures the “global” conformational changes that these loops undergo including any combined rotations and deformations. The second method, termed loop alignment, does the same calculations as the first method, but after aligning the two loops being compared on their backbones. This method captures the deformation aspects of loops regardless of any hinging from the framework region.
[5]:
def categorize_movement(rmsd: float) -> str:
if rmsd < NOISE_LEVEL:
return f'Little Movement (<{NOISE_LEVEL} Å)'
if NOISE_LEVEL <= rmsd < 1.0:
return f'Some Movement ({NOISE_LEVEL} to 1.0 Å)'
if 1.0 <= rmsd < 2.0:
return 'Movement (1.0 to 2.0 Å)'
if 2.0 <= rmsd < 4.0:
return 'Large Movement (2.0 to 4.0 Å)'
if 4.0 <= rmsd:
return 'Significant Movement (>4.0 Å)'
movement_order = pd.CategoricalDtype(categories=[f'Little Movement (<{NOISE_LEVEL} Å)',
f'Some Movement ({NOISE_LEVEL} to 1.0 Å)',
'Movement (1.0 to 2.0 Å)',
'Large Movement (2.0 to 4.0 Å)',
'Significant Movement (>4.0 Å)'], ordered=True)
Framework Alignment
[6]:
results_framework = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_results.csv'))
[7]:
results_framework = results_framework.merge(
summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
how='left',
left_on='complex_id',
right_index=True,
)
[8]:
results_framework_holo_holo = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_holo.csv'))
[9]:
results_framework_holo_holo['cdr_sequences_collated'] = None
results_framework_holo_holo['mhc_slug'] = None
results_framework_holo_holo['peptide_sequence'] = None
[10]:
cdr_pattern = r'[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+'
cdr_complex_ids = results_framework_holo_holo['complex_id'].str.contains(cdr_pattern)
holo_cdr_sequences_collated = results_framework_holo_holo[cdr_complex_ids]['complex_id']
results_framework_holo_holo.loc[cdr_complex_ids, 'cdr_sequences_collated'] = holo_cdr_sequences_collated
[11]:
results_framework_holo_holo = results_framework_holo_holo.query('cdr_sequences_collated.notnull()')
[12]:
results_framework = pd.concat([results_framework, results_framework_holo_holo])
[13]:
results_framework = results_framework.merge(
summary_df[['file_name','pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
how='left',
left_on='structure_x_name',
right_on='file_name',
).merge(
summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
how='left',
left_on='structure_y_name',
right_on='file_name',
suffixes=('_x', '_y')
)
[14]:
results_framework['comparison'] = results_framework['state_x'] + '-' + results_framework['state_y']
results_framework['comparison'] = results_framework['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)
[15]:
results_framework['structure_comparison'] = results_framework.apply(
lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
axis='columns',
)
results_framework = results_framework.drop_duplicates(['structure_comparison', 'chain_type', 'cdr'])
[16]:
results_framework = results_framework.groupby(['cdr_sequences_collated', 'comparison', 'cdr', 'chain_type'])['rmsd'].mean().reset_index()
[17]:
results_framework['movement'] = results_framework['rmsd'].map(categorize_movement).astype(movement_order)
[18]:
results_framework
[18]:
cdr_sequences_collated | comparison | cdr | chain_type | rmsd | movement | |
---|---|---|---|---|---|---|
0 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 1 | alpha_chain | 1.193604 | Movement (1.0 to 2.0 Å) |
1 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 1 | beta_chain | 0.473654 | Little Movement (<0.5 Å) |
2 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 2 | alpha_chain | 0.905868 | Some Movement (0.5 to 1.0 Å) |
3 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 2 | beta_chain | 0.865963 | Some Movement (0.5 to 1.0 Å) |
4 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 3 | alpha_chain | 2.635113 | Large Movement (2.0 to 4.0 Å) |
... | ... | ... | ... | ... | ... | ... |
318 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 1 | beta_chain | 0.646786 | Some Movement (0.5 to 1.0 Å) |
319 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 2 | alpha_chain | 2.650761 | Large Movement (2.0 to 4.0 Å) |
320 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 2 | beta_chain | 0.786869 | Some Movement (0.5 to 1.0 Å) |
321 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 3 | alpha_chain | 1.956881 | Movement (1.0 to 2.0 Å) |
322 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 3 | beta_chain | 3.343748 | Large Movement (2.0 to 4.0 Å) |
323 rows × 6 columns
Loop Alignment
[19]:
results_loop = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_results.csv'))
[20]:
results_loop= results_loop.merge(
summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
how='left',
left_on='complex_id',
right_index=True,
)
[21]:
results_loop_holo_holo = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_holo.csv'))
[22]:
results_loop_holo_holo['cdr_sequences_collated'] = None
results_loop_holo_holo['mhc_slug'] = None
results_loop_holo_holo['peptide_sequence'] = None
[23]:
cdr_pattern = r'[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+'
cdr_complex_ids = results_loop_holo_holo['complex_id'].str.contains(cdr_pattern)
holo_cdr_sequences_collated = results_loop_holo_holo[cdr_complex_ids]['complex_id']
results_loop_holo_holo.loc[cdr_complex_ids, 'cdr_sequences_collated'] = holo_cdr_sequences_collated
[24]:
results_loop_holo_holo = results_loop_holo_holo.query('cdr_sequences_collated.notnull()')
[25]:
results_loop = pd.concat([results_loop, results_loop_holo_holo])
[26]:
results_loop = results_loop.merge(
summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
how='left',
left_on='structure_x_name',
right_on='file_name',
).merge(
summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
how='left',
left_on='structure_y_name',
right_on='file_name',
suffixes=('_x', '_y')
)
[27]:
results_loop['comparison'] = results_loop['state_x'] + '-' + results_loop['state_y']
results_loop['comparison'] = results_loop['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)
[28]:
results_loop['structure_comparison'] = results_loop.apply(
lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
axis='columns',
)
results_loop = results_loop.drop_duplicates(['structure_comparison', 'chain_type', 'cdr'])
[29]:
results_loop = results_loop.groupby(['cdr_sequences_collated', 'comparison', 'cdr', 'chain_type'])['rmsd'].mean().reset_index()
[30]:
results_loop['movement'] = results_loop['rmsd'].map(categorize_movement).astype(movement_order)
[31]:
results_loop
[31]:
cdr_sequences_collated | comparison | cdr | chain_type | rmsd | movement | |
---|---|---|---|---|---|---|
0 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 1 | alpha_chain | 0.801538 | Some Movement (0.5 to 1.0 Å) |
1 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 1 | beta_chain | 0.239049 | Little Movement (<0.5 Å) |
2 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 2 | alpha_chain | 0.725440 | Some Movement (0.5 to 1.0 Å) |
3 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 2 | beta_chain | 0.774079 | Some Movement (0.5 to 1.0 Å) |
4 | ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR... | apo-holo | 3 | alpha_chain | 1.210009 | Movement (1.0 to 2.0 Å) |
... | ... | ... | ... | ... | ... | ... |
318 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 1 | beta_chain | 0.113593 | Little Movement (<0.5 Å) |
319 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 2 | alpha_chain | 1.082823 | Movement (1.0 to 2.0 Å) |
320 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 2 | beta_chain | 0.201656 | Little Movement (<0.5 Å) |
321 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 3 | alpha_chain | 1.475462 | Movement (1.0 to 2.0 Å) |
322 | YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA... | apo-holo | 3 | beta_chain | 2.503307 | Large Movement (2.0 to 4.0 Å) |
323 rows × 6 columns
Visualizing and analysing the results
Measuring the distance between apo and holo confomations
Since RMSD was calculated for all pairings of the same TCR, the apo-holo (or holo-apo) comparisons were selected.
Framework Alignment
[32]:
results_framework['chain_type_formatted'] = results_framework['chain_type'].str.replace('_', ' ').str.title()
[33]:
sns.boxplot(data=results_framework.query("comparison == 'apo-holo'"), y='rmsd', x='cdr', hue='chain_type_formatted')
x = np.linspace(-0.75, 2.75)
y = np.repeat(NOISE_LEVEL, len(x))
plt.plot(x, y, '--r', label=f'Noise Level ({NOISE_LEVEL: .2f} Å)')
plt.xlim(-0.75, 2.75)
plt.legend(title='Chain Type')
plt.xlabel('CDR')
plt.ylabel('RMSD (Å)')
plt.show()
results_framework.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])['rmsd'].describe()
[33]:
count | mean | std | min | 25% | 50% | 75% | max | ||
---|---|---|---|---|---|---|---|---|---|
chain_type | cdr | ||||||||
alpha_chain | 1 | 25.0 | 1.625591 | 1.126121 | 0.172478 | 0.849734 | 1.308559 | 2.056843 | 5.547374 |
2 | 24.0 | 1.360652 | 0.725418 | 0.496448 | 0.894783 | 1.121992 | 1.509622 | 3.170339 | |
3 | 25.0 | 2.205034 | 1.666278 | 0.157864 | 1.124128 | 1.678234 | 2.635113 | 7.883364 | |
beta_chain | 1 | 25.0 | 0.818192 | 0.428709 | 0.211640 | 0.504221 | 0.770227 | 0.983855 | 2.106660 |
2 | 25.0 | 0.879987 | 0.718325 | 0.311457 | 0.526078 | 0.705481 | 0.947803 | 3.928769 | |
3 | 25.0 | 1.375520 | 0.977938 | 0.142801 | 0.705002 | 1.001740 | 1.823828 | 3.820573 |
[34]:
sns.displot(results_framework.query("comparison == 'apo-holo'"),
x='rmsd', col='cdr', row='chain_type_formatted')
[34]:
<seaborn.axisgrid.FacetGrid at 0x7f9d0735eb30>
[35]:
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_framework.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])]
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=33.47222908277405, pvalue=3.0322000362602383e-06)
[36]:
combos = [(((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y))
for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in itertools.combinations(treatments, 2)
if cdr_x == cdr_y]
significance_level = 0.05 / len(combos)
framework_stats = {
'chain_type_x': [],
'cdr_x': [],
'chain_type_y': [],
'cdr_y': [],
'statistic': [],
'p_val': [],
}
for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
framework_stats['chain_type_x'].append(chain_x)
framework_stats['cdr_x'].append(cdr_x)
framework_stats['chain_type_y'].append(chain_y)
framework_stats['cdr_y'].append(cdr_y)
framework_stats['statistic'].append(stat)
framework_stats['p_val'].append(p_val)
framework_stats = pd.DataFrame(framework_stats)
framework_stats['significant'] = framework_stats['p_val'] < significance_level
framework_stats
[36]:
chain_type_x | cdr_x | chain_type_y | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|---|---|
0 | alpha_chain | 1 | beta_chain | 1 | 3.346992 | 0.000817 | True |
1 | alpha_chain | 2 | beta_chain | 2 | 3.380000 | 0.000725 | True |
2 | alpha_chain | 3 | beta_chain | 3 | 2.202223 | 0.027650 | False |
[37]:
combos = list(itertools.combinations([(cdr, df['rmsd'].to_numpy())
for cdr, df in results_framework.query("comparison == 'apo-holo'").groupby('cdr')], 2))
significance_level = 0.05 / len(combos)
framework_stats = {
'cdr_x': [],
'cdr_y': [],
'statistic': [],
'p_val': [],
}
for (cdr_x, sample_x), (cdr_y, sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
framework_stats['cdr_x'].append(cdr_x)
framework_stats['cdr_y'].append(cdr_y)
framework_stats['statistic'].append(stat)
framework_stats['p_val'].append(p_val)
framework_stats = pd.DataFrame(framework_stats)
framework_stats['significant'] = framework_stats['p_val'] < significance_level
framework_stats
[37]:
cdr_x | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|
0 | 1 | 2 | 0.335930 | 0.736924 | False |
1 | 1 | 3 | -2.343899 | 0.019083 | False |
2 | 2 | 3 | -2.820412 | 0.004796 | True |
The figure above shows there appears to be an increase in the movement of CDR3 compared to CDR1 and CDR2 and that the alpha chain has more movement than the beta chain of the TCR.
[38]:
movement_counts = results_framework.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
movement_counts = movement_counts[movement_counts > 0]
ax = movement_counts.plot.pie(title='CDR Movement from Framework Alignment',
autopct='%1.2f%%',
legend=True,
ylabel='',
labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()
Loop Alignment
[39]:
results_loop['chain_type_formatted'] = results_loop['chain_type'].str.replace('_', ' ').str.title()
[40]:
sns.boxplot(data=results_loop.query("comparison == 'apo-holo'"), y='rmsd', x='cdr', hue='chain_type_formatted')
x = np.linspace(-0.75, 2.75)
y = np.repeat(NOISE_LEVEL, len(x))
plt.plot(x, y, '--r', label=f'Noise Level ({NOISE_LEVEL: .2f} Å)')
plt.xlim(-0.75, 2.75)
plt.legend(title='Chain Type')
plt.xlabel('CDR')
plt.ylabel('RMSD (Å)')
plt.show()
results_loop.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])['rmsd'].describe()
[40]:
count | mean | std | min | 25% | 50% | 75% | max | ||
---|---|---|---|---|---|---|---|---|---|
chain_type | cdr | ||||||||
alpha_chain | 1 | 25.0 | 0.539703 | 0.370003 | 0.037604 | 0.265344 | 0.398626 | 0.865431 | 1.317612 |
2 | 24.0 | 0.644121 | 0.446783 | 0.215395 | 0.294647 | 0.439994 | 0.979735 | 1.684088 | |
3 | 25.0 | 1.121232 | 0.650441 | 0.077000 | 0.584030 | 1.083251 | 1.423754 | 2.495455 | |
beta_chain | 1 | 25.0 | 0.279281 | 0.184982 | 0.110227 | 0.149009 | 0.214492 | 0.363066 | 0.744923 |
2 | 25.0 | 0.339790 | 0.232802 | 0.056751 | 0.191979 | 0.248698 | 0.398983 | 0.988705 | |
3 | 25.0 | 0.886316 | 0.691293 | 0.102327 | 0.341379 | 0.693077 | 1.220448 | 2.503307 |
[41]:
sns.displot(results_loop.query("comparison == 'apo-holo'"),
x='rmsd', col='cdr', row='chain_type_formatted')
[41]:
<seaborn.axisgrid.FacetGrid at 0x7f9d03eedc30>
[42]:
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_loop.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])]
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=48.69824429530195, pvalue=2.5578145006489472e-09)
[43]:
combos = [(((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y))
for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in itertools.combinations(treatments, 2)
if cdr_x == cdr_y]
significance_level = 0.05 / len(combos)
loop_stats = {
'chain_x': [],
'cdr_x': [],
'chain_y': [],
'cdr_y': [],
'statistic': [],
'p_val': [],
}
for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
loop_stats['chain_x'].append(chain_x)
loop_stats['cdr_x'].append(cdr_x)
loop_stats['chain_y'].append(chain_y)
loop_stats['cdr_y'].append(cdr_y)
loop_stats['statistic'].append(stat)
loop_stats['p_val'].append(p_val)
loop_stats = pd.DataFrame(loop_stats)
loop_stats['significant'] = loop_stats['p_val'] < significance_level
loop_stats
[43]:
chain_x | cdr_x | chain_y | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|---|---|
0 | alpha_chain | 1 | beta_chain | 1 | 3.172366 | 0.001512 | True |
1 | alpha_chain | 2 | beta_chain | 2 | 3.120000 | 0.001809 | True |
2 | alpha_chain | 3 | beta_chain | 3 | 1.600735 | 0.109436 | False |
[44]:
combos = list(itertools.combinations([(cdr, df['rmsd'].to_numpy())
for cdr, df in results_loop.query("comparison == 'apo-holo'").groupby('cdr')], 2))
significance_level = 0.05 / len(combos)
loop_stats = {
'cdr_x': [],
'cdr_y': [],
'statistic': [],
'p_val': [],
}
for (cdr_x, sample_x), (cdr_y, sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
loop_stats['cdr_x'].append(cdr_x)
loop_stats['cdr_y'].append(cdr_y)
loop_stats['statistic'].append(stat)
loop_stats['p_val'].append(p_val)
loop_stats = pd.DataFrame(loop_stats)
loop_stats['significant'] = loop_stats['p_val'] < significance_level
loop_stats
[44]:
cdr_x | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|
0 | 1 | 2 | -1.252739 | 2.103007e-01 | False |
1 | 1 | 3 | -5.053170 | 4.345369e-07 | True |
2 | 2 | 3 | -4.325099 | 1.524634e-05 | True |
The figure above shows there appears to be an increase in the movement of CDR3 compared to CDR1 and CDR2 and that the alpha chain has more movement than the beta chain of the TCR. In this alignment procedure, the CDR-1 and CDR-2 loops median drop below the noise level line, indicating that these show different effects to the CDR3 loops.
[45]:
movement_counts = results_loop.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
movement_counts = movement_counts[movement_counts > 0]
ax = movement_counts.plot.pie(title='CDR Movement from Loop Alignment',
autopct='%1.2f%%',
legend=True,
ylabel='',
labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')
plt.show()
[46]:
results_loop.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
[46]:
Little Movement (<0.5 Å) 83
Some Movement (0.5 to 1.0 Å) 37
Movement (1.0 to 2.0 Å) 23
Large Movement (2.0 to 4.0 Å) 6
Significant Movement (>4.0 Å) 0
Name: movement, dtype: int64
Comparison of apo-holo, apo-apo and holo-holo conformational changes
The following analysis aims to ascertain whether there is notable movement in the CDR domains between the apo and holo conformations, using apo-apo and holo-holo differences as controls.
Framework Alignment
[47]:
sns.catplot(results_framework.sort_values(['comparison', 'chain_type', 'cdr']),
x='comparison',
y='rmsd',
col='cdr',
row='chain_type',
kind='box')
[48]:
sns.displot(results_framework.sort_values(['comparison', 'chain_type', 'cdr']),
hue='comparison',
x='rmsd',
col='cdr',
row='chain_type',
kind='kde')
[48]:
<seaborn.axisgrid.FacetGrid at 0x7f9d02e1fa00>
[49]:
results_framework.groupby(['comparison', 'chain_type', 'cdr']).size()
[49]:
comparison chain_type cdr
apo-apo alpha_chain 1 11
2 11
3 11
beta_chain 1 11
2 11
3 11
apo-holo alpha_chain 1 25
2 24
3 25
beta_chain 1 25
2 25
3 25
holo-holo alpha_chain 1 18
2 18
3 18
beta_chain 1 18
2 18
3 18
dtype: int64
[50]:
treatment_options = ['comparison', 'chain_type', 'cdr']
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_framework.groupby(treatment_options)]
treatments
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=112.0659833804159, pvalue=4.924957542598049e-16)
[51]:
combos = []
for pairing in list(itertools.combinations(treatments, 2)):
chain_type_x = pairing[0][0][1]
cdr_x = pairing[0][0][2]
chain_type_y = pairing[1][0][1]
cdr_y = pairing[1][0][2]
if (chain_type_x, cdr_x) == (chain_type_y, cdr_y):
combos.append(pairing)
significance_level = 0.05 / len(combos)
print(significance_level)
statistics = []
p_vals = []
for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
statistics.append(stat)
p_vals.append(p_val)
momvement_statistics_fw = pd.DataFrame({
'comparison_x': [name for ((name, _, _), _), _ in combos],
'chain_type_x': [name for ((_, name, _), _), _ in combos],
'cdr_x': [name for ((_, _, name), _), _ in combos],
'comparison_y': [name for _, ((name, _, _), _) in combos],
'chain_type_y': [name for _, ((_, name, _), _) in combos],
'cdr_y': [name for _, ((_, _, name), _) in combos],
'statistic': statistics,
'p_val': p_vals,
'significant': [p_val < significance_level for p_val in p_vals],
})
momvement_statistics_fw['p_val'] = momvement_statistics_fw['p_val'].map(lambda num: f'{num:.2e}')
momvement_statistics_fw
0.002777777777777778
[51]:
comparison_x | chain_type_x | cdr_x | comparison_y | chain_type_y | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|---|---|---|---|
0 | apo-apo | alpha_chain | 1 | apo-holo | alpha_chain | 1 | -2.558466 | 1.05e-02 | False |
1 | apo-apo | alpha_chain | 1 | holo-holo | alpha_chain | 1 | 0.404520 | 6.86e-01 | False |
2 | apo-apo | alpha_chain | 2 | apo-holo | alpha_chain | 2 | -2.274141 | 2.30e-02 | False |
3 | apo-apo | alpha_chain | 2 | holo-holo | alpha_chain | 2 | 0.404520 | 6.86e-01 | False |
4 | apo-apo | alpha_chain | 3 | apo-holo | alpha_chain | 3 | -2.215048 | 2.68e-02 | False |
5 | apo-apo | alpha_chain | 3 | holo-holo | alpha_chain | 3 | 2.112493 | 3.46e-02 | False |
6 | apo-apo | beta_chain | 1 | apo-holo | beta_chain | 1 | -1.356502 | 1.75e-01 | False |
7 | apo-apo | beta_chain | 1 | holo-holo | beta_chain | 1 | 1.303453 | 1.92e-01 | False |
8 | apo-apo | beta_chain | 2 | apo-holo | beta_chain | 2 | -0.188880 | 8.50e-01 | False |
9 | apo-apo | beta_chain | 2 | holo-holo | beta_chain | 2 | 2.382173 | 1.72e-02 | False |
10 | apo-apo | beta_chain | 3 | apo-holo | beta_chain | 3 | -1.390844 | 1.64e-01 | False |
11 | apo-apo | beta_chain | 3 | holo-holo | beta_chain | 3 | 2.112493 | 3.46e-02 | False |
12 | apo-holo | alpha_chain | 1 | holo-holo | alpha_chain | 1 | 3.495798 | 4.73e-04 | True |
13 | apo-holo | alpha_chain | 2 | holo-holo | alpha_chain | 2 | 3.507467 | 4.52e-04 | True |
14 | apo-holo | alpha_chain | 3 | holo-holo | alpha_chain | 3 | 4.529767 | 5.90e-06 | True |
15 | apo-holo | beta_chain | 1 | holo-holo | beta_chain | 1 | 3.372707 | 7.44e-04 | True |
16 | apo-holo | beta_chain | 2 | holo-holo | beta_chain | 2 | 3.692745 | 2.22e-04 | True |
17 | apo-holo | beta_chain | 3 | holo-holo | beta_chain | 3 | 3.988164 | 6.66e-05 | True |
[52]:
combos = list(itertools.combinations(treatments, 2))
significance_level = 0.05 / len(combos)
statistics = []
p_vals = []
for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
statistics.append(stat)
p_vals.append(p_val)
pd.DataFrame({
'comparison_x': [name for ((name, _, _), _), _ in combos],
'chain_type_x': [name for ((_, name, _), _), _ in combos],
'cdr_x': [name for ((_, _, name), _), _ in combos],
'comparison_y': [name for _, ((name, _, _), _) in combos],
'chain_type_y': [name for _, ((_, name, _), _) in combos],
'cdr_y': [name for _, ((_, _, name), _) in combos],
'statistic': statistics,
'p_val': p_vals,
'significant': [p_val < significance_level for p_val in p_vals],
})
[52]:
comparison_x | chain_type_x | cdr_x | comparison_y | chain_type_y | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|---|---|---|---|
0 | apo-apo | alpha_chain | 1 | apo-apo | alpha_chain | 2 | -0.229828 | 0.818226 | False |
1 | apo-apo | alpha_chain | 1 | apo-apo | alpha_chain | 3 | -1.214803 | 0.224441 | False |
2 | apo-apo | alpha_chain | 1 | apo-apo | beta_chain | 1 | 0.164163 | 0.869603 | False |
3 | apo-apo | alpha_chain | 1 | apo-apo | beta_chain | 2 | -0.361158 | 0.717982 | False |
4 | apo-apo | alpha_chain | 1 | apo-apo | beta_chain | 3 | -0.623818 | 0.532747 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
148 | holo-holo | alpha_chain | 3 | holo-holo | beta_chain | 2 | 0.601133 | 0.547751 | False |
149 | holo-holo | alpha_chain | 3 | holo-holo | beta_chain | 3 | 0.664411 | 0.506428 | False |
150 | holo-holo | beta_chain | 1 | holo-holo | beta_chain | 2 | 0.506218 | 0.612704 | False |
151 | holo-holo | beta_chain | 1 | holo-holo | beta_chain | 3 | 0.474579 | 0.635087 | False |
152 | holo-holo | beta_chain | 2 | holo-holo | beta_chain | 3 | -0.158193 | 0.874305 | False |
153 rows × 9 columns
The analysis of the plots and statistical tests shows that there is a statistically significant (p-value: 0.05) difference between the target and controls (as seen by the results of the Kruskal-Wallis test) but it also shows that there is a significant difference between the apo-holo comparisons and the holo-holo baseline.
Loop Alignment
[53]:
sns.catplot(results_loop.sort_values(['comparison', 'chain_type', 'cdr']),
x='comparison',
y='rmsd',
col='cdr',
row='chain_type',
kind='box')
[54]:
sns.displot(results_loop.sort_values(['comparison', 'chain_type', 'cdr']),
x='rmsd',
hue='comparison',
col='cdr',
row='chain_type',
kind='kde')
[54]:
<seaborn.axisgrid.FacetGrid at 0x7f9d0366b1f0>
[55]:
results_loop.groupby(['comparison', 'chain_type', 'cdr']).size()
[55]:
comparison chain_type cdr
apo-apo alpha_chain 1 11
2 11
3 11
beta_chain 1 11
2 11
3 11
apo-holo alpha_chain 1 25
2 24
3 25
beta_chain 1 25
2 25
3 25
holo-holo alpha_chain 1 18
2 18
3 18
beta_chain 1 18
2 18
3 18
dtype: int64
[56]:
treatment_options = ['comparison', 'chain_type', 'cdr']
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_loop.groupby(treatment_options)]
treatments
print(scipy.stats.kruskal(*[values for _, values in treatments]))
KruskalResult(statistic=106.74425009179026, pvalue=4.928764973630228e-15)
[57]:
combos = []
for pairing in list(itertools.combinations(treatments, 2)):
chain_type_x = pairing[0][0][1]
cdr_x = pairing[0][0][2]
chain_type_y = pairing[1][0][1]
cdr_y = pairing[1][0][2]
if (chain_type_x, cdr_x) == (chain_type_y, cdr_y):
combos.append(pairing)
significance_level = 0.05 / len(combos)
print(significance_level)
statistics = []
p_vals = []
for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
statistics.append(stat)
p_vals.append(p_val)
momvement_statistics_loop = pd.DataFrame({
'comparison_x': [name for ((name, _, _), _), _ in combos],
'chain_type_x': [name for ((_, name, _), _), _ in combos],
'cdr_x': [name for ((_, _, name), _), _ in combos],
'comparison_y': [name for _, ((name, _, _), _) in combos],
'chain_type_y': [name for _, ((_, name, _), _) in combos],
'cdr_y': [name for _, ((_, _, name), _) in combos],
'statistic': statistics,
'p_val': p_vals,
'significant': [p_val < significance_level for p_val in p_vals],
})
momvement_statistics_loop['p_val'] = momvement_statistics_loop['p_val'].map(lambda num: f'{num:.2e}')
momvement_statistics_loop
0.002777777777777778
[57]:
comparison_x | chain_type_x | cdr_x | comparison_y | chain_type_y | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|---|---|---|---|
0 | apo-apo | alpha_chain | 1 | apo-holo | alpha_chain | 1 | -2.592808 | 9.52e-03 | False |
1 | apo-apo | alpha_chain | 1 | holo-holo | alpha_chain | 1 | -1.663026 | 9.63e-02 | False |
2 | apo-apo | alpha_chain | 2 | apo-holo | alpha_chain | 2 | -3.979747 | 6.90e-05 | True |
3 | apo-apo | alpha_chain | 2 | holo-holo | alpha_chain | 2 | -2.517013 | 1.18e-02 | False |
4 | apo-apo | alpha_chain | 3 | apo-holo | alpha_chain | 3 | -1.974655 | 4.83e-02 | False |
5 | apo-apo | alpha_chain | 3 | holo-holo | alpha_chain | 3 | 1.303453 | 1.92e-01 | False |
6 | apo-apo | beta_chain | 1 | apo-holo | beta_chain | 1 | -1.665579 | 9.58e-02 | False |
7 | apo-apo | beta_chain | 1 | holo-holo | beta_chain | 1 | -0.764093 | 4.45e-01 | False |
8 | apo-apo | beta_chain | 2 | apo-holo | beta_chain | 2 | -1.150451 | 2.50e-01 | False |
9 | apo-apo | beta_chain | 2 | holo-holo | beta_chain | 2 | 0.898933 | 3.69e-01 | False |
10 | apo-apo | beta_chain | 3 | apo-holo | beta_chain | 3 | -1.837288 | 6.62e-02 | False |
11 | apo-apo | beta_chain | 3 | holo-holo | beta_chain | 3 | 0.539360 | 5.90e-01 | False |
12 | apo-holo | alpha_chain | 1 | holo-holo | alpha_chain | 1 | 1.821754 | 6.85e-02 | False |
13 | apo-holo | alpha_chain | 2 | holo-holo | alpha_chain | 2 | 2.262062 | 2.37e-02 | False |
14 | apo-holo | alpha_chain | 3 | holo-holo | alpha_chain | 3 | 4.283584 | 1.84e-05 | True |
15 | apo-holo | beta_chain | 1 | holo-holo | beta_chain | 1 | 0.960114 | 3.37e-01 | False |
16 | apo-holo | beta_chain | 2 | holo-holo | beta_chain | 2 | 2.708013 | 6.77e-03 | False |
17 | apo-holo | beta_chain | 3 | holo-holo | beta_chain | 3 | 3.594272 | 3.25e-04 | True |
[58]:
combos = list(itertools.combinations(treatments, 2))
significance_level = 0.05 / len(combos)
statistics = []
p_vals = []
for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
stat, p_val = scipy.stats.ranksums(sample_x, sample_y)
statistics.append(stat)
p_vals.append(p_val)
pd.DataFrame({
'comparison_x': [name for ((name, _, _), _), _ in combos],
'chain_type_x': [name for ((_, name, _), _), _ in combos],
'cdr_x': [name for ((_, _, name), _), _ in combos],
'comparison_y': [name for _, ((name, _, _), _) in combos],
'chain_type_y': [name for _, ((_, name, _), _) in combos],
'cdr_y': [name for _, ((_, _, name), _) in combos],
'statistic': statistics,
'p_val': p_vals,
'significant': [p_val < significance_level for p_val in p_vals],
})
[58]:
comparison_x | chain_type_x | cdr_x | comparison_y | chain_type_y | cdr_y | statistic | p_val | significant | |
---|---|---|---|---|---|---|---|---|---|
0 | apo-apo | alpha_chain | 1 | apo-apo | alpha_chain | 2 | 0.689483 | 0.490520 | False |
1 | apo-apo | alpha_chain | 1 | apo-apo | alpha_chain | 3 | -2.134113 | 0.032834 | False |
2 | apo-apo | alpha_chain | 1 | apo-apo | beta_chain | 1 | 0.558153 | 0.576740 | False |
3 | apo-apo | alpha_chain | 1 | apo-apo | beta_chain | 2 | -0.820813 | 0.411753 | False |
4 | apo-apo | alpha_chain | 1 | apo-apo | beta_chain | 3 | -1.280468 | 0.200381 | False |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
148 | holo-holo | alpha_chain | 3 | holo-holo | beta_chain | 2 | 2.056509 | 0.039733 | False |
149 | holo-holo | alpha_chain | 3 | holo-holo | beta_chain | 3 | 0.727688 | 0.466805 | False |
150 | holo-holo | beta_chain | 1 | holo-holo | beta_chain | 2 | 0.284747 | 0.775838 | False |
151 | holo-holo | beta_chain | 1 | holo-holo | beta_chain | 3 | -1.328821 | 0.183907 | False |
152 | holo-holo | beta_chain | 2 | holo-holo | beta_chain | 3 | -1.676846 | 0.093573 | False |
153 rows × 9 columns
Similarily here, there is a significant difference between the comparison types but in this case the post-hoc testing shows a difference between the comparisons of apo-holo and holo-holo structures for CDR3 loops only.
Conclusion
Overall, this analysis has showed that all six CDR loops undergo bulk conformational changes between apo and holo states, but that only the CDR3 loops (both \(\alpha\)- and \(\beta\)-chain) deform when the loops are aligned to each other.