Comparison of apo and holo CDR loops

Introduction

The following notebook aims to quantify the movement of CDR domains in TCR variable regions between unbound (apo) and bound (holo) conformations. The holo conforomations in this case are TCRs bound to class I pMHC complexes.

[1]:

import os
import itertools

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import seaborn as sns

[2]:

DATA_DIR = '../data/processed/apo-holo-tcr-pmhc-class-I-comparisons'

[3]:

NOISE_LEVEL = 0.5 # Å

[4]:

summary_df = pd.read_csv('../data/processed/apo-holo-tcr-pmhc-class-I/apo_holo_summary.csv')

ids = summary_df['file_name'].str.replace('.pdb', '', regex=True)
ids.name = 'id'
summary_df = summary_df.set_index(ids)

summary_df

[4]:

	file_name	pdb_id	structure_type	state	alpha_chain	beta_chain	antigen_chain	mhc_chain1	mhc_chain2	cdr_sequences_collated	peptide_sequence	mhc_slug
id
1ao7_D-E-C-A-B_tcr_pmhc	1ao7_D-E-C-A-B_tcr_pmhc.pdb	1ao7	tcr_pmhc	holo	D	E	C	A	B	DRGSQS-IYSNGD-AVTTDSWGKLQ-MNHEY-SVGAGI-ASRPGLA...	LLFGYPVYV	hla_a_02_01
1bd2_D-E-C-A-B_tcr_pmhc	1bd2_D-E-C-A-B_tcr_pmhc.pdb	1bd2	tcr_pmhc	holo	D	E	C	A	B	NSMFDY-ISSIKDK-AAMEGAQKLV-MNHEY-SVGAGI-ASSYPGG...	LLFGYPVYV	hla_a_02_01
1bii_A-B-P_pmhc	1bii_A-B-P_pmhc.pdb	1bii	pmhc	apo	NaN	NaN	P	A	B	NaN	RGPGRAFVTI	h2_dd
1ddh_A-B-P_pmhc	1ddh_A-B-P_pmhc.pdb	1ddh	pmhc	apo	NaN	NaN	P	A	B	NaN	RGPGRAFVTI	h2_dd
1duz_A-B-C_pmhc	1duz_A-B-C_pmhc.pdb	1duz	pmhc	apo	NaN	NaN	C	A	B	NaN	LLFGYPVYV	hla_a_02_01
...	...	...	...	...	...	...	...	...	...	...	...	...
8gon_D-E-C-A-B_tcr_pmhc	8gon_D-E-C-A-B_tcr_pmhc.pdb	8gon	tcr_pmhc	holo	D	E	C	A	B	TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR...	NaN	NaN
8gop_A-B_tcr	8gop_A-B_tcr.pdb	8gop	tcr	apo	A	B	NaN	NaN	NaN	TSESDYY-QEAYKQQN-ASSGNTPLV-SGHNS-FNNNVP-ASTWGR...	NaN	NaN
8gvb_A-B-P-H-L_tcr_pmhc	8gvb_A-B-P-H-L_tcr_pmhc.pdb	8gvb	tcr_pmhc	holo	A	B	P	H	L	YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD...	RYPLTFGW	hla_a_24_02
8gvg_A-B-P-H-L_tcr_pmhc	8gvg_A-B-P-H-L_tcr_pmhc.pdb	8gvg	tcr_pmhc	holo	A	B	P	H	L	YGATPY-YFSGDTLV-AVGFTGGGNKLT-SEHNR-FQNEAQ-ASSD...	RFPLTFGW	hla_a_24_02
8gvi_A-B-P-H-L_tcr_pmhc	8gvi_A-B-P-H-L_tcr_pmhc.pdb	8gvi	tcr_pmhc	holo	A	B	P	H	L	YGATPY-YFSGDTLV-AVVFTGGGNKLT-SEHNR-FQNEAQ-ASSL...	RYPLTFGW	hla_a_24_02

358 rows × 12 columns

Calculating RMSD between conformations

RMSD is used to compute the difference between all states of the same TCR. Comparions between the different apo forms and the differenceholo forms are also done so that these can be used as a control for the differences in apo to holo comparisons. This process is done two different ways. The first, termed framework alignment, computes the comparison between TCRs after aligning on the framework region of the proteins. This method captures the “global” conformational changes that these loops undergo including any combined rotations and deformations. The second method, termed loop alignment, does the same calculations as the first method, but after aligning the two loops being compared on their backbones. This method captures the deformation aspects of loops regardless of any hinging from the framework region.

[5]:

def categorize_movement(rmsd: float) -> str:
    if rmsd < NOISE_LEVEL:
        return f'Little Movement (<{NOISE_LEVEL} Å)'

    if NOISE_LEVEL <= rmsd < 1.0:
        return f'Some Movement ({NOISE_LEVEL} to 1.0 Å)'

    if 1.0 <= rmsd < 2.0:
        return 'Movement (1.0 to 2.0 Å)'

    if 2.0 <= rmsd < 4.0:
        return 'Large Movement (2.0 to 4.0 Å)'

    if 4.0 <= rmsd:
        return 'Significant Movement (>4.0 Å)'


movement_order = pd.CategoricalDtype(categories=[f'Little Movement (<{NOISE_LEVEL} Å)',
                                                 f'Some Movement ({NOISE_LEVEL} to 1.0 Å)',
                                                 'Movement (1.0 to 2.0 Å)',
                                                 'Large Movement (2.0 to 4.0 Å)',
                                                 'Significant Movement (>4.0 Å)'], ordered=True)

Framework Alignment

[6]:

results_framework = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_results.csv'))

[7]:

results_framework = results_framework.merge(
    summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
    how='left',
    left_on='complex_id',
    right_index=True,
)

[8]:

results_framework_holo_holo = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_fw_align_holo.csv'))

[9]:

results_framework_holo_holo['cdr_sequences_collated'] = None

results_framework_holo_holo['mhc_slug'] = None
results_framework_holo_holo['peptide_sequence'] = None

[10]:

cdr_pattern = r'[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+'
cdr_complex_ids = results_framework_holo_holo['complex_id'].str.contains(cdr_pattern)

holo_cdr_sequences_collated = results_framework_holo_holo[cdr_complex_ids]['complex_id']
results_framework_holo_holo.loc[cdr_complex_ids, 'cdr_sequences_collated'] = holo_cdr_sequences_collated

[11]:

results_framework_holo_holo = results_framework_holo_holo.query('cdr_sequences_collated.notnull()')

[12]:

results_framework = pd.concat([results_framework, results_framework_holo_holo])

[13]:

results_framework = results_framework.merge(
    summary_df[['file_name','pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
    suffixes=('_x', '_y')
)

[14]:

results_framework['comparison'] = results_framework['state_x'] + '-' + results_framework['state_y']
results_framework['comparison'] = results_framework['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)

[15]:

results_framework['structure_comparison'] = results_framework.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_framework = results_framework.drop_duplicates(['structure_comparison', 'chain_type', 'cdr'])

[16]:

results_framework = results_framework.groupby(['cdr_sequences_collated', 'comparison', 'cdr', 'chain_type'])['rmsd'].mean().reset_index()

[17]:

results_framework['movement'] = results_framework['rmsd'].map(categorize_movement).astype(movement_order)

[18]:

results_framework

[18]:

	cdr_sequences_collated	comparison	cdr	chain_type	rmsd	movement
0	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	1	alpha_chain	1.193604	Movement (1.0 to 2.0 Å)
1	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	1	beta_chain	0.473654	Little Movement (<0.5 Å)
2	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	2	alpha_chain	0.905868	Some Movement (0.5 to 1.0 Å)
3	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	2	beta_chain	0.865963	Some Movement (0.5 to 1.0 Å)
4	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	3	alpha_chain	2.635113	Large Movement (2.0 to 4.0 Å)
...	...	...	...	...	...	...
318	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	1	beta_chain	0.646786	Some Movement (0.5 to 1.0 Å)
319	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	2	alpha_chain	2.650761	Large Movement (2.0 to 4.0 Å)
320	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	2	beta_chain	0.786869	Some Movement (0.5 to 1.0 Å)
321	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	3	alpha_chain	1.956881	Movement (1.0 to 2.0 Å)
322	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	3	beta_chain	3.343748	Large Movement (2.0 to 4.0 Å)

323 rows × 6 columns

Loop Alignment

[19]:

results_loop = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_results.csv'))

[20]:

results_loop= results_loop.merge(
    summary_df[['cdr_sequences_collated', 'peptide_sequence', 'mhc_slug']],
    how='left',
    left_on='complex_id',
    right_index=True,
)

[21]:

results_loop_holo_holo = pd.read_csv(os.path.join(DATA_DIR, 'rmsd_cdr_loop_align_holo.csv'))

[22]:

results_loop_holo_holo['cdr_sequences_collated'] = None

results_loop_holo_holo['mhc_slug'] = None
results_loop_holo_holo['peptide_sequence'] = None

[23]:

cdr_pattern = r'[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+-[A-Z]+'
cdr_complex_ids = results_loop_holo_holo['complex_id'].str.contains(cdr_pattern)

holo_cdr_sequences_collated = results_loop_holo_holo[cdr_complex_ids]['complex_id']
results_loop_holo_holo.loc[cdr_complex_ids, 'cdr_sequences_collated'] = holo_cdr_sequences_collated

[24]:

results_loop_holo_holo = results_loop_holo_holo.query('cdr_sequences_collated.notnull()')

[25]:

results_loop = pd.concat([results_loop, results_loop_holo_holo])

[26]:

results_loop = results_loop.merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_x_name',
    right_on='file_name',
).merge(
    summary_df[['file_name', 'pdb_id', 'structure_type', 'state', 'alpha_chain', 'beta_chain', 'antigen_chain', 'mhc_chain1', 'mhc_chain2']],
    how='left',
    left_on='structure_y_name',
    right_on='file_name',
    suffixes=('_x', '_y')
)

[27]:

results_loop['comparison'] = results_loop['state_x'] + '-' + results_loop['state_y']
results_loop['comparison'] = results_loop['comparison'].map(lambda entry: 'apo-holo' if entry == 'holo-apo' else entry)

[28]:

results_loop['structure_comparison'] = results_loop.apply(
    lambda row: '-'.join(sorted([row.structure_x_name, row.structure_y_name])),
    axis='columns',
)
results_loop = results_loop.drop_duplicates(['structure_comparison', 'chain_type', 'cdr'])

[29]:

results_loop = results_loop.groupby(['cdr_sequences_collated', 'comparison', 'cdr', 'chain_type'])['rmsd'].mean().reset_index()

[30]:

results_loop['movement'] = results_loop['rmsd'].map(categorize_movement).astype(movement_order)

[31]:

results_loop

[31]:

	cdr_sequences_collated	comparison	cdr	chain_type	rmsd	movement
0	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	1	alpha_chain	0.801538	Some Movement (0.5 to 1.0 Å)
1	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	1	beta_chain	0.239049	Little Movement (<0.5 Å)
2	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	2	alpha_chain	0.725440	Some Movement (0.5 to 1.0 Å)
3	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	2	beta_chain	0.774079	Some Movement (0.5 to 1.0 Å)
4	ATGYPS-ATKADDK-ALSDPVNDMR-SGHAT-FQNNGV-ASSLRGR...	apo-holo	3	alpha_chain	1.210009	Movement (1.0 to 2.0 Å)
...	...	...	...	...	...	...
318	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	1	beta_chain	0.113593	Little Movement (<0.5 Å)
319	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	2	alpha_chain	1.082823	Movement (1.0 to 2.0 Å)
320	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	2	beta_chain	0.201656	Little Movement (<0.5 Å)
321	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	3	alpha_chain	1.475462	Movement (1.0 to 2.0 Å)
322	YSGSPE-HISR-ALSGFNNAGNMLT-SGHAT-FQNNGV-ASSLGGA...	apo-holo	3	beta_chain	2.503307	Large Movement (2.0 to 4.0 Å)

323 rows × 6 columns

Visualizing and analysing the results

Measuring the distance between apo and holo confomations

Since RMSD was calculated for all pairings of the same TCR, the apo-holo (or holo-apo) comparisons were selected.

Framework Alignment

[32]:

results_framework['chain_type_formatted'] = results_framework['chain_type'].str.replace('_', ' ').str.title()

[33]:

sns.boxplot(data=results_framework.query("comparison == 'apo-holo'"), y='rmsd', x='cdr', hue='chain_type_formatted')

x = np.linspace(-0.75, 2.75)
y = np.repeat(NOISE_LEVEL, len(x))

plt.plot(x, y, '--r', label=f'Noise Level ({NOISE_LEVEL: .2f} Å)')
plt.xlim(-0.75, 2.75)

plt.legend(title='Chain Type')
plt.xlabel('CDR')
plt.ylabel('RMSD (Å)')

plt.show()

results_framework.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])['rmsd'].describe()

../_images/source_Comparison_of_apo_and_holo_CDR_loops_40_0.png

[33]:

		count	mean	std	min	25%	50%	75%	max
chain_type	cdr
alpha_chain	1	25.0	1.625591	1.126121	0.172478	0.849734	1.308559	2.056843	5.547374
	2	24.0	1.360652	0.725418	0.496448	0.894783	1.121992	1.509622	3.170339
	3	25.0	2.205034	1.666278	0.157864	1.124128	1.678234	2.635113	7.883364
beta_chain	1	25.0	0.818192	0.428709	0.211640	0.504221	0.770227	0.983855	2.106660
	2	25.0	0.879987	0.718325	0.311457	0.526078	0.705481	0.947803	3.928769
	3	25.0	1.375520	0.977938	0.142801	0.705002	1.001740	1.823828	3.820573

[34]:

sns.displot(results_framework.query("comparison == 'apo-holo'"),
            x='rmsd', col='cdr', row='chain_type_formatted')

[34]:

<seaborn.axisgrid.FacetGrid at 0x7f9d0735eb30>

../_images/source_Comparison_of_apo_and_holo_CDR_loops_41_1.png

[35]:

treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_framework.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])]
print(scipy.stats.kruskal(*[values for _, values in treatments]))

KruskalResult(statistic=33.47222908277405, pvalue=3.0322000362602383e-06)

[36]:

combos = [(((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y))
          for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in itertools.combinations(treatments, 2)
          if cdr_x == cdr_y]

significance_level = 0.05 / len(combos)

framework_stats = {
    'chain_type_x': [],
    'cdr_x': [],
    'chain_type_y': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    framework_stats['chain_type_x'].append(chain_x)
    framework_stats['cdr_x'].append(cdr_x)
    framework_stats['chain_type_y'].append(chain_y)
    framework_stats['cdr_y'].append(cdr_y)

    framework_stats['statistic'].append(stat)
    framework_stats['p_val'].append(p_val)

framework_stats = pd.DataFrame(framework_stats)
framework_stats['significant'] = framework_stats['p_val'] < significance_level

framework_stats

[36]:

	chain_type_x	cdr_x	chain_type_y	cdr_y	statistic	p_val	significant
0	alpha_chain	1	beta_chain	1	3.346992	0.000817	True
1	alpha_chain	2	beta_chain	2	3.380000	0.000725	True
2	alpha_chain	3	beta_chain	3	2.202223	0.027650	False

[37]:

combos = list(itertools.combinations([(cdr, df['rmsd'].to_numpy())
                                       for cdr, df in results_framework.query("comparison == 'apo-holo'").groupby('cdr')], 2))

significance_level = 0.05 / len(combos)

framework_stats = {
    'cdr_x': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for (cdr_x, sample_x), (cdr_y, sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)


    framework_stats['cdr_x'].append(cdr_x)
    framework_stats['cdr_y'].append(cdr_y)

    framework_stats['statistic'].append(stat)
    framework_stats['p_val'].append(p_val)

framework_stats = pd.DataFrame(framework_stats)
framework_stats['significant'] =  framework_stats['p_val'] < significance_level

framework_stats

[37]:

	cdr_x	cdr_y	statistic	p_val	significant
0	1	2	0.335930	0.736924	False
1	1	3	-2.343899	0.019083	False
2	2	3	-2.820412	0.004796	True

The figure above shows there appears to be an increase in the movement of CDR3 compared to CDR1 and CDR2 and that the alpha chain has more movement than the beta chain of the TCR.

[38]:

movement_counts = results_framework.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
movement_counts = movement_counts[movement_counts > 0]

ax = movement_counts.plot.pie(title='CDR Movement from Framework Alignment',
                              autopct='%1.2f%%',
                              legend=True,
                              ylabel='',
                              labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')

plt.show()

../_images/source_Comparison_of_apo_and_holo_CDR_loops_46_0.png

Loop Alignment

[39]:

results_loop['chain_type_formatted'] = results_loop['chain_type'].str.replace('_', ' ').str.title()

[40]:

sns.boxplot(data=results_loop.query("comparison == 'apo-holo'"), y='rmsd', x='cdr', hue='chain_type_formatted')

x = np.linspace(-0.75, 2.75)
y = np.repeat(NOISE_LEVEL, len(x))

plt.plot(x, y, '--r', label=f'Noise Level ({NOISE_LEVEL: .2f} Å)')
plt.xlim(-0.75, 2.75)

plt.legend(title='Chain Type')
plt.xlabel('CDR')
plt.ylabel('RMSD (Å)')

plt.show()

results_loop.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])['rmsd'].describe()

../_images/source_Comparison_of_apo_and_holo_CDR_loops_49_0.png

[40]:

		count	mean	std	min	25%	50%	75%	max
chain_type	cdr
alpha_chain	1	25.0	0.539703	0.370003	0.037604	0.265344	0.398626	0.865431	1.317612
	2	24.0	0.644121	0.446783	0.215395	0.294647	0.439994	0.979735	1.684088
	3	25.0	1.121232	0.650441	0.077000	0.584030	1.083251	1.423754	2.495455
beta_chain	1	25.0	0.279281	0.184982	0.110227	0.149009	0.214492	0.363066	0.744923
	2	25.0	0.339790	0.232802	0.056751	0.191979	0.248698	0.398983	0.988705
	3	25.0	0.886316	0.691293	0.102327	0.341379	0.693077	1.220448	2.503307

[41]:

sns.displot(results_loop.query("comparison == 'apo-holo'"),
            x='rmsd', col='cdr', row='chain_type_formatted')

[41]:

<seaborn.axisgrid.FacetGrid at 0x7f9d03eedc30>

../_images/source_Comparison_of_apo_and_holo_CDR_loops_50_1.png

[42]:

treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_loop.query("comparison == 'apo-holo'").groupby(['chain_type', 'cdr'])]
print(scipy.stats.kruskal(*[values for _, values in treatments]))

KruskalResult(statistic=48.69824429530195, pvalue=2.5578145006489472e-09)

[43]:

combos = [(((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y))
          for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in itertools.combinations(treatments, 2)
          if cdr_x == cdr_y]

significance_level = 0.05 / len(combos)

loop_stats = {
    'chain_x': [],
    'cdr_x': [],
    'chain_y': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for ((chain_x, cdr_x), sample_x), ((chain_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    loop_stats['chain_x'].append(chain_x)
    loop_stats['cdr_x'].append(cdr_x)
    loop_stats['chain_y'].append(chain_y)
    loop_stats['cdr_y'].append(cdr_y)

    loop_stats['statistic'].append(stat)
    loop_stats['p_val'].append(p_val)

loop_stats = pd.DataFrame(loop_stats)

loop_stats['significant'] = loop_stats['p_val'] < significance_level

loop_stats

[43]:

	chain_x	cdr_x	chain_y	cdr_y	statistic	p_val	significant
0	alpha_chain	1	beta_chain	1	3.172366	0.001512	True
1	alpha_chain	2	beta_chain	2	3.120000	0.001809	True
2	alpha_chain	3	beta_chain	3	1.600735	0.109436	False

[44]:

combos = list(itertools.combinations([(cdr, df['rmsd'].to_numpy())
                                      for cdr, df in results_loop.query("comparison == 'apo-holo'").groupby('cdr')], 2))

significance_level = 0.05 / len(combos)

loop_stats = {
    'cdr_x': [],
    'cdr_y': [],
    'statistic': [],
    'p_val': [],
}

for (cdr_x, sample_x), (cdr_y, sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    loop_stats['cdr_x'].append(cdr_x)
    loop_stats['cdr_y'].append(cdr_y)

    loop_stats['statistic'].append(stat)
    loop_stats['p_val'].append(p_val)

loop_stats = pd.DataFrame(loop_stats)

loop_stats['significant'] = loop_stats['p_val'] < significance_level

loop_stats

[44]:

	cdr_x	cdr_y	statistic	p_val	significant
0	1	2	-1.252739	2.103007e-01	False
1	1	3	-5.053170	4.345369e-07	True
2	2	3	-4.325099	1.524634e-05	True

The figure above shows there appears to be an increase in the movement of CDR3 compared to CDR1 and CDR2 and that the alpha chain has more movement than the beta chain of the TCR. In this alignment procedure, the CDR-1 and CDR-2 loops median drop below the noise level line, indicating that these show different effects to the CDR3 loops.

[45]:

movement_counts = results_loop.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()
movement_counts = movement_counts[movement_counts > 0]

ax = movement_counts.plot.pie(title='CDR Movement from Loop Alignment',
                              autopct='%1.2f%%',
                              legend=True,
                              ylabel='',
                              labeldistance=None)
ax.legend(bbox_to_anchor=(1, 1), loc='upper left')

plt.show()

../_images/source_Comparison_of_apo_and_holo_CDR_loops_55_0.png

[46]:

results_loop.query("comparison == 'apo-holo'")['movement'].dropna().value_counts()

[46]:

Little Movement (<0.5 Å)         83
Some Movement (0.5 to 1.0 Å)     37
Movement (1.0 to 2.0 Å)          23
Large Movement (2.0 to 4.0 Å)     6
Significant Movement (>4.0 Å)     0
Name: movement, dtype: int64

Comparison of apo-holo, apo-apo and holo-holo conformational changes

The following analysis aims to ascertain whether there is notable movement in the CDR domains between the apo and holo conformations, using apo-apo and holo-holo differences as controls.

Framework Alignment

[47]:

sns.catplot(results_framework.sort_values(['comparison', 'chain_type', 'cdr']),
            x='comparison',
            y='rmsd',
            col='cdr',
            row='chain_type',
            kind='box')

../_images/source_Comparison_of_apo_and_holo_CDR_loops_59_0.png

[48]:

sns.displot(results_framework.sort_values(['comparison', 'chain_type', 'cdr']),
            hue='comparison',
            x='rmsd',
            col='cdr',
            row='chain_type',
            kind='kde')

[48]:

<seaborn.axisgrid.FacetGrid at 0x7f9d02e1fa00>

../_images/source_Comparison_of_apo_and_holo_CDR_loops_60_1.png

[49]:

results_framework.groupby(['comparison', 'chain_type', 'cdr']).size()

[49]:

comparison  chain_type   cdr
apo-apo     alpha_chain  1      11
                         2      11
                         3      11
            beta_chain   1      11
                         2      11
                         3      11
apo-holo    alpha_chain  1      25
                         2      24
                         3      25
            beta_chain   1      25
                         2      25
                         3      25
holo-holo   alpha_chain  1      18
                         2      18
                         3      18
            beta_chain   1      18
                         2      18
                         3      18
dtype: int64

[50]:

treatment_options = ['comparison', 'chain_type', 'cdr']
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_framework.groupby(treatment_options)]
treatments
print(scipy.stats.kruskal(*[values for _, values in treatments]))

KruskalResult(statistic=112.0659833804159, pvalue=4.924957542598049e-16)

[51]:

combos = []
for pairing in list(itertools.combinations(treatments, 2)):
    chain_type_x = pairing[0][0][1]
    cdr_x = pairing[0][0][2]
    chain_type_y = pairing[1][0][1]
    cdr_y = pairing[1][0][2]

    if (chain_type_x, cdr_x) == (chain_type_y, cdr_y):
        combos.append(pairing)

significance_level = 0.05 / len(combos)
print(significance_level)
statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

momvement_statistics_fw = pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})

momvement_statistics_fw['p_val'] = momvement_statistics_fw['p_val'].map(lambda num: f'{num:.2e}')

momvement_statistics_fw

0.002777777777777778

[51]:

	comparison_x	chain_type_x	cdr_x	comparison_y	chain_type_y	cdr_y	statistic	p_val	significant
0	apo-apo	alpha_chain	1	apo-holo	alpha_chain	1	-2.558466	1.05e-02	False
1	apo-apo	alpha_chain	1	holo-holo	alpha_chain	1	0.404520	6.86e-01	False
2	apo-apo	alpha_chain	2	apo-holo	alpha_chain	2	-2.274141	2.30e-02	False
3	apo-apo	alpha_chain	2	holo-holo	alpha_chain	2	0.404520	6.86e-01	False
4	apo-apo	alpha_chain	3	apo-holo	alpha_chain	3	-2.215048	2.68e-02	False
5	apo-apo	alpha_chain	3	holo-holo	alpha_chain	3	2.112493	3.46e-02	False
6	apo-apo	beta_chain	1	apo-holo	beta_chain	1	-1.356502	1.75e-01	False
7	apo-apo	beta_chain	1	holo-holo	beta_chain	1	1.303453	1.92e-01	False
8	apo-apo	beta_chain	2	apo-holo	beta_chain	2	-0.188880	8.50e-01	False
9	apo-apo	beta_chain	2	holo-holo	beta_chain	2	2.382173	1.72e-02	False
10	apo-apo	beta_chain	3	apo-holo	beta_chain	3	-1.390844	1.64e-01	False
11	apo-apo	beta_chain	3	holo-holo	beta_chain	3	2.112493	3.46e-02	False
12	apo-holo	alpha_chain	1	holo-holo	alpha_chain	1	3.495798	4.73e-04	True
13	apo-holo	alpha_chain	2	holo-holo	alpha_chain	2	3.507467	4.52e-04	True
14	apo-holo	alpha_chain	3	holo-holo	alpha_chain	3	4.529767	5.90e-06	True
15	apo-holo	beta_chain	1	holo-holo	beta_chain	1	3.372707	7.44e-04	True
16	apo-holo	beta_chain	2	holo-holo	beta_chain	2	3.692745	2.22e-04	True
17	apo-holo	beta_chain	3	holo-holo	beta_chain	3	3.988164	6.66e-05	True

[52]:

combos = list(itertools.combinations(treatments, 2))

significance_level = 0.05 / len(combos)

statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})

[52]:

	comparison_x	chain_type_x	cdr_x	comparison_y	chain_type_y	cdr_y	statistic	p_val	significant
0	apo-apo	alpha_chain	1	apo-apo	alpha_chain	2	-0.229828	0.818226	False
1	apo-apo	alpha_chain	1	apo-apo	alpha_chain	3	-1.214803	0.224441	False
2	apo-apo	alpha_chain	1	apo-apo	beta_chain	1	0.164163	0.869603	False
3	apo-apo	alpha_chain	1	apo-apo	beta_chain	2	-0.361158	0.717982	False
4	apo-apo	alpha_chain	1	apo-apo	beta_chain	3	-0.623818	0.532747	False
...	...	...	...	...	...	...	...	...	...
148	holo-holo	alpha_chain	3	holo-holo	beta_chain	2	0.601133	0.547751	False
149	holo-holo	alpha_chain	3	holo-holo	beta_chain	3	0.664411	0.506428	False
150	holo-holo	beta_chain	1	holo-holo	beta_chain	2	0.506218	0.612704	False
151	holo-holo	beta_chain	1	holo-holo	beta_chain	3	0.474579	0.635087	False
152	holo-holo	beta_chain	2	holo-holo	beta_chain	3	-0.158193	0.874305	False

153 rows × 9 columns

The analysis of the plots and statistical tests shows that there is a statistically significant (p-value: 0.05) difference between the target and controls (as seen by the results of the Kruskal-Wallis test) but it also shows that there is a significant difference between the apo-holo comparisons and the holo-holo baseline.

Loop Alignment

[53]:

sns.catplot(results_loop.sort_values(['comparison', 'chain_type', 'cdr']),
            x='comparison',
            y='rmsd',
            col='cdr',
            row='chain_type',
            kind='box')

../_images/source_Comparison_of_apo_and_holo_CDR_loops_67_0.png

[54]:

sns.displot(results_loop.sort_values(['comparison', 'chain_type', 'cdr']),
            x='rmsd',
            hue='comparison',
            col='cdr',
            row='chain_type',
            kind='kde')

[54]:

<seaborn.axisgrid.FacetGrid at 0x7f9d0366b1f0>

../_images/source_Comparison_of_apo_and_holo_CDR_loops_68_1.png

[55]:

results_loop.groupby(['comparison', 'chain_type', 'cdr']).size()

[55]:

comparison  chain_type   cdr
apo-apo     alpha_chain  1      11
                         2      11
                         3      11
            beta_chain   1      11
                         2      11
                         3      11
apo-holo    alpha_chain  1      25
                         2      24
                         3      25
            beta_chain   1      25
                         2      25
                         3      25
holo-holo   alpha_chain  1      18
                         2      18
                         3      18
            beta_chain   1      18
                         2      18
                         3      18
dtype: int64

[56]:

treatment_options = ['comparison', 'chain_type', 'cdr']
treatments = [(group, df['rmsd'].to_numpy()) for group, df in results_loop.groupby(treatment_options)]
treatments
print(scipy.stats.kruskal(*[values for _, values in treatments]))

KruskalResult(statistic=106.74425009179026, pvalue=4.928764973630228e-15)

[57]:

combos = []
for pairing in list(itertools.combinations(treatments, 2)):
    chain_type_x = pairing[0][0][1]
    cdr_x = pairing[0][0][2]
    chain_type_y = pairing[1][0][1]
    cdr_y = pairing[1][0][2]

    if (chain_type_x, cdr_x) == (chain_type_y, cdr_y):
        combos.append(pairing)

significance_level = 0.05 / len(combos)
print(significance_level)
statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

momvement_statistics_loop = pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})

momvement_statistics_loop['p_val'] = momvement_statistics_loop['p_val'].map(lambda num: f'{num:.2e}')

momvement_statistics_loop

0.002777777777777778

[57]:

	comparison_x	chain_type_x	cdr_x	comparison_y	chain_type_y	cdr_y	statistic	p_val	significant
0	apo-apo	alpha_chain	1	apo-holo	alpha_chain	1	-2.592808	9.52e-03	False
1	apo-apo	alpha_chain	1	holo-holo	alpha_chain	1	-1.663026	9.63e-02	False
2	apo-apo	alpha_chain	2	apo-holo	alpha_chain	2	-3.979747	6.90e-05	True
3	apo-apo	alpha_chain	2	holo-holo	alpha_chain	2	-2.517013	1.18e-02	False
4	apo-apo	alpha_chain	3	apo-holo	alpha_chain	3	-1.974655	4.83e-02	False
5	apo-apo	alpha_chain	3	holo-holo	alpha_chain	3	1.303453	1.92e-01	False
6	apo-apo	beta_chain	1	apo-holo	beta_chain	1	-1.665579	9.58e-02	False
7	apo-apo	beta_chain	1	holo-holo	beta_chain	1	-0.764093	4.45e-01	False
8	apo-apo	beta_chain	2	apo-holo	beta_chain	2	-1.150451	2.50e-01	False
9	apo-apo	beta_chain	2	holo-holo	beta_chain	2	0.898933	3.69e-01	False
10	apo-apo	beta_chain	3	apo-holo	beta_chain	3	-1.837288	6.62e-02	False
11	apo-apo	beta_chain	3	holo-holo	beta_chain	3	0.539360	5.90e-01	False
12	apo-holo	alpha_chain	1	holo-holo	alpha_chain	1	1.821754	6.85e-02	False
13	apo-holo	alpha_chain	2	holo-holo	alpha_chain	2	2.262062	2.37e-02	False
14	apo-holo	alpha_chain	3	holo-holo	alpha_chain	3	4.283584	1.84e-05	True
15	apo-holo	beta_chain	1	holo-holo	beta_chain	1	0.960114	3.37e-01	False
16	apo-holo	beta_chain	2	holo-holo	beta_chain	2	2.708013	6.77e-03	False
17	apo-holo	beta_chain	3	holo-holo	beta_chain	3	3.594272	3.25e-04	True

[58]:

combos = list(itertools.combinations(treatments, 2))

significance_level = 0.05 / len(combos)

statistics = []
p_vals = []

for ((comparison_x, chain_type_x, cdr_x), sample_x), ((comparison_y, chain_type_y, cdr_y), sample_y) in combos:
    stat, p_val = scipy.stats.ranksums(sample_x, sample_y)

    statistics.append(stat)
    p_vals.append(p_val)

pd.DataFrame({
    'comparison_x': [name for ((name, _, _), _), _ in combos],
    'chain_type_x': [name for ((_, name, _), _), _ in combos],
    'cdr_x': [name for ((_, _, name), _), _ in combos],
    'comparison_y': [name for _, ((name, _, _), _) in combos],
    'chain_type_y': [name for _, ((_, name, _), _) in combos],
    'cdr_y': [name for _, ((_, _, name), _) in combos],
    'statistic': statistics,
    'p_val': p_vals,
    'significant': [p_val < significance_level for p_val in p_vals],
})

[58]:

	comparison_x	chain_type_x	cdr_x	comparison_y	chain_type_y	cdr_y	statistic	p_val	significant
0	apo-apo	alpha_chain	1	apo-apo	alpha_chain	2	0.689483	0.490520	False
1	apo-apo	alpha_chain	1	apo-apo	alpha_chain	3	-2.134113	0.032834	False
2	apo-apo	alpha_chain	1	apo-apo	beta_chain	1	0.558153	0.576740	False
3	apo-apo	alpha_chain	1	apo-apo	beta_chain	2	-0.820813	0.411753	False
4	apo-apo	alpha_chain	1	apo-apo	beta_chain	3	-1.280468	0.200381	False
...	...	...	...	...	...	...	...	...	...
148	holo-holo	alpha_chain	3	holo-holo	beta_chain	2	2.056509	0.039733	False
149	holo-holo	alpha_chain	3	holo-holo	beta_chain	3	0.727688	0.466805	False
150	holo-holo	beta_chain	1	holo-holo	beta_chain	2	0.284747	0.775838	False
151	holo-holo	beta_chain	1	holo-holo	beta_chain	3	-1.328821	0.183907	False
152	holo-holo	beta_chain	2	holo-holo	beta_chain	3	-1.676846	0.093573	False

153 rows × 9 columns

Similarily here, there is a significant difference between the comparison types but in this case the post-hoc testing shows a difference between the comparisons of apo-holo and holo-holo structures for CDR3 loops only.

Conclusion

Overall, this analysis has showed that all six CDR loops undergo bulk conformational changes between apo and holo states, but that only the CDR3 loops (both \(\alpha\)- and \(\beta\)-chain) deform when the loops are aligned to each other.