Volume 69, Issue 4 pp. 269-279
Full Access

In Silico Prediction of SARS Protease Inhibitors by Virtual High Throughput Screening

Dariusz Plewczynski

Corresponding Author

Dariusz Plewczynski

Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, Pawinskiego 5a Street, 02-106 Warsaw, Poland

* Dariusz Plewczynski, [email protected]Search for more papers by this author
Marcin Hoffmann

Marcin Hoffmann

BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznan, Poland

Search for more papers by this author
Marcin Von Grotthuss

Marcin Von Grotthuss

BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznan, Poland

Search for more papers by this author
Krzysztof Ginalski

Krzysztof Ginalski

Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, Pawinskiego 5a Street, 02-106 Warsaw, Poland

Search for more papers by this author
Leszek Rychewski

Leszek Rychewski

BioInfoBank Institute, Limanowskiego 24A/16, 60-744 Poznan, Poland

Search for more papers by this author
First published: 24 April 2007
Citations: 11

Abstract

A structure-based in silico virtual drug discovery procedure was assessed with severe acute respiratory syndrome coronavirus main protease serving as a case study. First, potential compounds were extracted from protein–ligand complexes selected from Protein Data Bank database based on structural similarity to the target protein. Later, the set of compounds was ranked by docking scores using a Electronic High-Throughput Screening flexible docking procedure to select the most promising molecules. The set of best performing compounds was then used for similarity search over the 1 million entries in the Ligand.Info Meta-Database. Selected molecules having close structural relationship to a 2-methyl-2,4-pentanediol may provide candidate lead compounds toward the development of novel allosteric severe acute respiratory syndrome protease inhibitors.

Severe acute respiratory syndrome coronavirus (SARS-CoV) has been recently found to be a potent pathogen of humans and capable of rapid global spread. It is a life-threatening form of pneumonia characterized by high fever, nonproductive cough, chills, myalgia, lymphopenia, and progressive infiltrates as indicated by chest radiography (1). Relative to its emergence a few years ago, an epidemic rapidly spread as facilitated by international air travel, from its origin in Guangdong Province, China, to many other countries. The World Health Organization (WHO) has reported over 8000 SARS cases and nearly 800 deaths resulting from the infection with the SARS-associated coronavirus (SARS-CoV; 2). Since about 2003, various SARS-CoV protein targets for drug discovery were identified, including SARS protease, polymerase and helicase (3). This study describes an in silico method that captures key features of potential inhibitor molecules to provide specificity and address opportunities for chemical biology and drug design (4). Our approach is based on experimental information contained in publicly available databases, therefore presenting a foundation for experimental validation. Genomic research provides an ever increasing number of potential drug targets. Structural biology allows for intense use of available and experimentally verified, structural data in various computational projects. In this study, we exploited structural homologs of SARS-CoV protease co-crystallized with small molecules to explore opportunities for drug design of potential inhibitors for this therapeutic target enzyme.

The crystal structures for all members of Structural Classification of Proteins (SCOP; 5–7), protein family of viral cysteine proteases of trypsin-fold were extracted from the Protein Data Bank (PDB) database (8,9). This family is a part of the trypsin-like serine protease superfamily that possesses closed barrel-type structure and consists of two domains of the same Greek-key duplication. The SCOP family of viral cysteine proteases encompasses three different groups of structures. The first group is 3C cysteine protease (picornain 3C) as exemplified by three proteins: human rhinovirus type 2 (1CQQ); human hepatitis A virus (1QA7, 1HAV), and Poliovirus type I (1L1N). The second group is 2A cysteine proteinase as singularly exemplified by protein Human rhinovirus 2 (2HRV). The third group is coronavirus main proteinase (3Cl-pro, putative coronavirus nsp2) as exemplified by four proteins: transmissible gastroenteritis virus (1LVO); transmissible gastroenteritis virus (1P9U); human coronavirus (1P9S); and SARS coronavirus (1Q2W, 1UJ1, 1UK2, 1UK3, 1UK4). In addition, the sequence similarity search was performed against all proteins from PDB database in order to find homologous protein structures not included in a recent version of SCOP database. The list of ligands co-crystallized with cysteine proteases presents a significant chemical diversity and includes peptides, small molecules, and inorganic salts. The ligand structures are summarized in Table 1 and includes their resolution as well as information about location of their binding inside or outside the common active site. Of these ligand structures, two peptides and one small molecule (i.e. chloroacetone) have been reported in the crystal structure of the SARS coronavirus protease (1UK4). Table 1 summarizes the known ligands co-crystalized with SCOP family of viral cysteine proteases of trypsin-fold and deposited in PDB database. Our approach considers such molecules to be relevant candidate lead compounds (or lead fragments) for further chemical modifications (e.g. peptide bond replacement by non-hydrolyzable bioisosteres). It is also noted that these initial lead compounds do not represent a comprehensive list as they are limited to relevant structures deposited in PDB database. Furthermore, some molecules which have been reported with respect to SARS drug discovery (10–17) were not included in this investigation.

Table 1. The list of structures from SCOP family of viral cysteine proteases of trypsin-fold with their inhibitors or substructures of small chemical molecules or peptides co-crystalized with those proteins
PDB codes PDB names Chains Active site Resolution Ligand short name
1CQQ Human rhinovirus type 2 A 1.85
+ AG7
1QA7 Human hepatitis A virus A, B, C, D 1.9
+ ACE
+ VAL
+ NFA
DMS
GOL
1HAV Human hepatitis A virus A, B 2.0
+ OCS
CL
1L1N Poliovirus type I A, B 2.1
2HRV Human rhinovirus 2 A, B 1.95
ZN
1LVO Transmissible gastroenteritis virus A, B, C, D, E, F 1.96
DOX
+ MPD
SO4
1P9U Transmissible gastroenteritis virus A, B, C, D, E, F 2.37
+ MPD
SO4
+ CH2
+ Short peptide
1P9S Human coronavirus A, B 2.54
+ DOX
+/− MSE
1Q2W SARS coronavirus A, B 1.86
MPD
1UJ1 SARS coronavirus A, B 1.9
1UK3 SARS coronavirus A, B 2.4
1UK2 SARS coronavirus A, B 2.2
1UK4 SARS coronavirus A, B 2.5
+ ATO
GH+, K− Short peptides
  • SCOP, Structural Classification of Protein; PDB, Protein Data Bank; SARS, severe acute respiratory syndrome.

The swiss pdbviewer software (18) was used to align analyzed structures in three-dimensional (3D) space. First, we divided all available protein chains into single domains. The domains were then structurally aligned in order to analyze the binding mode of each ligand in their active sites. From aligned structures we extracted the 1UK4 protein active site with all ligands located within it. In our drug design strategy, we used 1UK4 as the template for a flexible docking experiment in order to adjust the conformation of ligands in the new structural context. The target structure with structurally aligned ligands (co-crystallized with structurally homologous proteins) was then subjected to a further analysis in order to address any inconsistencies in the PDB database entries for certain ligands. In some instances, ligands co-crystallized in protein structures deposited in the PDB database were lacking defined atoms or functional groups may have deformed 3D representation. In the case of SARS-CoV mPro enzyme (1UK4), the ligand molecule, chloroacetone, was represented only by the chlorine atom, hence oxygen and carbon atoms needed to be added to obtain a complete molecule. In the same structure (1UK4), a pentapeptide inhibitor was contained in the crystal structure; however, its C-terminus (a carboxylic acid) was deformed to such a degree that the carbon atom had sp3 instead of sp2 hybridization. Moreover, the distance between the two oxygen atoms in the terminal carboxylate moiety was only 0.740 Å and 1.066 Å for chains G and H, respectively. The original study (19) provides information that one should expect a typical carboxylic moiety at C-terminus of the inhibitory pentapeptide, hence further analysis required prior manually remodeling to reconstitute the desired molecules.

We have explored the use of structural information contained in PDB database for an in silico virtual drug discovery campaign using, as a case study, the main protease of SARS-CoV. A similar approach using HIV protease as a case study has been reported in Refs (20,21). Two in silico methods were employed to evaluate the gathered structural information from PDB database. The first method was Electronic High-Throughput Screening (eHiTS), an exhaustive flexible docking method that systematically covers the significant part of the conformational and positional search space to produce highly accurate docking poses at a speed practical for virtual high-throughput screening (22). The second method was the Ligand.Info, a system designed for fast, sensitive, virtual high-throughput screening of small-molecule databases (23). The Ligand.Info search algorithm is based on two-dimensional structure similarity. The developed system enables search for similar compounds in a large Ligand.Info Meta-Database (24) that contains various publicly available sets of small molecules, including: (i) Harvard's ChemBank, which encompasses bioactive compounds and FDA-approved drugs (25); (ii) ChemPDB – ligands marked as Hetero Atoms in PDB files (26,27); (iii) KEGG Ligand – molecules which are found in the KEGG pathways (28); and (iv) the Open National Cancer Institute database (29). The total size of the Meta-Database exceeded 1 million entries.

Using this method, plausible inhibitors were generated as based only on the set of ligands from crystallized complexes of a protein target and other proteins from its structurally homologous family. The docking was performed on small molecules and short peptides extracted from protein–ligand complexes of the viral cysteine proteases of trypsin-fold. These small molecules and peptides were then modified in order to correct chemical attributes of the ligand structures. This set of inhibitors was next evaluated by the eHiTS flexible docking algorithm. The top-ranked ligands are summarized in Table 2 (see Figure 1 for chemical structures). The best scoring compounds of the first set were the original AG7 ligand (eHiTS docking score of −5.615) and two analogs of AG7 (eHiTS docking scores of −5.103 and −4.803, respectively). The fourth and fifth best scoring compounds were the original peptides of SARS target (PDB code: 1UK4, chains H and G) and reflected improving the initial 3D structure (eHiTS docking scores of −4.795 and −4.703, respectively). In the case of the crystal structure 1UK4, the same short peptide (chains G and H) interacting with the enzyme can be seen. In the crystal structure this peptide is present in two different conformations, and when each are used as staring points the the eHiTS results were determined to be different (cf. peptides type 1 chain H and type 1 chain G in Table 1). The same effect was observed for a very simple and flexible molecule, 2-methyl-2,4-pentanediol (MPD), which exists in crystal structures in various conformations and when used as starting points for eHiTS calculations yielded different results. Thus, eHiTS results depend on the choice of a ligand initial conformation for several reasons, including: (i) some parts of the molecule recognized by eHiTS as rigid blocks may not be rigid and should not be treated as rigid; (ii) some flexible parts may not be flexible enough; and (iii) the search may not be adequately comprehensive. We tend to think that the dependence of results on initial conformation stems from the combination of (i) and (iii) as rings within molecules being virtually identical between various poses produced by the program. Unfortunately, such a suggestion could not be confirmed as we could not identify any information relative to any parts of a molecule being treated by eHiTS as rigid blocks. Further, each molecule shown in Table 1 was used for screening the Ligand.Info Meta-Database as a query. Unfortunately, almost all designed potential lead compounds did not have any close analog in any Meta-Database subset. Experimental confirmation will obviously require chemical synthesis and biologic testing. However, MPD allosteric inhibitor was an exception as 21 similar analogs (with MTC ≥ 0.60) were found using MPD as a query.

Table 2. Scores for selected ligands and peptides docked in the active site of coronavirus main proteinase from SARS coronarvirus (1UK4)
Ligand short name Ligand smiles Ligand structure Dockingscore
AG7 CCOC(=O)CC[C@H](C[C@@H]1CCNC1=O)NC(=O)[C@@H] (CC(=O)[C@@H](NC(=O)C2=NOC(C)=C2)C(C)C)Cc3ccc(F)cc3 inline image −5.615
AG7 version 1 CCOC(=O)CC[C@H](C[C@@H]1CCNC1=O)NC(=O)[C@@H] (CC(=O)[C@@H](NC=O)C(C)C)Cc2ccc(F)cc2 inline image −5.103
AG7 modified version 1 CCOC(=O)CC[C@H](C[C@@H]1CCNC1=O)NC(=O)[C@@H](C[C@H](O)[C@@H](NC=O)C(C)C)Cc2ccc(F)cc2 inline image −4.803
Peptide type 1 chain H (1UK4 protein) modified CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC(N)=O)[C@@H](C)O)C(=O)N[C@@H] (CCC(N)=O)C(O)=O inline image −4.795
Peptide type 1 chain G (1UK4 protein) modified CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC (=O)[C@@H](N)CC(N)=O)[C@@H](C)O)C(=O)N[C@@H] (CCC(N)=O)C(O)=O inline image −4.703
Peptide type 1 chain G (1UK4 protein). C-end with dioxirane ring present in the PDB CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC (=O)[C@@H](N)CC(N)=O)[C@@H](C)O)C(=O)N[C@@H] (CCC(N)=O)C1OO1 inline image −4.608
VNSTLQ modified version 2 CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC (=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)C(C)C)[C@@H](C)O) C(=O)N[C@@H](CCC(N)=O)C(O)=O inline image −4.344
AG7 modified version 3 CC[C@@H](CCC(O)=O)NC(=O)[C@H] (C)C[C@H](O)[C@@H](NC(C)=O)C(C)C inline image −4.239
Peptide type 1 chain H (1UK4 protein). C-end with dioxirane ring present in the PDB CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@@H](N)CC(N)=O)[C@@H](C)O)C(=O)N[C@@H] (CCC(N)=O)C1OO1 inline image −3.979
VNSTLQ modified version 1 CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)C(C)C)[C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C=O inline image −3.965
AG7 connected with NFA version 2 modified CC[C@@H](CCCC(=O)N[C@@H](Cc1ccccc1)C(N)=O)NC(=O)[C@H](C)CC(=O)[C@@H](NC(C)=O)C(C)C inline image −3.921
AG7 version 3 CC[C@@H](CCC(O)=O)NC(=O)[C@H](C)CC(=O)[C@@H](NC(C)=O)C(C)C inline image −3.896
VNSTLQ modified CC(C)C[C@H](NC(=O)[C@@H](NC(=O)[C@H](CO)NC(=O)[C@H](CC(N)=O)NC(=O)[C@@H](N)C(C)C)[C@@H](C)O)C(=O)N[C@@H](CCC(N)=O)C=O inline image −3.771
AG7 modified version 2 CCOC(=O)CC[C@H](CCC(N)=O)NC(=O)[C@H](C)CC(=O)[C@@H](NC(C)=O)C(C)C inline image −3.769
AG7 version 2 CCOC(=O)CC[C@H](CCC(N)=O)NC(=O)[C@H](C)C[C@H](O)[C@@H](NC(C)=O)C(C)C inline image −3.735
VAL connected with NFA modified CC(C)[C@H](N)C(=O)N[C@@H](Cc1ccccc1)C(N)=O inline image −3.697
VAL connected with NFA CC(C)[C@H](N)C(=O)N[C@@H](Cc1ccccc1)C(N)=O inline image −3.690
AG7 connected with NFA version 1 modified CC[C@@H](CC∖C=C∖N[C@@H](Cc1ccccc1)C(N)=O)NC(=O)[C@H](C)CC(=O)[C@@H](NC(C)=O)C(C)C inline image −3.569
NFA modified NC(=O)[C@H](Cc1ccccc1)NC=O inline image −3.390
VAL connected with NFA ver B CC(C)[C@H](N)C(=O)N[C@@H](Cc1ccccc1)C(N)=O inline image −3.313
NFA modified version 1 NC(=O)[C@H](Cc1ccccc1)NC=O inline image −3.045
AG7 connected with NFA version 1 CC[C@@H](CC∖C=C∖N[C@@H](Cc1ccccc1)C(N)=O)NC(=O)[C@H](C)C[C@H](O)[C@@H](NC(C)=O)C(C)C inline image −2.484
MPD ver 4005 C[C@@H](O)CC(C)(C)O inline image −2.316
MPD ver 4006 C[C@@H](O)CC(C)(C)O inline image −2.202
MPD ver 4001 C[C@@H](O)CC(C)(C)O inline image −2.131
MPD ver 1004 C[C@@H](O)CC(C)(C)O inline image −2.029
MPD ver 1003 C[C@@H](O)CC(C)(C)O inline image −2.005
MPD modified C[C@@H](O)CC(C)(C)O inline image −1.892
MPD ver 1002 C[C@@H](O)CC(C)(C)O inline image −1.819
MPD ver 1001 C[C@@H](O)CC(C)(C)O inline image −1.709
MPD ver 4002 C[C@@H](O)CC(C)(C)O inline image −1.709
MPD ver 4004 C[C@@H](O)CC(C)(C)O inline image −1.677
MPD ver 4003 C[C@@H](O)CC(C)(C)O inline image −1.645
Chloroacetone CC(=O)CCl inline image −1.306
Details are in the caption following the image

Two dimensional chemical structures presented in Table II: AG7 (A.1), modified AG7 (A.2), NSTLQ (A.3), VNSTLQ (A.4), modified AG7 (A.5), NSTLQ modified (A.6), VNSTLQ modified (A.7), modified AG7 connected with NFA modified (A.8), other version of modified AG7 (B.9), different modification of AG7 (B.10), other modified AG7 (B.11), modified NFA connected with VAL (B.12), modified AG7 connected with NFA (B.13), modified AG7 connected with modified NFA (B.14), modified NFA (B.15), 2-methyl-2,4-pentanediol (MPD) compound (B.16), and Chloroacetone (B.17).

In conclusion, a series of lead compounds as potential SARS protease inhibitors have been preliminarily identified using a structure-based in silico virtual drug discovery approach. However, it is stressed that no MPD analogs have yet been reported to date relative to SARS protease inhibitor drug discovery (10–17,30–34). Also importantly, MPD is a chemical additive used for crystallization of biologic macromolecules, and it has been determined in co-structures with varying proteins, and not limited to only the SCOP family of viral cysteine proteases of trypsin-fold (35–43). Relative to SARS protease, MPD may provides a candidate lead compound (or fragment) for drug discovery. Several of the selected MPD analogs identified in this study are being tested experimentally (SEPSDA Sino-European Commission Project) with respect to their potential SARS protease inhibitory properties.

Acknowledgments

This work was supported by EC BioSapiens (LHSG-CT-2003-503265) and EC SEPSDA (SP22-CT-2004-003831) 6FP projects, EMBO Installation Grant to KG as well as the Polish Ministry of Education and Science (PBZ-MNiI-2/1/2005 and 2P05A00130). MvG would like to thank the Foundation for Polish Science for the fellowship.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.