Volume 2023, Issue 1 1787485

Research Article

Open Access

Targeting Essential Hypothetical Proteins of Pseudomonas aeruginosa PAO1 for Mining of Novel Therapeutics: An In Silico Approach

Atikur Rahman

orcid.org/0000-0002-2022-711X

Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh just.edu.bd

Search for more papers by this author

Md. Takim Sarker,

Md. Takim Sarker

orcid.org/0000-0002-6259-2908

Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh just.edu.bd

Search for more papers by this author

Md Ashiqul Islam,

Md Ashiqul Islam

Department of Chemistry and Biochemistry, University of Windsor, Canada uwindsor.ca

Search for more papers by this author

Mohammad Uzzal Hossain,

Mohammad Uzzal Hossain

Bioinformatics Division, National Institute of Biotechnology, Ganakbari, Ashulia, Savar, Dhaka 1349, Bangladesh nib.gov.bd

Search for more papers by this author

Mahmudul Hasan,

Mahmudul Hasan

Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh sau.ac.bd

Search for more papers by this author

Tasmina Ferdous Susmi,

Corresponding Author

Tasmina Ferdous Susmi

[email protected]

orcid.org/0000-0002-2628-2371

Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh just.edu.bd

Search for more papers by this author

Atikur Rahman,

Atikur Rahman

orcid.org/0000-0002-2022-711X

Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh just.edu.bd

Search for more papers by this author

Md. Takim Sarker,

Md. Takim Sarker

orcid.org/0000-0002-6259-2908

Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh just.edu.bd

Search for more papers by this author

Md Ashiqul Islam,

Md Ashiqul Islam

Department of Chemistry and Biochemistry, University of Windsor, Canada uwindsor.ca

Search for more papers by this author

Mohammad Uzzal Hossain,

Mohammad Uzzal Hossain

Bioinformatics Division, National Institute of Biotechnology, Ganakbari, Ashulia, Savar, Dhaka 1349, Bangladesh nib.gov.bd

Search for more papers by this author

Mahmudul Hasan,

Mahmudul Hasan

Department of Pharmaceuticals and Industrial Biotechnology, Sylhet Agricultural University, Sylhet 3100, Bangladesh sau.ac.bd

Search for more papers by this author

Tasmina Ferdous Susmi,

Corresponding Author

Tasmina Ferdous Susmi

[email protected]

orcid.org/0000-0002-2628-2371

Department of Genetic Engineering and Biotechnology, Faculty of Biological Science and Technology, Jashore University of Science and Technology, Jashore 7408, Bangladesh just.edu.bd

Search for more papers by this author

First published: 11 April 2023

https://doi.org/10.1155/2023/1787485

Citations: 8

Academic Editor: Luis Morales-Quintana

Share a link

Email
Wechat
Bluesky

Abstract

As an omnipresent opportunistic bacterium, Pseudomonas aeruginosa PAO1 is responsible for acute and chronic infection in immunocompromised individuals. Currently, this bacterium is on WHO’s red list where new antibiotics are urgently required for the treatment. Finding essential genes and essential hypothetical proteins (EHP) can be crucial in identifying novel druggable targets and therapeutics. This study is aimed at characterizing these EHPs and analyzing subcellular and physiochemical properties, PPI network, nonhomologous analysis against humans, virulence factor and novel drug target prediction, and finally structural analysis of the identified target employing around 42 robust bioinformatics tools/databases, the output of which was evaluated using the ROC analysis. The study discovered 18 EHPs from 336 essential genes, with domain and functional annotation revealing that 50% of these proteins belong to the enzyme category. The majority are cytoplasmic and cytoplasmic membrane proteins, with half being stable proteins subjected to PPIs network analysis. The network contains 261 nodes and 269 edges for 9 proteins of interest, with 11 hubs containing at least three nodes each. Finally, a pipeline builder predicts 7 proteins with novel drug targets, 5 nonhomologous proteins against human proteome, human antitargets, and human gut flora, and 3 virulent proteins. Among these, homology modeling of NP_249450 and NP_251676 was done, and the Ramachandran plot analysis revealed that more than 94% of the residues were in the preferred region. By analyzing functional attributes and virulence characteristics, the findings of this study may facilitate the development of innovative antibacterial drug targets and drugs of Pseudomonas aeruginosa PAO1.

1. Introduction

Pseudomonas aeruginosa, often termed as an opportunistic pathogen, is a rod-shaped, motile, Gram-negative, and nonfermenting bacteria found ubiquitously in soil and water as well as found in colonies on the animate part of plants and animals including humans [1, 2]. Isolates collected from diverse environments reported 272 species of the Pseudomonas genus in which Pseudomonas aeruginosa PA01 is one of the most commonly used laboratory strains as well as employed to generate publicly accessible genomic resources [2, 3]. Pseudomonas aeruginosa PA01 is the first-ever strain of its species having a completely sequenced genome from a chronic lesion isolate dated from the 1950s. The genome is 6.3 Mbp long that includes 5570 ORFs, roughly 89.4% coding regions, and 0.4% stable RNAs. This was the largest bacterial genome available during the year 2000 when sequenced. However, despite the same species, different genomic and phenotypic changes are found across isolates of P. aeruginosa PA01 strains stored in different laboratories worldwide [2, 4].

A broad spectrum of host targets including nematodes, insects, plants, and mammals are susceptible to infection by P. aeruginosa species [2]. It is found harmless in normal gut microflora but causes dangerous infections in critically ill ICU patients [5]. This trend in pathogenesis makes them opportunistic pathogens [2]. It is regarded to be within the top three causative agents for infection caused by opportunistic pathogens annually in the community as well as related to (10-15%) of hospital-acquired infections [6]. In 2015, a report from the European Antimicrobial Resistance Surveillance Network (EARS-Net) on European regions revealed that around 13.7% of strains of P. aeruginosa had acquired resistance to a minimum of three antimicrobial communities whereas about 5.5% of the strains were resistant against five antimicrobial groups. Every year in the USA alone, roughly 440 deaths and 51,000 infection cases are caused by P. aeruginosa of which over 13% result from multidrug resistant Pseudomonas strains. As a consequence, P. aeruginosa has been announced as one of the greatest threats to public health amongst the 12 bacterial families from the antibiotic-resistance priority pathogens enlisted by WHO in 2017 [7]. It is also involved with some other nosocomial infections like bloodstream infection, gastrointestinal infection, and urinary tract infection [5]. This bacterium poses a devastating impact on lung disease patients with cystic fibrosis (CF). Apart from CF, it is equally deadly for individuals having compromised immune systems like AIDS, cancer, burn lesions, and eye injuries. The situation can get even worse despite having robust antibiotic medication since P. aeruginosa possess a wide spectrum of resistance against antibiotics including aminoglycosides, β-lactams, and fluoroquinolones. Therefore, disease stress subsequently results in organ failure and eventually death [2, 5].

Pseudomonas aeruginosa adopts some survival strategy that helps them to resist environmental stressors and dodge host immune responses [8]. Some of these survival tools include biofilm formation, enzyme promiscuity, horizontal gene transfer, and quorum sensing [5]. It is one of the well-studied strains for investigating the bacterial biofilm formation process [9]. Three polysaccharides, alginate, Pel, and Psl, were discovered to be important for bacterial attachment and biofilm formation in P. aeruginosa PA01 [8]. Over 500 regulatory genes have been recorded from the P. aeruginosa PA01 genome investigation [10]. There is still a lot to discover for a better understanding of the intracellular signaling pathways and several other regulatory mechanisms involving many proteins that are still uncharacterized. Thus, domain analysis and functional annotation of essential hypothetical proteins (EHPs) can pave the way to identify new potential targets facilitating drug repositioning development. Since these EHPs are needed for cellular, biological, and metabolic processes, their deletion or mutation can be fatal to the species. These prospective drug targets may be crucial in the development of antimicrobial drugs [11].

In this study, an in silico-based approach has been adopted for the characterization of proteins with unknown functions via different algorithm-based tools and software. Besides, a network-based analysis was directed to find interaction with critically connected hub proteins that may control major molecular activities together. The pipeline builder was employed to analyze nonhomologous proteins against humans, human antitargets, and the proteome of the gut microbiota, as well as predict virulence factors and novel drug targets. Finally, using reliable software, the structural conformation of our protein of interest with potential druggability was predicted and assessed. Thus, our analysis mainly involves the identification of essential hypothetical proteins in P. aeruginosa PA01 and can further lead to the discovery of novel proteins of therapeutic targets.

2. Materials and Methods

2.1. Sequence Retrieval and Analysis

The full proteome of P. aeruginosa PAO1 (strain ATCC 15692) was retrieved from the NCBI genome database. The bacterial complete genome contains 6.3 million base pairs and 5564 proteins [4]. The essential genes database (DEG) is then subjected to find out the essential hypothetical proteins (EHPs) from this complete proteome list by employing a series of unique keywords [12]. To begin, we looked for similar hypothetical proteins where we found 2181 proteins among these 5564 proteins. Following that, we searched for the exact matches of hypothetical proteins and exact matches of conserved hypothetical proteins and found 1540 and 625 hypothetical proteins, respectively. According to the DEG database, this bacterial proteome contains 336 essential proteins (EPs). Essential proteins are those that are inevitable and adequate for a living cell to survive under ideal circumstances. Consequently, we discovered 29 essential hypothetical proteins by manual curation whose genomes were entirely conserved among the 336 EPs. The status (reviewed or unreviewed), annotation score (1–5), structural and functional availability, and other factors were used to further validate these 29 EHPs from the NCBI and UniProt databases. Eventually, we excluded 11 proteins and targeted 18 essential hypothetical proteins whose FASTA sequences were used to facilitate further analysis throughout this study. The complete framework of our investigation is presented in Figure 1, and all the databases/software used in this study are in Table 1.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

Schematic representation of the whole methodology used in our investigation. There are four segments: segment I: functional annotation and properties characterization; segment II: protein-protein interaction network; segment III: nonhomology analysis, virulence factor prediction, and druggability identification; and segment IV: structure prediction and structure validation.

Table 1. Bioinformatics resources used in the study.

Serial No.	Server/database	Version	Using reason	Link	References
Functional annotation
1	DEG	15.2	Finding essential HPs	http://tubic.tju.edu.cn/deg/	[12]
2	GO FEAT	1.0	For functional annotation	http://computationalbiology.ufpa.br/gofeat/	[13]
3	CDART		Protein homology search domain architecture	https://www.ncbi.nlm.nih.gov/Structure/lexington/lexington.cgi	[14]
4	SMART		Identification and annotation of protein domains	http://smart.embl-heidelberg.de/	[15]
5	SUPERFAMILY	1.75	For functional annotation	https://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/	[16]
6	Pfam	34.0	Determine protein families	http://pfam.xfam.org/	[17]
7	SVMProt		Protein functional family prediction	http://bidd.group/cgi-bin/svmprot/svmprot.cgi	[18]
8	CATH	4.3	Protein domains into superfamily	http://www.cathdb.info/	[19]
9	InterPro	84.0	Classification of protein families	https://www.ebi.ac.uk/interpro/	[20]
10	HHPred		Sequence similarity searching, prediction of sequence features, and sequence classification.	https://toolkit.tuebingen.mpg.de/tools/hhpred	[21]
11	PANNZER		Functional annotation of uncharacterized proteins	http://ekhidna2.biocenter.helsinki.fi/sanspanz/	[22]
12	PFP		Automated protein function Gene Ontology prediction	https://kiharalab.org/web/pfp.php	[23]
13	ESG		Protein function prediction	https://kiharalab.org/web/esg.php	[24]
Subcellular localization
14	Psortb	3.0.2	Subcellular localization	https://www.psort.org/psortb/	[25]
15	CELLO	v.2.5	Subcellular localization	http://cello.life.nctu.edu.tw/	[26, 27]
16	TMHMM	v. 2.0	Prediction of transmembrane helices in proteins	http://www.cbs.dtu.dk/services/TMHMM-2.0/	[28]
17	Phobius		Prediction of transmembrane helices in proteins	https://phobius.sbc.su.se/index.html	[29]
18	HMMTOP	2.0	Prediction of transmembrane helices in proteins	http://www.enzim.hu/hmmtop/index.php	[30]
19	CCTOP	1.00	Prediction of transmembrane helices in proteins	http://cctop.enzim.ttk.mta.hu/	[31]
20	PROTTER	1.0	Predicts the presence and location of signal peptide cleavage sites in amino acid sequences and prediction of transmembrane helices in proteins	https://wlab.ethz.ch/protter/start/	[32]
21	SignalP 4.1	4.1	Predicts the presence and location of signal peptide cleavage sites in amino acid sequences	http://www.cbs.dtu.dk/services/SignalP-4.1/	[33]
22	PrediSi		Prediction of signal peptides	http://www.predisi.de/	[34]
Physicochemical properties
23	ProtParam		Computation of various physical and chemical parameters for a given protein	https://web.expasy.org/protparam/	[36]
Protein-protein interaction
24	NetworkAnalyst	v3.0	PPI construction and visualization	https://www.networkanalyst.ca/NetworkAnalyst/uploads/ListUploadView.xhtml	[39]
Nonhomology analysis
25	PBIT		Pipeline building for nonhomology analysis	http://www.pbit.bicnirrh.res.in/	[41]
Virulence factor analysis
26	VICMpred		Functional classification of proteins of bacteria into virulence factors	https://webs.iiitd.edu.in/raghava/vicmpred/index.html	[43]
27	VirulentPred		VirulentPred is a bacterial virulent protein prediction method	http://bioinfo.icgeb.res.in/virulent/	[44]
28	MP3		Predict pathogenic proteins in both genomic and metagenomic datasets	http://metagenomics.iiserb.ac.in/mp3/tutorial.php	[45]
Druggability analysis
29	DrugBank	5.0	Identification of information on drugs and drug targets	https://go.drugbank.com/	[47]
Secondary structure analysis
30	SOPMA		Secondary structure prediction	https://npsa-prabi.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_sopma.html	[48]
31	PSIPRED	4.0	Secondary structure prediction	http://bioinf.cs.ucl.ac.uk/psipred/	[49]
3D structure analysis
32	SWISS-MODEL		Protein 3D structure determination	https://swissmodel.expasy.org/	[50]
33	Robetta		Protein 3D structure determination	https://http://robetta.bakerlab.org/
34	Galaxy refine		Refinement of protein structure	http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE	[52]
35	PyMOL software	2.0	Structure visualization	https://pymol.org/2/
Validation check
36	ERRAT		3D structure validation	https://saves.mbi.ucla.edu/	[53]
37	VARIFY 3D		3D structure validation	https://saves.mbi.ucla.edu/	[54, 55]
38	PROVE		3D structure validation	https://saves.mbi.ucla.edu/	[56]
39	WHATCHECK		3D structure validation	https://saves.mbi.ucla.edu/	[57]
40	PROCHECK		3D structure validation	https://saves.mbi.ucla.edu/	[58]
41	Ramachandran plot		3D structure validation	http://services.mbi.ucla.edu/SAVES/Ramachandran/	[58]
42	ROC analysis		This web page calculates a receiver operating characteristic (ROC) curve from data	http://www.rad.jhmi.edu/jeng/javarad/roc/JROCFITi.html

2.2. Segment I: Functional Annotation and Properties Characterization

2.2.1. Functional Annotation and Domain Analysis of EHPs

The functional annotation of 18 Pseudomonas aeruginosa EHPs was unveiled by using numerous publicly accessible databases and tools. To gain more knowledge about the molecular functions and biological processes of the EHPs, we consider protein superfamily, family, conserved domain analysis, and Gene Ontology (GO) analysis. Using an online server GO FEAT, for the functional characterization by homology searching through multiple databases such as NCBI, Uniprot, and EMBL, a preliminary assessment was performed to see if any of the HPs were allocated a family and/or protein domain [13]. After preliminary evaluation, proteins conserved domains and protein functions based on domain architecture were determined by using CDART [14] from the conserved domain database (CDD) and SMART [15], respectively. For functional analysis, SUPERFAMILY 1.75 [16], Pfam 34.0 [17], SVMProt [18], CATH 4.3 [19], InterPro 84.0 [20], and HHPred [21] were used to identify the protein superfamily, functional family, domain, and essential sites based on similarity. PANNZER [22], PFP [23], and ESG [24] tools were used for high-throughput functional annotation of EHPs, which provided Gene Ontology information with z-scores as well as brief explanations of the annotated protein’s functionality. These GO terms facilitate understanding a gene’s molecular functions, physiological roles, and cellular mechanism, which refers to the location of the gene’s product. We used default parameters for all databases.

2.2.2. Subcellular Localization and Transmembrane Helices Analysis

Subcellular localization of a protein can help to infer much information about that protein’s function. In our study, we employed several databases to annotate the subcellular localization of the selected 9 EHPs which include PSORTb [25], CELLO [26, 27], TMHMM [28], Phobius [29], HMMTOP [30], CCTOP [31], PROTTER [32], SignalP 4.1 [33], and PrediSi [34]. According to PSORTb and CELLO, the proteins were distinguished by 5 major cellular positions: cytoplasmic, inner membrane, periplasmic, outer membrane, and extracellular. To predict transmembrane helices, TMHMM, Phobius, HMMTOP, CCTOP, and PROTTER were employed. Information of transmembrane helices location is somehow beneficial for the conformation of possible 3D structures [35]. Besides, it is necessary to find out signal peptides which are the N-terminal part of a protein. Mainly, they are targeted to the endoplasmic reticulum to the secretory pathway, and it is considered the way of protein localization prediction [33]. Signal peptides were identified by using these SignalP 4.1, PROTTER and PrediSi databases.

2.2.3. Analysis of Physicochemical Properties

The Expasy’s ProtParam server [36] was utilized for the analysis of the physicochemical properties of 18 selected essential hypothetical proteins (EHPs) which include molecular weight, theoretical pI (isoelectric point), formula, the total number of positively and negatively charged residues, instability index, aliphatic index, and grand average of hydropathicity (GRAVY).

2.3. Segment II: Protein-Protein Interaction Network of 9 EHPs

2.3.1. Protein-Protein Interaction Network Analysis

The function of a protein molecule often is modulated by its surrounding protein networks [37]. For this reason, it is important to discover the protein network to get an insight into the functional association of a particular protein [38]. In this study, we have used NetworkAnalyst v3.0 for network building [39]. We have inputted a list of genes containing 9 EHPs with their Uniprot IDs (Q9HXM8, Q9HWT5, Q9HVM2, Q9HVF5, Q9I5H0, Q9HZL8, Q9HYC8, Q9HXV5, and Q9HUH3) since all of these proteins were found stable through the physicochemical analysis. The generic PPI option under protein-protein interactions (PPI) was checked for further processing. The P. aeruginosa PA01 interactome database provided with robust computational prediction and experimentally validated data was adopted for network building. Next, the corresponding network was explored for further analysis in Cytoscape. Cytoscape is a standalone software that enables several topological parameter analyses like discovering the shortest possible path, node degree distribution, and clustering hub genes of the network [40].

2.4. Segment III: Nonhomology Analysis, Virulence Factor Prediction, and Druggability Identification

2.4.1. Nonhomology Analysis against Human Proteome and Human Antitargets

Several features were needed for the identification of the drug target for any human disease. For this reason, to analyze nonhomology aspects, we tried the pipeline builder for the identification of target (PBIT) server for the nonhomology analysis against human proteome, against human antitargets and human gut flora proteomes [41]. Using the pipeline builder, we first identified human homologous proteins that share high sequence similarity with human proteome. The sequence similarity of the inputted 9 sequences was figured using the BLAST algorithm where the E − value > 0.005 and %sequence identity < 50 were set. 8 of the 9 input sequences are nonhomologous that were selected for further investigation. These homologous proteins were filtered to avoid the undesirable toxic-effects for these similarities. Filtered and selected 8 nonhomologous proteins were further employed in the pipeline to recognize nonhomologous proteins against human antitargets, proteins that contain harmful effects due to the impact of a drug named antitargets [41]. To screen out the significant similar sequence with familiar human antitargets, the PBIT database uses the BLAST algorithm where they utilize those human antitarget proteins based on different literature [41]. Again E − value > 0.005 and %sequence identity < 50 were set, and all nonhomologous sequences were selected.

2.4.2. Nonhomology Analysis against Human Gut Flora Proteomes

PBIT also analyzes human gut flora proteomes that make it easier to find out those highly similar sequences with human gut microbiota. It is known that gut microbiota plays an important role in human health that includes immune, metabolic, and neurobehavioral characters [42]. That is why it is necessary to design such drugs whose target is nonhomologous protein sequence of the gut microbiome. As result, such drugs could not able to kill or hamper essential microbes found in the human gut. For this, the pipeline builder for identification of target (PBIT) server was again used to identify nonhomologous proteins against gut microbiota proteomes. As the third step of the pipeline builder, selected proteins were employed where E − value > 0.001 and %sequence identity < 50 were set. Now nonhomologous proteins were selected for the next investigation.

2.4.3. Analysis of Virulence Factor

Understanding the pathogenesis mechanism through the analysis of virulence factors can be a key to the discovery of new promising therapeutic targets [38]. Therefore, we have used VICMpred [43], VirulentPred [44], and MP3 [45] for the identification of the virulence property of the 9 EHPs. We have collected the results predicted combined by 2 out of the 3 tools. All the results were collected by using the provided default options by the servers.

2.4.4. Druggability Analysis and New Target Identification

Identification of a new drug target can be a new window for the discovery and development of a new drug against infectious or serious diseases [46]. Druggability analysis is the examination of a protein that has the possible capability or binding affinity towards a drug or drug-like molecules. This druggability analysis can introduce a new drug target against a drug. Here we used DrugBank, a comprehensive online database that contains information on drugs and drug targets [47]. Target identification segment was utilized for this purpose, and amino acid sequences in the FASTA format were the search index. All other BLAST parameters and filters were set as default where the expectation value was set to 0.00001.

2.5. Segment IV: Structure Prediction and Structure Validation

2.5.1. Secondary Structure Analysis

The interactions between neighboring polypeptides mainly design a protein’s secondary structure. When the elements of the secondary structure have folded together among each other, the 3D structure of the protein is formed. The databases namely SOPMA [48] and PSIPRED [49] provide the secondary structure of a protein. These databases were used to predict the structure where the protein sequence in the FASTA format was the searching index for the websites and the rest of the parameters were set as default.

2.5.2. Essential Hypothetical Proteins 3D Structure Modeling

The protein 3D structure was determined based on two methods: template-based homology modeling and trRosetta methods. The three-dimensional structure of the targeted protein was generated using the SWISS-MODEL server, which uses template search and then aligns the target sequence with the template structure to create the homology model [50]. To construct the model with an accuracy equal to low-resolution X-ray crystallography, we only consider templates with ≥30% sequence identity. Then the server Robetta (https://robetta.bakerlab.org/) was employed to predict the 3D model by using the trRosetta algorithm. It is a deep learning method based on direct energy minimizations that is the most accurate process of structure building provided by this server [51]. Finally, the built structure was optimized using Galaxy Refiner, with the best-refined model based on the lowest MolProbity and highest GDT-HA value [52]. Consequently, the PyMOL 2.0 visualization software is used to visualize all of the refined structure files, which are in .pdb format.

2.5.3. Protein Structure Validation Assessment

The reliability of a predicted 3D structure of a protein can be assessed by using various quality assessment tools. Here, in this study, we used SAVES version 6.0 (https://saves.mbi.ucla.edu/) which is a metaserver that runs six programs at once to check and validate protein structure during and after model refinement. This server validates the stereochemical consistency of a protein structure by performing residue by residue geometry and overall structure geometry. Furthermore, it also compares the results to good structures to see if an atomic model (3D) is compatible with its amino acid sequence (1D) by assigning a structural class based on its location and environment (alpha, beta, loop, polar, nonpolar, etc.). We run ERRAT [53], VARIFY 3D [54, 55], PROVE [56], WHATCHECK [57], PROCHECK [58], and Ramachandran plot [58] from SAVES v6.0 to determine the consistency of the constructed model.

2.6. Molecular Dynamics Simulation

We used an online-based user-friendly interface to make a recommendation for our predicted protein structures. Here, we used template-based homology modeling and ab initio modeling to predict the structure of our targeted protein candidate. We used the WebGro (https://simlab.uams.edu/) GROMACS simulation package to simulate protein in water dynamics simulation for over 50 ns. The simulation system works by the following steps such as preprocessing, energy minimization, equilibration, molecular dynamics, trajectory analysis, and result generation. The trajectory analysis was performed to measure root mean square deviations (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent-accessible surface area (SASA), and hydrogen bonds.

2.7. Determination of Binding Site

The binding site or active site is known as an important portion of any protein/enzyme. Mainly that particular portion is involved with attaching molecules to initiate a certain reaction. Though our target is to find a novel therapeutic target, it is necessary to find the active sites of our target proteins. To find the active sites, we utilized the website called prank web (https://prankweb.cz/) to determine all possible binding targets of our selected protein.

2.8. Performance Assessment of the Study

In our study, we have applied the receiver operating characteristic (ROC) analysis for validating the accuracy of our bioinformatics tools used for the functional annotation of EHPs from P. aeruginosa [59]. We have collected 100 arbitrary protein functions of P. aeruginosa along with their gene names using the same pipeline used prior to our study in Supplementary file 1 and 2. Two integer values namely “1” as a truly positive and “0” as a truly negative were assigned to classify the prediction. The confidence rating was denoted by “2,” “3,” “4,” and “5,” respectively. The higher number denotes a greater level of confidence. The input file consists of 2 columns where 1st column contains binary numbers like 1 (true positive) and 0 (true negative), and the 2nd column contains a rate of confidence ranging from 2 to 5. For the present study, six levels were considered for determining the diagnostic efficacy. The ROC analysis was used for 12 individual functional annotation tools. The data were submitted to an online-based ROC curve-generating web server in format-1 [60]. The output result includes accuracy, sensitivity, specificity, and the ROC area (Supplementary File 1 and 2). The accuracy of our adopted pipeline is 97.42% which indicates a very high and reliable result for the bioinformatics tools that we used in our study.

3. Results

3.1. Functional Annotation and Domain Analysis of EHPs

The functional annotation of the 18 EHPs was examined using 12 reliable platforms that predict protein superfamily, family, conserved domains, and Gene Ontology (GO) terms. Here, the functional annotation was assigned with high confidence as we considered only that function that was similar in three or more programs. Consequently, the functional characterization categorizes these proteins into 9 functional categories. The first category was enzymes (deaminases, dehydrogenases, helicases, transferases, DNases, oxidoreductases, kinases, etc.) where nine of the 18 EHPs are included (NP_252456.1, NP_252782.1, NP_253095.1, NP_253326.1, NP_250846.1, NP_252375.1, NP_253678.1, NP_253679.1, and NP_253685.1). The two proteins are transporter proteins (NP_253252.1 and NP_251676.1). The remaining seven proteins are in the rest seven categories that are bacterial outer-membrane protein, folate-binding protein, peptidase inhibitor protein, electron transporter protein, chromosome partition protein, ribosome maturation protein, and pathogenesis-related protein. Table 2 enlists the 18 EHP superfamily, functional family, molecular functions, and biological functions, as well as their GO IDs and database IDs. Among these proteins, NP_249450.1 is a member of the folate-binding superfamily, with the aminomethyl transferase folate-binding domain as its functional family. Aminomethyl transferase and transaminase activity are the two molecular functions of this protein. Another protein sequence of NP_251676.1 was predicted belonging to the functional family that represents the periplasmic core domain found in a variety of ABC transporters. ATP binding, ATPase-coupled xenobiotic transmembrane transporter activity, efflux transmembrane transporter activity, and ATPase activity are some of the molecular functions of this protein. According to the GO annotation, there were 65 GO terminologies in total for the molecular function and biological process. These GO IDs can be used to retrieve Gene Ontology analysis of these 18 EHPs.

Table 2. Functional annotations of 18 essential hypothetical proteins.

Serial No.	RefSeq	Superfamily	Family	Gene Ontology		GO ID/database integration
Serial No.	RefSeq	Superfamily	Family	Biological process	Molecular function	GO ID/database integration
1	NP_252456.1	Cytidine deaminase-like	Deoxycytidylate deaminase-like	(1) tRNA wobble adenosine to inosine editing	(1) Hydrolase activity (2) Zinc ion binding (3) Catalytic activity (4) tRNA-specific adenosine 34-deaminase activity	(GO:0002100) (GO:0016787) (GO:0008270) (GO:0003824) (GO:0052717) Uniprot (W1MGT3) Interpro (W1MGT3) Interpro (IPR016192) Interpro (IPR002125) Interpro (IPR016193) Interpro (IPR028883) Pfam (PF14437) NCBI (532131853) EMBL (ATNK01000135)
2	NP_252782.1	Hotdog thioesterase/thiol ester dehydratase-isomerase	Thioesterase	(1) Histidine biosynthetic process	(1) Histidinol dehydrogenase activity (2) Zinc ion binding (3) NAD binding	(GO: 0000105) (GO: 0004399) (GO: 0008270) (GO: 0051287) Uniprot (A0A448BY09) Interpro (A0A448BY09) Interpro (IPR029069) Interpro (IPR006683) Pfam (PF03061) EMBL (LR134300)
3	NP_253095.1	Uncharacterized protein	Dna[CI] antecedent, DciA	(1) Protein dephosphorylation	(1) Zinc ion binding (2) Protein tyrosine/serine/threonine phosphatase activity	(GO: 0006470) (GO: 0008270) (GO: 0008138) Uniprot (Q9HW03) Interpro (Q9HW03) KEGG (pae: PA4405) KEGG GM (pae: PA4405) Interpro (IPR007922) Pfam (PF05258) NCBI(489212117) EMBL (AE004091)
4	NP_253252.1	MATE_like	Lipid II flippaseMurJ, polysaccharide biosynthesis C-terminal domain	(1) Cell wall organization (2) Peptidoglycan biosynthetic process (3) Regulation of cell shape	(1) Lipid-linked peptidoglycan transporter activity	(GO: 0071555) (GO: 0009252) (GO: 0008360) (GO: 0015648) Uniprot (W1MQM4) Interpro (W1MQM4) Interpro (IPR004268) Pfam (PF03023) NCBI (532135099) EMBL (ATNK01000069)
5	NP_253326.1	Glycerol-3-phosphate (1)-acyltransferase	Glycerol-3-phosphate (1)-acyltransferase	(1) D-Galacturonate catabolic process (2) D-Glucuronate catabolic process	(1) Transferase activity, transferring acyl groups	Uniprot (Q9HVF5) Interpro (Q9HVF5) KEGG (pae: PA4636) KEGG GM (pae: PA4636) Interpro (IPR002123) Pfam (PF01553) NCBI (489205664) EMBL (AE004091)
6	NP_253368.1	TonB-dependent receptor family	Energy transducer TonB	(1) Viral process	(1) GTP binding	(GO: 0016032) (GO: 0005525) Uniprot (Q9HVB6) Interpro (Q9HVB6) KEGG (pae: PA4679) KEGG GM (pae: PA4679) NCBI (489212281) EMBL (AE004091)
7	NP_249450.1	Folate-binding	Aminomethyl transferase folate-binding domain	(1) Iron-sulfur cluster assembly (2) Glycine decarboxylation via glycine cleavage system	(1) Aminomethyl transferase activity (2) Transaminase activity	(GO: 0019464) (GO: 0004047) (GO: 0008483) Superfamily (GO: 0016226) Uniprot (Q9I5H0) Interpro (Q9I5H0) KEGG (pae: PA0759) KEGG GM(pae:PA0759) Interpro (IPR029043) Interpro (IPR017703) NCBI (489205124) EMBL (AE004091)
8	NP_250659.1	Inhibitor_I78	Peptidase inhibitor I78 family	(1) Cell adhesion (2) Homophilic cell adhesion via plasma membrane adhesion molecules	(1) Calcium ion binding (2) Serine-type endopeptidase inhibitor activity	(GO: 0007155) (GO: 0007156) (GO: 0005509) (GO: 0004867) SMART Uniprot (Q9I2D5) Interpro (Q9I2D5) KEGG (pae: PA1969) KEGG GM (pae:PA1969) Interpro (IPR021719) Pfam (PF11720) NCBI (489210309) EMBL (AE004091)
9	NP_250846.1	DNase I-like	Endonuclease/exonuclease/phosphatase	N/A	(1) Endonuclease activity (2) Exonuclease activity	SMART (GO: 0004519) (GO: 0004527) Uniprot (A0A6N0KLP9) Interpro (A0A6N0KLP9) Interpro (IPR036691) Interpro (IPR005135) Pfam (PF03372) EMBL (CP054572)
10	NP_251676.1	LolE	MacB-like periplasmic core domain, lipoprotein-releasing ABC transporter permease	(1) Lipoprotein localization to outer membrane (2) Lipoprotein transport (3) Protein localization to outer membrane	(1) ATP binding (2) ATPase-coupled xenobiotic transmembrane transporter activity (3) Efflux transmembrane transporter activity (4) ATPase activity	SMART (GO: 0044874) (GO: 0042953) (GO: 0089705) (GO: 0005524) (GO: 0008559) (GO: 0015562) (GO: 0016887) Uniprot (Q9HZL8) Interpro (Q9HZL8) KEGG (pae:PA2986) KEGG GM (pae:PA2986) Interpro (IPR003838) Interpro (IPR011925) Interpro (IPR025857) Pfam (PF02687) Pfam (PF12704) NCBI (489210993) EMBL (AE004091)
11	NP_252171.1	Fe-S cluster assembly (FSCA) domain-like	Iron-sulfur cluster assembly protein	(1) Iron-sulfur cluster assembly	(1) ATPase activity (2) ATP binding (3) Iron-sulfur cluster binding (4) Metal ion binding	SMART (GO: 0016226) (GO: 0016887) (GO: 0005524) (GO: 0051536) (GO: 0046872) Uniprot (A0A3S4MTX6) Interpro (A0A3S4MTX6) Interpro (IPR034904) Interpro (IPR002744) Interpro (IPR019591) Interpro (IPR000808) Interpro (IPR027417) Interpro (IPR033756) Pfam (PF01883) Pfam (PF10609) EMBL (LR134300)
12	NP_252375.1	Carbam_trans_N (carbamoyltransferase N-terminus)	tRNA N6-adenosine threonyl carbamoyltransferase	(1) tRNA threonyl carbamoyl adenosine modification	(1) Metalloendopeptidase activity (2) Iron ion binding (3) N(6)-L-Threonyl carbamoyl adenine synthase activity	SMART (GO: 0002949) (GO: 0004222) (GO: 0005506) (GO: 0061711) Uniprot (Q9HXV5) Interpro (Q9HXV5) KEGG (pae:PA3685) KEGG GM (pae:PA3685) Interpro (IPR043129) Interpro (IPR000905) Interpro (IPR022496) Pfam (PF00814) NCBI (887492937) EMBL (AE004091)
13	NP_253374.1	MukE (MukE is part of the MukBEF condensin complex)	Bacterial condensin subunit MukE	(1) Cell cycle (2) Cell division (3) DNA replication (4) Chromosome segregation (5) Chromosome condensation	(1) GTP binding (2) GTPase activity (3) Translation elongation factor activity (4) ATP binding	(GO: 0007049) (GO:0051301) (GO:0006260) (GO:0007059) (GO:0030261) (GO:0005525) (GO:0003924) (GO:0003746) (GO:0005524) Uniprot (A0A448BSU5) Interpro (A0A448BSU5) Interpro (IPR042038) EMBL (LR134300)
14	NP_253434.1	(1) RimP N-terminal domain (2) RimP C-terminal SH3 domain (also known as yhbC)	RimP N-terminal domain, RimP C-terminal SH3 domain	(1) Ribosomal small subunit biogenesis	N/A	SMART (GO:0042274) Uniprot (A0A3S4MTG9) Interpro (A0A3S4MTG9) Interpro (IPR003728) Interpro (IPR028998) Interpro (IPR036847) Interpro (IPR028989) Interpro (IPR035956) Pfam (PF02576) Pfam (PF17384) EMBL (LR134300)
15	NP_253455.1	Bet v1-like	Polyketide cyclase/dehydrase and lipid transport	(1) Ubiquinone biosynthetic process (2) Cellular respiration	(1) Ubiquinone binding	(GO: 0006744) (GO: 0045333) (GO: 0048039) Superfamily 1.75 SMART IPR005031 Uniprot (A0A448BT54) Interpro (A0A448BT54) Interpro (IPR005031) Interpro (IPR023393) Pfam (PF03364) EMBL (LR134300)
16	NP_253678.1	FAD/NAD(P)-binding domain	FAD-dependent oxidoreductase	(1) Oxidation-reduction process	(1) Oxidoreductase activity	SMART (GO: 0055114) (GO: 0016491)
17	NP_253679.1	NAD(P)-linked oxidoreductase/ Aldo/keto reductase (AKR) superfamily	Aldo/keto reductase family	(1) Daunorubicin metabolic process (2) Doxorubicin metabolic process	(1) Oxidoreductase activity (2) D-Threo-aldose 1-dehydrogenase activity	SMART (GO:0044597) (GO:0044598) (GO:0047834) Uniprot (A0A3S4Q0Y1) Interpro (A0A3S4Q0Y1) Interpro(IPR023210) Interpro (IPR036812) Pfam (PF00248) EMBL (LR134300)
18	NP_253685.1	Protein kinase-like (PK-like)	Phosphotransferase enzyme family	(1) Protein phosphorylation	(1) ATP binding (2) Protein serine/threonine kinase activity	Pfam (GO:0006468) (GO:0005524) (GO:0004674) Uniprot (A0A3S5E573) Interpro (A0A3S5E573) Interpro (IPR011009) EMBL (LR134300)

3.2. Subcellular Localizations of EHPs

To identify the cellular localization of our 18 EHPs, the websites PSORTb and CELLO were utilized. According to the data of PSORTb, among 18 essential hypothetical proteins, 6 proteins belong to cytoplasmic protein, 8 proteins belong to the location of the cytoplasmic membrane, and the remaining 4 proteins are considered unknown. The database CELLO depicted that 14 proteins are cytoplasmic proteins, 2 proteins are considered as inner membrane proteins, and the rest 2 are periplasmic proteins. This is the generalized concept of the cellular location which is shown in Figure 2 and supplementary table 1. The existence of the transmembrane helix was also figured out, and this can help to carry out the function of a protein through transmembrane transportation. The amount of transmembrane helix was given in supplementary table 1. The presence of signal peptide was also investigated from the three websites SignalP 4.1, PROTTER, and PrediSi. Among 18 proteins, 14 proteins (NP_252456.1, NP_252782.1, NP_253095.1, NP_253326.1, NP_253368.1, NP_249450.1, NP_250846.1, NP_252171.1, NP_252375.1, NP_253374.1, NP_253455.1, NP_253678.1, NP_253679.1, and NP_253685.1) do not contain any signal peptide, and one protein (NP_250659.1) contains signal peptide unanimously, whereas the remaining proteins (NP_253252.1, NP_251676.1, and NP_253434.1) are containing signal peptides from any of a website (supplementary table 1).

3.3. Physicochemical Properties Analysis

We have searched for the physicochemical properties of 18 EHPs which are shown in Table 3. All the proteins had molecular weights ranging from 13335.11 to 56122.54 Dalton (Da). The highest molecular weight was observed to be 56122.54 Da for the NP_253252.1 protein, a probable lipid II flippaseMurJ [61]. The theoretical pI (isoelectric point) indicates the pH at which the charge of an amino acid of a protein remains neutral. Therefore, no movement occurs when placed in an electric field with a direct current. This parameter comes in handy as proteins are dense and stable at an isoelectric pH [62]. The theoretical pI ranged from 4.52 to 10.71. Both of these parameters (molecular weight and theoretical pI) help visualize the two-dimensional gel electrophoresis or (2-DE) and hence contribute to the scientific examinations of these hypothetical proteins [63]. The aliphatic index can be an effective indicator for determining the thermostability of some protein molecules [64]. A protein molecule with a higher aliphatic index indicates its higher range of temperature at which it gains its thermostability [65]. The aliphatic index tabulated for our protein group ranged from 83.13 to 133.96. The NP_253252.1 protein showed the maximum thermostability and NP_252456.1 with the lowest. The parameter called the instability index determines a protein whether it is stable or unstable in a test tube [66]. For our analysis, we set the cutoff value to 40 where the value below 40 indicates a protein as stable and above 40 predicts it as an unstable protein. A total of 9 proteins (NP_252456.1, NP_252782.1, NP_253252.1, NP_253326.1, NP_249450.1, NP_251676.1, NP_252171.1, NP_252375.1, and NP_253679.1) out of 18 proteins of interest were found to be stable with instability index values of 25.65, 37.46, 36.08, 38.61, 31.65, 38.17, 32.05, 29.03, and 32.97, respectively. The grand average of hydropathy (GRAVY) determines the extent of protein-water interaction which is calculated by dividing the aggregate of all the amino acids’ hydropathy values by the total number of residues in the given sequence [65, 67]. The GRAVY values lay between -0.427 and 0.857. The lower the GRAVY value, the more a protein interacts with water [65]. The NP_253095.1 protein was found to be the most interactive among all these proteins having a GRAVY value of -0.427.

Table 3. Physicochemical properties of 18 essential hypothetical proteins.

Serial No.	RefSeq	Molecular weight (Dalton)	Theoretical pI	Formula	Total number of negatively charged residues (Asp + Glu)	Total number of positively charged residues (Arg + Lys)	Instability index (II)	Aliphatic index	Grand average of hydropathicity (GRAVY)
1	NP_252456.1	19937.91	9.12	C₈₆₉H₁₄₀₉N₂₆₅O₂₅₅S₉	22	26	25.65 (stable)	83.13	-0.257
2	NP_252782.1	14871.24	7.93	C₆₅₈H₁₀₇₈N₁₈₈O₁₉₃S₅	15	16	37.46 (stable)	100.29	0.162
3	NP_253095.1	15057.34	10.71	C₆₅₇H₁₀₈₀N₂₁₀O₁₈₈S₄	12	21	57.65 (unstable)	93.28	-0.427
4	NP_253252.1	56122.54	10.03	C₂₆₄₃H₄₂₀₁N₆₅₁O₆₅₁S₁₉	21	40	36.08 (stable)	133.96	0.857
5	NP_253326.1	43779.82	6.85	C₁₉₅₅H₃₀₅₈N₅₅₄O₅₆₉S₁₁	51	50	38.61 (stable)	87.02	-0.376
6	NP_253368.1	24873.64	5.30	C₁₁₁₁H₁₇₇₉N₃₁₇O₃₁₉S₆	28	25	73.37 (unstable)	91.89	-0.119
7	NP_249450.1	33667.59	5.37	C₁₄₉₂H₂₄₁₅N₄₂₅O₄₄₆S₇	39	32	31.65 (stable)	108.22	0.057
8	NP_250659.1	13335.11	8.98	C₅₆₇H₉₃₆N₁₇₈O₁₈₁S₆	12	15	53.34 (unstable)	83.54	-0.085
9	NP_250846.1	27693.92	9.90	C₁₂₄₅H₁₉₆₂N₃₈₀O₃₃₀S₅	23	30	52.29 (unstable)	100.69	-0.229
10	NP_251676.1	47387.94	9.69	C₂₁₃₉H₃₄₈₄N₅₈₂O₅₈₇S₂₀	34	44	38.17 (stable)	114.85	0.365
11	NP_252171.1	38888.77	5.26	C₁₇₁₁H₂₇₈₀N₄₈₂O₅₁₇S₁₆	40	31	32.05 (stable)	102.34	0.090
12	NP_252375.1	24180.71	5.02	C₁₀₈₁H₁₇₀₇N₃₀₃O₃₁₃S₇	27	19	29.03 (stable)	102.48	0.166
13	NP_253374.1	26354.58	4.52	C₁₁₆₆H₁₈₁₁N₃₁₅O₃₆₆S₈	43	19	53.53 (unstable)	89.96	-0.366
14	NP_253434.1	17171.46	4.59	C₇₆₃H₁₂₀₈N₂₀₆O₂₃₆S₄	27	15	57.54 (unstable)	105.07	-0.197
15	NP_253455.1	16000.46	6.72	C₇₂₀H₁₁₃₀N₁₉₀O₂₀₈S₇	14	14	43.00 (unstable)	88.12	-0.037
16	NP_253678.1	42109.34	7.73	C₁₈₆₆H₃₀₁₁N₅₅₁O₅₄₁S₉	47	48	48.61 (unstable)	98.72	-0.130
17	NP_253679.1	29030.07	6.00	C₁₂₈₁H₂₀₆₇N₃₇₃O₃₈₆S₅	36	31	32.97 (stable)	101.22	-0.101
18	NP_253685.1	24985.76	9.60	C₁₁₁₂H₁₇₉₁N₃₃₇O₃₁₁S₄	27	34	45.71 (unstable)	104.77	-0.365

3.4. Protein-Protein Interaction Network Analysis

The PPI represents the connection among the 9 stable EHPs and their corresponding functionally relative proteins from P. aeruginosa PA01. The network has 261 nodes and 269 edges for 9 proteins of interest. Here, the network is provided with 11 subnetworks (hubs) with a minimum of 3 nodes each. The nodes with only 3 connections (degree) are considered islands (ostA and PA1847) (Table 4) [68]. The node degree and betweenness centrality range from 3 to 45 and 4750 to 17881.76, respectively. The interaction among the hub proteins can be seen in Figure 3. The size and color gradient of the nodes determine the degree of a protein. A node degree reveals the extent of interaction of a particular node with other nodes. The nodes with lower degree values are colored green namely PA4992 (24), PA3481 (23), PA4093 (20), and PA4636 (18). The color gradually turned into deep purple by the increase of node degree values. Nodes with enlarged size similarly denote increased node degree values such as PA2986 (45), PA0759 (41), PA4562 (38), PA3685 (32), and PA3767 (28) (Figure 3) (Table 4). The nodes in cyan blue mean 2 or more interactions with their corresponding subnetworks. Betweenness centrality is a topological measure that typically determines the number of shortest paths through nodes. The nodes with a higher degree and betweenness centrality values represent vital proteins for signal trafficking of the cellular system [68]. The function of all proteins in the network is collected from NCBI using their associated Entrez IDs and listed in the supplementary table 2.

Table 4. List of proteins with their reference sequence, Uniprot ID, protein name, node degree value, and betweenness centrality.

Serial No.	RefSeq	UniProt ID	Protein name	Degree	Betweenness centrality
1	NP_251676.1	Q9HZL8	PA2986	45	9681.83
2	NP_249450.1	Q9I5H0	PA0759	41	17881.76
3	NP_253252.1	Q9HVM2	PA4562	38	9315.74
4	NP_252375.1	Q9HXV5	PA3685	32	8742.58
5	NP_252456.1	Q9HXM8	PA3767	28	11489.25
6	NP_253679.1	Q9HUH3	PA4992	24	10103.17
7	NP_252171.1	Q9HYC8	PA3481	23	5329.08
8	NP_252782.1	Q9HWT5	PA4093	20	4750.0
9	NP_253326.1	Q9HVF5	PA4636	18	6466.58
10	NP_249286.1	Q9I5U2	ostA	3	10848.52
11	NP_250538.1	Q9I2P8	PA1847	3	5698.26

3.5. Nonhomology Analysis against Human Proteome, Human Antitargets, and Human Gut Flora Proteomes

To introduce a novel target for a drug, it must be nonhomologous against human proteome, human antitargets, and human gut flora proteomes. Utilizing pipeline builder from the pipeline builder for identification of target (PBIT) server, 9 protein sequences were inputted to find out the highly similar sequence with human proteome. Among the 9 EHPs sequences, one sequence was homologous with the human proteome. Filtering that one sequence, 8 nonhomologous proteins were selected for the next pipeline analysis to find out the nonhomologous proteins against human antitargets. Among the 8 entered sequences, significant similar sequences of human antitarget proteins were screened out. This result depicted that 7 proteins are nonhomologous and one protein is homologous to the human antitarget where this one homologous protein was omitted from the study. After the filtration, selected 7 proteins were further inputted onto the pipeline builder to analyze nonhomologous proteins against human gut flora proteomes. This time, 2 proteins were screened out because of containing high sequence similarity with the proteomes of the beneficiary microbes belonging to the human gut. Then finally, the sequences of 5 nonhomologous EHPs were selected for the next parameter of finding virulence capability. The details of the nonhomology analysis are given in Table 5.

Table 5. Aspects of the proteins like nonhomology to human proteins and proteins of human gut flora, virulence of the pathogen, druggability for the 9 EHPs.

serial No.	Protein	Nonhomology analysis against human proteome	Nonhomology analysis against human antitargets	Nonhomology analysis against gut microbiota proteomes	Virulence analysis	Druggability analysis
1	NP_252456.1	Nonhomologous	Nonhomologous	Nonhomologous	Nonvirulent	Old target
2	NP_252782.1	Nonhomologous	Nonhomologous	Nonhomologous	Nonvirulent	Novel target
3	NP_253252.1	Nonhomologous	Nonhomologous	Homologous	Nonvirulent	Novel target
4	NP_253326.1	Nonhomologous	Nonhomologous	Nonhomologous	Nonvirulent	Novel target
5	NP_249450.1	Nonhomologous	Nonhomologous	Nonhomologous	Virulent	Novel target
6	NP_251676.1	Nonhomologous	Nonhomologous	Nonhomologous	Virulent	Novel target
7	NP_252171.1	Homologous	Nonhomologous	Nonhomologous	Nonvirulent	Novel target
8	NP_252375.1	Nonhomologous	Nonhomologous	Homologous	Nonvirulent	Novel target
9	NP_253679.1	Nonhomologous	Nonhomologous	Nonhomologous	Nonvirulent	Old target

3.6. Virulence Factor

The virulent EHPs from P. aeruginosa PA01 are enlisted in Table 5. VICMpred is a support vector machine- (SVM-) based webserver that predicted all of the 9 EHPs as nonvirulent with 70.75% accuracy [43]. VirulentPred is also based on bilayer cascade SVM with fivefold increased cross-validation methods that give 81.8% prediction accuracy [44]. A total of 3 proteins namely NP_249450.1 (e-106), NP_251676.1 (e-171), and NP_253679.1 (7e-77) were predicted as virulent by VirulentPred in P. aeruginosa PA01 strain utilizing the similarity-based search through PSI-BLAST. Another webserver called MP3 uses an integrated SVM-HMM approach which commonly predicted NP_251676.1 as a pathogenic protein.

3.7. A Possible New Drug Target Identification

Along with the two virulent EHPs, other 7 sequences of EHPs were employed on the DrugBank server for the identification of potentially new drug candidates. This server showed that NP_252456.1 contains one drug target against the drug Imidazole (E-value: 5.62144e − 18; bit score: 75.485; query length: 182; alignment length: 77) and two drug targets were exhibited by the protein NP_253679.1 against the drug nicotinamide adenine dinucleotide phosphate (E-value: 3.79487e − 15; bit score: 72.4034; query length: 270; alignment length: 213) and nicotinamide adenine dinucleotide phosphate (E-value: 1.77261e − 14; bit score: 70.8626; query length: 270; alignment length: 217). The remaining 7 proteins (NP_252782.1, NP_253252.1, NP_253326.1, NP_249450.1, NP_251676.1, NP_252171.1, and NP_252375.1) were considered a fresh or new drug target by the DrugBank database. This website also revealed that our targeted two proteins named NP_249450.1 and NP_251676.1 displayed zero matches for the drug target which means they are new potential drug candidates with druggability. The overall results are shown in Table 5.

3.8. Analyzing Secondary Structure

Based on the findings of segment III, we selected two proteins for the next-level investigations that match all the criteria of segment III. As they are hypothetical proteins, they must lack some information. For this, to suggest them as a new drug target, we explored their secondary structure. SOPMA and PSIPRED were the web tools that were used for the secondary structure analysis. According to the SOPMA server, the secondary structure of NP_249450.1 had alpha helix (Hh): 126 (40.13%); extended strand (Ee): 57 (18.15%); beta turn (Tt): 20 (6.37%), and random coil (Cc): 111 (35.35%) where the parameters were set as window width: 17; similarity threshold: 8; and number of states: 4. The protein NP_251676.1 had alpha helix (Hh): 206 (47.58%); extended strand (Ee): 83 (19.17%); beta turn (Tt): 23 (5.31%); and random coil (Cc): 121 (27.94%) with the same parameter as before. The results from SOPMA database for both proteins are given in supplementary table 3. The PSIPRED sequence plot and PSIPRED cartoon plot were provided as a result of the PSIPRED web servers. The sequence plot and cartoon plot structure described that goldenrod (semi-yellow) color is for the extracellular strand domain, pink color is for helix, grey color is for coil, and blackish blue is for the confidence of the structure. According to this, NP_249450.1 showed more coil in its secondary structure whereas NP_251676.1 showed more helix. Figure 4(a) is the secondary structure of NP_249450.1 from PSIPRED, and Figure 4(b) is from SOPMA websites. Besides, Figure 5(a) is the secondary structure of NP_251676.1 from PSIPRED, and Figure 5(b) is from the SOPMA database.

3.9. Essential Hypothetical Proteins 3D Structure Modeling

Only proteins that passed all of the above-mentioned pipeline analyses were assigned a three-dimensional structural conformation. Two proteins, NP_249450.1 and NP_251676.1, were subjected to a thorough pipeline review and thus have the potential to be used as new drug targets. As a result, these two proteins were subjected to 3D structural conformation determination using two methods: template-based homology modeling from SWISS-MODEL and ab initio modeling using the trRosetta algorithm from the Robetta server. For template-based homology modeling, we searched for templates from SWISS-MODEL for these two proteins. 1vly.1 and 6f3z.2 were the best template for NP_249450.1 and NP_251676.1, respectively. The templates were chosen based on several parameters, including the Global Model Quality Estimation (GMQE), Qualitative Model Energy ANalysis (QMEAN), Z-score, sequence identity, sequence similarity, sequence coverage, and oligo-state of the chosen templates. The template 1vly.1 was actually a 1.30 Å resolution X-ray diffraction crystallography structure of a putative aminomethyltransferase (ygfz) from E. coli. This template shared 30.23% sequence identity with the 314 aa long NP_249450.1 protein, which spans from (4-307 aa). The template 6f3z.2, on the other hand, was a complex of E. coli LolA and the periplasmic domain of LolC that was also identified by X-ray diffraction crystallography at a resolution of 2.00 Å. The sequence identity was 30.73%, spanning (67-290) amino acids out of the 433 amino acids in the NP_251676.1 protein. Both of these templates had a monomer oligo-state. Finally, using 1vly.1 and 6f3z.2 templates, the structures of NP_249450.1 and NP_251676.1 EHPs were formed, as shown in Figures 6(a) and 6(c). Structure prediction by Robetta server illustrated that the provided model was built using trRefineRosetta modeling (ab initio modeling using the trRosettaalgorithm). Structure of NP_249450.1 showed a 0.79 score as confidence (Figure 6(b)) while NP_251676.1 showed a 0.81 score as confidence (Figure 6(d)). Consequently, the structures from SWISS-MODEL were then refined from Galaxy Refiner where model 2 for NP_249450.1 and model 5 for NP_251676.1 were downloaded after final refinement. For the NP_249450.1 and NP_251676.1 proteins, the lowest MolProbity was 1.738 (model 2) and 1.729 (model 5), respectively, while the initial score was 2.280 and 2.299. Also, the structures from Robbetta were refined from Galaxy Refiner.

3.10. Protein Structure Validation Assessment

The predicted protein structure was validated by the SAVES v6.0 server which runs six programs simultaneously to evaluate the quality of the build model. The ERRAT value served as the model’s overall quality element. The overall quality factor for the NP_249450.1 protein structure from SWISS-MODEL and Robetta, respectively, was 87.6325% and 92.459%. It was 93.3649% and 98.063% for NP_251676.1 from these two servers, respectively. In supplementary figure 1, bar plots depict the overall quality factor from ERRAT. VARIFY3D conducts an analysis in which a structure passes if at least 80% of the amino acids in the 3D/1D profile have a score of ≥0.2. Three of the four structures passed this parameter (two from SWISS-MODEL and one from Robetta), while one structure failed for NP_251676.1 from Robetta. WHATCHECK included a color box with a number within it that reflects 46 different criteria, with the green, yellow, and maroon colors representing OK, warning, and error, respectively. The overall summary report is OK for all four structures. Table 6 included a comprehensive report on the consistency of the four structures that we retrieved from the SAVES v6.0 server. On the contrary, structures from SWISS-MODEL failed to pass the PROVE parameters, while structures from Robetta were placed in warning categories due to atomicB-factors, and the protein atoms having absolute Z − scores > 3. Ramachandran plot analysis from the PROCHECK program also demonstrated that more than 94% of residues were in the most favored region for all four structures from both SWISS-MODEL and Robetta. It was 97.3% for NP_251676.1 protein from Robetta, with 0.0% residues in the disallowed region. The Ramachandran plot analysis unveiled that the generated structures of the proteins represent an excellent degree of validity and reliability, which is depicted in Figure 7.

Table 6. Three-dimensional structure validation of the predicted two hypothetical proteins from the SAVES v6.0 server.

Protein name	Saves result
Protein name	ERRAT	VARIFY 3D	PROVE	WHATCHECK	PROCHECK	Ramachandran plot (% residue in the most favored region)
NP_249450.1 (SWISS-MODEL)	Overall quality factor 87.6325	97.70% of the residues have averaged 3D-1D score ≥ 0.2 Pass	Buried outlier protein atoms total From 1 model: 6.1% Fail	1234567891011 12131415161718 19202122232425 26272829303132 33343536373839 40414243444546	Out of 8 evaluations Errors: 3 Warning: 2 Pass: 3	94.6%
NP_249450.1 (Robetta)	Overall quality factor 92.459	94.90% of the residues have averaged 3D-1D score ≥ 0.2 Pass	Buried outlier protein atoms total From 1 model: 4.1% Warning	1234567891011 12131415161718 19202122232425 26272829303132 33343536373839 40414243444546	Out of 8 evaluations Errors: 3 Warning: 2 Pass: 3	94.0%
NP_251676.1 (SWISS-MODEL)	Overall quality factor 93.3649	82.59% of the residues have averaged 3D-1D score ≥ 0.2 Pass	Buried outlier protein atoms total From 1 model: 5.4% Fail	1234567891011 12131415161718 19202122232425 26272829303132 33343536373839 40414243444546	Out of 8 evaluations Errors: 2 Warning: 4 Pass: 2	95.3%
NP_251676.1 (Robetta)	Overall quality factor 98.063	66.97% of the residues have averaged 3D-1D score ≥ 0.2 Fail	Buried outlier protein atoms total From 1 model: 4.2% Warning	1234567891011 12131415161718 19202122232425 26272829303132 33343536373839 40414243444546	Out of 8 evaluations Errors: 2 Warning: 1 Pass: 5	97.3%

3.11. Molecular Dynamics Simulation

Both NP_249450.1 and NP_251676.1 protein structures predicted from the SWISS-MODEL depict that the RMSD value initially follows an upward trend until 5 ns and then stabilizes to 50 ns without major fluctuations. Therefore, these structures have a stable profile. On the other hand, structures predicted by ab initio method do not follow a constant trend throughout the 50 ns trajectory. They follow an upward trend, and the RMSD value is not consistent within this 50 ns trajectory. The RMSF measures the average deviation of amino acid residues over a specific time course. It typically measures individual residue flexibility. The structures have a lower residue flexibility as they have a maximum RMSF value < 0.25 nm. The SASA is also analyzed from the simulation trajectories, and it represents that, initially, structures predicted by the ab initio method have a higher SASA trend, whereas structures predicted by homology modeling have a lower SASA value than the other model which presents higher stability of the model. Moreover, the Rg value which is a determinant of protein mobility and rigidness was also analyzed, and it demonstrates that there is less fluctuation in proteins predicted by homology modeling compared to the structures of the ab initio model. Protein structures of homology modeling follow a constant trend over the 50 ns trajectory. Furthermore, we have also determined the number of hydrogen bonds that is an important determinant of a stable structure complex. The protein structures predicted by homology modeling follow a stable hydrogen bonding over the 50 ns trajectory without no significant fluctuations. Overall, NP_249450.1 and NP_251676.1 protein structures predicted from the SWISS-MODEL have a consistent and stable profile over the 50 ns simulation trajectory than the other model. The molecular dynamic simulation result is given in supplementary figure 2 and 3.

3.12. Active Site Identification

For the prediction of active sites, we selected the SWISS-MODEL structures for both proteins on basis of the evaluation of the simulation result. Prank Web provides existing pockets of protein with a pocket score and probability score. According to the database, the SWISS-MODEL structure of NP_249450.1 contained 8 pockets: binding pocket 1 (pocket score: 4.74, probability score: 0.220, AA count: 8); pocket 2 (pocket score: 3.15, probability score: 0.110, AA count: 11); pocket 3 (pocket score: 2.58, probability score: 0.075, AA count: 8); pocket 4 (pocket score: 2.38, probability score: 0.063, AA count: 8); pocket 5 (pocket score: 2.16, probability score: 0.050, AA count: 9); pocket 6 (pocket score: 1.42, probability score: 0.018, AA count: 10); pocket 7 (pocket score: 1.00, probability score: 0.007, AA count: 7); and pocket 8 (pocket score: 0.98, probability score: 0.006, AA count: 7). On the other hand, the SWISS-MODEL structure of NP_251676.1 possessed 5 binding pockets: binding pocket 1 (pocket score: 4.21, probability score: 0.184, AA count: 13); pocket 2 (pocket score: 3.16, probability score: 0.110, AA count: 11); pocket 3 (pocket score: 1.53, probability score: 0.022, AA count: 7); pocket 4 (pocket score: 1.39, probability score: 0.017, AA count: 7); and pocket 5 (pocket score: 1.21, probability score: 0.012, AA count: 9). Both the proteins with binding residue are shown in Figure 8.

4. Discussion

Pseudomonas aeruginosa PA01 is an omnipresent pathogenic bacterium that can cause acute and chronic infection in humans by contaminating environmental water and food, daily food spoilage, and infections. It is a rising concern for its increasing resistance against a broad range of antimicrobials. The biofilm-forming ability and evolution of antibiotic tolerance shapes pseudomonas isolate highly resistant against imipenem (95.3%), trimethoprim-sulfamethoxazole (69.8%), aztreonam (60.5%), chloramphenicol (45.3%), and meropenem (27.9%) [69]. Factors like chromosomal mutations and transferring of resistant genes through horizontal gene transfer contribute to its broad-spectrum drug resistance property [70]. Thus, it is necessary to introduce a new drug target when multidrug resistance for any disease or problem is noticed. In silico process has a great advantage for the identification of new drug targets in that situation within a very short time. Consequently, to combat the ever-increasing danger of antibiotic resistance, identifying novel drug targets is a dire necessity. A drug target should have some properties before it is considered a new target which includes being nonhomologous to human proteome, human antitargets, human gut microbiota, virulence capability, and druggability. For this reason, we scrutinized the properties of our targeted essential hypothetical proteins where analyzing these EHPs from multidrug resistance bacteria can lead to the identification of new potential therapeutic solutions.

We searched for essential hypothetical proteins (EHPs) among the 336 essential proteins of this bacterial strain to meet this need. Essential genes/proteins are those that are vital for a pathogen’s survival and thereby analyzing their functions and metabolic pathways, and crucial information that may be central to life can be retrieved. In this research, we discovered 18 EHPs for the first time that may provide valuable information about the pathogenesis, molecular mechanisms, and functions of these bacteria. Functional annotation is a prerequisite in understanding the pathogen metabolic pathways and the products that they synthesize for their survival in adverse conditions. Moreover, domain analysis, which is a basic, distinctive, and stable unit of a protein structure that is fiercely conserved during the evolutionary process, is crucial for further investigation [71]. Moreover, the function of a protein is directly or indirectly related to subcellular localization [72]. The physicochemical properties of a protein depict a chemical assessment that shows the identity of chemical nature and physical hazards and to understands or predicts molecular attributes. The combined analysis of the physicochemical properties helps to characterize the proteins annotated as hypothetical proteins from the genome of an opportunistic pathogen like P. aeruginosa PAO1 (Table 3) [73].

In this study, the PPI network has provided congruent meaningful insights into the protein’s function. Here, we have looked for potential relativity to our predicted function of EHPs and their connectivity with proteins involved with functionally important activities. The protein PA2986 (NP_251676.1) related to the MacB-like periplasmic core domain represents a connection with 45 proteins of which 8 are hypothetical proteins (HPs). A notable number of proteins grouped with PA2986 are involved in protein translocation activities such as translocation protein TolQ, TolR, and TolB. Tol proteins show activity in Gram-negative bacteria by providing stability to the outer membrane [74]. Moreover, the ABC transporter ATP-binding protein (PA0073) is related to it as it functions by utilizing TolC exit duct by shifting substrates to extracellular space from the periplasm [75]. This finding supports the idea of PA2986 being a member of the MacB-like periplasmic core domain. Another important protein for bacterial survival lysS, a lysine tRNA ligase was found to interact with PA2986 which is a mutant in some Gram-negative bacteria conferring resistance against the OP0595, diazabicyclooctane b-lactamase inhibitor (an antibiotic) [76]. We have found penicillin-binding protein 1 called ponA protein in this group of networks. Alteration in the ponA protein (penicillin-binding protein 1A) has a significant role in harnessing chromosomally mediated resistance against penicillin in N. gonorrhoeae [77]. Similarly, other considerable proteins like outer-membrane lipoprotein carrier protein lolA, transporter ExbB, and penicillin-binding protein 1A (ponA) interacted with PA2986 Figure 3.

PA0759 (NP_249450.1) has got the 2nd largest degree value having 41 nodes (Table 4) in connection of which 17 are HPs. The highest betweenness centrality value of 17881.76 determines its significance towards cell signaling pathways as in the case of directed or regulated networks. Betweenness centrality is considered a much robust essentiality indicator than degree value [78]. Genes in this hub include proteins having prime roles in translational regulation and cellular metabolic activities such as glycine cleavage system protein T2 [79], translation elongation factor (tsf) [80], and ribosomal large subunit pseudouridine synthase C (rluC) [81], respectively. Moreover, the RecO protein in this network is a replication-repairing protein from the RecF recombination repair pathway that facilitates both DNA strand annealing and DNA recombination in complex with RecA protein found in high radiation tolerant bacteria Deinococcus radiodurans [82]. This property may also contribute to better survival efficacy for P. aeruginosa PA01 in extreme conditions.

The PA4562 (NP_253252.1) protein is a probable member of the Lipid II flippase MurJ family which is used for the genesis of lipid II on both inner and outer leaflets and ultimately produces peptidoglycan in almost every bacterial species. Peptidoglycan is the primary protective foundation for shielding against environmental hazards and is involved in cell wall organizations [83]. Proteins related with morphological importance in bacteria such as flagellar basal body rod protein (FlgC) [84], rod shape-determining protein (rodA) [85], and type 4 fimbrial biogenesis outer-membrane protein (PilQ) [86] are present in a connection with the PA4562 proteins that strengthen our prediction regarding this protein function.

The proteins PA3685 (NP_252375.1) and PA3767 (NP_252456.1) are adjoined with 12 and 6 HPs, respectively. Both of these proteins are largely involved with enzymes of different molecular functions—tRNA N6-adenosine threonyl carbamoyl transferase (gcp) is a universal structural modifier found at position 37 of tRNAs that provides the anticodon loop with greater binding efficiency to ribosomes in vitro in E. coli [87]. The protein UDP-2,3-diacyl glucosamine hydrolase (PA1792) is hypothesized to be catalyzing lipid-A biogenesis in E. coli bacteria [88]. Lipid-A is a saccharolipid that modulates lipopolysaccharide (LPS) anchorage on the outer leaflet of the outer membrane in Gram-negative bacteria which is an essential component for the bacteria shielding from antibiotics and sustaining its viability [89]. Besides, other proteins having enzymatic properties include thiamine monophosphate kinase (thiL), ATP-dependent DNA helicase DinG (PA1045), riboflavin-specific deaminase/reductase (ribD), amidotransferase(PA1742), and acetyltransferase (PA2631) Figure 3.

The majority of the proteins connected with PA4992 (NP_253679.1) from aldo/keto reductase family are uncharacterized proteins. The PA4167 protein has a contributing role as a source of carbon and energy for a large number of bacteria [90]. The hub proteins PA4093 (NP_252782.1) and PA4992 (NP_253679.1) are interconnected through the intermediate protein PA4098, a probable short-chain dehydrogenase enzyme [91].

The mgtE is an Mg transporter that represents a connection with the PA3481 (NP_252171.1) which can be an opportunistic inhibitor for the type III secretion system (T3SS). T3SS is a formidable toxin injected by P. aeruginosa that can ultimately cause cell death into its host. The mgtE interrupts the T3SS transcription regulation system by provoking rsmYZ gene transcription and hence inhibits T3SS protein expression [92]. Interestingly, another analogous protein, DNA polymerase II (polB), is associated with this same hub protein PA3481. polB functions as a crucial candidate for repressing the translation process of master T3SS regulator ExsA. ExsA operates a major role in maintaining the regulatory cascade of T3SS. Thus, affecting ExsA expression can prohibit T3SS toxin secretion process. Furthermore, Chakravarty et al. found that T3SS transcription is attenuated when polB is overexpressed. Therefore, polB may act as a promising target for therapeutic interventions [93]. Besides, proteins responsible for exopolysaccharide biosynthesis and biofilm formation namely pslA [94] and pslD [95] are both members of the psl operon. The presence of such virulent protein types in this protein hub suggests PA3481 as a crucial protein involved in multiple virulence pathways in P. aeruginosa. The PA4636 (NP_253326.1) protein harbors some of the virulent proteins like lptA and algQ that are required for the biogenesis of lipid bilayer in the outer membrane in P. aeruginosa [96] and facilitates in developing a chronic infection in cystic fibrosis [97].

Some notable mutual interactions also have been observed between two hub proteins like PA2986 and PA4562 where interrelated proteins include mraY, a potential target for antibiotic development that is a crucial element for the bacterial cell wall synthesis [98], and opr86, an outer-membrane protein found previously in all Gram-negative bacteria. Likewise, it is suggested as a potential drug target with a significant therapeutic potential against P. aeruginosa in earlier studies [99]; rpoH, a 32 kDa heat shock protein in E. coli, can also take part as a complementary for sigma factor during the increasing temperature in the environment as well as while starving [100]; PA5568 possesses an inner membrane translocation subunit protein YidC which facilitates proteins to be passed onto inner membranes without the help of Sec translocase complex proteins [101]; ComL is a lipoprotein that facilitates the DNA transformation process in N. gonorrhoeae [102]. Lastly, organic solvent tolerance protein OstA holds interaction simultaneously with the top 3 hub genes of maximum node connection. Concurrently, OstA is a protein of high molecular significance as it is found in almost all Gram-negative bacteria and is involved in the bacterial envelope biogenesis process. A study by Chiu et al. found that OstA deficiency in Helicobacter pylori causes sensitivity to organic solvents, impaired membrane permeability, and vulnerability to antibiotics [103]. The function of the proteins in this network shows relational integrity with our predicted HPs. Knowing the protein’s function in a protein-protein interaction network can facilitate the process of discovering the proteins with unknown functions [104].

Herein, we analyzed these properties of our targeted essential hypothetical proteins where PBIT servers direct categorized all the nonhomology features (Table 5). We have selected NP_249450.1 and NP_251676.1, respectively, for being virulent determined by two of our tools with strong confidence scores. Targeting these virulent factors can limit the pathogenicity of P. aeruginosa. Even antivirulence drugs insist a pathogen towards a weaker selection for resistance in them compared to antibiotics [11]. Therefore, understanding the virulence factors and their role in pathogenesis can lead us to a new potential therapeutic solution. Besides, druggability analysis also confirmed that NP_249450.1 and NP_251676.1 can be a new and potential drug target.

Structure prediction and quality assessment of the predicted structure are also parallelly important to evaluate the molecular and biological functions of a protein in cells for in-depth analysis and drug target identification [105]. Before structure prediction, the information of Alpha helix (Hh), extended strand (Ee), beta turn (Tt), or random coil (Cc) helps to establish the secondary structure. That is why the secondary structure was annotated to complete the structure related to all the information of our selected two proteins (Figures 4 and 5, and supplementary table 3). Thus, we further predicted the 3D structure of two EHPs (NP_249450.1 and NP_251676.1) and assessed the quality of these structures to decipher their unique conformation. The 3D structure and Ramachandran plot are depicted in Figures 6 and 7, respectively. The predicted structure’s quality assessment parameters are listed in Table 6, and the overall quality factor is shown in Supplementary Figure 1. Our predicted structure is accurate and reliable, according to the Ramachandran plot analysis, since more than 90% of residues are considered the cutoff value and our findings surpass that range by more than 94%.

Moreover, from the analysis of molecular dynamic simulation, NP_249450.1 and NP_251676.1 protein structures predicted from the SWISS-MODEL have a consistent and stable profile over the 50 ns simulation trajectory than the other model (Supplementary Figure 2 and 3). Based on the simulation result, active sites were identified. Besides, for the establishment of the target proteins as a therapeutic target, active sites needed to be predicted to find out where a ligand compound can probably bind and initiate respective reactions. The structures obtained from the SWISS-MODEL were selected for the active site prediction where NP_249450.1 and NP_251676.1 showed 8 and 5 probable binding pockets, respectively (Figure 8). Therefore, this structural and functional information will open a new window for further identification of potential drug candidates that can halt the surge of this pathogenic bacterium from becoming resistant.

5. Conclusion

Unveiling the functional characterization of pathogenic microorganisms is of great importance in biological processes and medical science. Essential proteins and essential hypothetical proteins are versatile macromolecules that can be crucial in inferring new treatment strategies towards these pathogenic bacteria. For the functional characterization of EHPs, we used an in silico approach in combination with different bioinformatics databases/tools, with ROC analysis indicating that these tools are highly reliable for the functional characterization of P. aeruginosa PA01. We attributed function to 18 EHPs and analyzed the subcellular localization and physiochemical properties of these proteins. Afterwards, a PPI network analysis was carried out on 9 stable EHPs and their functionally related proteins from this bacterium. Further, host nonhomologous analysis predicts 5 pathogen-specific proteins, three of which have virulent factors that could be used as novel therapeutic targets. Finally, the structural conformation of two EHPs (NP_249450.1 and NP_251676.1) was determined, and the accuracy of the predicted model was evaluated, indicating that this model is highly accurate. Our findings will pave the way for new antibacterial drugs and treatment strategies to be developed by focusing on these novel drug targets.

Disclosure

A preprint has previously been published [106].

Conflicts of Interest

The authors declare that there is no conflict of interest.

Open Research

Data Availability

The data used to support the findings of this study are included within the article.

Supporting Information

Filename	Description
bmri1787485-sup-0001-f1.pdfPDF document, 638 KB	Supplementary 1 Overall quality factor of ERRAT value from SAVES v6.0 server. (a) The quality factor is 87.6325% for the NP_249450.1 protein structure from SWISS-MODEL, (b) the quality factor is 93.3649% for NP_251676.1 protein structure from SWISS-MODEL, (c) the quality factor is 92.459% for the NP_249450.1 protein structure from Robetta, and (d) the quality factor is 98.063% for NP_251676.1 protein structure from Robetta.
bmri1787485-sup-0002-f2.pdfPDF document, 1.9 MB	Supplementary 2 Analysis of molecular dynamics simulation for the structures obtained from SWISS-MODEL (i) NP_249450.1 and (ii) NP_251676.1. (a) The RMSD values; (b) RMSF values; (c) the Radius of gyration (Rg) profile; (d) the hydrogen bond; and (e) SASA descriptors.
bmri1787485-sup-0003-f3.pdfPDF document, 734.3 KB	Supplementary 3 Analysis of molecular dynamics simulation for the structures obtained from Robetta (i) NP_249450.1 and (ii) NP_251676.1. (a) The RMSD values; (b) RMSF values; (c) the Radius of gyration (Rg) profile; (d) the hydrogen bond; and (e) SASA descriptors.
bmri1787485-sup-0004-f4.pdfPDF document, 323.5 KB	Supplementary 4 Table S1: subcellular localization and transmembrane topology. Table S2: list of functions of the proteins found in PPI network. Table S3: properties of secondary structure from SOPMA database.
bmri1787485-sup-0005-f5.zipapplication/x-compressed, 351.7 KB	Supplementary 5 ROC analysis.

Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.

References

1 Klockgether J. and Tümmler B., Recent advances in understanding Pseudomonas aeruginosa as a pathogen, F1000Research. (2017) 6, https://doi.org/10.12688/f1000research.10506.1, 2-s2.0-85027457327.
10.12688/f1000research.10506.1
PubMed Google Scholar
2 Diggle S. P. and Whiteley M., Microbe profile: Pseudomonas aeruginosa: opportunistic pathogen and lab rat, Microbiology. (2020) 166, no. 1, 30–33, https://doi.org/10.1099/mic.0.000860, 31597590.
10.1099/mic.0.000860
CAS PubMed Web of Science® Google Scholar
3 Nikolaidis M., Mossialos D., Oliver S. G., and Amoutzias G. D., Comparative analysis of the core proteomes among the Pseudomonas major evolutionary groups reveals species-specific adaptations for Pseudomonas aeruginosa and Pseudomonas chlororaphis, Diversity. (2020) 12, no. 8, https://doi.org/10.3390/d12080289.
10.3390/d12080289
PubMed Google Scholar
4 Stover C. K., Pham X. Q., Erwin A. L., Mizoguchi S. D., Warrener P., Hickey M. J., Brinkman F. S. L., Hufnagle W. O., Kowalik D. J., Lagrou M., Garber R. L., Goltry L., Tolentino E., Westbrock-Wadman S., Yuan Y., Brody L. L., Coulter S. N., Folger K. R., Kas A., Larbig K., Lim R., Smith K., Spencer D., Wong G. K. S., Wu Z., Paulsen I. T., Reizer J., Saier M. H., Hancock R. E. W., Lory S., and Olson M. V., Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen, Nature. (2000) 406, no. 6799, 959–964, https://doi.org/10.1038/35023079, 2-s2.0-0034739007.
10.1038/35023079
CAS PubMed Web of Science® Google Scholar
5 Pachori P., Gothalwal R., and Gandhi P., Emergence of antibiotic resistance Pseudomonas aeruginosa in intensive care unit; a critical review, Genes & Diseases. (2019) 6, no. 2, 109–119, https://doi.org/10.1016/j.gendis.2019.04.001, 2-s2.0-85065719814.
10.1016/j.gendis.2019.04.001
PubMed Web of Science® Google Scholar
6 Dubern J.-. F., Cigana C., De Simone M., Lazenby J., Juhas M., Schwager S., Bianconi I., Döring G., Eberl L., Williams P., Bragonzi A., and Cámara M., Integrated whole-genome screening for Pseudomonas aeruginosa virulence genes using multiple disease models reveals that pathogenicity is host specific, Environmental microbiology. (2015) 17, no. 11, 4379–4393, https://doi.org/10.1111/1462-2920.12863, 2-s2.0-84949314449.
10.1111/1462-2920.12863
CAS PubMed Web of Science® Google Scholar
7 Azam M. W. and Khan A. U., Updates on the pathogenicity status of Pseudomonas aeruginosa, Drug Discovery Today. (2019) 24, no. 1, 350–359, https://doi.org/10.1016/j.drudis.2018.07.003, 2-s2.0-85050878866.
10.1016/j.drudis.2018.07.003
CAS PubMed Web of Science® Google Scholar
8 Periasamy S., Nair H. A. S., Lee K. W. K., Ong J., Goh J. Q. J., Kjelleberg S., and Rice S. A., Pseudomonas aeruginosa PAO1 exopolysaccharides are important for mixed species biofilm community development and stress tolerance, Frontiers in Microbiology. (2015) 6, https://doi.org/10.3389/fmicb.2015.00851, 2-s2.0-84940857668.
10.3389/fmicb.2015.00851
Google Scholar
9 Schleheck D., Barraud N., Klebensberger J., Webb J. S., McDougald D., Rice S. A., and Kjelleberg S., Pseudomonas aeruginosa PAO1 preferentially grows as aggregates in liquid batch cultures and disperses upon starvation, PLoS One. (2009) 4, no. 5, article e5513, https://doi.org/10.1371/journal.pone.0005513, 2-s2.0-65849315814, 19436737.
10.1371/journal.pone.0005513
PubMed Google Scholar
10 Klockgether J., Cramer N., Wiehlmann L., Davenport C. F., and Tümmler B., Pseudomonas aeruginosa genomic structure and diversity, Frontiers in Microbiology. (2011) 2, https://doi.org/10.3389/fmicb.2011.00150, 2-s2.0-84859103889.
10.3389/fmicb.2011.00150
PubMed Google Scholar
11 Prava J., Pranavathiyani G., and Pan A., Functional assignment for essential hypothetical proteins of Staphylococcus aureus N315, International Journal of Biological Macromolecules. (2018) 108, 765–774, https://doi.org/10.1016/j.ijbiomac.2017.10.169, 2-s2.0-85032747089.
10.1016/j.ijbiomac.2017.10.169
CAS PubMed Google Scholar
12 Luo H., Lin Y., Gao F., Zhang C. T., and Zhang R., DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements, Nucleic Acids Research. (2014) 42, no. D1, D574–D580, https://doi.org/10.1093/nar/gkt1131, 2-s2.0-84891786814, 24243843.
10.1093/nar/gkt1131
CAS PubMed Web of Science® Google Scholar
13 Araujo F. A., Barh D., Silva A., Guimarães L., and Ramos R. T. J., GO FEAT: a rapid web-based functional annotation tool for genomic and transcriptomic data, Scientific Reports. (2018) 8, no. 1, article 1794, https://doi.org/10.1038/s41598-018-20211-9, 2-s2.0-85041242400.
10.1038/s41598-018-20211-9
Google Scholar
14 Geer L. Y., Domrachev M., Lipman D. J., and Bryant S. H., CDART: protein homology by domain architecture, Genome Research. (2002) 12, no. 10, 1619–1623, https://doi.org/10.1101/gr.278202, 2-s2.0-0036796467, 12368255.
10.1101/gr.278202
CAS PubMed Web of Science® Google Scholar
15 Letunic I., Khedkar S., and Bork P., SMART: recent updates, new developments and status in 2020, Nucleic Acids Research. (2021) 49, no. D1, D458–D460, https://doi.org/10.1093/nar/gkaa937, 33104802.
10.1093/nar/gkaa937
CAS PubMed Web of Science® Google Scholar
16 Gough J., Karplus K., Hughey R., and Chothia C., Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure, Journal of Molecular Biology. (2001) 313, no. 4, 903–919, https://doi.org/10.1006/jmbi.2001.5080, 2-s2.0-0035798406.
10.1006/jmbi.2001.5080
CAS PubMed Web of Science® Google Scholar
17 Mistry J., Chuguransky S., Williams L., Qureshi M., Salazar G. A., Sonnhammer E. L. L., Tosatto S. C. E., Paladin L., Raj S., Richardson L. J., Finn R. D., and Bateman A., Pfam: the protein families database in 2021, Nucleic Acids Research. (2021) 49, no. D1, D412–D419, https://doi.org/10.1093/nar/gkaa913, 33125078.
10.1093/nar/gkaa913
CAS PubMed Web of Science® Google Scholar
18 Li Y. H., Xu J. Y., Tao L., Li X. F., Li S., Zeng X., Chen S. Y., Zhang P., Qin C., Zhang C., Chen Z., Zhu F., and Chen Y. Z., SVM-Prot 2016: a web-server for machine learning prediction of protein functional families from sequence irrespective of similarity, PLoS One. (2016) 11, no. 8, article e0155290, https://doi.org/10.1371/journal.pone.0155290, 2-s2.0-84984783441, 27525735.
10.1371/journal.pone.0155290
PubMed Web of Science® Google Scholar
19 Sillitoe I., Bordin N., Dawson N., Waman V. P., Ashford P., Scholes H. M., Pang C. S. M., Woodridge L., Rauer C., Sen N., Abbasian M., le Cornu S., Lam S. D., Berka K., Varekova I. H., Svobodova R., Lees J., and Orengo C. A., CATH: increased structural coverage of functional space, Nucleic Acids Research. (2021) 49, no. D1, D266–D273, https://doi.org/10.1093/nar/gkaa1079, 33237325.
10.1093/nar/gkaa1079
CAS PubMed Web of Science® Google Scholar
20 Blum M., Chang H.-Y., Chuguransky S., Grego T., Kandasaamy S., Mitchell A., Nuka G., Paysan-Lafosse T., Qureshi M., Raj S., Richardson L., Salazar G. A., Williams L., Bork P., Bridge A., Gough J., Haft D. H., Letunic I., Marchler-Bauer A., Mi H., Natale D. A., Necci M., Orengo C. A., Pandurangan A. P., Rivoire C., Sigrist C. J. A., Sillitoe I., Thanki N., Thomas P. D., Tosatto S. C. E., Wu C. H., Bateman A., and Finn R. D., The InterPro protein families and domains database: 20 years on, Nucleic Acids Research. (2021) 49, no. D1, D344–D354, https://doi.org/10.1093/nar/gkaa977, 33156333.
10.1093/nar/gkaa977
CAS PubMed Web of Science® Google Scholar
21 Gabler F., Nam S. Z., Till S., Mirdita M., Steinegger M., Söding J., Lupas A. N., and Alva V., Protein sequence analysis using the MPI bioinformatics toolkit, Current Protocols in Bioinformatics. (2020) 72, no. 1, article e108, https://doi.org/10.1002/cpbi.108, 33315308.
10.1002/cpbi.108
PubMed Google Scholar
22 Koskinen P., Törönen P., Nokso-Koivisto J., and Holm L., PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment, Bioinformatics. (2015) 31, no. 10, 1544–1552, https://doi.org/10.1093/bioinformatics/btu851, 2-s2.0-84929622388, 25653249.
10.1093/bioinformatics/btu851
CAS PubMed Web of Science® Google Scholar
23 Hawkins T., Chitale M., Luban S., and Kihara D., PFP: automated prediction of gene ontology functional annotations with confidence scores using protein sequence data, Proteins: Structure, Function, and Bioinformatics. (2009) 74, no. 3, 566–582, https://doi.org/10.1002/prot.22172, 2-s2.0-59849089151.
10.1002/prot.22172
CAS PubMed Web of Science® Google Scholar
24 Chitale M., Hawkins T., Park C., and Kihara D., ESG: extended similarity group method for automated protein function prediction, Bioinformatics. (2009) 25, no. 14, 1739–1745, https://doi.org/10.1093/bioinformatics/btp309, 2-s2.0-67649868148, 19435743.
10.1093/bioinformatics/btp309
CAS PubMed Web of Science® Google Scholar
25 Yu N. Y., Wagner J. R., Laird M. R., Melli G., Rey S., Lo R., Dao P., Sahinalp S. C., Ester M., Foster L. J., and Brinkman F. S. L., PSORTb 3.0: improved protein subcellular localization prediction with refined localization subcategories and predictive capabilities for all prokaryotes, Bioinformatics. (2010) 26, no. 13, 1608–1615, https://doi.org/10.1093/bioinformatics/btq249, 2-s2.0-77954199597, 20472543.
10.1093/bioinformatics/btq249
CAS PubMed Web of Science® Google Scholar
26 Yu C. S., Lin C. J., and Hwang J. K., Predicting subcellular localization of proteins for Gram-negative bacteria by support vector machines based on n-peptide compositions, Protein Science. (2004) 13, no. 5, 1402–1406, https://doi.org/10.1110/ps.03479604, 2-s2.0-1942505330, 15096640.
10.1110/ps.03479604
CAS PubMed Web of Science® Google Scholar
27 Yu C. S., Chen Y. C., Lu C. H., and Hwang J. K., Prediction of protein subcellular localization, Proteins: Structure, Function, and Bioinformatics. (2006) 64, no. 3, 643–651, https://doi.org/10.1002/prot.21018, 2-s2.0-33746218840.
10.1002/prot.21018
CAS PubMed Web of Science® Google Scholar
28 Krogh A., Larsson B., von Heijne G., and Sonnhammer E. L. L., Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes, Journal of Molecular Biology. (2001) 305, no. 3, 567–580, https://doi.org/10.1006/jmbi.2000.4315, 2-s2.0-0035910270.
10.1006/jmbi.2000.4315
CAS PubMed Web of Science® Google Scholar
29 Käll L., Krogh A., and Sonnhammer E. L. L., A combined transmembrane topology and signal peptide prediction method, Journal of Molecular Biology. (2004) 338, no. 5, 1027–1036, https://doi.org/10.1016/j.jmb.2004.03.016, 2-s2.0-2142657817, 15111065.
10.1016/j.jmb.2004.03.016
CAS PubMed Web of Science® Google Scholar
30 Tusnády G. E. and Simon I., The HMMTOP transmembrane topology prediction server, Bioinformatics. (2001) 17, no. 9, 849–850, https://doi.org/10.1093/bioinformatics/17.9.849, 2-s2.0-0034786532.
10.1093/bioinformatics/17.9.849
CAS PubMed Web of Science® Google Scholar
31 Dobson L., Reményi I., and Tusnády G. E., CCTOP: a consensus constrained topology prediction web server, Nucleic Acids Research. (2015) 43, no. W1, W408–W412, https://doi.org/10.1093/nar/gkv451, 2-s2.0-84979853667, 25943549.
10.1093/nar/gkv451
CAS PubMed Web of Science® Google Scholar
32 Omasits U., Ahrens C. H., Müller S., and Wollscheid B., Protter: interactive protein feature visualization and integration with experimental proteomic data, Bioinformatics. (2014) 30, no. 6, 884–886, https://doi.org/10.1093/bioinformatics/btt607, 2-s2.0-84897873840, 24162465.
10.1093/bioinformatics/btt607
CAS PubMed Web of Science® Google Scholar
33 Nielsen H., Predicting secretory proteins with SignalP, Protein function prediction, 2017, Springer, 59–73.
10.1007/978-1-4939-7015-5_6
Google Scholar
34 Hiller K., Grote A., Scheer M., Munch R., and Jahn D., PrediSi: prediction of signal peptides and their cleavage positions, Nucleic Acids Research. (2004) 32, no. Web Server, W375–W379, https://doi.org/10.1093/nar/gkh378, 2-s2.0-3242878999, 15215414.
10.1093/nar/gkh378
CAS PubMed Web of Science® Google Scholar
35 Ganapathiraju M., Balakrishnan N., Reddy R., and Klein-Seetharaman J., Transmembrane helix prediction using amino acid property features and latent semantic analysis, BMC Bioinformatics. (2008) 9, no. Supplement 1, https://doi.org/10.1186/1471-2105-9-S1-S4, 2-s2.0-84872004750, 18315857.
10.1186/1471-2105-9-S1-S4
PubMed Google Scholar
36 Gasteiger E., Gattiker A., Hoogland C., Ivanyi I., Appel R. D., and Bairoch A., ExPASy: the proteomics server for in-depth protein knowledge and analysis, Nucleic Acids Research. (2003) 31, no. 13, 3784–3788, https://doi.org/10.1093/nar/gkg563, 2-s2.0-0043122944, 12824418.
10.1093/nar/gkg563
CAS PubMed Web of Science® Google Scholar
37 Shahbaaz M., ImtaiyazHassan M., and Ahmad F., Functional annotation of conserved hypothetical proteins from Haemophilus influenzae Rd KW20, PLoS One. (2013) 8, no. 12, article e84263, https://doi.org/10.1371/journal.pone.0084263, 2-s2.0-84894214238, 24391926.
10.1371/journal.pone.0084263
PubMed Google Scholar
38 Naqvi A. A. T., Shahbaaz M., Ahmad F., and Hassan M. I., Identification of functional candidates amongst hypothetical proteins of Treponema pallidum ssp. pallidum, PLoS One. (2015) 10, no. 4, article e0124177, https://doi.org/10.1371/journal.pone.0124177, 2-s2.0-84929995550.
10.1371/journal.pone.0124177
Google Scholar
39 Zhou G., Soufan O., Ewald J., Hancock R. E. W., Basu N., and Xia J., NetworkAnalyst 3.0: a visual analytics platform for comprehensive gene expression profiling and meta-analysis, Nucleic Acids Research. (2019) 47, no. W1, W234–W241, https://doi.org/10.1093/nar/gkz240, 2-s2.0-85068482670, 30931480.
10.1093/nar/gkz240
CAS PubMed Web of Science® Google Scholar
40 Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B., and Ideker T., Cytoscape: a software environment for integrated models of biomolecular interaction networks, Genome Research. (2003) 13, no. 11, 2498–2504, https://doi.org/10.1101/gr.1239303, 2-s2.0-0242490780, 14597658.
10.1101/gr.1239303
CAS PubMed Web of Science® Google Scholar
41 Shende G., Haldankar H., Barai R. S., Bharmal M. H., Shetty V., and Idicula-Thomas S., PBIT: pipeline builder for identification of drug targets for infectious diseases, Bioinformatics. (2017) 33, no. 6, 929–931, https://doi.org/10.1093/bioinformatics/btw760, 2-s2.0-85018281619, 28039165.
10.1093/bioinformatics/btw760
CAS PubMed Web of Science® Google Scholar
42 Valdes A. M., Walter J., Segal E., and Spector T. D., Role of the gut microbiota in nutrition and health, BMJ. (2018) 361, article k2179, https://doi.org/10.1136/bmj.k2179, 2-s2.0-85048672675.
10.1136/bmj.k2179
Google Scholar
43 Saha S. and Raghava G. P. S., VICMpred: an SVM-based method for the prediction of functional proteins of Gram-negative bacteria using amino acid patterns and composition, Genomics, Proteomics & Bioinformatics. (2006) 4, no. 1, 42–47, https://doi.org/10.1016/S1672-0229(06)60015-6, 2-s2.0-33645715647, 16689701.
10.1016/S1672-0229(06)60015-6
CAS PubMed Google Scholar
44 Garg A. and Gupta D., VirulentPred: a SVM based prediction method for virulent proteins in bacterial pathogens, BMC Bioinformatics. (2008) 9, no. 1, https://doi.org/10.1186/1471-2105-9-62, 2-s2.0-39849091628.
10.1186/1471-2105-9-62
Google Scholar
45 Gupta A., Kapil R., Dhakan D. B., and Sharma V. K., MP3: a software tool for the prediction of pathogenic proteins in genomic and metagenomic data, PLoS One. (2014) 9, no. 4, article e93907, https://doi.org/10.1371/journal.pone.0093907, 2-s2.0-84899667632, 24736651.
10.1371/journal.pone.0093907
PubMed Web of Science® Google Scholar
46 Emmerich C. H., Gamboa L. M., Hofmann M. C. J., Bonin-Andresen M., Arbach O., Schendel P., Gerlach B., Hempel K., Bespalov A., Dirnagl U., and Parnham M. J., Improving target assessment in biomedical research: the GOT-IT recommendations, Nature Reviews Drug Discovery. (2020) 20, no. 1, 64–81, https://doi.org/10.1038/s41573-020-0087-3.
10.1038/s41573-020-0087-3
PubMed Google Scholar
47 Wishart D. S., Feunang Y. D., Guo A. C., Lo E. J., Marcu A., Grant J. R., Sajed T., Johnson D., Li C., Sayeeda Z., Assempour N., Iynkkaran I., Liu Y., Maciejewski A., Gale N., Wilson A., Chin L., Cummings R., le D., Pon A., Knox C., and Wilson M., DrugBank 5.0: a major update to the DrugBank database for 2018, Nucleic Acids Research. (2018) 46, no. D1, D1074–D1082, https://doi.org/10.1093/nar/gkx1037, 2-s2.0-85040924244, 29126136.
10.1093/nar/gkx1037
CAS PubMed Web of Science® Google Scholar
48 Geourjon C. and Deléage G., SOPMA: significant improvements in protein secondary structure prediction by consensus prediction from multiple alignments, Bioinformatics. (1995) 11, no. 6, 681–684, https://doi.org/10.1093/bioinformatics/11.6.681, 2-s2.0-0029595442, 8808585.
10.1093/bioinformatics/11.6.681
CAS PubMed Web of Science® Google Scholar
49 Buchan D. W. A. and Jones D. T., The PSIPRED protein analysis workbench: 20 years on, Nucleic Acids Research. (2019) 47, no. W1, W402–W407, https://doi.org/10.1093/nar/gkz297, 2-s2.0-85068978904, 31251384.
10.1093/nar/gkz297
CAS PubMed Web of Science® Google Scholar
50 Waterhouse A., Bertoni M., Bienert S., Studer G., Tauriello G., Gumienny R., Heer F. T., de Beer T. A. P., Rempfer C., Bordoli L., Lepore R., and Schwede T., SWISS-MODEL: homology modelling of protein structures and complexes, Nucleic Acids Research. (2018) 46, no. W1, W296–W303, https://doi.org/10.1093/nar/gky427, 2-s2.0-85050235281, 29788355.
10.1093/nar/gky427
CAS PubMed Web of Science® Google Scholar
51 Yang J., Anishchenko I., Park H., Peng Z., Ovchinnikov S., and Baker D., Improved protein structure prediction using predicted interresidue orientations, Proceedings of the National Academy of Sciences of the United States of America. (2020) 117, no. 3, 1496–1503, https://doi.org/10.1073/pnas.1914677117, 31896580.
10.1073/pnas.1914677117
CAS PubMed Web of Science® Google Scholar
52 HeeShin W., Prediction of protein structure and interaction by GALAXY protein modeling programs, Biodesign. (2014) 2, 1–11.
Google Scholar
53 Colovos C. and Yeates T. O., Verification of protein structures: patterns of nonbonded atomic interactions, Protein Science. (1993) 2, no. 9, 1511–1519, https://doi.org/10.1002/pro.5560020916, 2-s2.0-0027180507, 8401235.
10.1002/pro.5560020916
CAS PubMed Web of Science® Google Scholar
54 Bowie J. U., Lüthy R., and Eisenberg D., A method to identify protein sequences that fold into a known three-dimensional structure, Science. (1991) 253, no. 5016, 164–170, https://doi.org/10.1126/science.1853201, 2-s2.0-0025830469.
10.1126/science.1853201
CAS PubMed Web of Science® Google Scholar
55 Lüthy R., Bowie J. U., and Eisenberg D., Assessment of protein models with three-dimensional profiles, Nature. (1992) 356, no. 6364, 83–85, https://doi.org/10.1038/356083a0, 2-s2.0-0026610767.
10.1038/356083a0
CAS PubMed Web of Science® Google Scholar
56 Pontius J., Richelle J., and Wodak S. J., Deviations from standard atomic volumes as a quality measure for protein crystal structures, Journal of Molecular Biology. (1996) 264, no. 1, 121–136, https://doi.org/10.1006/jmbi.1996.0628, 2-s2.0-0030598343, 8950272.
10.1006/jmbi.1996.0628
CAS PubMed Web of Science® Google Scholar
57 Hooft R. W., Vriend G., Sander C., and Abola E. E., Errors in protein structures, Nature. (1996) 381, no. 6580, 272–272, https://doi.org/10.1038/381272a0, 2-s2.0-0030047142.
10.1038/381272a0
CAS PubMed Web of Science® Google Scholar
58 Laskowski R. A., MacArthur M. W., Moss D. S., and Thornton J. M., PROCHECK: a program to check the stereochemical quality of protein structures, Journal of Applied Crystallography. (1993) 26, no. 2, 283–291, https://doi.org/10.1107/S0021889892009944.
10.1107/S0021889892009944
CAS PubMed Web of Science® Google Scholar
59 Bradley A. P., The use of the area under the ROC curve in the evaluation of machine learning algorithms, Pattern Recognition. (1997) 30, no. 7, 1145–1159, https://doi.org/10.1016/S0031-3203(96)00142-2, 2-s2.0-0031191630.
10.1016/S0031-3203(96)00142-2
Web of Science® Google Scholar
60 Eng J., ROC Analysis: Web-Based Calculator for ROC Curves, 2017, Johns Hopkins University, Baltimore, May 2017, http://www.jrocfit.org.
Google Scholar
61 Kuk A. C. Y., Mashalidis E. H., and Lee S.-Y., Crystal structure of the MOP flippase MurJ in an inward-facing conformation, Nature Structural & Molecular Biology. (2017) 24, no. 2, 171–176, https://doi.org/10.1038/nsmb.3346, 2-s2.0-85007060787, 28024149.
10.1038/nsmb.3346
CAS PubMed Web of Science® Google Scholar
62 Islam M. S., Shahik S. M., Sohel M., Patwary N. I. A., and Hasan M. A., In silico structural and functional annotation of hypothetical proteins of Vibrio cholerae O139, Genomics & informatics. (2015) 13, no. 2, 53–59, https://doi.org/10.5808/GI.2015.13.2.53, 26175663.
10.5808/GI.2015.13.2.53
PubMed Google Scholar
63 Malhotra H. and Kaur H., A bioinformatics approach for functional and structural analysis of hypothetical proteins of Clostridium difficile, Imperial Journal of Interdisciplinary Research. (2016) 2, 1601–1609.
Google Scholar
64 Ikai A., Thermostability and aliphatic index of globular proteins, The Journal of Biochemistry. (1980) 88, no. 6, 1895–1898, https://doi.org/10.1093/oxfordjournals.jbchem.a133168, 7462208.
10.1093/oxfordjournals.jbchem.a133168
CAS PubMed Web of Science® Google Scholar
65 da Costa W. L. O., de Aragão Araújo C. L., Dias L. M., de Sousa Pereira L. C., Alves J. T. C., Araújo F. A., Folador E. L., Henriques I., Silva A., and Folador A. R. C., Functional annotation of hypothetical proteins from the Exiguobacterium antarcticum strain B7 reveals proteins involved in adaptation to extreme environments, including high arsenic resistance, PLoS One. (2018) 13, no. 6, article e0198965, https://doi.org/10.1371/journal.pone.0198965, 2-s2.0-85049006411, 29940001.
10.1371/journal.pone.0198965
PubMed Google Scholar
66 Guruprasad K., Reddy B. V. B., and Pandit M. W., Correlation between stability of a protein and its dipeptide composition: a novel approach for predicting in vivo stability of a protein from its primary sequence, Protein Engineering, Design and Selection. (1990) 4, no. 2, 155–161, https://doi.org/10.1093/protein/4.2.155, 2-s2.0-0025612425.
10.1093/protein/4.2.155
CAS Web of Science® Google Scholar
67 Kyte J. and Doolittle R. F., A simple method for displaying the hydropathic character of a protein, Journal of Molecular Biology. (1982) 157, no. 1, 105–132, https://doi.org/10.1016/0022-2836(82)90515-0, 2-s2.0-0020475449, 7108955.
10.1016/0022-2836(82)90515-0
CAS PubMed Web of Science® Google Scholar
68 Xia J., Benner M. J., and Hancock R. E. W., NetworkAnalyst-integrative approaches for protein–protein interaction network analysis and visual exploration, Nucleic Acids Research. (2014) 42, no. W1, W167–W174, https://doi.org/10.1093/nar/gku443, 2-s2.0-84904801714, 24861621.
10.1093/nar/gku443
CAS PubMed Web of Science® Google Scholar
69 Meng L., Liu H., Lan T., Dong L., Hu H., Zhao S., Zhang Y., Zheng N., and Wang J., Antibiotic resistance patterns of Pseudomonas spp. isolated from raw milk revealed by whole genome sequencing, Frontiers in Microbiology. (2020) 11, https://doi.org/10.3389/fmicb.2020.01005, 32655503.
10.3389/fmicb.2020.01005
PubMed Google Scholar
70 Poole K., Pseudomonas aeruginosa: resistance to the max, Frontiers in Microbiology. (2011) 2, https://doi.org/10.3389/fmicb.2011.00065, 2-s2.0-84857660045.
10.3389/fmicb.2011.00065
PubMed Web of Science® Google Scholar
71 Rahman A., Susmi T. F., Yasmin F., Karim M. E., and Hossain M. U., Functional annotation of an ecologically important protein from Chloroflexus aurantiacus involved in polyhydroxyalkanoates (PHA) biosynthetic pathway, SN Applied Sciences. (2020) 2, no. 11, article 1810, https://doi.org/10.1007/s42452-020-03598-x.
10.1007/s42452-020-03598-x
Google Scholar
72 Itzhak D. N., Tyanova S., Cox J., and Borner G. H. H., Global, quantitative and dynamic mapping of protein subcellular localization, eLife. (2016) 5, article e16950, https://doi.org/10.7554/eLife.16950, 2-s2.0-84979673478, 27278775.
10.7554/eLife.16950
PubMed Google Scholar
73 Samsonov G. V., Handbook of the Physicochemical Properties of the Elements, 2012, Springer Science & Business Media.
Google Scholar
74 Lazzaroni J.-C., Dubuisson J.-F., and Vianney A., The Tol proteins of Escherichia coli and their involvement in the translocation of group A colicins, Biochimie. (2002) 84, no. 5-6, 391–397, https://doi.org/10.1016/s0300-9084(02)01419-0, 2-s2.0-0036589166.
10.1016/S0300-9084(02)01419-0
CAS PubMed Web of Science® Google Scholar
75 Crow A., Greene N. P., Kaplan E., and Koronakis V., Structure and mechanotransmission mechanism of the MacB ABC transporter superfamily, Proceedings of the National Academy of Sciences of the United States of America. (2017) 114, no. 47, 12572–12577, https://doi.org/10.1073/pnas.1712153114, 2-s2.0-85034566033, 29109272.
10.1073/pnas.1712153114
CAS PubMed Web of Science® Google Scholar
76 Doumith M., Mushtaq S., Livermore D., and Woodford N., New insights into the regulatory pathways associated with the activation of the stringent response in bacterial resistance to the PBP2-targeted antibiotics, mecillinam and OP0595/RG6080, Journal of Antimicrobial Chemotherapy. (2016) 71, no. 10, 2810–2814, https://doi.org/10.1093/jac/dkw230, 2-s2.0-84994762906, 27330062.
10.1093/jac/dkw230
CAS PubMed Google Scholar
77 Ropp P. A., Hu M., Olesky M., and Nicholas R. A., Mutations in ponA, the gene encoding penicillin-binding protein 1, and a novel locus, penC, are required for high-level chromosomally mediated penicillin resistance in Neisseria gonorrhoeae, Antimicrobial Agents and Chemotherapy. (2002) 46, no. 3, 769–777, https://doi.org/10.1128/AAC.46.3.769-777.2002, 2-s2.0-0036171501, 11850260.
10.1128/AAC.46.3.769-777.2002
CAS PubMed Web of Science® Google Scholar
78 Yu H., Kim P. M., Sprecher E., Trifonov V., and Gerstein M., The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics, PLoS Computational Biology. (2007) 3, no. 4, article e59, https://doi.org/10.1371/journal.pcbi.0030059, 2-s2.0-34247622363, 17447836.
10.1371/journal.pcbi.0030059
PubMed Web of Science® Google Scholar
79 Müller M. and Papadopoulou B., Stage-specific expression of the glycine cleavage complex subunits in Leishmania infantum, Molecular and Biochemical Parasitology. (2010) 170, no. 1, 17–27, https://doi.org/10.1016/j.molbiopara.2009.11.009, 2-s2.0-74449084727.
10.1016/j.molbiopara.2009.11.009
CAS PubMed Web of Science® Google Scholar
80 An G., Bendiak D. S., Mamelak L. A., and Friesen J. D., Organization and nucleotide sequence of a new ribosomal operon in Escherichia coli containing the genes for ribosomal protein S2 and elongation factor Ts, Nucleic Acids Research. (1981) 9, no. 16, 4163–4172, https://doi.org/10.1093/nar/9.16.4163, 2-s2.0-0019827917.
10.1093/nar/9.16.4163
CAS PubMed Web of Science® Google Scholar
81 Conrad J., Sun D., Englund N., and Ofengand J., The rluC gene of Escherichia coli codes for a pseudouridine synthase that is solely responsible for synthesis of pseudouridine at positions 955, 2504, and 2580 in 23 S ribosomal RNA, Journal of Biological Chemistry. (1998) 273, no. 29, 18562–18566, https://doi.org/10.1074/jbc.273.29.18562, 2-s2.0-0032540887.
10.1074/jbc.273.29.18562
CAS PubMed Web of Science® Google Scholar
82 Makharashvili N., Koroleva O., Bera S., Grandgenett D. P., and Korolev S., A novel structure of DNA repair protein RecO from Deinococcus radiodurans, Structure. (2004) 12, no. 10, 1881–1889, https://doi.org/10.1016/j.str.2004.08.006, 2-s2.0-4644266683.
10.1016/j.str.2004.08.006
CAS PubMed Web of Science® Google Scholar
83 Zheng S., Sham L.-T., Rubino F. A., Brock K. P., Robins W. P., Mekalanos J. J., Marks D. S., Bernhardt T. G., and Kruse A. C., Structure and mutagenic analysis of the lipid II flippase MurJ from Escherichia coli, Proceedings of the National Academy of Sciences of the United States of America. (2018) 115, no. 26, 6709–6714, https://doi.org/10.1073/pnas.1802192115, 2-s2.0-85049045767, 29891673.
10.1073/pnas.1802192115
CAS PubMed Web of Science® Google Scholar
84 Zuberi A. R., Ying C., Bischoff D. S., and Ordal G. W., Gene-protein relationships in the flagellar hook-basal body complex of Bacillus subtilis: sequences of the flgB, flgC, flgG, fliE and fliF genes, Gene. (1991) 101, no. 1, 23–31, https://doi.org/10.1016/0378-1119(91)90220-6, 2-s2.0-0025801342.
10.1016/0378-1119(91)90220-6
CAS PubMed Web of Science® Google Scholar
85 Matsuzawa H., Asoh S., Kunai K., Muraiso K., Takasuga A., and Ohta T., Nucleotide sequence of the rodA gene, responsible for the rod shape of Escherichia coli: rodA and the pbpA gene, encoding penicillin-binding protein 2, constitute the rodA operon, Journal of Bacteriology. (1989) 171, no. 1, 558–560, https://doi.org/10.1128/jb.171.1.558-560.1989, 2-s2.0-0024572662, 2644207.
10.1128/jb.171.1.558-560.1989
CAS PubMed Web of Science® Google Scholar
86 Martin P. R., Hobbs M., Free P. D., Jeske Y., and Mattick J. S., Characterization of pilQ, a new gene required for the biogenesis of type 4 fimbriae in Pseudomonas aeruginosa, Molecular Microbiology. (1993) 9, no. 4, 857–868, https://doi.org/10.1111/j.1365-2958.1993.tb01744.x, 2-s2.0-0027235766.
10.1111/j.1365-2958.1993.tb01744.x
CAS PubMed Web of Science® Google Scholar
87 El Yacoubi B., Lyons B., Cruz Y., Reddy R., Nordin B., Agnelli F., Williamson J. R., Schimmel P., Swairjo M. A., and de Crecy-Lagard V., The universal YrdC/Sua5 family is required for the formation of threonylcarbamoyladenosine in tRNA, Nucleic Acids Research. (2009) 37, no. 9, 2894–2909, https://doi.org/10.1093/nar/gkp152, 2-s2.0-66249095330, 19287007.
10.1093/nar/gkp152
CAS PubMed Web of Science® Google Scholar
88 Babinski K. J., Genetic and Biochemical Characterization of the Specific UDP-2, 3-Diacylglucosamine Hydrolase of Lipid a Biosynthesis, 2004, https://doi.org/10.1074/jbc.m204067200, 2-s2.0-0037135537.
10.1074/jbc.m204067200
Google Scholar
89 Metzger L. E., Lee J. K., Stroud R. M., and Raetz C. R. H., Discovery, characterization, and structural determination of a novel UDP-2,3-diacylglucosamine hydrolase, The FASEB Journal. (2010) 24, no. Supplement 1, https://doi.org/10.1096/fasebj.24.1_supplement.509.1.
10.1096/fasebj.24.1_supplement.509.1
PubMed Google Scholar
90 Yum D.-Y., Lee B.-Y., and Pan J.-G., Identification of the yqhE and yafB Genes Encoding Two 2,5-Diketo- d -Gluconate Reductases in Escherichia coli, Applied and Environmental Microbiology. (1999) 65, no. 8, 3341–3346, https://doi.org/10.1128/aem.65.8.3341-3346.1999.
10.1128/AEM.65.8.3341-3346.1999
CAS PubMed Web of Science® Google Scholar
91 Moynie L., Schnell R., McMahon S. A., Sandalova T., Boulkerou W. A., Schmidberger J. W., Alphey M., Cukier C., Duthie F., Kopec J., and Liu H., The AEROPATH project targeting Pseudomonas aeruginosa: crystallographic studies for assessment of potential targets in early-stage drug discovery, Acta Crystallographica Section F: Structural Biology and Crystallization Communications. (2013) 69, no. 1, 25–34.
10.1107/S1744309112044739
CAS PubMed Web of Science® Google Scholar
92 Chakravarty S., Melton C. N., Bailin A., Yahr T. L., and Anderson G. G., Pseudomonas aeruginosa magnesium transporter MgtE inhibits type III secretion system gene expression by stimulating rsmYZ transcription, Journal of Bacteriology. (2017) 199, no. 23, https://doi.org/10.1128/JB.00268-17, 2-s2.0-85032994220, 28847924.
10.1128/JB.00268-17
PubMed Google Scholar
93 Chakravarty S., Ramos-Hegazy L., Gasparovic A., and Anderson G. G., DNA alternate polymerase PolB mediates inhibition of type III secretion in Pseudomonas aeruginosa, Microbes and Infection. (2021) 23, no. 2-3, article 104777, https://doi.org/10.1016/j.micinf.2020.11.004.
10.1016/j.micinf.2020.11.004
PubMed Google Scholar
94 Overhage J., Schemionek M., Webb J. S., and Rehm B. H. A., Expression of the psl operon in Pseudomonas aeruginosa PAO1 biofilms: PslA performs an essential function in biofilm formation, Applied and Environmental Microbiology. (2005) 71, no. 8, 4407–4413, https://doi.org/10.1128/AEM.71.8.4407-4413.2005, 2-s2.0-23744495448, 16085831.
10.1128/AEM.71.8.4407-4413.2005
CAS PubMed Web of Science® Google Scholar
95 Campisano A., Schroeder C., Schemionek M., Overhage J., and Rehm B. H. A., PslD is a secreted protein required for biofilm formation by Pseudomonas aeruginosa, Applied and Environmental Microbiology. (2006) 72, no. 4, 3066–3068, https://doi.org/10.1128/AEM.72.4.3066-3068.2006, 2-s2.0-33646118246, 16598021.
10.1128/AEM.72.4.3066-3068.2006
CAS PubMed Web of Science® Google Scholar
96 Sperandeo P., Cescutti R., Villa R., di Benedetto C., Candia D., Dehò G., and Polissi A., Characterization of lptA and lptB, two essential genes implicated in lipopolysaccharide transport to the outer membrane of Escherichia coli, Journal of Bacteriology. (2007) 189, no. 1, 244–253, https://doi.org/10.1128/JB.01126-06, 2-s2.0-33845950113, 17056748.
10.1128/JB.01126-06
CAS PubMed Web of Science® Google Scholar
97 Konyecsni W. M. and Deretic V., DNA sequence and expression analysis of algP and algQ, components of the multigene system transcriptionally regulating mucoidy in Pseudomonas aeruginosa: algP contains multiple direct repeats, Journal of Bacteriology. (1990) 172, no. 5, 2511–2520, https://doi.org/10.1128/jb.172.5.2511-2520.1990.
10.1128/jb.172.5.2511-2520.1990
CAS PubMed Web of Science® Google Scholar
98 Chung B. C., Zhao J., Gillespie R. A., Kwon D. Y., Guan Z., Hong J., Zhou P., and Lee S. Y., Crystal structure of MraY, an essential membrane enzyme for bacterial cell wall synthesis, Science. (2013) 341, no. 6149, 1012–1016, https://doi.org/10.1126/science.1236501, 2-s2.0-84883060074, 23990562.
10.1126/science.1236501
CAS PubMed Web of Science® Google Scholar
99 Tashiro Y., Nomura N., Nakao R., Senpuku H., Kariyama R., Kumon H., Kosono S., Watanabe H., Nakajima T., and Uchiyama H., Opr86 is essential for viability and is a potential candidate for a protective antigen against biofilm formation by Pseudomonas aeruginosa, Journal of Bacteriology. (2008) 190, no. 11, 3969–3978, https://doi.org/10.1128/JB.02004-07, 2-s2.0-44349193490, 18390657.
10.1128/JB.02004-07
CAS PubMed Web of Science® Google Scholar
100 Jenkins D. E., Auger E. A., and Matin A., Role of RpoH, a heat shock regulator protein, in Escherichia coli carbon starvation protein synthesis and survival, Journal of Bacteriology. (1991) 173, no. 6, 1992–1996, https://doi.org/10.1128/jb.173.6.1992-1996.1991, 2-s2.0-0026035207, 2002001.
10.1128/jb.173.6.1992-1996.1991
CAS PubMed Web of Science® Google Scholar
101 Samuelson J. C., Chen M., Jiang F., Möller I., Wiedmann M., Kuhn A., Phillips G. J., and Dalbey R. E., YidC mediates membrane protein insertion in bacteria, Nature. (2000) 406, no. 6796, 637–641, https://doi.org/10.1038/35020586, 2-s2.0-0034632819.
10.1038/35020586
CAS PubMed Web of Science® Google Scholar
102 Fussenegger M., Facius D., Meier J., and Meyer T. F., A novel peptidoglycan-linked lipoprotein (ComL) that functions in natural transformation competence of Neisseria gonorrhoeae, Molecular Microbiology. (1996) 19, no. 5, 1095–1105, https://doi.org/10.1046/j.1365-2958.1996.457984.x, 2-s2.0-0029867376, 8830266.
10.1046/j.1365-2958.1996.457984.x
CAS PubMed Web of Science® Google Scholar
103 Chiu H. C., Lin T. L., and Wang J. T., Identification and characterization of an organic solvent tolerance gene in Helicobacter pylori, Helicobacter. (2007) 12, no. 1, 74–81, https://doi.org/10.1111/j.1523-5378.2007.00473.x, 2-s2.0-33846184987, 17241305.
10.1111/j.1523-5378.2007.00473.x
CAS PubMed Web of Science® Google Scholar
104 Rao V. S., Srinivas K., Sujini G. N., and Kumar G. N. S., Protein-protein interaction detection: methods and analysis, International Journal of Proteomics. (2014) 2014, 12, 147648, https://doi.org/10.1155/2014/147648, 24693427.
10.1155/2014/147648
CAS PubMed Google Scholar
105 Imam N., Alam A., Ali R., Siddiqui M. F., Ali S., Malik M. Z., and Ishrat R., In silico characterization of hypothetical proteins from Orientia tsutsugamushi str. Karp uncovers virulence genes, Heliyon. (2019) 5, no. 10, article e02734, https://doi.org/10.1016/j.heliyon.2019.e02734, 2-s2.0-85074196927.
10.1016/j.heliyon.2019.e02734
PubMed Google Scholar
106 Rahman A., Sarker M. T., Islam M. A., Hossain M. U., Hasan M., and Susmi T. F., Targeting essential hypothetical proteins of Pseudomonas aeruginosa PAO1 for mining of novel therapeutics: an in silico approach, Research Square. (2022) https://doi.org/10.21203/rs.3.rs-1650735/v1.
10.21203/rs.3.rs-1650735/v1
Google Scholar

Citing Literature

All articles

Targeting Essential Hypothetical Proteins of Pseudomonas aeruginosa PAO1 for Mining of Novel Therapeutics: An In Silico Approach

Abstract

1. Introduction

2. Materials and Methods

2.1. Sequence Retrieval and Analysis

2.2. Segment I: Functional Annotation and Properties Characterization

2.2.1. Functional Annotation and Domain Analysis of EHPs

2.2.2. Subcellular Localization and Transmembrane Helices Analysis

2.2.3. Analysis of Physicochemical Properties

2.3. Segment II: Protein-Protein Interaction Network of 9 EHPs

2.3.1. Protein-Protein Interaction Network Analysis

2.4. Segment III: Nonhomology Analysis, Virulence Factor Prediction, and Druggability Identification

2.4.1. Nonhomology Analysis against Human Proteome and Human Antitargets

2.4.2. Nonhomology Analysis against Human Gut Flora Proteomes

2.4.3. Analysis of Virulence Factor

2.4.4. Druggability Analysis and New Target Identification

2.5. Segment IV: Structure Prediction and Structure Validation

2.5.1. Secondary Structure Analysis

2.5.2. Essential Hypothetical Proteins 3D Structure Modeling

2.5.3. Protein Structure Validation Assessment

2.6. Molecular Dynamics Simulation

2.7. Determination of Binding Site

2.8. Performance Assessment of the Study

3. Results

3.1. Functional Annotation and Domain Analysis of EHPs

3.2. Subcellular Localizations of EHPs

3.3. Physicochemical Properties Analysis

3.4. Protein-Protein Interaction Network Analysis

3.5. Nonhomology Analysis against Human Proteome, Human Antitargets, and Human Gut Flora Proteomes

3.6. Virulence Factor

3.7. A Possible New Drug Target Identification

3.8. Analyzing Secondary Structure

3.9. Essential Hypothetical Proteins 3D Structure Modeling

3.10. Protein Structure Validation Assessment

3.11. Molecular Dynamics Simulation

3.12. Active Site Identification

4. Discussion

5. Conclusion

Disclosure

Conflicts of Interest

Open Research

Data Availability

Supporting Information

References

Citing Literature

Figures

References

Related

Information