RESEARCH ARTICLE

Open Access

mibPOPdb: An online database for microbial biodegradation of persistent organic pollutants

Tanyaradzwa R. Ngara

orcid.org/0000-0002-8323-6291

Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Peiji Zeng,

Peiji Zeng

Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Houjin Zhang,

Corresponding Author

Houjin Zhang

[email protected]

Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, Wuhan, China

Correspondence Houjin Zhang, College of Science and Technology, Huazhong University of Science and Technology, MOE Key Laboratory of Molecular Biophysics, Wuhan 430074, China.

Email: [email protected]

Search for more papers by this author

Tanyaradzwa R. Ngara,

Tanyaradzwa R. Ngara

orcid.org/0000-0002-8323-6291

Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Peiji Zeng,

Peiji Zeng

Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, Wuhan, China

Search for more papers by this author

Houjin Zhang,

Corresponding Author

Houjin Zhang

[email protected]

Department of Biotechnology, College of Life Science and Technology, MOE KEY Laboratory of Molecular Biophysics, Huazhong University of Science and Technology, Wuhan, China

Correspondence Houjin Zhang, College of Science and Technology, Huazhong University of Science and Technology, MOE Key Laboratory of Molecular Biophysics, Wuhan 430074, China.

Email: [email protected]

Search for more papers by this author

First published: 17 August 2022

https://doi.org/10.1002/imt2.45

Citations: 1

Tanyaradzwa R. Ngara and Peiji Zeng contributed equally to this study.

Share a link

Email
Wechat
Bluesky

Abstract

Microbial biodegradation of persistent organic pollutants (POPs) is an attractive, ecofriendly, and cost-efficient clean-up technique for reclaiming POP-contaminated environments. In the last few decades, the number of publications documenting POP-degrading microbes, enzymes, and experimental data sets has continuously increased, necessitating the development of a dedicated web resource that catalogs consolidated information on POP-degrading microbes and tools to facilitate integrative analysis of POP degradation data sets. To address this knowledge gap, we developed the Microbial Biodegradation of Persistent Organic Pollutants Database (mibPOPdb) by accumulating microbial POP degradation information from the public domain and manually curating published scientific literature. Currently, in mibPOPdb, there are 9215 microbial strain entries, including 184 gene (sub)families, 100 enzymes, 48 biodegradation pathways, and 593 intermediate compounds identified in POP-biodegradation processes, and information on 32 toxic compounds listed under the Stockholm Convention environmental treaty. Besides the standard database functionalities, which include data searching, browsing, and retrieval of database entries, we provide a suite of bioinformatics services to facilitate comparative analysis of users' own data sets against mibPOPdb entries. Additionally, we built a Graph Neural Network-based prediction model for the biodegradability classification of chemicals. The predictive model exhibited a good biodegradability classification performance and high prediction accuracy. mibPOPdb is a free data-sharing platform designated to promote research in microbial-based biodegradation of POPs and fills a long-standing gap in environmental protection research. Database URL: http://mibpop.genome-mining.cn/

Graphical Abstract

This study presents the Microbial Biodegradation of Persistent Organic Pollutants Database (mibPOPdb) database, a web-accessible literature-based microbial biodegradation of persistent organic pollutants (POPs) resource. We also developed a robust chemical biodegradability prediction model with Graph Neural Networks. By providing high-level curated information on POP-degrading microbial communities, mibPOPdb will be an essential platform for fostering studies on microbial biodegradation of POP compounds and how these microbes would help solve the problem of POP accumulation. In addition, the in silico model can be used to evaluate the persistence of organic chemicals, which is a critical task in ecological risk assessment studies.

Highlights

This database is the first manually curated data resource on microbial biodegradation of persistent organic pollutants (POPs).
Information on 593 intermediate compounds associated with POP-biodegradation processes was extracted from the literature.
A Graph Neural Network-based model was developed to predict the biodegradability of chemical entities, which would provide a valuable tool for the risk assessment of POPs.

INTRODUCTION

Persistent organic pollutants (POPs) are highly toxic and recalcitrant organic compounds that bioaccumulate through the food web and persist in the environment for extended periods [1]. These pollutants possess the potential for mobilization through the soil, water, atmosphere, and migratory species, resulting in them being widespread globally [2]. Chronic exposure to POPs has been implicated in detrimental effects on the biosphere and health [3]. The Stockholm Convention is a global treaty for the regulation of POPs, which entered into force in 2004 to protect human health and the environment [4]. Countries that are signatories to the Stockholm Convention have banned or severely restricted the use and production of POPs in the past two decades [5].

Despite the phasing out of most POPs-based products, there is a growing body of evidence that global climate change results in legacy POP revolatilization and remobilization from surface reservoirs (e.g., permafrost, soil, water, and ice), which act as secondary sources of POPs release into the biosphere [2, 6, 7]. In addition, changes in land use and glyphosate-induced soil erosion have reportedly resulted in the re-emergence of legacy POPs [8-11]. The resurrection of legacy POPs can potentially induce a second toxic event, which undermines global efforts to minimize human and environmental exposure to these harmful compounds [12]. As such, calls for eliminating POPs have intensified in recent years.

Employing microbial communities in the biodegradation of POPs is a relatively sustainable and ecofriendly approach to reclaiming POP-polluted environments [13-15]. Rapid advances in high-throughput multiomics techniques, molecular biology, bioinformatics, and relatively low-cost next-generation sequencing technologies have enhanced our understanding of microbial-mediated bioremediation [16]. These advances have opened up avenues for using both culture-dependent and -independent approaches in the characterization of POP-degrading microbial communities [17]. The selection of novel microbial species and catabolic genes for the bioremediation of POPs is an important research priority [18].

There are large public data sources, such as GenBank [19], KEGG [20], and UniProtKB [21], which contain huge amounts of nucleotide and protein sequences data generated from scientific studies. However, the sheer size of the biological information data collected in these databases and insufficient annotations make it arduous to retrieve microbial biodegradation data sets from tens of millions of sequences. Hence, researchers have started developing tailored databases for specific topics to promote and facilitate easy and quick data searching and retrieval and provide tools for other researchers to analyze their own data. In this regard, the development of specialized data repositories for the organization of biodegradation of environmental pollutants data sets (i.e., curated and peer-reviewed organisms, genes, degradation reactions, pathway maps, and publications) have greatly facilitated systems biology studies into bioremediation [22, 23], biodegradation research [24, 25], and modeling experiments [26-28] in the development of novel environmental clean-up solutions. Several databases, accessible as web resources, have been dedicated to microbial-mediated bioremediation of xenobiotic compounds in the last three decades. The EAWAG Biocatalysis/Biodegradation Database (EAWAG-BBD) is an authoritative and comprehensive data repository containing biodegradation information on almost 1400 xenobiotic compounds, over 200 pathway maps, 1500 reactions, nearly 1000 enzymes, 543 microorganism entries, and 249 biotransformation rules derived from degradation reactions information extracted from scientific publications [29]. Another web resource was MetaRouter, an integrated platform that contained data on the biochemical aspects of xenobiotic compounds' biodegradation and provided tools for querying biodegradative pathways to predict a compound's biodegradability [30]. The OxDBase web resources provide information on over 240 biodegradative oxygenases extracted from scientific literature and databases and are helpful for aromatic hydrocarbons biodegradation studies [31]. The Bionemo database was a comprehensive web resource containing sequence information for over 320 biodegradation reactions, over 130 biodegradation pathways, more than 1107 proteins, and transcription regulation information of over 200 transcription units, 100 transcription factors, and 100 promoters, which was manually curated from scientific literature [32].

However, several of the abovementioned tailored web resources are no longer maintained or accessible. Furthermore, microbial-mediated biodegradation of POPs research data is distributed randomly and unsystematically across scientific literature and public repositories, making it challenging and time-consuming for researchers to retrieve relevant research data sets to support their own biodegradation studies without collecting unconnected and unrelated information. So far, no single dedicated web resource organizes microbial biodegradation of POPs information and provides tools for researchers to analyze their own data. Hence, it is desirable to establish a web resource of systematically reviewed microbial POP degradation information to allow for more efficient access to POP-biodegradation data sets and facilitate data analysis and data mining, which would not be possible with experimental data stored in scientific literature.

The development of new chemicals plays a pivotal role in technological and scientific advancements and also presents serious health and environmental concerns. Chemical substances are screened to determine their persistence by assessing their ready biodegradability [33]. The bulk of chemical persistence/biodegradation evaluation studies currently uses animal-based assays to generate evidence-based risk profiles and develop effective risk management strategies to protect humans and the environment [34]. However, these tests are time-consuming, expensive, and problematic from an ethical perspective [35]. Regulators advocate for the use of alternative approaches, such as in silico models, which can reliably predict the ready biodegradability of chemical substances at a reduced monetary cost and time, with the potential to reduce the number of tests on animals [33].

In recent years, several quantitative structure-activity relationships (QSARs) classification models have been developed for predicting the ready biodegradability of chemical compounds [36-40]. QSAR methods build correlations between a compound's chemical structure information (described by various molecular descriptors, such as functional groups, electronic, steric, thermodynamic, and geometric properties) and a target biological property of interest [41]. However, QSAR models are inevitably associated with drawbacks that limit their application. The reliability of QSAR models and precision of ready biodegradability results is dependent on the correct feature selection applied during QSAR modeling [42]. Also, when QSAR models are applied to chemicals outside the applicability domain for which they were developed, it results in added conservatism being incorporated and an increase in error propagation within the biodegradability classification model [43]. There is a need to develop new better-suited biodegradability classification models, which would be helped by the emergence of deep learning methods [44].

Herein, we report the development of a new Microbial Biodegradation of Persistent Organic Pollutants Database (mibPOPdb) that provides a centralized web server of manually curated evidence-based microbial-mediated biodegradation POP data sets retrieved from the scientific literature (Figure 1). The database includes information on the physicochemical properties of the POP compounds and intermediates for the breakdown reactions, experimentally verified POP degrading microbes and biodegradation genes, experimental biodegradation data, and sample collection information. To our knowledge, mibPOPdb is the first web resource that systematically provides information on microbially mediated biodegradation of POPs through a web interface, facilitating ease of browsing, querying, visualizing, and downloading POP-degradation information contained in the database. To overcome the limitations associated with QSAR classification models, this study also presents a tool for predicting the biodegradability classification of chemicals built using Graph Neural Networks (GNNs). The GNN-based model achieved reliable predictions for the biodegradability classification tasks and could potentially replace QSAR models in the classification of chemicals in regulatory hazard and risk assessments. The mibPOPdb is a designated open-access platform that may assist professional POP-biodegradation researchers and the broader scientific community working to understand the microbial biodegradation of POP compounds and promote new avenues for future research in POP bioremediation.

Details are in the caption following the image — **Figure 1**
Open in figure viewer PowerPoint

General overview of the mibPOPdb construction, content, and web interface. mibPOPdb data were collected from published literature, biological databases, and curated scientific databases, and manually curated into three main categories. The user-friendly online interface supports data querying, browsing, uploading new data sets, and downloading various information deposited in mibPOPdb. BBD, Biocatalysis/Biodegradation Database; CAS, Chemical Abstract Service; DB, database; mibPOPdb, Microbial Biodegradation of Persistent Organic Pollutants Database; POP, persistent organic pollutant.

RESULTS

Compounds regulated by the Stockholm Convention

The chemicals regulated under the Stockholm Convention are aldrin, chlordane, dicofol, 1,1,1-trichloro-2,2-bis(4-chlorophenyl)ethane, dieldrin, endrin, hexachlorobenzene, heptachlor, mirex, toxaphene, perfluorooctanoic acid (PFOA), its salts and PFOA-related compounds, polychlorinated biphenyls (PCBs), polychlorinated dibenzo-p-dioxins (PCDDs), polychlorinated dibenzofurans (PCDF), α-hexachlorocyclohexane, beta hexachlorocyclohexane, chlordecone, hexabromobiphenyl, hexabromocyclododecane, hexabromodiphenyl ether, heptabromodiphenyl ether, lindane, pentachlorobenzene, pentachlorophenol and its salts and esters, perfluorooctane sulfonic acid, its salts and perfluorooctane sulfonyl fluoride, polychlorinated naphthalenes (PCNs), technical endosulfan and its related isomers, tetrabromodiphenyl ether, pentabromodiphenyl ether, decabromodiphenyl ether (c-decaBDE), short-chain chlorinated paraffins, and hexachlorobutadiene.

Literature search results

Study screening and selection procedures are illustrated in Supporting Information Figure S1. The initial literature search yielded 7159 references. A total of 2486 duplicate records were detected and removed. The remaining 4673 references were eligible to be screened on the basis of the title or abstract. From these, 1986 references were considered to be ineligible and were removed. The full texts for the remaining 2687 references were manually screened, and 1623 references were excluded as they did not meet our eligibility criteria. The remaining 1064 studies were included for data extraction.

Data content and statistics

The current version of mibPOPdb contains information on 9215 microbial strain entries annotated from 1064 peer-reviewed articles, 184 gene (sub)families, 100 enzymes, 48 biodegradation pathways, 593 intermediate compounds identified in the biodegradation of POPs, and information on 32 toxic compounds currently targeted by the Stockholm Convention on Persistent Organic Pollutants environmental treaty. Some of the xenobiotic chemicals listed under the Stockholm Convention, such as PCBs, polybrominated diphenyl ethers (PBDEs), PCNs, PCDDs, and PCDFs are not single compounds but rather occur as complex mixtures of congeners. Although there are hundreds of PCBs, PBDEs, PCNs, PCDDs, and PCDFs, only a small group of the congeners exhibit great toxic potential [5]. The chlorination/bromination pattern of these compounds determines their level of toxicity [45]. The congeners with a coplanar structure exhibit the most toxicological effects based on combined health effects considerations [46]. For each group of a mixture of compounds, the most studied and most toxic congener was selected as a model compound to study the biodegradation process for that group of compounds. For example, for PCDDs, the most studied and toxic of them is 2,3,7,8-tetrachlorodibenzo-p-dioxin, and in this study was selected as the model compound to study the biodegradation of PCDDs.

More than 90% of the microbial data set comprises bacterial entries, followed in order of abundance by fungal, algal, and archaeal microbial entries (Figure 2A). The POP-biodegradation functional gene data set in mibPOPdb is composed of 5736 microbial strain entries (Figure 2B). Of these entries, 5706 bacterial strains and 30 fungal strains accounted for 99.48% and 0.52%, respectively, of the POP-biodegradation functional genes data set in mibPOPdb (Figure 2B). The mibPOPdb contains a total of 184 POP-related biodegradation gene (sub)families covering 22 of the 32 POP compounds currently listed under the Stockholm Convention (see Supplementary File: mibPOPdb data.xlsx). There are 10 POP compounds for which no genes from any organism have been linked to their degradation (Supporting Information Table S1). Strain level information in mibPOPdb was assigned by manually annotating data from species descriptions in the primary literature based on small subunit ribosomal RNA (16S/18S rRNA) and genome sequence information. The mibPOPdb database has information on 3479 microbial strain entries collected from scientific literature, with 77.03% comprising 2680 bacterial strains entries, with fungi, algae, and archaea, accounting for 17.79%, 2.93%, and 2.24%, respectively (Figure 2C).

Prediction model training setting and performance analysis

The statistical parameters adopted for evaluating the classification performance of the five models were sensitivity (Sn), specificity (Sp), balanced accuracy (BA), and error rate (ER). This is a two-class model (i.e., readily biodegradable [RB] and not readily biodegradable [NRB]), considering that the specificity of one class corresponds to the sensitivity of the other class. These statistical parameters are all taken into account when evaluating the predictive ability of a model because they help describe the behavior of the model, for example, avoiding RB molecules being classified as NRB (which is associated with high NRB specificity) or avoiding NRB molecules being classified as RB (related to high NRB sensitivity) [47].

The individual classification performance for fivefold cross-validation on the training set and external validation set for the five GNN models is collected in Table 1. The sensitivity and specificity of NRB class, BA, and ER for each GNN model are reported for the training and external validation sets. All five proposed classification models displayed a high average performance with BA values ranging between 0.89–0.93 and 0.88–0.90 for the final test and external validation data sets, respectively. From the cross-validation results, model 4 exhibited the highest average performance for both the final test and external validation data sets, with BA values of 0.93 and 0.90, respectively. The ER of model 4 was the lowest among the five models, with 0.07 and 0.10 ER values for the final test and external validation data set, respectively. All the proposed classification models demonstrated the same trend in results, that is, the specificity (Sp) is higher than sensitivity (Sn), implying that the models can correctly classify NRB molecules with a more stable prediction capability compared to RB molecules. The specificity, sensitivity, BA, and ER of the values obtained on the test and external validation data sets in the five proposed classification models' cross-validation results were comparable, demonstrating the reliability and robustness of the proposed classification models. Of the five proposed GNN-based classification models in Table 1, model 4 displayed the best performance for specificity and sensitivity on both the testing and external validation data sets and was used to predict the biodegradability of chemical molecules.

Table 1. Classification performance for fivefold cross-validation on testing set and external validation data sets

Model	Final test data set				External validation data set
	BA	Sn	Sp	ER	BA	Sn	Sp	ER
Model 1	0.91	0.87	0.95	0.09	0.88	0.82	0.94	0.12
Model 2	0.91	0.87	0.95	0.09	0.88	0.81	0.94	0.12
Model 3	0.89	0.85	0.93	0.11	0.88	0.82	0.93	0.12
Model 4	0.93	0.90	0.96	0.07	0.90	0.85	0.95	0.10
Model 5	0.91	0.87	0.95	0.09	0.89	0.81	0.96	0.11

Note: For each model, sensitivity (Sn, correctly predicted ready biodegradable), specificity (Sp, correctly predicted not ready biodegradable), balanced accuracy (BA, average of sensitivity and specificity), and error rate (ER, complement of balanced accuracy) are provided.

The GNN model presented in this study was compared with the QSAR models presented by Mansouri et al. [40]. The results for the final and external validation data sets in Table 2 show that the proposed GNN-based model exhibits a relatively moderate improved classification performance compared to the QSAR models already published in the literature. The GNN model had a slightly higher average performance, with BA values of 0.93 and 0.90 for the test and external validation sets, respectively. The QSAR models had lower average performances. The partial least squares discriminant analysis model had the lowest balance accuracy value of 0.85 for the test set, and the support vector machines model had the lowest balance accuracy value of 0.82 on the external validation set. The GNN-based model had the lowest ERs compared to the QSAR models on both the test and external validation data sets (ER values of 0.07 and 0.10, respectively). This proves that the GNN model can rapidly and accurately predict the biodegradability of molecules. The GNN model exhibits a similar trend to that observed in the QSAR models: specificity being higher than sensitivity for the final test and external validation sets. The GNN classification model predicted and classified NRB molecules more accurately than RB molecules.

Table 2. Comparison of classification performance between GNN model and previously published QSAR models estimated on the same biodegradability experimental data set

Model	Final test data set				External validation data set
	BA	Sn	Sp	ER	BA	Sn	Sp	ER
kNN	0.86	0.81	0.90	0.14	0.83	0.75	0.91	0.17
SVM	0.87	0.82	0.91	0.13	0.82	0.74	0.91	0.18
Consensus 1	0.87	0.82	0.92	0.13	0.83	0.76	0.91	0.17
Consensus 2	0.91	0.88	0.94	0.09	0.87	0.81	0.94	0.13
PLSDA	0.85	0.83	0.87	0.15	0.83	0.80	0.86	0.17
GNN	0.93	0.90	0.96	0.07	0.90	0.85	0.95	0.10

Note: For each model, sensitivity (Sn, correctly predicted ready biodegradable), specificity (Sp, correctly predicted not ready biodegradable), balanced accuracy (BA, average of sensitivity and specificity), and error rate (ER, complement of balanced accuracy) are provided.
Abbreviations: GNN, Graph Neural Network; kNN, k-nearest neighbor; PLSDA, partial least squares discriminant analysis; QSAR, quantitative structure-activity relationship; SVM, support vector machines.

Data access and usage

Web interface and data browsing

The mibPOPdb database is freely accessible through a user-friendly website (http://mibpopgenome-mining.cn) and offers biological researchers access to information on microbially mediated biodegradation of POPs. Through a user-friendly interface, mibPOPdb provides tools for browsing, querying, exploring detailed information on microbial degradation of POPs, downloading all data, and a series of online bioinformatics services and a chemical biodegradability prediction tool (see Figure 1).

The homepage user interface is very simple (Figure 3A). From the mibPOPdb home page, users can quickly access and retrieve microbial biodegradation data sets of a specific POP compound by choosing the POP compound from the list of POP compounds in the dropdown menu and then selecting the type of microorganism for which they want data sets to be shown (Figure 3B).

Their query will be returned displaying organisms capable of degrading that POP compound (Figure 3C). The “Tools” page contains the chemical biodegradability prediction tool. The “About Us” and “Help” pages display information about navigating the database. The “Search sequence” page provides a suite of online bioinformatics services for users to execute sequence comparative analysis studies within the framework of the database.

The “Browse” page has six subpages, that is, compounds, organisms, biodegradation genes, intermediate profiles, degradation pathway maps, and enzymes. Users can find the basic annotation of the following:

“Compounds” page, such as POP compound name, listing year, Chemical Abstract Service (CAS) number, European Community number, and DSSTOX substance ID. Clicking on the “POP details” button directs the user to the report card page containing detailed information for that specific compound. The information presented on this page comprises the compound's listing information, general descriptions of the compound, structural analogs, and publication information. Users can backtrack to literature sources offering reports for that compound. Also provided are links to external resources, such as ChemSpider, DSSTOX, PubChem, and European Chemicals Agency (ECHA), which also display information concerning that compound (Supporting Information Figure S2).

“Organisms” page, such as type of organism, strain ID, nucleotide sequence accession ID, compound degraded, and country from which the microbial sample was taken. Users can access the detailed information for a specific strain by clicking the strain detail button. The detailed information report card page contains the strain's general information, the location where the original environmental samples were collected, POP compound metabolized, bioremediation information, and reference to the scientific literature (Supporting Information Figure S3).

“Biodegradation genes” page includes the type of organism, strain ID, encoding gene, protein sequence accession ID, and compound degraded. The biodegradation gene detailed information page provides information on the compound metabolized, geographical location from where the sample was taken, link to scientific literature, and degradation gene information, such as encoding gene, enzyme name, UniProt ID, and sequence accession IDs (Supporting Information Figure S4).

“Intermediate profiles” page, such as POP degraded, intermediate compound, and the CAS number of the compound. The detailed information page for a specific intermediate compound profile can be accessed by clicking on the intermediate detail link. The detailed report card contains information on the POP degraded, POP degradation pathway, the intermediate compound identified, and the intermediate compound's physicochemical properties. The PubChem, KEGG, and ChemSpider IDs are provided as external links (Supporting Information Figure S5).

The “Pathway maps” page contains a dropdown menu of the POP degradation pathways constructed and drawn relying on literature reporting microbial POP degradation that has been proven experimentally. The detailed report card for each biodegradation pathway displays information on the compound degraded, a general description of the POP compound, a graphical display of the POP compound's biodegradation routes, and literature citations so that users can backtrack to the original scientific research reporting the biodegradation routes (Supporting Information Figure S6).

And the “Enzymes” page, users can access information about the general description of enzyme function, enzyme class, enzyme classification number, enzyme name, and its synonyms, degradation pathway/s that they are associated with, reactions catalyzed by the enzyme, encoding gene and gene clusters, external links to BRENDA, KEGG, ExPASy, and Enzyme Database, microorganism information, links to GenBank, protein ID, and UniProtKB, and literature citations (Supporting Information Figure S7).

Moreover, when browsing entries in the compounds, organisms, biodegradation genes, and intermediate profile pages, users can use the interactive free-filter tabs to show/display a data set range based on their own chosen criteria. Free-text filters help users focus on specific information and perform efficient data analysis according to their set criteria. In addition, for ease of browsing, an interactive navigation bar is implemented at the bottom of each detailed information page aiding users in traversing the different sections of the detailed page quickly at the click of a button.

Data query

Using the simple search bar available in the upper left corner of each webpage, users can search the database for POP compounds, intermediate compounds, POP degradation genes, or POP degrading microbes of interest. Free text and predictive searching are supported, facilitating more straightforward and faster information searching in mibPOPdb. Users can query mibPOPdb through four paths: “Search by the compound name,” “Search by CAS ID,” “Search by protein and nucleotide sequence accession number,” and “Search by compound degraded.” The user can input a few characters of the prefix word of what they want to search in the search bar. Possible suggestions based on data found in mibPOPdb will be displayed in the format “value|data field|data table.” Value refers to data entry in mibPOPdb with the same prefix as the user's input, and the data field is the location where the value is stored. The data table is the domain that the user might want the search to focus on in the context of the database, for example, “Heptachlor|compound name|compound.” Heptachlor is a compound name, and users can search for its information in the compound domain (Figure 4A,B). In addition, the mibPOPdb provides a series of bioinformatics utilities for sequence analysis on the “Search sequence” page, including BLAST, Clustal Omega, and Phylotree modules (Supporting Information Figure S8). One can use the BLAST sequence similarity search to find a POP-degrading microbe or its homologs. The Phylotree.js module implemented in mibPOPdb facilitates studies into the evolutionary relationships between the user's query sequence and local sequences in the mibPOPdb database.

The mibPOPdb provides a tool for predicting the ready biodegradability of chemical compounds. Users can input the SMILES string of any arbitrary single chemical compound (this includes compounds not covered in the Stockholm Convention list) to determine its ready biodegradability. The application range of the prediction tool is limited to predicting the ready biodegradability of organic compounds. In addition, there is a limitation associated with the tool's applicability when predicting the ready biodegradability of complex or a mixture of compounds with undefined structures, such as pentabrominated diphenyl ether technical mix (DE-71), which does not have a SMILES string value assigned to it. Users are encouraged to submit SMILES of a specific chemical compound to predict the chemical biodegradability as SMILES cannot be obtained for mixtures of compounds. An example of the tool's chemical biodegradability prediction results for the compounds amoxicillin, musk xylene, heptachlor, and phenol is illustrated in Figure 5.

The output returns a pictorial graphic of the structure of the compound and the predicted biodegradation probability value as a percentage. As mentioned earlier, based on the Japanese Ministry of International Trade (MITI) (1) [48] biodegradability screening test, a compound is described as RB if the predicted ready biodegradability value is equal to or greater than 60%. If the predicted ready biodegradability value is less than 60%, the compound is classified as NRB. Antibiotics like amoxicillin are not easily biodegradable or nonbiodegradable. In this study, amoxicillin had a predicted nonready biodegradability value of 98.7% (Figure 5A). Musk xylene, a well-known nonreadily biodegrade compound [49], was predicted to be 100% nonready biodegradable (Figure 5B). Heptachlor is a POP compound and is NRB [50]. Heptachlor had a nonready biodegradability value of 99.7% (Figure 5C). The compound phenol is a benchmark chemical in screening tests and is used as a ready biodegradable standard reference [49], and its predicted ready biodegradability value was 75.2% (Figure 5D).

Data visualization

Interactive data visualizations help users to better probe and understand the POP-degrading data in the mibPOPdb. The interactive pie chart maps display the statistical information for microbial biodegradation of POP studies based on where the original environmental samples were collected. The interactive map can also help show users by continents and areas where data sets for microbial degradation of POPs studies are still to be undertaken or missing. By clicking on any given data point on the map, one immediately accesses the POP-biodegradation data sets associated with that geographical location. Hence, the data visualization page offers the ideal starting point for exploring the data in mibPOPdb.

Additional functionalities of the database

Submission of data: mibPOPdb incorporates an interactive data submission feature to encourage contributors to help complement the web resource by submitting newly published data sets related to microbial degradation of POPs following the devised submission guidelines. The curator may engage the researchers who have submitted novel data sets to review any potentially missing data during the information validation stage. It will then be uploaded and integrated into the database.

Downloading data: All data sets used in constructing the mibPOPdb can be freely accessed on the download page. Users are strongly encouraged to cite the original works when redistributing or reproducing the data sets.

DISCUSSION

Despite the establishment of specialized web resources dedicated to microbial-mediated biodegradation of xenobiotic compounds during the last 30 years, several of them have either gone offline or are no longer maintained. In addition, the systematic collection of validated POP degrading microbes is rare. Successful microbial-based biodegradation of POPs hinges on the characterization of microbial communities from diverse environments to investigate their ecology and biodiversity and determine their POP bioremediation capabilities [51]. Thus, establishing an integrated data repository that contains genomics, proteomics, and results of biodegradation experiment research data for POP-degrading microbes is an important development.

We built mibPOPdb, a manually curated and open-access data resource that provides information on experimentally validated POP-degrading microbial communities. The mibPOPdb database contains information on POP compounds listed under the Stockholm Convention, POP-degradation associated functional genes and strains, intermediate compounds of the degradation processes, and results of POP-biodegradation experimental studies.

Bacteria have been the foci for bioremediation studies; however, in recent times, archaea and eukaryotes have been shown to play an important role in the biodegradation of xenobiotic compounds [52, 53]. The availability of mibPOPdb, an integrated web resource displaying information on a wide variety of POP-degrading algal, archaeal, bacterial, and fungal strains, can facilitate the development of novel POP bioremediation approaches that draw on the abilities of underutilized microbial domains. Currently, the role of archaea in the degradation of POPs remains unclear. Studies have revealed that methanogens play a key role in the anaerobic biodegradation of chlorinated pollutants by cometabolic processes [54]. Reductive dechlorination of POPs by fermentative bacteria under methanogenic conditions is accompanied by the production of acetate or hydrogen as waste products [55]. Methanogens are capable of reducing the concentration of acetate and hydrogen, which increases the thermodynamic favorability and drives forward the anaerobic degradation processes [54]. In addition, organohalide-respiring bacteria lack the ability to synthesize corrinoids, which are enzyme cofactors essential in the functionality of reductive dehalogenase systems [56]. In these anaerobic environments, archaea may provide the key corrinoids for these dechlorinators [57].

By incorporating manually curated information on POP-degrading microorganisms beyond the typical bacterial candidates, mibPOPdb can build the foundation for more studies into the discovery of novel POP-degrading enzymes and lead to the development of new POP bioremediation strategies. In addition, mibPOPdb provides experimentally validated data sets on the intermediary metabolites and end-products produced during POP compounds biodegradation. This information is vital in helping elucidate POP degradation pathways [58].

The sharing, availability, and reuse of POP-biodegradation experimental data can help accelerate the development of POP-biodegradation research [59]. Despite the availability of experimentally validated POP-biodegradation results, they are usually reported in scientific literature and are hard to mine, limiting the visibility and availability of experimental data in POP degradation research studies [17]. The mibPOPdb aims to fill the gap by improving the accessibility of these POP degradation experimental data sets by integrating the manually curated experimental details into the data repository. The development of novel technologies for efficiently removing POPs and the intermediary metabolites of their breakdown processes from the environment depends on consolidating such data sets.

Computational-based approaches are gradually becoming important in predicting and evaluating the ready biodegradability of chemical substances [36]. Several QSAR classification models are used to predict the ready biodegradability of chemicals. However, the complex implementation of QSAR models limits their functionality [60]. GNNs have been successfully used in various biological fields that process graph structure data, such as molecular activity and property, synthesis, and interaction predictions [61]. Under the molecular graph theory, molecular structures can be interpreted as a chemical graph, where a molecule's atoms and bonds are mapped into sets of nodes and edges, respectively [62]. This type of representation is useful as an input feature in graph studies, allowing for the mathematical processing of molecular structures [61]. Features are automatically extracted from raw inputs, whereas QSAR classification models are influenced by some degree of bias because the selected handcrafted features or predefined descriptors might leave out important structure information [63].

The GNN model developed in this study exploited the features of atoms in the molecular graphs to predict the ready biodegradability of a chemical compound and achieved a higher overall classification performance compared to that of QSAR models reported in published literature. In addition, the GNN model displayed a stable classification performance because it does not make use of predefined molecular fingerprints as in the case of traditional machine learning models, which require performing complex feature selection processes and have complex interpretations [63]. Understanding the structure of a chemical is important in the chemistry field [64]. The RDKit generates pictorial depictions of the query compounds as images. Determining the structural formula of the query compound provides scientists with a visual representation of its chemical formula. In cheminformatics, chemical images are combined with deep learning algorithms for predicting chemical toxicity without employing any chemical descriptors [65].

CONCLUSIONS

This study presents the mibPOPdb, a manually curated data repository that focuses on the microbial degradation information of POPs extracted from the scientific literature. One of the limitations of this study, however, is that only English language reports were retrieved. The mibPOPdb portal also incorporates a GNN-based prediction module, which can be used to assess and predict the ready biodegradability of chemical substances. The mibPOPdb database is an ideal central point for scientists looking for specialized information on microbial degradation of POPs, which can be used to foster research into POP degrading microbial communities and the development of efficient POP bioremediation strategies.

METHODS

General approach to database construction

The construction of the mibPOPdb database was as follows: briefly, specific literature search strategies were developed and included broad search terms derived from the main concepts of this study work to collect validated data associated with microbial biodegradation of POPs. Publicly accessible scientific literature databases were then searched to retrieve all relevant articles reporting microbial POP degradation that has been proven experimentally. Eligible studies were identified by manually screening the titles and abstracts of this article pool. Studies that were clearly not related to the microbial degradation of POP compounds listed under the Stockholm Convention or the intermediate compounds obtained in the breakdown pathways of the POP degradation were removed at this stage. Full texts for the studies were then obtained and manually screened the files using the inclusion and exclusion criteria (Supporting Information Table S2). This was done to ensure the high quality of our data and to remain with the primary literature. This was followed by manual curation of experimentally supported events. Lastly, the chemical biodegradability module and the website services were implemented. A summarized graphical description of the mibPOPdb database content and construction is outlined in Figure 1.

Data collection and processing

To find the validated data associated with the microbial degradation of POPs, the Web of Science, Google Scholar, ScienceDirect, and PubMed literature databases were systematically searched to retrieve relevant scientific literature. A combination of Medical Subject Heading terms with a list of keywords, such as microbial degradation, biometabolism, biomineralization, biotransformation, decomposition, microbial bioremediation, catabolism, the specific name of every compound in the Stockholm Convention list, and specific name/s of the intermediate metabolites identified in the breakdown pathway was used to perform the literature searches. The search strategy employed for the PubMed database is shown in Supporting Information Table S3 (also see Supplementary File: mibPOPdb data.xlsx for the list of POP compounds listed under the Stockholm Convention and the intermediate compounds identified during their biodegradation). Over 7000 indexed citations were retrieved as a result of the search and imported the search results into EndNote (version X8.1). Manually screened all records to ensure that they had title records. After investigating the references, titles were manually added to documents with gaps in the Title field. Duplicates in the EndNote library were removed automatically in a series of steps that required changing which fields were compared by EndNote. The set field preferences for the deduplication process are shown in Supporting Information Table S4. The next stage was visually scanning the references and manually removing the duplicates that were not picked up automatically by comparing the titles of the articles. In addition, the articles were manually vetted for suitability for inclusion in this study by manually screening their titles and abstracts. At this stage, studies that were clearly not related to the microbial degradation of POPs listed under the Stockholm Convention and the intermediate compounds of their degradation and articles not published in the English language were excluded.

Furthermore, full texts for the studies were retrieved. The full texts of the studies were then manually screened using the inclusion/exclusion criteria. At this stage, systematic review articles, editorials, conference abstracts, and letters from which primary data cannot be extracted; publications that reported the degradation of the POP compounds but did not identify the microbial species responsible for the degradation of those POPs, and studies that reported the improvement or enhancement of POP-biodegradation efficiencies using microbial species already identified in other degradation research studies for that same POP compound were excluded. Manually reviewed the full text for the primary literature to assess and ensure the high quality of our data. Two researchers (Tanyaradzwa R. Ngara) and (Peiji Zeng) were tasked with manually extracting the microbial degradation of POPs data sets from the eligible scientific literature and resolved areas of disagreement by consensus through discussion and consulting a third reviewer (Houjin Zhang). Three types of data were extracted from the eligible list of papers, that is, (i) functional genes encoding enzymes involved in the biodegradation of POPs, (ii) POP compound degraded and intermediate compounds of the biodegradation reaction, and (iii) microbial species experimentally validated to biodegrade POPs.

POP-degrading organism resources

Two independent investigators (Tanyaradzwa R. Ngara and Peiji Zeng) extracted all relevant data from each included literature, reporting microbial POP degradation that has been proven experimentally. The target data fields used to capture the data extracted from the included studies are shown in Supporting Information Tables S5 and S6. All microbial entries of the mibPOPdb database are categorized into two groups, that is, organisms and biodegradation genes, contingent on identifying POP degrading microbes, either phylogenetic marker or functional gene marker. To develop a more authoritative knowledge-base for microbial degradation of POPs, where appropriate and convenient manually curated POP-degrading microbial information taken from the literature is linked to external resources, such as AlgaeBase [66], BRENDA [67], EAWAG-BBD [29], GenBank [19], KEGG [20], and UniProtKB [21].

Collection of POP compound information and intermediate compounds formed in the breakdown pathway

Data associated with the toxic chemicals listed under the Stockholm Convention treaty and the intermediate compounds formed during the metabolism of these xenobiotic compounds were retrieved from the scientific literature. For each of the POP compounds, data such as the compound name, Stockholm annex code, year of listing decision, compound's physicochemical properties, structure, and structural analogs (i.e., chemical analogs with the highest structural similarity to the toxic parent compound list under the Stockholm Convention) were collected. Scientific literature descriptions were retrieved from PubMed. The information was categorized into four broad sections: POP listing information, Compound description, Structural analogs, and Publications. For each POP compound record in our data set, their PubChem, KEGG, CHEMBL, DSSTOX Substance, CAS number, and ECHA IDs are externally linked to the corresponding compound information in those external databases. The data field for POP compounds is shown in Supporting Information Table S7. In addition, the two investigators extracted data associated with the intermediate compounds identified during the metabolism of the xenobiotic compounds covered by the Stockholm Convention reported in the scientific literature. For each eligible intermediate compound entry in mibPOPdb, the following basic characteristics were extracted: intermediate name, POP degraded, POP degradation pathway, SMILES string, CAS number, PubChem ID, KEGG ID, and ChemSpider ID. The data field for the intermediate compound entries is shown in Supporting Information Table S8.

Biodegradability experimental data set used to evaluate the GNN model's performance

The data set used to train the model and validate model performance was published by Mansouri et al. [40] and was composed of 1725 chemicals belonging to either of the two categories of compounds, that is, RB and NRB. It comprised training, testing, and external validation subsets (Supporting Information Figure S9). SMILES strings were used for molecular structure representation of the compounds and easy encoding of the molecular graph [68]. PubChem [69] and ChemSpider [70] were used to verify the accuracy of SMILES. The chemicals were checked for duplicates based on SMILES matching. The total number of RB and NRB compounds in the data set was 547 and 1178, respectively. The chemicals are classified into either RB or NRB based on the MITI test [48, 71]. The MITI-I screening test evaluates the biodegradability of chemical substances by measuring the biochemical oxygen demand (BOD) in an aerobic aqueous medium over a 4-week test period. Chemical substances that are described as RB are those with a BOD value of 60% or higher, and NRB chemicals are those considered to have a BOD value of lower than 60% [72].

Development of a GNN model for ready biodegradability

The general concept of GNN is based on a recursive message passing neural network scheme [73]. In graph representation learning, the molecule's spatial structure information, node features, and edge features are taken as inputs in order for the GNN to learn the representation vector for each node of the graph. Each node in the graph aggregates features information from its neighbors to iteratively update its new representations [74]. At the last iteration, an entire graph's representation vector is obtained by pooling together the representation vectors of all the nodes in the graph [75, 76]. The pipeline for the model development is outlined in Figure 6, and it essentially involves three parts: (1) preprocessing of data, (2) using GNNs to model the representation of the chemical graph, and (3) modifying parameters of GNNs using an optimizer based on the value of the loss function.

Preprocessing of data

To predict the biodegradability of a chemical molecule, GNN needs to know the atom's molecular graph structure and feature vector (see Supporting Information Table S9). First, we used the RDKit software package [77] to generate a molecular graph and an adjacent matrix to represent it.

The SMILES strings were converted into molecular graphs. The molecular graph denoted as G = (V, E) can be described as the connectivity relations of a set of vertices (V), representing the nodes, and a set of edges (E), representing the connections between the nodes in V. For adjacent matrix

{A}^{n* n}

n

is equal to the number of the atom and the value of the component

{a}_{{ij}}

indicates a connection from node

j

and to node

i

. In this study, we define A by

{a}_{ij}=\left\{\begin{array}{cc}1 & \text{if}\,\text{there}\,\text{is}\,\text{an}\,\text{edge}\,\text{from}\,\text{node}\,j\,\text{to}\,i,\\ 0 & \text{otherwise.}\end{array}\right.

(1)

Finally, we used the canonical atom featurizer to generate the atom feature vector of the molecular graph [78].

Architecture of GNN

The idea of message passing is straightforward; at each iteration, each node feature will be updated through aggregate information from its local neighborhood. During message passing, each node representation will update through three functions: (1) message function, which can generate message information of the node; (2) aggregation function, for node $v\in V$ , this function is responsible for aggregating local message information from its local neighborhood node $u\in {\mathscr{N}}(v)$ ; and (3) update function, it will combine aggregated information and node $v$ self-feature vector to update node $v$ .

The above three functions can help each node learn spatial structure information of the graph and then use the readout function to obtain a graph representation, as shown in Supporting Information Figure S10.

We redesigned a model for predicting biodegradability according to Graph Isomorphism Network [79], a classic GNN model. Details can be expressed as follows:

\begin{array}{c}{m}_{u}^{t}=\sigma \left(W{h}_{u}^{t-1}\right),\unicode{x02007}\unicode{x02007}\forall u\in {\mathscr{N}}(v),\\ {h}_{{\mathscr{N}}(v)}^{t}=\displaystyle \sum _{u\in {\mathscr{N}}(v)}{m}_{u}^{t}{m}_{u}^{t},\\ {h}_{v}^{t}=\text{GRU}\left(\text{CONCAT}\left((1+\epsilon ){\rm{\ast }}{h}_{v}^{(t-1)},{h}_{{\mathscr{N}}(v)}^{t}\right)\right)\text{,}\\ {z}_{{\mathscr{G}}}^{t}=\text{WeightSumAndMax}({H}^{t}),\end{array}

(2)

where the superscript

t

t-1

indicates the layer of GNN.

{h}_{u}

is the feature vector of node

u

{\mathscr{N}}(v)

is the local neighborhoods of node

v

{z}_{{\mathscr{G}}}

is the representation of the graph,

H

is the feature matrix consisting of all of the feature vectors of the node, and

\epsilon

is a learnable parameter or a fixed scalar whose value one can decide on during model training. The message function is a linear perceptron based on message passing and produces message information on each node. The aggregation function is the summation of local neighborhoods' message information, and the update function will feed the vector concatenating

(1+\epsilon )\ast {h}_{v}^{(t-1)}

with

{h}_{{\mathscr{N}}{\mathscr{(}}{\mathscr{v}}{\mathscr{)}}}^{t}

into gated recurrent units [80], whose result will assign node

v

Instead of generating graph representation on the last GNN layer, it will be generated after each iteration. After updating all node features, weight sum and max function will generate a graph presentation vector by concatenating weight sum and maximizing each dimension on node feature vector

H

. Finally, all of the graph representation

{z}_{{\mathscr{G}}}^{i}

will be fed into long short-term memory (LSTM) [81] and generate the molecular fingerprint

{z}_{{\mathscr{G}}}

, which can be expressed as follows:

{z}_{{\mathscr{G}}}=\text{LSTM}\left({z}_{{\mathscr{G}}}^{1},\ldots ,{z}_{{\mathscr{G}}}^{T}\right)\text{.}

(3)

Modifying parameters of GNNs based on the value of loss function using Adam optimizer

After defining the model, we expected the model's output to be closer to the true goal by adjusting the model's parameters. The loss function was used to calculate the gap (loss) between predicted output and target, and an optimizer adjusted the model's parameters. In this design, the focal loss [82], was selected as a loss function because it can focus more on class imbalance and misclassified examples. Finally, we used Adam [83], to modify the model's parameters based on the loss value.

Model validation

The focal loss function was used during the model training to calculate error and optimize the model's parameters using the Adam optimizer. For model optimization, we used a fivefold cross-validation on the training set to train five models and selected the one which displayed the best classification performance on the testing and external validation sets. The classification performances of the models were evaluated based on specificity (Sp) and sensitivity (Sn), that is, the ability to predict RB correctly and NRB molecules, respectively. The evaluation metrics are calculated using the following equations:

\mathrm{Sp}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}},\,\mathrm{Sn}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}.

(4)

TN, TP, FN, and FP denote the number of true negatives, true positives, false negatives, and false positives, respectively. In addition, the BA was calculated as the average of sensitivity and specificity. Also, the complement ER of BA was determined. These indices are useful when evaluating the classification performance of a binary classifier. This is particularly relevant when the classes are imbalanced, that is, with an unequal number of molecules in each class [40]. The specificity, sensitivity, BA, and ER are expressed as ratios, not percentages.

Website architecture and implementation

The website architecture was developed using Django (version 3.1.3), a web back-end framework for Python (version 3.7). The back-end system was developed using Python (version 3.6). All data are stored and managed in MariaDB (version 5.5.56). The web interface for the mibPOPdb database was implemented using HTML5, CSS3, JavaScript, and the front-end framework; Bootstrap (version 4.0). Nucleotide and protein sequences were captured in Biopython and implemented the homolog identification functions by integrating the BLAST (version 2.10.0) package. Clustal Omega (version 1.2.2) and PHYLIP (version 3.695) packages were integrated to implement multiple sequence alignment and phylogenetic tree generation functions. The interactive map and timeline were implemented with the Highcharts Javascript charting library https://www.highcharts.com/products/highcharts/, 2021) and Time Graphics (https://time.graphics/, 2021), respectively.

AUTHOR CONTRIBUTIONS

Tanyaradzwa R. Ngara: Conceptualization, methodology, software, data curation, writing—original draft, writing—review and editing, and visualization. Peiji Zeng: Methodology, validation, software, investigation, and data curation. Houjin Zhang: Conceptualization, supervision, writing—review and editing, and funding acquisition. All authors have approved the final version of the manuscript.

ACKNOWLEDGMENTS

Financial support was provided by the National Key Research and Development Program of China (Grant No. 2019YFA0905500) and the National Natural Science Foundation of China (Grant No. 31670793). We are extremely grateful to Dr. Lu Shin Wong of the Department of Chemistry at The University of Manchester, England, for taking the time to proofread our manuscript and for the valuable comments, corrections, and suggestions that ensued. His extensive and insightful comments helped us improve the quality of the manuscript.

CONFLICT OF INTEREST

The authors declare no conflict of interest.

Open Research

DATA AVAILABILITY STATEMENT

The mibPOPdb database is freely available at http://mibpop.genome-mining.cn/, and the data sets and standalone code for predicting chemical biodegradability are available at (https://github.com/monsterZeng/MIBPOP/).

Supporting Information

REFERENCES

1Guo, Wenjing, Bohu Pan, Sugunadevi Sakkiah, Gokhan Yavas, Weigong Ge, Wen Zou, Weida Tong, and Huixiao Hong. 2019. “Persistent Organic Pollutants in Food: Contamination Sources, Health Effects and Detection Methods.” International Journal of Environmental Research and Public Health 16: 4361. https://doi.org/10.3390/ijerph16224361
10.3390/ijerph16224361
CAS PubMed Web of Science® Google Scholar
2Hung, Hayley, Crispin Halsall, Hollie Ball, Terry Bidleman, Jordi Dachs, Amila De Silva, Mark Hermanson, et al. 2022. “Climate Change Influence on the Levels and Trends of Persistent Organic Pollutants (POPs) and Chemicals of Emerging Arctic Concern (CEACs) in the Arctic Physical Environment—A Review.” Environmental Science: Processes & Impacts. https://doi.org/10.1039/D1EM00485A
10.1039/D1EM00485A
Google Scholar
3Letcher, Robert J., Jan O. Bustnes, Rune Dietz, Bjørn M. Jenssen, Even H. Jørgensen, Christian Sonne, Jonathan Verreault, Mathilakath M. Vijayan, and Geir W. Gabrielsen. 2010. “Exposure and Effects Assessment of Persistent Organohalogen Contaminants in Arctic Wildlife and Fish.” Science of the Total Environment 408: 2995–3043. https://doi.org/10.1016/j.scitotenv.2009.10.038
10.1016/j.scitotenv.2009.10.038
CAS PubMed Web of Science® Google Scholar
4Guida, Yago, Capella Raquel, and Weber Roland. 2020. “Chlorinated Paraffins in the Technosphere: A Review of Available Information and Data Gaps Demonstrating the Need to Support the Stockholm Convention Implementation.” Emerging Contaminants 6: 143–54. https://doi.org/10.1016/j.emcon.2020.03.003
10.1016/j.emcon.2020.03.003
Web of Science® Google Scholar
5Christensen, Krista, Laura M. Carlson, and Geniece M. Lehmann. 2021. “The Role of Epidemiology Studies in Human Health Risk Assessment of Polychlorinated Biphenyls.” Environmental Research 194: 110662. https://doi.org/10.1016/j.envres.2020.110662
10.1016/j.envres.2020.110662
Google Scholar
6Ma, Jianmin, Hayley Hung, and Robie W. Macdonald. 2016. “The Influence of Global Climate Change on the Environmental Fate of Persistent Organic Pollutants: A Review with Emphasis on the Northern Hemisphere and the Arctic as a Receptor.” Global and Planetary Change 146: 89–108. https://doi.org/10.1016/j.gloplacha.2016.09.011
10.1016/j.gloplacha.2016.09.011
Web of Science® Google Scholar
7Ma, Jianmin, Hayley Hung, Chongguo Tian, and Roland Kallenborn. 2011. “Revolatilization of Persistent Organic Pollutants in the Arctic Induced by Climate Change.” Nature Climate Change 1: 255–60. https://doi.org/10.1038/nclimate1167
10.1038/nclimate1167
CAS Web of Science® Google Scholar
8Gardes, Thomas, Florence Portet-Koltalo, Maxime Debret, and Copard Yoann. 2021. “Historical and Post-Ban Releases of Organochlorine Pesticides Recorded in Sediment Deposits in an Agricultural Watershed, France.” Environmental Pollution 288: 117769. https://doi.org/10.1016/j.envpol.2021.117769
10.1016/j.envpol.2021.117769
CAS PubMed Web of Science® Google Scholar
9Sabatier, Pierre, Jérôme Poulenard, Bernard Fanget, Jean-Louis Reyss, Anne-Lise Develle, Bruno Wilhelm, Estelle Ployon, et al. 2014. “Long-Term Relationships Among Pesticide Applications, Mobility, and Soil Erosion in a Vineyard Watershed.” Proceedings of the National Academy of Sciences of the United States of America 111: 15647–52. https://doi.org/10.1073/pnas.1411512111
10.1073/pnas.1411512111
CAS PubMed Web of Science® Google Scholar
10Sabatier, Pierre, Charles Mottes, Nathalie Cottin, Olivier Evrard, Irina Comte, Christine Piot, Bastien Gay, et al. 2021. “Evidence of Chlordecone Resurrection by Glyphosate in French West Indies.” Environmental Science & Technology 55: 2296–306. https://doi.org/10.1021/acs.est.0c05207
10.1021/acs.est.0c05207
CAS PubMed Web of Science® Google Scholar
11Santschi, Peter H., Bobby J. Presley, Terry L. Wade, Garcia-Romero Bernardo, and Mahalingam M. Baskaran. 2001. “Historical Contamination of PAHs, PCBs, DDTs, and Heavy Metals in Mississippi River Delta, Galveston Bay and Tampa Bay Sediment Cores.” Marine Environmental Research 52: 51–79. https://doi.org/10.1016/s0141-1136(00)00260-9
10.1016/s0141-1136(00)00260-9
CAS PubMed Web of Science® Google Scholar
12Chevallier, Marion L., Oriane Della-Negra, Sebastien Chaussonnerie, Agnes Barbance, Delphine Muselet, Florian Lagarde, Ekaterina Darii, et al. 2019. “Natural Chlordecone Degradation Revealed by Numerous Transformation Products Characterized in Key French West Indies Environmental Compartments.” Environmental Science & Technology 53: 6133–43. https://doi.org/10.1021/acs.est.8b06305
10.1021/acs.est.8b06305
CAS PubMed Web of Science® Google Scholar
13Boudh, Siddharth, Jay Shankar Singh, and Preeti Chaturvedi. 2019. “ Microbial Resources Mediated Bioremediation of Persistent Organic Pollutants.” In New and Future Developments in Microbial Biotechnology and Bioengineering, edited by Singh Jay Shankar, 283–94. Amsterdam: Elsevier. https://doi.org/10.1016/B978-0-12-818258-1.00019-4
10.1016/B978-0-12-818258-1.00019-4
Google Scholar
14Saibu, Salametu, Sunday A. Adebusoye, and Ganiyu O. Oyetibo. 2020. “Aerobic Bacterial Transformation and Biodegradation of Dioxins: A Review.” Bioresources and Bioprocessing 7. https://doi.org/10.1186/s40643-020-0294-0
10.1186/s40643-020-0294-0
Google Scholar
15Zanaroli, Giulio, Andrea Negroni, Max M. Häggblom, and Fabio Fava. 2015. “Microbial Dehalogenation of Organohalides in Marine and Estuarine Environments.” Current Opinion in Biotechnology 33: 287–95. https://doi.org/10.1016/j.copbio.2015.03.013
10.1016/j.copbio.2015.03.013
CAS PubMed Web of Science® Google Scholar
16Chakraborty, Jaya, and Surajit Das. 2016. “Molecular Perspectives and Recent Advances in Microbial Remediation of Persistent Organic Pollutants.” Environmental Science and Pollution Research 23: 16883–903. https://doi.org/10.1007/s11356-016-6887-7
10.1007/s11356-016-6887-7
CAS PubMed Web of Science® Google Scholar
17Borchert, Erik, Katrin Hammerschmidt, Ute Hentschel, and Deines Peter. 2021. “Enhancing Microbial Pollutant Degradation by Integrating Eco-Evolutionary Principles with Environmental Biotechnology.” Trends in Microbiology 29: 908–18. https://doi.org/10.1016/j.tim.2021.03.002
10.1016/j.tim.2021.03.002
CAS PubMed Web of Science® Google Scholar
18Bhatt, Pankaj, Saurabh Gangola, Geeta Bhandari, Wenping Zhang, Damini Maithani, Sandhya Mishra, and Shaohua Chen. 2021. “New Insights into the Degradation of Synthetic Pollutants in Contaminated Environments.” Chemosphere 268: 128827. https://doi.org/10.1016/j.chemosphere.2020.128827
10.1016/j.chemosphere.2020.128827
CAS PubMed Web of Science® Google Scholar
19Sayers, Eric W., Jeffrey Beck, Evan E. Bolton, Devon Bourexis, James R. Brister, Kathi Canese, Donald C. Comeau, et al. 2021. “Database Resources of the National Center for Biotechnology Information.” Nucleic Acids Research 49: 10–7. https://doi.org/10.1093/nar/gkaa892
10.1093/nar/gkaa892
CAS PubMed Web of Science® Google Scholar
20Kanehisa, Minoru, Yoko Sato, Masayuki Kawashima, Miho Furumichi, and Mao Tanabe. 2016. “KEGG as a Reference Resource for Gene and Protein Annotation.” Nucleic Acids Research 44: 457–62. https://doi.org/10.1093/nar/gkv1070
10.1093/nar/gkv1070
CAS PubMed Web of Science® Google Scholar
21Magrane, Michele, and UniProt Consortium. 2011. “UniProt Knowledgebase: A Hub of Integrated Protein Data.” Database: The Journal of Biological Databases and Curation 2011: bar009. https://doi.org/10.1093/database/bar009
10.1093/database/bar009
Google Scholar
22Knutson, Christopher J., Nicholas C. Pflug, Wyanna Yeung, Matthew Grobstein, Eric V. Patterson, David M. Cwiertny, and James B. Gloer. 2021. “Computational Approaches for the Prediction of Environmental Transformation Products: Chlorination of Steroidal Enones.” Environmental Science & Technology 55: 14658–66. https://doi.org/10.1021/acs.est.1c04659
10.1021/acs.est.1c04659
CAS PubMed Web of Science® Google Scholar
23Zheng, Ziye, Hans Peter H. Arp, Gregory Peters, and Patrik L. Andersson. 2021. “Combining In Silico Tools with Multicriteria Analysis for Alternatives Assessment of Hazardous Chemicals: Accounting for the Transformation Products of decaBDE and its Alternatives.” Environmental Science & Technology 55: 1088–98. https://doi.org/10.1021/acs.est.0c02593
10.1021/acs.est.0c02593
CAS PubMed Web of Science® Google Scholar
24Lee, Yunho, Yunhee Lee, and Che Ok Jeon. 2019. “Biodegradation of Naphthalene, BTEX, and Aliphatic Hydrocarbons by Paraburkholderia aromaticivorans BN5 Isolated from Petroleum-Contaminated Soil.” Scientific Reports 9: 860. https://doi.org/10.1038/s41598-018-36165-x
10.1038/s41598-018-36165-x
PubMed Web of Science® Google Scholar
25Wongbunmak, Akanit, Sansanee Khiawjan, Manop Suphantharika, and Thunyarat Pongtharangkul. 2020. “BTEX Biodegradation by Bacillus amyloliquefaciens subsp. plantarum W1 and its Proposed BTEX Biodegradation Pathways.” Scientific Reports 10: 17408. https://doi.org/10.1038/s41598-020-74570-3
10.1038/s41598-020-74570-3
CAS PubMed Web of Science® Google Scholar
26Adewale, Peter, Alice Lang, Fang Huang, Daochen Zhu, Jianzhong Sun, Michael Ngadi, and Trent Chunzhong Yang. 2021. “A Novel Bacillus ligniniphilus Catechol 2,3-Dioxygenase Shows Unique Substrate Preference and Metal Requirement.” Scientific Reports 11: 23982. https://doi.org/10.1038/s41598-021-03144-8
10.1038/s41598-021-03144-8
CAS PubMed Web of Science® Google Scholar
27Muthabathula, Prajna, and Sujatha Biruduganti. 2022. “Analysis of Biodegradation of the Synthetic Pyrethroid Cypermethrin by Beauveria bassiana.” Current Microbiology 79: 46. https://doi.org/10.1007/s00284-021-02744-x
10.1007/s00284-021-02744-x
CAS PubMed Web of Science® Google Scholar
28Sivakumar, Subramaniam, Palanivel Anitha, Balsubramanian Ramesh, and Gopal Suresh. 2017. “Analysis of EAWAG-BBD Pathway Prediction System for the Identification of Malathion Degrading Microbes.” Bioinformation 13: 73–7. https://doi.org/10.6026/97320630013073
10.6026/97320630013073
PubMed Web of Science® Google Scholar
29Gao, Junfeng, Lynda B. M. Ellis, and Lawrence P. Wackett. 2010. “The University of Minnesota Biocatalysis/Biodegradation Database: Improving Public Access.” Nucleic Acids Research 38: 488–91. https://doi.org/10.1093/nar/gkp771
10.1093/nar/gkp771
CAS PubMed Web of Science® Google Scholar
30Pazos, Florencio, David Guijas, Alfonso Valencia, and Victor De Lorenzo. 2005. “MetaRouter: Bioinformatics for Bioremediation.” Nucleic Acids Research 33: 588–92. https://doi.org/10.1093/nar/gki068
10.1093/nar/gki068
CAS PubMed Web of Science® Google Scholar
31Arora, Pankaj K., Manish Kumar, Archana Chauhan, Gajendra P. S. Raghava, and Rakesh K. Jain. 2009. “OxDBase: A Database of Oxygenases Involved in Biodegradation.” BMC Research Notes 2: 67. https://doi.org/10.1186/1756-0500-2-67
10.1186/1756-0500-2-67
CAS PubMed Google Scholar
32Carbajosa, Guillermo, Almudena Trigo, Alfonso Valencia, and Ildefonso Cases. 2009. “Bionemo: Molecular Information on Biodegradation Metabolism.” Nucleic Acids Research 37: 598–602. https://doi.org/10.1093/nar/gkn864
10.1093/nar/gkn864
CAS PubMed Web of Science® Google Scholar
33Rocha, Werickson F. C., and David A. Sheen. 2016. “Classification of Biodegradable Materials Using QSAR Modelling with Uncertainty Estimation.” SAR and QSAR in Environmental Research 27: 799–811. https://doi.org/10.1080/1062936X.2016.1238010
10.1080/1062936X.2016.1238010
CAS PubMed Web of Science® Google Scholar
34Krewski, Daniel, Margit Westphal, Melvin E. Andersen, Gregory M. Paoli, Weihsueh A. Chiu, Mustafa Al-Zoughool, Maxine C. Croteau, Lyle D. Burgoon, and Ila Cote. 2014. “A Framework for the Next Generation of Risk Science.” Environmental Health Perspectives 122: 796–805. https://doi.org/10.1289/ehp.1307260
10.1289/ehp.1307260
PubMed Web of Science® Google Scholar
35Rowan, Andrew N. 2015. “Ending the Use of Animals in Toxicity Testing and Risk Evaluation.” Cambridge Quarterly of Healthcare Ethics 24: 448–58. https://doi.org/10.1017/S0963180115000109
10.1017/S0963180115000109
PubMed Web of Science® Google Scholar
36Cheng, Feixiong, Yutaka Ikenaga, Yadi Zhou, Yue Yu, Weihua Li, Jie Shen, Zheng Du, et al. 2012. “In Silico Assessment of Chemical Biodegradability.” Journal of Chemical Information and Modeling 52: 655–69. https://doi.org/10.1021/ci200622d
10.1021/ci200622d
CAS PubMed Web of Science® Google Scholar
37Fernández, Alberto, Robert Rallo, and Francesc Giralt. 2015. “Prioritization of in silico Models and Molecular Descriptors for the Assessment of Ready Biodegradability.” Environmental Research 142: 161–8. https://doi.org/10.1016/j.envres.2015.06.031
10.1016/j.envres.2015.06.031
CAS PubMed Web of Science® Google Scholar
38Lombardo, Anna, Fabiola Pizzo, Emilio Benfenati, Alberto Manganaro, Thomas Ferrari, and Giuseppina Gini. 2014. “A New in silico Classification Model for Ready Biodegradability, Based on Molecular Fragments.” Chemosphere 108: 10–6. https://doi.org/10.1016/j.chemosphere.2014.02.073
10.1016/j.chemosphere.2014.02.073
CAS PubMed Web of Science® Google Scholar
39Lunghini, Filippo, Gilles Marcou, Philippe Azam, Marie-Hélène Enrici, Erik Van Miert, and Alexandre Varnek. 2020. “Publicly Available QSPR Models for Environmental Media Persistence.” SAR and QSAR in Environmental Research 31: 493–510. https://doi.org/10.1080/1062936x.2020.1776387
10.1080/1062936x.2020.1776387
CAS PubMed Web of Science® Google Scholar
40Mansouri, Kamel, Tine Ringsted, Davide Ballabio, Roberto Todeschini, and Viviana Consonni. 2013. “Quantitative Structure–Activity Relationship Models for Ready Biodegradability of Chemicals.” Journal of Chemical Information and Modeling 53: 867–78. https://doi.org/10.1021/ci4000213
10.1021/ci4000213
CAS PubMed Web of Science® Google Scholar
41Ponzoni, Ignacio, Víctor Sebastián-Pérez, María J. Martínez, Carlos Roca, Carlos De la Cruz Pérez, Fiorella Cravero, Gustavo E. Vazquez, et al. 2019. “QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer's Disease.” Scientific Reports 9: 9102. https://doi.org/10.1038/s41598-019-45522-3
10.1038/s41598-019-45522-3
PubMed Web of Science® Google Scholar
42Martínez, María Jimena, Marina Razuc, and Ignacio Ponzoni. 2019. “MoDeSuS: A Machine Learning Tool for Selection of Molecular Descriptors in QSAR Studies Applied to Molecular Informatics.” BioMed Research International 2019: 2905203. https://doi.org/10.1155/2019/2905203
10.1155/2019/2905203
PubMed Web of Science® Google Scholar
43Aniceto, Natália, A. Freitas Alex, Andreas Bender, and Taravat Ghafourian. 2016. “A Novel Applicability Domain Technique for Mapping Predictive Reliability Across the Chemical Space of a QSAR: Reliability-Density Neighbourhood.” Journal of Cheminformatics 8: 69. https://doi.org/10.1186/s13321-016-0182-y
10.1186/s13321-016-0182-y
Web of Science® Google Scholar
44Acharya, Kishor, David Werner, Jan Dolfing, Maciej Barycki, Paola Meynet, Wojciech Mrozik, Oladapo Komolafe, Tomasz Puzyn, and Russell J. Davenport. 2019. “A Quantitative Structure-Biodegradation Relationship (QSBR) Approach to Predict Biodegradation Rates of Aromatic Chemicals.” Water Research 157: 181–90. https://doi.org/10.1016/j.watres.2019.03.086
10.1016/j.watres.2019.03.086
CAS PubMed Web of Science® Google Scholar
45Liu, Jing, Ya Tan, Erqun Song, and Yang Song. 2020. “A Critical Review of Polychlorinated Biphenyls Metabolism, Metabolites, and Their Correlation with Oxidative Stress.” Chemical Research in Toxicology 33: 2022–42. https://doi.org/10.1021/acs.chemrestox.0c00078
10.1021/acs.chemrestox.0c00078
CAS PubMed Web of Science® Google Scholar
46Van Aken, Benoît, and Renu Bhalla. 2011. “ Microbial Degradation of Polychlorinated Biphenyls.” In Comprehensive Biotechnology. 3rd ed., edited by Moo Young Murray , 71–86. Oxford: Pergamon. https://doi.org/10.1016/B978-0-444-64046-8.00347-5
10.1016/B978-0-444-64046-8.00347-5
Google Scholar
47Ballabio, Davide, Fabrizio Biganzoli, Roberto Todeschini, and Viviana Consonni. 2017. “Qualitative Consensus of QSAR Ready Biodegradability Predictions.” Toxicological & Environmental Chemistry 99: 1193–216. https://doi.org/10.1080/02772248.2016.1260133
10.1080/02772248.2016.1260133
CAS Web of Science® Google Scholar
48Junker, Thomas, Anja Coors, and Gerrit Schüürmann. 2016. “Development and Application of Screening Tools for Biodegradation in Water-Sediment Systems and Soil.” Science of the Total Environment 544: 1020–30. https://doi.org/10.1016/j.scitotenv.2015.11.146
10.1016/j.scitotenv.2015.11.146
CAS PubMed Web of Science® Google Scholar
49Özel Duygan, Birge D., Sylvain Rey, Sabine Leocata, Lucie Baroux, Markus Seyfried, and Jan R. van der Meer. 2021. “Assessing Biodegradability of Chemical Compounds from Microbial Community Growth Using Flow Cytometry.” mSystems 6: e01143-01120. https://doi.org/10.1128/mSystems.01143-20
10.1128/mSystems.01143-20
Web of Science® Google Scholar
50Purnomo, Adi Setyo, Surya Rosa Putra, Kuniyoshi Shimizu, and Ryuichiro Kondo. 2014. “Biodegradation of Heptachlor and Heptachlor Epoxide-Contaminated Soils By White-Rot Fungal Inocula.” Environmental Science and Pollution Research 21: 11305–12. https://doi.org/10.1007/s11356-014-3026-1
10.1007/s11356-014-3026-1
CAS PubMed Web of Science® Google Scholar
51Maphosa, Farai, Shakti H. Lieten, Inez Dinkla, Alfons J. Stams, Hauke Smidt, and Donna E. Fennell. 2012. “Ecogenomics of Microbial Communities in Bioremediation of Chlorinated Contaminated Sites.” Frontiers in Microbiology 3: 351. https://doi.org/10.3389/fmicb.2012.00351
10.3389/fmicb.2012.00351
PubMed Web of Science® Google Scholar
52Czaplicki, Lauren M., and Claudia K. Gunsch. 2016. “Reflection on Molecular Approaches Influencing State-of-the-Art Bioremediation Design: Culturing to Microbial Community Fingerprinting to Omics.” Journal of Environmental Engineering, ASCE 142: 1–13. https://doi.org/10.1061/(ASCE)EE.1943-7870.0001141
10.1061/(ASCE)EE.1943-7870.0001141
PubMed Web of Science® Google Scholar
53Karigar, Chandrakant S., and Shwetha S. Rao. 2011. “Role of Microbial Enzymes in the Bioremediation of Pollutants: A Review.” Enzyme Research 2011: 1–11. https://doi.org/10.4061/2011/805187
10.4061/2011/805187
Google Scholar
54Men, Yujie, Helene Feil, Nathan C. Verberkmoes, Manesh B. Shah, David R. Johnson, Patrick K. Lee, Kimberlee A. West, et al. 2012. “Sustainable Syntrophic Growth of Dehalococcoides ethenogenes Strain 195 with Desulfovibrio vulgaris Hildenborough and Methanobacterium congolense: Global Transcriptomic and Proteomic Analyses.” The ISME Journal 6: 410–21. https://doi.org/10.1038/ismej.2011.111
10.1038/ismej.2011.111
CAS PubMed Web of Science® Google Scholar
55Matturro, Bruna, Carla Ubaldi, Paola Grenni, Anna Barra Caracciolo, and Simona Rossetti. 2016. “Polychlorinated Biphenyl (PCB) Anaerobic Degradation in Marine Sediments: Microcosm Study and Role of Autochthonous Microbial Communities.” Environmental Science and Pollution Research International 23: 613–23. https://doi.org/10.1007/s11356-015-4960-2
10.1007/s11356-015-4960-2
PubMed Web of Science® Google Scholar
56Yi, Shan, Erica C. Seth, Yu-Jie Men, Sally P. Stabler, Robert H. Allen, Lisa Alvarez-Cohen, and Michiko E. Taga. 2012. “Versatility in Corrinoid Salvaging and Remodeling Pathways Supports Corrinoid-Dependent Metabolism in Dehalococcoides mccartyi.” Applied and Environmental Microbiology 78: 7745–52. https://doi.org/10.1128/AEM.02150-12
10.1128/AEM.02150-12
CAS PubMed Web of Science® Google Scholar
57Praveckova, Martina, Maria V. Brennerova, Christof Holliger, Felippe De Alencastro, and Pierre Rossi. 2016. “Indirect Evidence Link PCB Dehalogenation with Geobacteraceae in Anaerobic Sediment-Free Microcosms.” Frontiers in Microbiology 7: 933. https://doi.org/10.3389/fmicb.2016.00933
10.3389/fmicb.2016.00933
PubMed Web of Science® Google Scholar
58Tam, Jason Y. C., Lorsbach Tim, Sebastian Schmidt, and Jörg S. Wicker. 2021. “Holistic Evaluation of Biodegradation Pathway Prediction: Assessing Multi-Step Reactions and Intermediate Products.” Journal of Cheminformatics 13: 63. https://doi.org/10.1186/s13321-021-00543-x
10.1186/s13321-021-00543-x
PubMed Web of Science® Google Scholar
59Takagi, Kazuhiro. 2020. “Study on the Biodegradation of Persistent Organic Pollutants (POPs).” Journal of Pesticide Sciences 45: 119–23. https://doi.org/10.1584/jpestics.J19-06
10.1584/jpestics.J19-06
CAS PubMed Web of Science® Google Scholar
60Duvenaud, David K., Dougal Maclaurin, Jorge Iparraguirre, Rafael Bombarell, Timothy Hirzel, Alán Aspuru-Guzik, and Ryan P. Adams. 2015. “Convolutional Networks on Graphs for Learning Molecular Fingerprints.” Advances in Neural Information Processing Systems 28: 2224–32. https://doi.org/10.48550/arXiv.1509.09292
10.48550/arXiv.1509.09292
Google Scholar
61Gaudelet, Thomas, Ben Day, Arian R. Jamasb, Jyothish Soman, Cristian Regep, Gertrude Liu, Jeremy B. R. Hayter, et al. 2021. “Utilizing Graph Machine Learning within Drug Discovery and Development.” Briefings in Bioinformatics 22: bbab159. https://doi.org/10.1093/bib/bbab159
10.1093/bib/bbab159
PubMed Web of Science® Google Scholar
62Jiang, Dejun, Zhenxing Wu, Chang-Yu Hsieh, Guangyong Chen, Ben Liao, Zhe Wang, Chao Shen, Dongsheng Cao, Jian Wu, and Tingjun Hou. 2021. “Could Graph Neural Networks Learn Better Molecular Representation for Drug Discovery? A Comparison Study of Descriptor-Based and Graph-Based Models.” Journal of Cheminformatics 13: 12. https://doi.org/10.1186/s13321-020-00479-8
10.1186/s13321-020-00479-8
CAS PubMed Web of Science® Google Scholar
63Sun, Mengying, Sendong Zhao, Coryandar Gilvary, Olivier Elemento, Jiayu Zhou, and Fei Wang. 2019. “Graph Convolutional Networks for Computational Drug Development and Discovery.” Briefings in Bioinformatics 21: 919–35. https://doi.org/10.1093/bib/bbz042
10.1093/bib/bbz042
Web of Science® Google Scholar
64Fernandez, Michael, Fuqiang Ban, Godwin Woo, Michael Hsing, Takeshi Yamazaki, Eric LeBlanc, Paul S. Rennie, William J. Welch, and Artem Cherkasov. 2018. “Toxic Colors: The Use of Deep Learning for Predicting Toxicity of Compounds Merely from Their Graphic Images.” Journal of Chemical Information and Modeling 58: 1533–43. https://doi.org/10.1021/acs.jcim.8b00338
10.1021/acs.jcim.8b00338
CAS PubMed Web of Science® Google Scholar
65Meyer, Jesse G., Shengchao Liu, Ian J. Miller, Joshua J. Coon, and Anthony Gitter. 2019. “Learning Drug Functions from Chemical Structures with Convolutional Neural Networks and Random Forests.” Journal of Chemical Information and Modeling 59: 4438–49. https://doi.org/10.1021/acs.jcim.9b00236
10.1021/acs.jcim.9b00236
CAS PubMed Web of Science® Google Scholar
66Guiry, Michael, Gwendoline Guiry, Liam Morrison, Fabio Rindi, Salvador Valenzuela, Arthur Mathieson, Bruce Parker, et al. 2014. “AlgaeBase: An Online Resource for Algae.” Cryptogamie Algologie 35: 105–15. https://doi.org/10.7872/crya.v35.iss2.2014.105
10.7872/crya.v35.iss2.2014.105
Web of Science® Google Scholar
67Jeske, Lisa, Sandra Placzek, Schomburg Ida, Antje Chang, and Dietmar Schomburg. 2019. “BRENDA in 2019: A European ELIXIR Core Data Resource.” Nucleic Acids Research 47: 542–9. https://doi.org/10.1093/nar/gky1048
10.1093/nar/gky1048
CAS PubMed Web of Science® Google Scholar
68Weininger, David. 1988. “SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules.” Journal of Chemical Information and Computer Sciences 28: 31–6. https://doi.org/10.1021/ci00057a005
10.1021/ci00057a005
CAS Web of Science® Google Scholar
69Kim, Sunghwan, Paul A. Thiessen, Evan E. Bolton, Jie Chen, Gang Fu, Asta Gindulyte, and Lianyi Han, et al. 2016. “PubChem Substance and Compound Databases.” Nucleic Acids Research 44: 1202–13. https://doi.org/10.1093/nar/gkv951
10.1093/nar/gkv951
CAS PubMed Web of Science® Google Scholar
70Pence, Harry E., and Antony Williams. 2010. “ChemSpider: An Online Chemical Information Resource.” Journal of Chemical Education 87: 1123–4. https://doi.org/10.1021/ed100697w
10.1021/ed100697w
CAS Web of Science® Google Scholar
71Tunkel, Jay, Philip H. Howard, Robert S. Boethling, William Stiteler, and Helene Loonen. 2000. “Predicting Ready Biodegradability in the Japanese Ministry of International Trade and Industry Test.” Environmental Toxicology and Chemistry 19: 2478–85. https://doi.org/10.1002/etc.5620191013
10.1002/etc.5620191013
CAS Web of Science® Google Scholar
72Brown, David M., Delina Lyon, David M. V. Saunders, Christopher B. Hughes, James R. Wheeler, Hua Shen, and Graham Whale. 2020. “Biodegradability Assessment of Complex, Hydrophobic Substances: Insights from Gas-to-Liquid (GTL) Fuel and Solvent Testing.” Science of the Total Environment 727: 138528. https://doi.org/10.1016/j.scitotenv.2020.138528
10.1016/j.scitotenv.2020.138528
CAS PubMed Web of Science® Google Scholar
73Gilmer, Justin, Samuel Schoenholz, Patrick Riley, Oriol Vinyals, and George Dahl. 2017. “ Neural Message Passing for Quantum Chemistry.” In Proceedings of the 34th International Conference on Machine Learning. edited by Doina Precup and Yee Whye Teh , 70: 1263–72. Sydney: Proceedings of Machine Learning Research. https://doi.org/10.48550/arXiv.1704.01212
Google Scholar
74Xu, Keyulu, Chengtao Li, Yonglong Tian, Tomohiro Sonobe, Ken-ichi Kawarabayashi, and Stefanie Jegelka. 2018. “ Representation Learning on Graphs with Jumping Knowledge Networks.” In 35th International Conference on Machine Learning (ICML). edited by Jennifer Dy and Andreas Krause, 80, 2640–3498. Stockholm: Proceedings of Machine Learning Research. https://doi.org/10.48550/arXiv.1806.03536
Google Scholar
75Fang, Xiaomin, Lihang Liu, Jieqiong Lei, Donglong He, Shanzhuo Zhang, Jingbo Zhou, Fan Wang, Hua Wu, and Haifeng Wang. 2022. “Geometry-Enhanced Molecular Representation Learning for Property Prediction.” Nature Machine Intelligence 4: 127–34. https://doi.org/10.1038/s42256-021-00438-4
10.1038/s42256-021-00438-4
Google Scholar
76Ying, Rex, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, and Jure Leskovec. 2018. “ Hierarchical Graph Representation Learning with Differentiable Pooling.” In 32nd Conference on Neural Information Processing Systems (NIPS), edited by Samy Bengio, Hanna M Wallach, Hugo Larochelle, Kristen Grauman, and Nicolo Cesa-Bianchi, 31, 4805–15. New York: Curran Associates Inc. https://doi.org/10.48550/arXiv.1806.08804
Google Scholar
77Landrum, Greg. 2006. “ RDKit: Open-Source Cheminformatics from Machine Learning to Chemical Registration.” In ACS Fall National Meeting and Exposition. San Diego: Journal of the American Chemical Society. https://doi.org/10.5281/zenodo.591637
Google Scholar
78Li, Mufei, Jinjing Zhou, Jiajing Hu, Wenxuan Fan, Yangkang Zhang, Yaxin Gu, and George Karypis. 2021. “DGL-LifeSci: An Open-Source Toolkit for Deep Learning on Graphs in Life Science.” ACS Omega 6: 27233–8. https://doi.org/10.1021/acsomega.1c04017
10.1021/acsomega.1c04017
CAS PubMed Web of Science® Google Scholar
79Xu, Keyulu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. “ How Powerful Are Graph Neural Networks?" In International Conference on Learning Representations. New Orleans, Louisiana: Code for Science & Society. https://doi.org/10.48550/arXiv.1810.00826
Google Scholar
80Cho, Kyunghyun, Bart Merrienboer, Caglar Gulcehre, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations Using RNN Encoder–Decoder for Statistical Machine Translation. Doha, Qatar: Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179
10.3115/v1/D14-1179
Google Scholar
81Hochreiter, Sepp, and Jürgen Schmidhuber. 1997. “Long Short-Term Memory.” Neural Computation 9: 1735–80. https://doi.org/10.1162/neco.1997.9.8.1735
10.1162/neco.1997.9.8.1735
CAS PubMed Web of Science® Google Scholar
82Lin, Tsung-Yi, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. 2020. “Focal Loss for Dense Object Detection.” IEEE Transactions on Pattern Analysis and Machine Intelligence 42: 318–27. https://doi.org/10.1109/tpami.2018.2858826
10.1109/TPAMI.2018.2858826
PubMed Web of Science® Google Scholar
83Kingma, Diederik, and Jimmy Ba. 2014. “ Adam: A Method for Stochastic Optimization.” In 3rd International Conference on Learning Representations, edited by Yoshua Bengio and Yann LeCun. San Diego: Ithaca. https://doi.org/10.48550/arXiv.1412.6980
Google Scholar

Citing Literature

Volume1, Issue4

December 2022

e45

Filename	Description
imt245-sup-0001-mibPOPdb_data.xlsx80.2 KB	Supporting information.
imt245-sup-0002-Supplementary_file_mibPOPdb.docx2.7 MB	Supporting information.

mibPOPdb: An online database for microbial biodegradation of persistent organic pollutants

Abstract

Graphical Abstract

Highlights

INTRODUCTION

RESULTS

Compounds regulated by the Stockholm Convention

Literature search results

Data content and statistics

Prediction model training setting and performance analysis

Data access and usage

Web interface and data browsing

Data query

Data visualization

Additional functionalities of the database

DISCUSSION

CONCLUSIONS

METHODS

General approach to database construction

Data collection and processing

POP-degrading organism resources

Collection of POP compound information and intermediate compounds formed in the breakdown pathway

Biodegradability experimental data set used to evaluate the GNN model's performance

Development of a GNN model for ready biodegradability

Preprocessing of data

Architecture of GNN

Modifying parameters of GNNs based on the value of loss function using Adam optimizer

Model validation

Website architecture and implementation

AUTHOR CONTRIBUTIONS

ACKNOWLEDGMENTS

CONFLICT OF INTEREST

Open Research

DATA AVAILABILITY STATEMENT

Supporting Information

REFERENCES

Citing Literature

Figures

References

Related

Information