Thioesterase enzyme families: Functions, structures, and mechanisms
Benjamin T. Caswell and Caio C. de Carvalho contributed equally to this study.
Funding information: U.S. National Science Foundation, Grant/Award Number: 2001385
Abstract
Thioesterases are enzymes that hydrolyze thioester bonds in numerous biochemical pathways, for example in fatty acid synthesis. This work reports known functions, structures, and mechanisms of updated thioesterase enzyme families, which are classified into 35 families based on sequence similarity. Each thioesterase family is based on at least one experimentally characterized enzyme, and most families have enzymes that have been crystallized and their tertiary structure resolved. Classifying thioesterases into families allows to predict tertiary structures and infer catalytic residues and mechanisms of all sequences in a family, which is particularly useful because the majority of known protein sequence have no experimental characterization. Phylogenetic analysis of experimentally characterized thioesterases that have structures with the two main structural folds reveal convergent and divergent evolution. Based on tertiary structure superimposition, catalytic residues are predicted.
1 INTRODUCTION
Thioesterases (TEs) hydrolyze thioester bonds and catalyze reactions in many different pathways such as fatty acid synthesis, polyketide synthesis, and non-ribosomal peptide synthesis. TEs are enzymes used in the biological production of tailored fatty acids and other medically relevant compounds such as macrolide antibiotics.1-4 TEs catalyze the hydrolysis of a wide variety of thioesters; for example, acyl-coenzyme A (CoA) hydrolysis occurs in the biological production of 3-hydroxybutyrate,5 in fatty acid β-oxidation,6, 7 in vitamin K biosynthesis,8 and in 4-chlorobenzoate dehalogenation,9 among multiple pathways. TEs are also medically important, for example, protein palmitoylation plays a role in malaria pathogenesis,10 and acyl-CoA thioesterases (ACOTs) are involved with fatty acid metabolism that affects obesity, diabetes, and nonalcoholic fatty liver disease in humans.11
Classifying enzymes by primary structure (amino acid sequence) into families allows to predict the tertiary structure of all enzymes in a family as well as to identify catalytic residues and mechanisms. In 2010, the TE enzymes were classified into 23 families,12 and placed in the publicly available Thioester-active enzYmes (ThYme) database.13 This is particularly useful since known protein sequences vastly outnumber enzymes whose function has been experimentally characterized or whose structure has been experimentally determined.
Enzyme family classification allows to infer the structure and function of an uncharacterized sequence in an organism of interest, based on a single enzyme with a known function and structure in a family. For example, structural knowledge of bacterial enzymes in TE family 14 (TE14) led to understanding substrate-protein interactions in algal TEs,14 as well as to structure prediction and analysis of plant sequences in the same family.15 Further, structural predictions and analysis of plant sequences in TE14, combined with site-directed mutagenesis, resulted in identifying the catalytic residues of the Cuphea viscosissima acyl-ACP TE, relevant for the biological production of tailored fatty acids.16 More recently, knowledge of enzyme sequences and their substrate specificity was used to predict function from structure, as recently done with acyl-ACP TEs.17
Since we first classified the TEs into families, the number of known protein sequences has increased by about three orders of magnitude, and more TEs have been experimentally characterized. New TE substrate specificities have been determined: as examples, (a) in TE4, a preference toward short chain fatty acids was observed in ACOT18; (b) RpaL, a TesB-like TE4 enzyme from Rhodopseudomonas palustris, was found to be active on aromatic and long and short aliphatic molecules bound to CoA19; (c) in TE6, YciA enzymes from Methylobacterium extorquens were shown to be hydrolyze ethylmalonyl–CoA for dicarboxylic acid production20; and, (d) aryl-CoA substrate specificity was observed for enzymes in TE13.21
More TEs have been identified since we first classified TEs into families, some which form part of existing families. As examples, (a) guanosine diphosphate regulation TEs from Neisseria meningitidis appear in TE622; (b) acyl-lipid thioesterase from Arabidopsis thaliana in TE923; (c) methylketone synthases,24 which were originally characterized from tomato prior to the ThYme database, have also been found in Solanum melongena and Glycine max and form part of TE925, 26; (d) Shewanella oneidensis YbgC, which was found to primarily hydrolyze short chain acyl-CoA thioesters, also forms part of TE927; (e) BorB, required for borrelidin biosynthesis, is a member of TE1828; and (f) the Isochrysis galbana thioesterase/carboxylesterase (IgTeCe) is in TE21.29
Structural knowledge about how enzymes perform thioester hydrolysis has increased; an insightful, recent review describes TE structures, with a particularly useful and clear connection of catalytic residues with enzyme topology.30 Since we first classified TEs into families, new TE structures have been resolved, as examples: (a) in TE4, the TesB enzyme in Yersinia pestis was crystallized31; as were (b) the TesB enzymes in mycobacteria32; (c) in TE6, the human ACOT12 enzyme structure was obtained33; (d) in TE11, the tertiary structure of the TE involved with azinomycin biosynthesis was determined34; and (e) in TE12, the Synechocystis 1,4-dihydroxy-2-naphthoyl-CoA TE was crystallized.35
Given the increase in known sequences, structures, and experimental characterization, TE families were updated. In this work, we report 35 TE families: their functions and mechanisms described, their structures analyzed, catalytic residues predicted, as well as showing the phylogenetic analysis of TE enzymes with the main structural folds. The updated TE families are available in the new, updated ThYme database (http://thyme.engr.unr.edu).
2 RESULTS AND DISCUSSION
Based on sequence similarity, following the approach described in Section 4, we identified 35 TE families almost completely unrelated by primary structure. In the following sections, we discuss their functions (Section 2.1), tertiary structures and catalytic residues (Section 2.2), and phylogeny (Section 2.3). All the TE families are based on experimentally characterized enzymes, and most include tertiary structures from crystallization.
2.1 TE families and their functions
Enzymes in families TE1 to TE13, TE24 to TE26, TE28, and TE31 to TE35 hydrolyze substrates with various functionalities bound by a thioester to CoA. Those in TE14 to TE19, and TE30 add a water to break the thioester bonds between acyl groups and an acyl carrier protein (ACP). The enzymes in TE20, TE21, TE27, and TE29 cleave the bonds between acyl groups and other proteins. Members of TE22 and TE23 break bonds between acyl groups and glutathione and its derivatives. The thioester-carrying moiety in CoA and ACP is a pantetheine residue, while glutathione itself carries the sulfur moiety, and in non-ACP proteins, the sulfur-carrying moiety is built up mainly from a cysteine residue.
For most TE families, the main function of their enzymes is thioester hydrolysis; however, TE is not the main activity for TE33–TE35. All the reported TE families have at least one member that has been experimentally confirmed to have TE function; however, some families have members that catalyze other reactions besides TE.
Some TE families include enzymes that are the TE domains of larger, multimodular proteins such as fatty acid synthases (FASs), polyketide synthases (PKSs), or non-ribosomal peptide synthases (NRPs). FASs, PKSs, and NRPs are large enzymes with multiple domains each having different functions. Only the TE domains were used to identify TE family members.
The functions of enzymes in families TE1–TE23 are described in detail in our previous work,12 and those of families TE24–TE35 are described here. Table 1 includes common names and genes, their overall function, known substrate specificities, and references for all TE families.
Family | Genes and/or enzyme names | General function | Known substrate specificities | References |
---|---|---|---|---|
TE1 | Ach1 | Acyl-CoA hydrolase | Acetyl-CoA | 36, 37 |
TE2 | Acot1–Acot6 BAAT thioesterase |
Acyl-CoA hydrolase | Palmitoyl-CoA Bile-acid-CoA |
38, 39 |
TE3 | tesA estA Acyl-CoA thioesterase I Protease I Lysophospholipase L1 |
Acyl-CoA hydrolase | Medium- to long-chain acyl-CoA | 40, 41 |
TE4 | tesB Acyl-CoA thioesterase II Acot8 |
Acyl-CoA hydrolase | Short-chain acyl-CoA Short- to long-chain acyl-CoA Palmitoyl-CoA Choloyl-CoA |
18, 42, 43 |
TE5 | tesC (ybaW) Acyl-CoA thioesterase III |
Acyl-CoA hydrolase | Long-chain acyl-CoA 3,5-tetradecadienoyl-CoA |
44 |
TE6 | Acot7 (BACH) Acot11 (BFIT, Them1) Acot12 (CACH) YciA |
Acyl-CoA hydrolase | Short- to long-chain acyl-CoA Ethylmalonyl-CoA |
20, 33, 45-49 |
TE7 | Acot9 Acot10 |
Acyl-CoA hydrolase | Short- to long-chain acyl-CoA | 50, 51 |
TE8 | Acot13 (Them2) | Acyl-CoA hydrolase | Short- to long-chain acyl-CoA | 52 |
TE9 | YbgC ALT MKS |
Acyl-CoA hydrolase | Short-chain acyl-CoA Short- to long-chain acyl-CoA 4-Hydroxybenzoyl-CoA |
23, 27, 53-55 |
TE10 | 4HBT-I | Acyl-CoA hydrolase | 4-Hydroxybenzoyl-CoA | 56 |
TE11 | 4HBT-II EntH (YbdB) menI DHNAT1 1,4-Dihydroxy-2-naphthoyl-CoA hydrolase AziG |
Acyl-CoA hydrolase | 4-Hydroxybenzoyl-CoA | 34, 57 |
TE12 | 1,4-Dihydroxy-2-naphthoyl-CoA hydrolase | Acyl-CoA hydrolase | 1,4-Dihydroxy-2-napthoyl-CoA | 58 |
TE13 | paaI paaD |
Acyl-CoA hydrolase | Short and medium-chain acyl-CoA Hydroxyphenylacetyl-CoA aryl-CoA |
21, 59 |
TE14 | FatA FatB |
Acyl-ACP hydrolase | Short- to long-chain acyl-ACP | 60, 61 |
TE15 | Thioesterase CalE7 | Acyl-ACP hydrolase | — | 62 |
TE16 | Thioesterase I Type I thioesterase TE domain of FAS TE domain of PKS or NRP |
Acyl-ACP hydrolase | Long-chain acyl-ACP Polyketides Non-ribosomal peptides |
63-65 |
TE17 | TE domain of PKS | Acyl-ACP hydrolase | Polyketides | 64 |
TE18 | Thioesterase II Type II thioesterase (TE II) tesA rifR OLAH SAST |
Acyl-ACP hydrolase | Medium-chain acyl-ACP Polyketides Non-ribosomal peptides |
66-70 |
TE19 | luxD | Acyl-ACP hydrolase | Myristoyl-ACP | 71 |
TE20 | ppt1 ppt2 Palmitoyl-protein thioesterase |
Acyl-protein hydrolase | Palmitoyl-protein | 72-74 |
TE21 | apt1 apt2 Acyl-protein thioesterase Phospholipase Carboxylesterase |
Acyl-protein hydrolase | Thioacylate proteins Palmitoyl-protein |
75, 76 |
TE22 | S-formylglutathione hydrolase Esterase A Esterase D |
Glutathione hydrolase | S-formylglutathione | 77 |
TE23 | Hydroxyglutathione hydrolase Glyoxalase II |
Glutathione hydrolase | d-Lactoylglutathione | 78, 79 |
TE24 | Fcot-like thioesterase Type III thioesterase CmiS1 |
Acyl-CoA hydrolase | Palmitoyl-CoA Stearoyl-CoA Lauroyl-CoA Hexanoyl-CoA |
80-82 |
TE25 | Fluoroacetyl-CoA thioesterase | Acyl-CoA hydrolase | Fluoroacetyl-CoA | 83, 84 |
TE26 | EAT1 ybfF |
Acyl-CoA hydrolase | Acetyl-CoA Palmitoyl-CoA Malonyl-CoA |
85, 86 |
TE27 | ABHD10 Palmitoyl-protein thioesterase |
Acyl-protein hydrolase | S-palmitoyl-protein | 87 |
TE28 | mpaH Type I acyl-CoA thioesterase |
Acyl-CoA hydrolase | Malonyl-CoA | 88, 89 |
TE29 | ABHD17A ABHD17B ABHD17C |
Acyl-protein hydrolase | S-hexadecanoyl-l-cysteinyl | 90 |
TE30 | citA lovG mlcF mpL1 afoC mokD |
Acyl-ACP hydrolase | Malonyl-ACP Acetoacetyl-ACP |
91 |
TE31 | Them4 Them5 |
Acyl-CoA hydrolase | Long-chain acyl-CoA | 92 |
TE32 | ACAA2 3-Ketoacyl-CoA thiolase |
Acyl-CoA hydrolase | 2-Aminobenzoylacetyl-CoA | 93, 94 |
TE33 | ATF1 Alcohol-O-acetyltransferase |
Alcohol acetyl transferases | Acyl-CoA | 95, 96 |
TE34 | CLYBL Citramalyl-CoA lyase citE RipC |
Citramalyl-CoA lyase | Malyl-CoA | 97-100 |
TE35 | PLA2G6 Calcium-independent phospholipase A2 |
Calcium-independent phospholipase | Long-chain fatty acyl-CoAs | 101, 102 |
- Abbreviations: ALT, acyl-lipid thioesterase; CoA, coenzyme A; MKS, metylketone synthases.
Enzymes in family TE24, assigned to EC 3.1.2.2, are able to hydrolyze fatty acyl-CoA molecules with varying chain lengths (C4–C18), but they usually show a preference for long chain fatty acyl groups.80 TE24 members from Mycobacterium tuberculosis are involved in the synthesis of mycolic acids, which are used by the organism to form a protective layer around pathogens.81
Members of TE25, which include EC 3.1.2.29 among others, are able to breakdown fluoroacetyl-CoA, suggesting a key metabolic step in the resistance mechanism of Streptomyces cattley to fluoroacetate, a well-known toxic substance produced by plants as a biodefense.83, 84
Family TE26 includes structures ybfF enzymes that hydrolyze palmitoyl-CoA and malonyl-CoA.85 TE26 also includes alcohol acetyl transferases which could produce industrially relevant esters. The yeast Wickerhamomyces anomalus showed alcohol acetyltransferase (AATase) activity with ethanol and acetyl-CoA, releasing free CoA under high acetyl-CoA concentration. Although thioester hydrolysis is not the main function of the AATases in TE26, free CoA in the absence of ethanol was also reported, confirming TE activity by acetyl-CoA hydrolysis.86
Enzymes in TE27 (EC 3.1.2.22), described as mitochondrial palmitoyl-protein TEs, present in mammals, include the α/β hydrolases 10 (ABHD10) enzymes. ABHD10 enzymes are related with S-palmytoilation, a reversible lipid posttranslational modification.87
Enzymes in TE28 include mpaH, responsible for making mycophenolic acid from mycophenolyl-CoA, a natural antibiotic produced in the Penicillium brevicompactum peroxisome. These enzymes have a C-terminal cyclase/TE domain that catalyzes the cyclization and release of the polyketide.88, 89
Family TE29 (EC 3.1.2.22) includes acyl-protein thioesterases (APTs). APT enzymes are known to remove palmitate from cytosolic cysteine residues, such as S-hexadecanoyl-l-cysteinyl, in the Golgi complex of Homo sapiens.90
Enzymes in TE30 (EC 3.1.2.-) are known to be involved in the biosynthesis of citrinin, a mycotoxin, in Penicillium and Monascus species. Multi-domain PKSs are associated in citrinin biosynthesis. Type I and type VII PKS enzymes have a TE domain (CitA) involved in hydrolysis of thioester bond tethered with an ACP, releasing a free ACP and an aldehyde.91
Family TE31 (EC 3.1.2.2) has TEs that break down long-chain acyl-CoA molecules, releasing acyl chains used for reacylation of precursors of cardiolipin, a mitochondrial phospholipid found in H. sapiens and other mammalians.92
Among enzymes from TE32 (EC 3.1.2.32), those from Pseudomonas aeruginosa can hydrolyze 2-aminobenzoylacetyl-CoA to form 2-aminobenzoylacetate and CoA, a reaction in the signaling system for the expression of virulence genes that affect the cell density.93, 94
TE33 (EC 2.3.1.84 and EC 3.1.2.20) includes AATase enzymes, also known as alcohol-O-acyltransferase, that in Saccharomyces cerevisiae hydrolyze thioesters, but whose main function is not TE activity. These enzymes promote the esterification of isoamyl alcohol by acetyl-CoA. TE33 members, which prefer long- and straight-chain alcohol substrates over those with short and branched-chains, transfer the acyl group from an acyl-CoA donor to an acceptor alcohol, releasing acyl esters that can be applied in the food and beverage industry as flavoring agents. Some acetate ester products are: ethyl acetate, isoamyl acetate, isobutyl acetate, butyl acetate, hexyl acetate, heptyl acetate, and octyl acetate.95, 96
Family TE34 includes citramalyl-CoA lyase (EC 2.3.3.9 or EC 3.1.2.30), a human mitochondrial enzyme involved in vitamin B12 metabolism that is expressed from polymorphic human genes known as CLYBL, which turns malyl-CoA into malate and free CoA.97 Also present in TE34 are malyl-CoA lyase enzymes, which are structurally similar to CitE enzymes,103 were described as a multifunctional enzyme that plays a role in autotrophic CO2 fixation by Chloroflexus aurantiacus. These enzymes catalyze steps to generate (S)-malyl-CoA and β-methylmalyl-CoA in the 3-hydroxipropionate pathway.
Family TE35 (EC 3.1.1.4 and EC 3.1.2.2) includes enzymes encoded by the PLA2G6 human gene. Also known as VIA calcium-independent phospholipase A2 (iPLA2β), they perform SN-2 acyl chain hydrolysis, producing free fatty acids and lysophospholipids. Also, although not their main function, these enzymes can hydrolyze the thioester bonds from saturated long-chain fatty acyl-CoAs.101, 102
Other enzymes that have TE function, but were not classified into a family, include human mitochondrial 3-ketoacyl-CoA thiolases that are active on short, medium, or long-chain substrates to release free CoA, with the fastest rate being attributed to butyryl-CoA.104 The main function of thiolases is a condensation of acyl groups, and not TE. Ubiquitin carboxyl-terminal hydrolases105 were not classified into TE families because peptidase activity is their main function, and they can be found in the MEROPS database.106
2.2 TE families and their structures, catalytic residues, and mechanisms
The tertiary structures in each TE family were superimposed to confirm structural similarity. Each family that underwent this analysis exhibits members very highly similar in tertiary structure; their cores are nearly identical and their overall resemblance is high. This structural similarity is shown by RMSDave values of <1.4 Å and Pave values of >77% in all families (see Section 4 for definitions). Table 2 reports the structural fold of the enzymes in each family, as well as the RMSDave and Pave values for families with more than two known tertiary structures. Table 3 describes the catalytic residues, and their corresponding literature, of the structures in each TE family. We predicted catalytic residues from tertiary structure superimposition as those which spatially correspond with known catalytic residues in superimposed structures, also reported in Table 3. Figures 1 and 2 show how catalytic residues were predicted, based on structure superimposition and spatial correspondence, for TEs with HotDog fold (TE25) and an α/β-Hydrolase fold (TE20), respectively. Enzymes in TE23 and TE32 have available tertiary structures; however, their catalytic residues have not been proposed, and therefore predictions based on structural superimpositions were not done. Other families do not have any known tertiary structures: TE7, TE28, TE29, TE30, and TE33. Predicting catalytic residues was not necessary for TE13, TE14, TE17, TE18, TE19, TE24, TE26, and TE31 as every structure in these families has published literature indicating the catalytic residues (see Table 3). Within each of these families, the catalytic residues are suitably conserved between structures, with the exception of TE19 and TE26, which each only have single known structures.
Family | Fold | RMSDave, Å | Pave, % | Structures in the PDB |
---|---|---|---|---|
TE1 | NagB | 0.92 | 95.7 | 2G39, 2NVV, 4EU3, 4EU4, 4EU5, 4EU6, 4EU7, 4EU8, 4EU9, 4EUA, 4EUB, 4EUC, 4EUD, 5DDK, 5DW4, 5DW5, 5DW6, 5E5H |
TE2 | α/β-Hydrolase | 0.86 | 94.6 | 3HLK, 3K2I |
TE3 | SGNH | 0.92 | 87.4 | 1IVN, 1J00, 1JRL, 1U8U, 1V2G, 3HP4, 4JGG, 5TIC, 5TID, 5TIE, 5TIF, 6IQ9, 6IQA, 6IQB, 6LFB, 6LFC, 7C23, 7C29, 7C2A, 7C82, 7C84 |
TE4 | HotDog | 1.09 | 81.6 | 1C8U, 1TBU, 3RD7, 3U0A, 4QFW, 4R4U, 4R9Z |
TE5 | HotDog | — | — | 1NJK |
TE6 | HotDog | 1.09 | 86.9 | 1YLI, 2EIS, 2G6S, 2Q2B, 2QQ2, 2V1O, 3B7K, 3BJK, 3D6L, 4IEN, 4MOB, 4MOC, 4ZV3, 5DM5, 5SZU, 5SZV, 5SZY, 5SZZ, 5T02, 5V3A, 4NCP, 5EGJ, 5EGK, 5EGL, 5HWF, 5HZ4, 6VFY, 7CZ3 |
TE7 | Putative HotDog | — | — | — |
TE8 | HotDog | 0.56 | 97.7 | 2CY9, 2F0X, 2H4U, 3F5O, 4ORD |
TE9 | HotDog | 0.48 | 96.7 | 1S5U, 2PZH, 5KL9, 5T06, 5T07 |
TE10 | HotDog | 1.01 | 94.2 | 1BVQ, 1LO7, 1LO8, 1LO9, 1Z54, 5WH9 |
TE11 | HotDog | 0.90 | 98.4 | 1Q4S, 1Q4T, 1Q4U, 1SBK, 1SC0, 1VH5, 1VH9, 1VI8, 2B6E, 3LZ7, 3R32, 3R34, 3R35, 3R36, 3R37, 3R3A, 3R3B, 3R3C, 3R3D, 3R3F, 3S4K, 3TEA, 4K02, 4K49, 4K4A, 4K4B, 4K4C, 4K4D, 4 M20, 4QD7, 4QD8, 4QD9, 4QDA, 4QDB, 4YBV, 5EP5, 5HMB, 5HMC |
TE12 | HotDog | 0.92 | 88.3 | 2HX5, 4K00 |
TE13 | HotDog | 0.49 | 98.8 | 1J1Y, 1PSU, 1WLU, 1WLV, 1WM6, 1WN3, 2DSL, 2FS2 |
TE14 | HotDog | 1.36 | 81.3 | 2ESS, 2OWN, 4GAK, 5X04 |
TE15 | HotDog | 0.85 | 96.2 | 2W3X, 2XEM, 2XFL, 4I4J, 5VPJ |
TE16 | α/β-Hydrolase | 1.40 | 64.5 | 1JMK, 1XKT, 2CB9, 2CBG, 2K2Q, 2PX6, 3ILS, 3TJM, 4Z49, 4ZXH, 4ZXI, 5V3W, 5V3X, 5V3Y, 5V3Z, 5V40, 5V41, 5V42, 6OJC, 6OJD |
TE17 | α/β-Hydrolase | 1.23 | 79.2 | 1KEZ, 1MN6, 1MNA, 1MNQ, 1MO2, 2H7X, 2H7Y, 2HFJ, 2HFK, 3LCR, 5D3K, 5D3Z, 6MLK |
TE18 | α/β-Hydrolase | 1.16 | 77.0 | 3FLA, 3FLB, 3QMV, 3QMW, 4XJV, 5UGZ, 6BA8, 6BA9, 6FVJ, 6FW5, 6VAP |
TE19 | α/β-Hydrolase | — | — | 1THT |
TE20 | α/β-Hydrolase | 0.69 | 90.6 | 1EH5, 1EI9, 1EXW, 1PJA, 3GRO |
TE21 | α/β-Hydrolase | 1.03 | 85.6 | 1AUO, 1AUR, 1FJ2, 3CN7, 3CN9, 3U0V, 4F21, 4FHZ, 4FTW, 5KRE, 5SYM, 5SYN, 6AVV, 6AVW, 6AVX, 6AVY, 6BJE, 6QGN, 6QGO, 6QGQ, 6QGS |
TE22 | α/β-Hydrolase | 0.90 | 95.6 | 1PV1, 3C6B, 3E4D, 3FCX, 3I6Y, 3LS2, 3S8Y, 4B6G, 4FLM, 4FOL, 6JZL |
TE23 | Lactamase | 1.24 | 82.6 | 1QH3, 1QH5, 1XM8, 2Q42, 2QED, 3TP9, 4YSB, 6RZ0, 6S0I |
TE24 | HotDog | 0.85 | 93.1 | 2PFC, 3B18, 5WSX, 5WSY |
TE25 | HotDog | 0.71 | 99.5 | 2CWZ, 3KUV, 3KUW, 3KV7, 3KV8, 3KVI, 3KVU, 3KVZ, 3KW1, 3KX7, 3KX8, 3P2Q, 3P2R, 3P2S, 3P3F, 3P3I |
TE26 | α/β-Hydrolase | —a | — | 3BF7, 3BF8 |
TE27 | α/β-Hydrolase | 1.06 | 85.2 | 3LLC, 6NY9 |
TE28 | Putative α/β-Hydrolase | — | — | — |
TE29 | Putative α/β-Hydrolase | — | — | — |
TE30 | Putative α/β-Hydrolase | — | — | — |
TE31 | HotDog | 0.54 | 98.5 | 4AE7, 4AE8, 4GAH |
TE32 | Lactamase | 0.31 | 1.00 | 2Q0I, 2Q0J, 2VW8, 3DH8, 5HIO, 5HIP, 5HIQ, 5HIS |
TE33 | — | — | — | — |
TE34 | Beta-hairpin (C-terminal) TIM barrel (N-terminal) |
1.15 | 87.4 | 1SGJ, 1U5H, 1U5V, 1Z6K, 3QLL, 4L9Y, 4L9Z, 5UGR, 5VXC, 5VXO, 5VXS, 6AQ4 |
TE35 | — | — | — | 6AUN |
- Abbreviations: PDB, Protein Data Bank; RMSD, root mean square distance.
- a RMSD and Pave for TE26 were not calculated because the two PDB entries are of the same protein structure.
Family | Catalytic residues | Corresponding structure | Producing organism | Reference |
---|---|---|---|---|
TE1 | Val270, Glu294, Asn347, Gly388 | 4EU3, 4EU4, 4EU5, 4EU6, 4EU7, 4EU8, 4EU9, 4EUA, 4EUB, 4EUC, 4EUD | Acetobacter aceti | 107 |
Val270, Glu294, Asn347, Gly388 | 5DDK, 5DW4, 5DW5, 5DW6, 5E5H | A. aceti | 108 | |
Val259, Glu284, Asn337, Gly378 | 2NVV | Porphyromonas givgivalis | Predicted in this work | |
Ile264, Glu288, Asn341, Gly382 | 2G39 | Pseudomonas aeruginosa | Predicted in this work | |
TE2 | Ser294, His422, Asp388 | 3HLK | Homo sapiens | 109 |
Ser232, His360, Asp326 | 3K2I | H. sapiens | Predicted in this work | |
TE3 | Ser10, Asp154, His157 | 1IVN, 1JRL, 1J00, 1U8U, 1V2G | Escherichia coli | 110 |
Ser11, Asp158, His161 | 3HP4 | Pseudoalteromonas sp. | 111 | |
Ser9, Asp156, His159 | 4JGG | P. aeruginosa | 112 | |
Ser10, Asp154, His157 | 5TIC, 5TID, 5TIE, 5TIF | E. coli | 113 | |
Ser10, Asp154, His157 | 6LFB, 6LFC | E. coli | Predicted in this work | |
Ser29, Asp178, His181 | 7C23, 7C29, 7C2A, 7C82, 7C84 | Croceicoccus marinus | 114 | |
Ser13, Asp162, His165 | 6IQ9, 6IQA, 6IQB | Altericroceibacterium indicum | Predicted in this work | |
TE4 | Asp204, Thr228, Gln278 | 1C8U | E. coli | 115 |
Asp194, Ser216, Gln266 | 3U0A | Mycobacterium marinum M | Predicted in this work | |
Asp204, Thr228, Gln278 | 4QFW, 4R4U | Yersinia pestis | 31 | |
— | 1TBU | Saccharomyces cerevisiae | — | |
Ala202, Leu225, Gln275a | 3RD7 | Mycobacterium avium 104 | Predicted in this work | |
Ala197, Gln 216, Gln266a |
4R9Z |
M. avium subsp. paratuberculosis K-10 | Predicted in this work | |
TE5 | — | 1NJK | E. coli | — |
TE6 | Asp213 | 2Q2B | Mus musculus | 116 |
Asn24 | 2V1O | |||
Asp44 | 1YLI, 3BJK | Haemophilus influenzae Rd KW20 | 117 | |
Asp34 | 3D6L | Campylobacter jejuni | 46 | |
Asp36, Asn195 | 3B7K, 4MOB, 4MOC | H. sapiens | Predicted in this work | |
Asp245 | 2QQ2 | H. sapiens | Predicted in this work | |
Asp46 | 5DM5 | Yersinia pestis | Predicted in this work | |
Asp31 | 2EIS | Thermus thermophilus | Predicted in this work | |
Asn70, Asp259 | 4ZV3, 6VFY | M. musculus | Predicted in this work | |
Asn24, Asp39 | 4IEN, 5SZU, 5SZV, 5SZY, 5SZZ, 5T02, 5V3A | Neisseria meningitidis | 22 | |
Asn28, Asp43, Thr60 | 4NCP, 5EGJ, 5EGK, 5EGL, 5HWF, 5HZ4 | Staphylococcus aureus, subsp. aureus Mu50 | 118 | |
Asn23, Asp38 | 7CZ3 | Bacillus cereus ATCC 14579 | 119 | |
TE7 | — | — | — | — |
TE8 | Asn50, His56, Gly57, Asp65 | 2F0X, 3F5O, 2H4U | H. sapiens | 120, 121 |
Asn50, His56, Gly57, Asp65 | 2CY9 | M. musculus | Predicted in this work | |
Asn51, His57, Gly58, Asp66 | 4ORD | Danio rerio | Predicted in this work | |
Asp65, Ser83, His134 | Simulationb | H. sapiens | 122 | |
TE9 | Tyr7, Asp11, His18 | 2PZH | Helicobacter pylori | 53 |
Tyr14, Asp18, His25 | 1S5U, 5KL9, 5T06, 5T07 | E. coli | Predicted in this work | |
TE10 | Asp17 | 1BVQ, 1LO7, 1LO8, 1LO9 | Pseudomonas sp. | 123 |
Asp16 | 5WH9 | Alkalihalobacillus halodurans C-125 | Predicted in this work | |
TE11 | Gly65, Glu73 | 1Q4S, 1Q4T, 1Q4U | Arthrobacter sp. | 124 |
Gly55, Glu63 | 1VH9, 1VH5, 1VI8, 1SBK | E. coli | Predicted in this work | |
Gly55, Glu63 | 2B6E, 1SC0, 3LZ7 | Haemophilus influenzae | Predicted in this work | |
Gly39, Glu47 | 4M20, 4YBV, 5EP5 | Staphylococcus aureus, subsp. aureus Mu50 | Predicted in this work | |
Gly65, Ala73 | 3R32, 3R34, 3R35, 3R36, 3R37, 3R3A, 3R3B, 3R3C, 3R3D, 3R3F, 3TEA | Arthrobacter sp. | Predicted in this work | |
Gly52, Glu60 | 3S4K | Mycobacterium tuberculosis | Predicted in this work | |
Gly55, Glu63 | 4K49, 4K4A, 4K4B, 4K4C, 4K4D | E. coli K-12 | 125 | |
Gly56, Glu64 | 4QD7, 4QD8, 4QD9, 4QDA, 4QDB | P. aeruginosa | Predicted in this work | |
Gly49, Glu57 | 5HMB, 5HMC | Streptomyces sahachiroi | Predicted in this work | |
Gly49, Glu57 | 4K02 | Arabidopsis thaliana | 35 | |
TE12 | Asp16 | 2HX5 | Prochlorococcus marinus | Predicted in this work |
Asp16 | 4K00 | Synechocystis sp. PCC 6803 substr. Kazusa | 35 | |
TE13 | Gly40, Asp48 | 1WLU, 1J1Y, 1WM6, 1WLV, 1WN3, 2DSLc | Thermus thermophilius | 126 |
Gly53, Asp61 | 2FS2, 1PSU | E. coli | 127 | |
TE14 | Asp281, Asn283, His285, Glu319 | 2ESS | Bacteroides thetaiotaomicron VPI-5482 | 128 |
Asp281, Asn283, His285, Glu319 | 2OWN | Lactiplantibacillus plantarum | 128 | |
Asp281, Asn283, His285, Glu319 | 4GAK | Spirosoma linguale DSM 74 | 128 | |
Asp281, Asn283, His285, Glu319 | 5X04 | Umbellulaia californica | 128 | |
TE15 | Asn19, Tyr29, Arg37 | 2W3X | Micromonospora echinospora | 62 |
Asn23, Tyr33, Arg41 | 2XEM, 2XFL | Micromonospora chersina | Predicted in this work | |
Asn21, Tyr31, Arg39 | 4I4J | Streptomyces globisporus | Predicted in this work | |
Asn17, Tyr27, Arg35 | 5VPJ | Actinomadura verrucosospora | Predicted in this work | |
TE16 | Ser2308, Asp2338, His2481 | 1XKT, 2PX6, 3TJM, 4Z49 | H. sapiens | 129 |
Ser80, Asp107, His207 | 1JMK | Bacillus subtilis | 130 | |
Ser84, Asp111, His201 | 2CB9, 2CBG | B. subtilis | 131 | |
Ser1937, Asp1964, His2088 | 3ILS | Aspergillus parasiticus | 132 | |
Cys1135, Asp1162, His1295 | 4ZXH, 4ZXI | Acinetobacter baumannii AB307-0294 | Predicted in this work | |
Ser1533, Asp1560, His1699 | 5V3W, 5V3X, 5V3Y, 5V3Z, 5V40, 5V41, 5V42 | M. tuberculosis | 133 | |
Ser1790, Asp1806, His1901 | 6OJC, 6OJD | Nocardia uniformis subsp. tsuyamanensis | 134 | |
TE17 | Ser142, Asp169, His259 | 1KEZ, 1MO2, 5D3K, 5D3Z, 6MLK | Saccaropolyspora erythaea | 135 |
Ser148, Asp176, His268 | 1MN6, 1MNA, 1MNQ, 2H7X, 2H7Y, 2HFJ, 2HFK | Streptomyces venezuelae | 136 | |
Ser132, Asp159, His255 | 3LCR | Streptomyces sp. CK4412 | 137 | |
TE18 | Ser86, Asp189, His216 | 2K2Q, 2RON | Brevibacillus parabrevis, B. subtilis | 138 |
Ser94, Asp200, His228 | 3FLA, 3FLB | Amycolatopsis mediterranei | 67 | |
Ser107, Asp213, His241 | 3QMV, 3QMW | Streptomyces coelicolor | 139 | |
Ser101, Asp212, His237 | 4XJV | H. sapiens | 140 | |
Ser78, Asp186, His215 | 5UGZ | E. coli | 141 | |
Ser89, Asp197, His225 | 6BA8, 6BA9 | E. coli | 142 | |
Ser104, Asp208, His236 | 6FVJ, 6FW5 | M. tuberculosis | 66 | |
Ser98, Asp204, His232 | 6VAP | Streptomyces sp. WAC02707 | 28 | |
TE19 | Ser114, Asp211, His241 | 1THT | Vibrio harveyi | 143 |
TE20 | Ser115, Asp233, His289 | 1EH5, 1EI9, 1EXW | Bos taurus | 144 |
Ser111, Asp228, His283 | 1PJA, 3GRO | H. sapiens | Predicted in this work | |
TE21 | Ser114, Asp168, His199 | 1AUO, 1AUR | Pseudomonas fluorescens | 145 |
Ser114, Asp169, His203 | 1FJ2 | H. sapiens | 146 | |
Ser113, Asp166, His197 | 3CN7, 3CN9 | P. aeruginosa | 147 | |
Ser124, Asp179, Glu212 | 3U0V | H. sapiens | Predicted in this work | |
Ser116, Asp170, His202 | 4F21 | Francisella tularensis subsp. tularensis SCHU S4 | 148 | |
Ser165, Asp216, His248 | 4FHZ, 4FTW | Cereibacter sphaeroides | Predicted in this work | |
Ser119, Asp174, His209 | 5SYM | H. sapiens | 149 | |
Ser122, Asp176, His210 | 5SYN | H. sapiens | ||
Ser106, Asp160, His192 | 6AVV, 6AVW, 6AVX | A. thaliana | Predicted in this work | |
Ser126, Asp197, His230 | 6AVY | Zea mays | Predicted in this work | |
Ser122, Asp176, His210 | 6BJE | H. sapiens | 150 | |
Ser119, Asp174, His208 | 6QGN, 6QGO, 6QGQ, 6QGS | H. sapiens | Predicted in this work | |
TE22 | Ser161, Asp241, His276 | 1PV1, 3C6B | S. cerevisiae | 151 |
Ser147, Asp223, His256 | 3E4D | Agrobacterium fabrum str. C58 | 152 | |
Ser153, Asp230, His264 | 3FCX | H. sapiens | 153 | |
Ser148, Asp224, His257 | 3I6Y, 3S8Y | Oleispira antarctica | 154 | |
Ser147, Asp225, His258 | 3LS2 | Pseudoalteromonas translucida TAC125 | 155 | |
Ser145, Asp221, His254 | 4B6G | N. meningitidis MC58 | 156 | |
Ser161, Asp241, His276 | 4FLM, 4FOL | Saccaromyces cerevisia | Predicted in this work | |
Ser148, Asp224, His257 | 6JZL | Shewanella frigidimarina | 157 | |
TE23 | — | —c | — | — |
TE24 | Asn83, Tyr87, Tyr33, Met118 (subunit A), and Tyr66, Thr70, His72, Asn74 (subunit B) | 2PFC, 3B18 | M. tuberculosis | 80 |
Tyr53, Ile54, His59, Asn61, Ser62 (subunit A), and Tyr20, Asn70, Met73, Tyr74, Ile107 (subunit B) | 5WSX, 5WSY | Streptomyces avermitilis MA-4680 = NBRC 14893 | 82 | |
TE25 | Thr42, Glu50, His76 | 3KUV, 3KUW, 3KV7, 3KV8, 3KVI, 3KVU, 3KVZ, 3KW1, 3KX7, 3KX8 | Streptomyces cattleya | 158 |
Thr36, Glu44, His70 | 2CWZ | T. thermophilus HB8 | Predicted in this work | |
Thr42, Glu50, His76 | 3P2Q, 3P2R, 3P2S, 3P3F, 3P3I | S. cattleya | 84 | |
TE26 | Ser89, Asp113, Ser206, His234 | 3BF7, 3BF8 | E. coli | 85 |
TE27 | Ser100, Asp197, His227 | 6NY9 | M. musculus | 87 |
Ser113, Asp216, His246 | 3LLC | Agrobacterium vitis S4 | Predicted in this work | |
TE28 | — | — | — | — |
TE29 | — | — | — | — |
TE30 | — | — | — | — |
TE31 | Thr308, Ser473 | 4AE7, 4AE8, 4GAH | H. sapiens | 92 |
TE32 | — | —c | — | — |
TE33 | — | — | — | — |
TE34 | Asp320 | 5VXS, 5VXC, 5VXO | H. sapiens | 97 |
— | 1SGJ | Deinococcus radiodurans | — | |
— | 1U5H, 1U5V, 1Z6K | M. tuberculosis | — | |
Glu49 | 6AQ4 | M. tuberculosis | 99 | |
— | 3QLL | Yersinia pestis | — | |
Asp299 | 4L9Y, 4L9Z | C. sphaeroides 2.4.1 | Predicted in this work | |
Asp304 | 5UGR | Methylorubrum extorquens AM1 | Predicted in this work | |
TE35 | Ser465, Asp598 | 6AUN | Cricetulus griseus | 159 |
- a Catalytic residue prediction for 3RD7 was based purely on their high degree of spatial correlation with the catalytic residues of 1C8U and 4QFW. It is noted that these residues do not have a high degree of chemical similarity.
- b Predicted from mixed quantum mechanics/molecular mechanics simulations based on the 3F5O crystal structure.
- c Even though structures are known, catalytic residues have not been determined, so none are predicted.


2.2.1 HotDog catalytic residues and mechanisms
Families with HotDog160, 161 fold structures (TE4–TE15, TE24, TE25, TE31) have highly similar tertiary structures, indicated by the consistently low RMSDave and high Pave values.
HotDog-fold enzymes lack defined non-solvated binding pockets and conserved catalytic residues,45 thus a variety of catalytic residues and mechanisms exist.
In TE4, Mycobacterium marinum TesB2 (3U0A) catalytic residues were predicted to be Asp194-Ser216-Gln266, based on comparison to an Escherchia coli TE II enzyme (1C8U) in which Asp204-Gln278-Thr228 orient a water molecule for nucleophilic attack on the substrate.115 This is consistent with the catalytic residues found in Y. pestis TesB (4QFW, 4R4U); a structure that presents an octameric quaternary structure, unique among HotDog families.31 A S. cerevisiae TE I structure (1TBU) contains only residues from the N-terminal domain that does not include the residues that could be compared to the catalytic triad. Catalytic residues for the remaining family members were predicted (see Table 3). Of note in these predictions are Mycobacterium avium MAV2540 (3RD7) and MAP1729c (4R9Z); these inactive TesB enzymes contain a mutation in which the highly conserved Asp residue is substituted for an Ala residue. Within TesB TEs, this mutation appears to be unique to Mycobacterium species.32
In TE6, Mus musculus Acot7 N-terminal domain (2V1O) and C-terminal domain (2Q2B) catalytic residues are reported as Asn24 and Asp213, respectively.116 The structures for human Acot12 (3B7K, 4MOB, 4MOC) and M. musculus Acot7 (4ZV3, 6VFY) contain both N and C-terminal domains. Our alignment placed both 2V1O and 2Q2B over the C-terminal of these structures confirm catalytic residues in the C-terminal domain. Using this molecular symmetry, the N-terminal catalytic residues were predicted as well. This follows with literature which indicates that these structures form a functioning active site when joined as a dimer.33 A study on N. meningitidis TE 12 (5SZU) supported these findings, pointing to a covalent disulfide bond dimer linkage that is requisite for enzymatic activity.22 The Asn-Asp catalytic motif is highly consistent in this family, recently supported by findings on a Bacillus cereus TE (7CZ3).119 Unique among the family is a S. aureus TE (4NCP) that also relies on a Thr residue for catalysis.118 Also in TE6, YciA structures have and aspartic acid catalytic residues in the same structural position as those in Campylobacter jejuni Cj0915 (3D6L) and Haemophilus influenzae Rd KW20 HI0827 (1YLI, 3BJK).46, 117
Although TE7 has no known crystal structures, sequence analysis with other ACOT enzyme suggests that Asp120 and Asn305 are catalytic residues in the mouse ACOT9 enzyme.50
It was proposed for TE8 enzymes, based on the crystal structure of a human Them2 enzyme, that Gly57 and Asn50 bind and polarize the thioester carbonyl group while Asp65 and Ser85 orient and activate the water nucleophile.120, 121 It was later proposed, based on mixed quantum mechanics/molecular mechanics simulations of the same human enzyme, that a His-Ser pair acts as the acid proton donor in a concerted mechanism where the Asp residue activates the water molecule.122 Based on superimposition with the crystal structure of the human Them2, the structures for M. musculus Acot13 (2CY9) and Danio renio Acot13 (4ORD) are predicted to have the same Asn50, His56, Gly57, Asp65 catalytic structure.120 The position of these catalytic residues seem to be extremely highly conserved in this family; the position of the catalytic residues in 2CY9 and 2F0X are exactly the same and are only shifted by one position in 4ORD (e.g., Asp65 to Asp66).
In TE9, an E. coli enzyme (1S5U) is predicted to have catalytic residues Tyr14-Asp18-His25, based on a strong spatial correlation with the catalytic structure (Tyr7-Asp11-His18) of an Helicobacter pylori enzyme (2PZH) in the superimposed structures.53
It was proposed for TE10 4-hydroxybenzoyl-CoA TEs (1LO7, 1LO8, 1LO9) that a helix dipole moment make the thioester carbonyl group more susceptible to a nucleophilic attack by Asp17.123 We predict that Asp16 in an Alkalihalobacillus halodurans enzyme (5WH9) is catalytic, based on the Asp17 residue of a Pseudomonas TE (1BVQ).56
TE11 TEs in Arthrobacter (1Q4S, 1Q4T, 1Q4U), E. coli K-12 (4K49), and A. thaliana At2g48320 (4K02) all have nearly identically positioned glycine and glutamic acid catalytic residues.35, 124 The crystal structures of other members of this family spatially align well, and are predicted to have the same Gly-Glu catalytic residues (Table 3). Members of TE11 may also act as chain elongation and cyclization domains in certain synthetic pathways.34
TE12 enzymes from Synechocystis (4K00) and Prochlorococcus (2HX5) bacteria have been crystallized. In 4K00, Asp16 was proposed to act as a nucleophile, while it is also possible that it acts as a base to attack the thioester through activation of a water molecule. The thioester oxygen atom could be stabilized by the amide hydrogen on Phe23. Also, Pro57, which positioned above the substrate moiety, may contribute to substrate specificity.35
From the structures 1WLU, 1J1Y, 1WM6, 1WLV, and 1WN3, a study proposed that TE13 Thermus thermophilus PaaI TE hydrolyze substrates with an Asp48-activated water nucleophile.126 By comparison, an E. coli PaaI structure (2FS2) with the Arthrobacter TE11 structures and site-directed mutagenesis, a mechanism similar to that in TE11 was proposed: Gly53 prepares the thioester for a nucleophilic attack from Asp61.127
TE14, which has many bacterial sequences that have been less characterized than their plant counterparts, has a surprising breath of substrate specificity.60 In TE14, a site-directed mutagenesis study on a FatB enzyme from A. thaliana pointed to a Cys264, His229, and Asn227 papain-like catalytic triad.162 Another site-directed mutagenesis study on a FatB enzyme, from Umbellulaia californica, proposed a catalytic network of Asp281, Asn283, His285, and Glu319.128 More recently, structural predictions and site-directed mutagenesis resulted in identifying the catalytic residues of the C. viscosissima acyl-ACP TE.16
In TE15, a mechanism based on CalE7 enzyme (2W3X), which has no acidic residues in the catalytic region, was proposed: Asn19 and Arg37 hold the substrate while a water molecule or hydroxide anion acts as a nucleophile, and Tyr29 assists in decarboxylation.62 Asn, Arg, and Tyr residues in a Micromonospora chersina tebC (2XEM, 2XFL), as well as Streptomyces globisporus (4I4J) and Actinomadura verrucosospora (5VPJ) TEs are predicted to be catalytic based on spatial correspondence with the superimposed M. echinospora structure (2W3X).
The crystal structure for TE24 is represented by Protein Data Bank (PDB) 2PFC and 3B18. The quaternary structure is formed by three dimers and has a long and narrow substrate-binding site. The catalytic site is formed by Asn83, Tyr87, Tyr33, and Met118 for subunit A and Tyr66, Thr70, His72, and Asn74 for subunit B.80 Notably, the active site lacks acidic residues common to HotDog TEs, which is also observed in a TE24 Streptomyces enzyme.82
In TE25, a T. thermophilus TE (2CWZ) is predicted to have Thr36, Glu44, and His70 as catalytic residues (see Figure 1) based on the spatial superimposition with the catalytic residues in Streptomyces cattleya fIK (3KUV).158 The specificity for fluorine-containing compounds could arise from substrate binding through a hydrophobic pocket formed by a helical lid structure (side chains of Val46 and Val54), as well as by Val23, Leu26, Phe33, and Phe36 in S. cattleya fIK.84
Family TE31 has Them4 and Them5 isoforms, which have been crystalized and are reported by the 4AE8 and 4AE7 structures, respectively, forming a homodimer unity. Their structures consist of a long central alpha helix surrounded by a six-stranded curved antiparallel beta-sheets. Both isoforms are formed by two active sites per homodimer at the end of each HotDog helix: His152, Gly153, Gly154/His158, Gly159, Gly160 (active site one), and Asp161, Thr177/Asp167, Thr183 (active site two).92
2.2.2 α/β hydrolase catalytic residues and mechanisms
The α/β-hydrolase fold,163 found in TE2, TE16 to TE22, and TE26 to TE28, shows higher variation in RMSDave and Pave values than the HotDog fold. Most α/β-hydrolase fold proteins, not only TEs, are present in the ESTHER database.164 Two families, TE29 and TE30, based on sequence similarity, are likely to have α/β-hydrolase-like folds; however, there are no available structures to confirm. α/β hydrolases have conserved catalytic residues: a nucleophile–histidine–acid triad.163 Serine, cysteine, or aspartate can act as the nucleophile. There is a large variation of fold architecture and binding sites in α/β hydrolases.165 In their catalytic mechanism, the acid stabilizes the histidine, which acts as a base by accepting a proton from the nucleophile, which forms a substrate intermediate that attacked by water. In PKSs or NRPs that make cyclic products, for example, in erythromycin biosynthesis,166 a hydroxyl group from the substrate chain is used instead of a water molecule. Different cyclization mechanisms lead to a wide variety of PKS or NRP products.167
The structure of TE2 is represented by 3HLK, which comes from human ACOT2, and 3K2I, which comes from human ACOT4. These structures are somewhat unique for this fold: in the primary structure for these enzymes the Asp residue precedes the His residue, where in all other α/β hydrolase TEs the His residue precedes the Asp residue.109 The catalytic residues of 3K2I (Table 3) are predicted based on alignment with 3HLK.
In TE16, most structures show a consistent Ser-Asp-His catalytic triad: seen in the human FAS TE domain,129, 168-170 the TE domain in Bacillus NRPSs surfactin and fengycin synthetases,130, 131 the TE domain of the Aspergillus aflatoxin PKS,132 the TE domain of Mycobacterium PKSs involved in making mycolic acids,133 and in the TE domain of NocB enzyme in Nocardia.134 However, based on structural superimposition with TE16 structures with identified catalytic residues, we predict that the TE domain of an Acinetobacter baumannii NRPS enzyme (4ZXH, 4ZXI)171 has a Cys-Asp-His catalytic triad (Table 3).
TE17 has enzymes, which are the TE domain of macrocycle-forming PKSs, such as of 6-deoxyerythronolide B synthase from S. erythraea,135, 136, 172, 173 picromycin synthase from S. venezuelae,136, 174, 175 and tautomycin synthase.137 They all show a consistent Ser-Asp-His catalytic triad.
Member of TE18 with crystal structures are type II TEs, a class of enzyme responsible for a variety of functions, primarily maintenance of biosynthetic pathways through release of undesired intermediates from carrier protein domains.28, 66, 67, 139-142, 176 A lid-flip conformational change is present in these enzymes and the Ser-Asp-His catalytic triad is conserved. This can be seen in the surfactin synthase from Bacillus subtilis,177 from the rifamycin biosynthetic cluster from A. mediterranei,67 the borrelidin biosynthetic cluster from Streptomyces,28 in the prodiginine biosynthetic pathway in Streptomyces coelicolor,139 and in ClbQ and YbtT enzymes in E. coli.141, 142 This also holds true in a human TE II and in a TesA from M. tuberculosis.66, 140
In family TE19, a single structure is known, that of a Vibrio harveyi TE, which also has the Ser-Asp-His catalytic triad.143
Families TE20, TE21, and TE22 all share the characteristic Ser-Asp-His catalytic triad. Comparison of tertiary structures within each family leads us to predict that this Ser-Asp-His catalytic triad is consistent for all structures (see Table 3 and Figure 2).
TE21 includes mainly eukaryotic acyl-protein hydrolases, as well as enzymes with different functions. The carboxylesterase from P. fluorescens has very little activity on triacylglycerides with fatty acids longer than four carbons, likely due to the loops constraining the active-site cleft.145 A closely related human enzyme, hAPT1, originally thought to be a lysophospholipase, has been shown to have stronger TE activity.146 Another APT, from Francisella tularensis, has a similar substrate specificity profile to both of the aforementioned enzymes, though unlike P. fluorescens, it lacks a lid domain.148 This was confirmed by another study that examined the mechanism of isoform-selective inhibitors on human APT1.149 The carboxylesterase from P. aeruginosa was shown to have no activity on triacylglycerols, and a preference for eight-carbon acyl substrates. The human lysophospholipase A2 is a cystolic serine hydrolase partially responsible for lysophospholipid metabolism.150 All of these structures follow the Ser-Asp-His catalytic motif.
Members of TE22 are involved in glutathione-dependent formaldehyde detoxification, and many of the crystal structures in this family are of S-formylglutathione hydrolase (SFGH) enzymes. These have been studied in a variety of species: S. cerevisiae,151 Agrobacterium fabrum str. C58,152 P. translucida TAC125,155 Shewanella frigidimarina,157 and N. meningitidis MC58.156 Other functions are present in this family as well: (a) a human esterase has been studied because it is relevant to retinoblastoma,153 and (b) an oil-degrading bacterium, O. antarctica, expresses an enzyme with carboxylesterase and TE activity.154 TE22 enzymes have the characteristic Ser-Asp-His catalytic triad. Based on this, the catalytic structure of a S. cerevisiae SFGH (4FLM) is predicted as Ser161-Asp241-His276 (Table 3).
A study on the only crystal structures found for this family, ybfF from E. coli (3BF7, 3BF8), suggests that this family is unique within the α/β hydrolase TEs: rather than the typical Ser-Asp-His catalytic triad, this family seems to have a Ser89-Asp113-Ser206-His234 catalytic tetrad. The α/β hydrolase domain of these structures gives good alignment with other canonical α/β hydrolases. However, the Asp113 residue, which normally lies above or parallel to the His234 imidizole rings, is located in the lower section of the His imidazole ring. The expected position for the Asp113 residue is instead occupied by Ser206, which is well conserved in the ybfF enzymes.85
The structure of TE27 enzymes is described by a M. musculus ABHD10, which shows a Ser-His-Asp catalytic triad. The location of the catalytic serine residue suggests a hydrophobic interaction between the lipid substrate and the interior surface of the protein. A “cap domain” above the catalytic triad forms a binding pocket and affects substrate accessibility.87 We predict that Ser113-Asp216-His246 is the catalytic triad in an A. vitis enzyme based on comparison to the M. musculus ABHD10 enzyme.87
Families TE28 and TE29 have no crystal structures. TE28 shows sequence similarity with a putative α/β hydrolase fold enzyme, and their structure and mechanisms still unknown despite a close relationship with FASs.88 TE29 may also have an α/β hydrolase fold, as was predicted from gene ABHD17C.89
The structure of an CitA enzyme in TE30, predicted by homology from a co-expression of the PKS gene, suggests a Ser122-His235-Asp207 as catalytic triad.91
2.2.3 Catalytic residues and mechanisms in other folds
TEs are found in the NagB (TE1) and SGNH (TE3) folds.111-114, 110 In TE1, which also includes acyl-CoA transferases, we predict that the catalytic residues of a putative acetyl-CoA hydrolase from Porphyromonas givgivalis (2NVV) and a CoA transferase from P. aeruginosa (2G39) are Val259-Glu284-Asn337-Gly378 and Ile264-Glu288-Asn341-Gly382, respectively, based on those known from A. aceti AarCH6 structures (4EU3, 5DDK).107, 108
In TE3, comparison to available structures—E. coli tesA (e.g., 1IVN, 1JRL)110 and Pseudoalteromonas estA (3HP4)111—reveals the likely catalytic residues for an E. coil TE (6LFB, 6LFC) and A. indicum AlinE4 esterase (6IQ9, 6IQA, 6IQB) are Ser10-Asp154-His157 and Ser13-Asp162-His165, respectively. TesA enzymes were found to have a Ser-His-Asp catalytic triad similar to those in α/β hydrolases.110 The crystal structure of TesA from E. coli was found to be particularly compact and rigid, which likely pushes the substrate specificity toward smaller chain lengths.112 It has also proved to be a useful candidate for attempts at engineering TEs to produce specific lengths of free fatty acids.113 Other SGNH fold TEs, CrmE10 and AlinE4 were similarly susceptible to engineering for increased enzymatic activity.114
Two families have the β-lactamase fold: TE23 and TE32. The structures in TE23 are significantly less well conserved than those in TE32. TE23 hydroxyglutathione hydrolases, which include glyoxalase II enzymes, have a metallo-β-lactamase fold, and their mechanisms are very different from the rest of TEs that do not have catalytic metal ions. Crystal structures of human glyoxalase II (1QH3, 1QH5) reveal two zinc ions with octahedral coordination, interacting with His and Asp residues. Based on this, a study proposed that a hydroxide ion bonded with both ions attacks the carbonyl carbon atom of the glutathione thioester substrate, forming a tetrahedral intermediate, followed by breakage of the CS bond.178 In mitochondrial glyoxalase II from A. thaliana (1XM8, 2Q42), the zinc ions were also coordinated by His and Asp residues, but were in trigonal bipyramidal and tetrahedral geometries.179 Another glyoxylase II enzyme, from Salmonella typhimurium (2QED), was proposed to have an uncommon metal affinity: a diiron, dimanganese, or hybrid Fe/Mn.180 A unique member of the family, a persulfide dioxygenase from Myxococcus xanthus (4YSB), has a single ion in the active site with a two-His and one-carboxylate triad coordination pattern.181
Enzymes in TE32 have monomeric metallo-β-lactamase fold structures, with an Fe(II)Fe(III) center in the active site and an αβ/αβ sandwich core. All the resolved structures in this family are PqsE enzymes from P. aeruginosa, a human pathogen of particular interest due to its tendency for antibiotic resistance.182 The active center of the enzyme is covered by a lid formed by two α-helices in the C-terminal region, affecting substrate access.94 It has also been demonstrated that PqsE has a role in alkylquinolone biosynthesis.183
Although TE33 includes no crystal structures, a mechanism has been proposed, which shows an active site His acting as a base, with the substrate hydroxyl forming a hydrogen bond with a histidine residue.184-186 A nucleophilic attack from a deprotonated hydroxyl at the carbonyl of an acyl-CoA thioester was described, as was the involvement of an Asp residue in the stabilization of the structure within the active site.96, 184, 185, 187
Crystal structure 5VXS represents a member from TE34 and reveals a homotrimer with a substrate-bound cavity located between the N-terminal from one subunit and the C-terminal from the subsequent subunit. The N-terminal forms a β8α8-TIM barrel fold and the C-terminal is characterized by a lid-domain consisting of two helices connect by a β-hairpin loop. The β-hairpin loop presents a highly conserved Asp320 that removes a proton from the substrate during the catalytic activity.97, 103, 188, 189 In TE34, the catalytic residues for a M. tuberculosis (6AQ4), Cereibacter sphaeroides (4L9Y, 4L9Z), and M. extorquens (5UGR) enzymes are predicted to be Asp261, Asp299, and Asp304, respectively, based on comparison to human CLYBL structure (5VXS).97 The catalytic residues for the remaining family members could not be confidently predicted by structural comparison. Two of these are CitE proteins from M. tuberculosis: one study (1U5H) predicts that the catalytic site is in a hydrophobic cavity formed by the C-terminal tips of the TIM β-barrel,190 while another study (6AQ4) shows that the active site contains an Mg2+ ion coordinated by the ligand, Glu112, Asp138, and two water molecules.99 Closely related to 1U5H is Y. pestis RipC (3QLL), for which the active site is similarly predicted. However, it is also suggested that the active site for 3QLL may be formed through an intermonomer interaction.100
The structure 6AUN in TE35 is characterized by the presence of an Ankyrin domain, a 33-residue helix-turn-helix structure followed by a hairpin-like loop, and a catalytic domain. Regarding the catalytic mechanisms, a dyad formed by Ser-Asp is responsible for the lipid hydrolysis.159, 191
2.3 TE phylogeny
TE families show convergent evolution because enzymes from different families, with different folds, have the same activity (thioester hydrolysis) despite a wide variety of substrates. Divergent evolution is evidenced by the many substrates that enzymes in single families show activity to, even though they have similar primary and tertiary structures and mechanisms. A phylogenetic analysis of TEs exhibiting the two main folds, α/β hydrolase and HotDog, was performed.
All the amino acid sequences with experimentally confirmed TE activity which are members of TE families with a HotDog fold (TE4–TE15, TE24, TE25, and TE31) were aligned and a phylogenetic tree was constructed, shown in Figure 3.

The HotDog fold cladogram confirms the previously reported TE clans,12 since families within a clan are grouped in the same clade. TE clans were previously identified with structural superimpositions, not by phylogeny. Figure 3 suggests that TE15 is a part of Clan TE-A, which includes TE5, TE9, TE10, TE12 as well, and is similar to the 4HBT-like SCOP family. TE8, TE11, and TE13 were grouped into clan TE-B in previous work,12 and form part from same clade in Figure 3. The proximity of sequences from TE25 and TE31 to this clade suggests that they also form part of TE-B. TE14 members present a common ancestor with clan TE-B sequences. However, structural differences and catalytic mechanisms do not support TE14 inclusion in TE-B.
All TE4 members share a common ancestor and present high sequence similarity, forming a single clade in Figure 3. Members of TE6 and TE7 share a common ancestor, but the lack of crystal structures in TE7 does not allow for inference of a new clan based on structure superimposition. Enzymes in TE6 and TE7 appear in the same clade, suggesting that ortholog sequences from a speciation event at the branching point. At least TE24, despite belonging to the HotDog fold, seems to diverge from the common ancestor prior to other clades and is represented as an outgroup.
All the amino acid sequences with experimentally confirmed TE activity which are members of TE families with an α/β hydrolase fold (TE2, TE16, TE17, TE18, TE19 TE20, TE21, TE22, TE26, TE27, TE28, TE29 and TE30) were aligned, and a phylogenetic constructed, shown in Figure 4.

The TE families were grouped previously in two clans: TE-C (TE16, TE17, and TE18) and TE-D (TE20 and TE21).12 Unlike for clans with TE HotDog enzymes, they are not grouped in the same clade, despite structural and functional similarity, suggesting a convergent evolution event.
Members of TE2 are phylogenetically close to TE16, TE17, and TE18, TE19 is close from TE20, and as TE30 is close toTE21, but with not enough structural criteria for it to form part of clan TE-C. The sequences in TE22, TE26, TE27, and TE28 share a common ancestor, forming a well-defined clade that is closer to TE21, TE29, and TE30 than any TE-C family member. Apparently, the α/β hydrolase fold facilitates nonrestricted acyl-ACP hydrolase or acyl-CoA hydrolase activity, increasing the variety of substrate options for this group.
2.4 Updated ThYme database
All the sequences and structures in the TE families described here appear in the ThYme database,13 which is in the process of being completely updated and has a new home at the University of Nevada, Reno (http://thyme.engr.unr.edu). Families, their member sequences, taxonomical data, accession codes, and protein names can be viewed using the ThYme database online interface. The database has links to UniProt,192 GenBank,193 and PDB194 databases. Although the content of families will be updated automatically, human judgment will still be necessary for adding, merging, or deleting families.
In the new ThYme website, each enzyme class (e.g., TEs) will have an interactive interface where users can view content of a single family or multiple families. Each unique sequence is displayed as a row containing: the family, the organism, protein names, protein identifiers, protein evidence information, crystal structures, gene names, as well as gene and pathway identifiers. Each entry will display, at the minimum, the family and a protein identifier; all other fields will be populated if suitable data is known. The content has multiple search fields such as name, identifier, or sequence in FASTA format. Results can be narrowed to show only entries with evidence at protein level or known crystal structures.
3 CONCLUSIONS
TE families have been updated through analysis of the primary structures of all known TE sequences. New families have been proposed, and all sequences and structures are classified into new, or previously identified, families. This system of classification provides a standardized nomenclature and a means to predict the tertiary structure, function, and mechanism of a TE sequence that has not been experimentally characterized. These assertions are supported by family members displaying a high degree of primary and tertiary structural similarity, highly conserved active sites and catalytic residues, and consistent mechanisms. Examination of families that share a fold reveals some similarity in primary and tertiary structures, catalytic residues and active sites, and mechanisms. Convergent and divergent evolution is suggested from phylogenetic analyses of TEs whose structures have the two main structural folds.
4 MATERIALS AND METHODS
For a sequence to be considered a member of a family it must have a strong sequence similarity (~30%), a nearly identical tertiary structure to other structures in the family, and catalytic residues in the same locations as the other members of that family.
The protocol by which the new TE families were identified is described: (a) enzyme sequences experimentally confirmed to have TE activity are gathered and those present in a previously existing family (TE1–TE23) were discarded; (b) each of the remaining TE sequences are independently processed by the Basic Local Alignment Search Tool (BLAST)195 and results were compared with the other sequences' results to identify the representative sequences that will originate new families; (c) the catalytic domains of the representative sequences were processed by BLAST to populate potential new families; (d) the number of shared sequences were counted for all permutations of pairs of potential new families, highly similar families (>15% sequences in common) were merged; (e) intra-family congruity and inter-family uniqueness were confirmed by tertiary structure superimposition, comparison of catalytic residue position and identity, multiple sequence alignments (MSAs), and final examination of shared sequences between all possible pairs of families; and (f) sequences common to multiple families are assigned to the family with the highest sequence similarity.
4.1 Sequence selection and BLAST searches
Enzyme sequences experimentally confirmed to have TE activity were extracted from the Swiss-Prot database in Uniprot, which contains only reviewed sequences and has a higher level of annotation. Possible TEs were identified by a label of EC 3.1.2.1 to EC 3.1.2.32, EC 3.1.2.–, or having “TE” in the description, as well as having “evidence at protein level.” Less stringently verified sequences, like those with “evidence at transcript level” or “inferred from homology,” as well as fragments or theoretical proteins, were disregarded. The primary sequences meeting the criteria, and not in TE1–TE23, were collected, resulting in ~200 new query sequence candidates.
Each of these sequences was subjected to a BLAST search against the National Center for Bio-technology Information's (NCBI) GenBank nr peptide sequence database using the protein–protein algorithm.196 These BLAST searches were completed using a local instance of blast-2.9.0-2 and the nr database, both downloaded from NCBI on a Unix system. Previously, an E-value cutoff of 1 × 10−3 was used12; however, due to the growth of the nr database by ~3 orders of magnitude, an E-value of 1 × 10−7 was used to capture as many sequences with the required similarity as possible while minimizing the number of redundant sequences. The highest Max Target Sequences was used to capture all sequences within an E-value of 1 × 10−7. Other parameters were left at default settings.
BLAST results were compared against each other to check for common sequences and identify the representative sequences that results in the lowest number of BLAST results with no overlapping, common sequences. The query sequences of unique, nonredundant BLAST results become the representative sequences that will originate new families from all confirmed TE sequences. The referenced literature in Uniport is checked to confirm experimental TE activity. The catalytic domain of each of the new representative sequences, identified in Pfam-A,197 were used to populate the prospective new families with BLAST as described above.
4.2 Comparison of tertiary structures
All known tertiary structures in each family was obtained from the Research Collaboratory for Structural Bioinformatics PDB.194 Enzyme tertiary structures were reviewed to exclude fragments, putative proteins, and non-TE domains from multidomain proteins from any structural comparisons.
All monomer structures were extracted, and for each family a reference structure was selected, which served as the pivot around which other monomers were superimposed. The shortest monomer in each family was selected as the pivot to ensure consistent alignment of the core structure and allow for uniform structural similarity calculations. All monomers within each family were superimposed using MultiProt198 with OnlyRefMol set to 1, Scoring set to 2, and all other parameters left at default.
A root mean square distance (RMSD) of the superimposed tertiary structures in each family with more than one structure was done to quantify structural similarity. For RMSD calculations, the distances between corresponding alpha carbon atoms (Cα) from two superimposed structures (pivot and subject) were calculated. A cutoff distance, calculated as the average distance between sequential Cαs in the pivot structure, was used to determine corresponding Cαs between the pivot and subject structures. Any pairs more distant than the cutoff were not considered to be corresponding and were not used in the RMSD calculation. The percentage value (P) of Cαs used to calculate the RMSD implies the significance of the RMSD calculation. For a given family, the pivot structure was superimposed to all other structures, resulting in n – 1 calculations, where n is the number of monomers being compared within that family. For families where n > 2, the average RMSD and p values (RMSDave and Pave, respectively) were calculated.
4.3 Multiple sequence alignments and phylogenetic trees
Phylogeny was initialized by a Multiple Sequence Alignment by MUSCLE v3.8.31199 with default parameters using the amino acid sequences from the TE catalytic domain. A unrooted dendogram was built using MEGA X200 with maximum likelihood as statistical method. The reliability of the tree was estimated by the bootstrap method with 1,000 replicates. The tree was visualized and edited using the FigTree v1.4.4.
ACKNOWLEDGMENT
This work was supported by the U.S. National Science Foundation (Award 2001385).
AUTHOR CONTRIBUTIONS
Benjamin T. Caswell: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); software (equal); validation (equal); visualization (equal); writing – original draft (equal); writing – review and editing (equal). Caio C. de Carvalho: Conceptualization (equal); data curation (equal); formal analysis (equal); investigation (equal); methodology (equal); validation (equal); visualization (equal); writing – original draft (equal). Hung Nguyen: Data curation (equal); investigation (equal); methodology (equal); software (equal); writing – original draft (equal). Monikrishna Roy: Data curation (equal); investigation (equal); methodology (equal); software (equal); writing – original draft (equal). Tin Nguyen: Data curation (equal); funding acquisition (equal); investigation (equal); methodology (equal); project administration (equal); software (equal); supervision (equal); writing – original draft (equal). David C. Cantu: Conceptualization (equal); data curation (equal); formal analysis (equal); funding acquisition (equal); investigation (equal); methodology (equal); project administration (equal); supervision (equal); validation (equal); visualization (equal); writing – original draft (equal); writing – review and editing (equal).