QM/MM modeling of compound I active species in cytochrome P450, cytochrome C peroxidase, and ascorbate peroxidase
Abstract
QM/MM calculations provide a means for predicting the electronic structure of the metal center in metalloproteins. Two heme peroxidases, Cytochrome c Peroxidase (CcP) and Ascorbate Peroxidase (APX), have a structurally very similar active site, yet have active intermediates with very different electronic structures. We review our recent QM/MM calculations on these systems, and present new computational data. Our results are in good agreement with experiment, and suggest that the difference in electronic structure is due to a large number of small differences in structure from one protein to another. We also discuss recent QM/MM calculations on the active species of cytochrome P450, in which a similar sensitivity of the electronic structure to the environment is found. However, this does not appear to explain different catalytic profiles of the different drug-metabolizing isoforms of this class of enzyme. © 2006 Wiley Periodicals, Inc. J Comput Chem 27: 1352–1362, 2006
Introduction
Quantum mechanical methods provide a useful approach for predicting the electronic structure and reactivity of reactive intermediates that are difficult to observe experimentally. This is particularly important for transition metal compounds, in which metal–ligand binding is extremely complex, leading to structures, electronic structures, stabilities, and reactivities that are much more varied and less predictable than for main-group compounds. Density functional theory (DFT) has proved to be a remarkably powerful method for predicting and rationalizing the properties of organometallic and coordination compounds.1 Efficient algorithms and cheap, powerful, computers mean that it is now relatively straightforward to characterize compounds of up to 100 to 200 atoms, which includes many small- to medium-sized compounds. Where the full complex cannot be modeled due to computational restrictions, it is often the case that model compounds can be used, in which substituents are removed without altering the key electronic properties of the metal ligands. Solvent effects can often be represented by using a continuum model, which only leads to modest computational overheads compared to gas-phase calculations.
The situation is more difficult for metalloproteins. Even the smallest of these compounds have more than a thousand atoms, and thereby dwarf the vast majority of coordination compounds. Also, they tend to have a polar, hydrophilic exterior that is not well described in the gas phase, so that a continuum or explicit description of the surrounding water and counterions is required. Unlike many coordination compounds, it is not necessarily easy to construct small models that include all important physical effects. This is because a natural choice for such a model, for example, containing the first ligand sphere around the metal site of interest, neglects the effects of the many polar groups (peptide bonds and some side chains) in the second and higher coordination shells, which can have an effect on the electronic structure of a metal site. For example, the redox potential of the Fe(II)/Fe(III) couple in heme groups of characterized proteins spans an enormous range, from −550 to +362 mV,2 representing a difference of 20.5 kcal/mol in the free energy of oxidation of the ferrous center. Part of this is due to changes in the nature of the fifth (and, where present, sixth) ligand to iron, as well as changes in the structure of the heme group, but even within the group of mono-histidine heme b complexes, the range is still substantial, from −306 to +150 mV.2, 3 As well as these electrostatic effects, it is well known that relatively small changes in tertiary structure can change binding affinities and catalytic efficiency of active sites. Where qualitative changes in the structure of the active site follow from the conformational switch (e.g., a ligand moves away from a metal center), such allosteric effects4 can be treated using cluster models, but this is not always the case. This shows that for many problems in bioinorganic chemistry, a large number of atoms need to be taken into account as they can all have an effect on the properties of the metal site.
Despite this, many groups are successfully using DFT on small- to medium-sized cluster models of metalloenzyme active sites to characterize structure, electronic structure, energetics, and reactivity for a wide range of systems.5, 6 The bulk of the protein is either not included, or treated as a continuum with some effective dielectric constant (usually 4 or thereabouts). The undeniable and impressive success of such computations reflects the fact that first ligand shell effects are dominant even in metalloproteins. Furthermore, where important residues beyond the first coordination sphere are identified, the model can be grown to include them. In principle, efficient linear-scaling algorithms can be used to treat the whole protein and its environment at the DFT level. In practice, however, it is difficult to obtain useful results beyond the limit of 100–200 atoms.
To be able to describe all relevant atoms, one very promising method involves splitting the system into two or more parts, and using different levels of theory to describe each subsystem and the coupling between them. Such hybrid methods can preserve the atomistic and polar character of all important atoms, yet avoid the prohibitive computational expense of including all atoms in the DFT treatment. The most commonly used hybrid method involves a quantum mechanical (QM) treatment (e.g., DFT) of the active site and a molecular mechanical (MM) description of the bulk of the protein and the solvent. This method was first developed by Warshel and Levitt,7 and a variety of different implementations have been proposed;8 a recent review discusses applications to metalloenzymes.9 In nearly all cases, the QM region is polarized by the MM region through the inclusion of a point charge representation of the MM atoms in the QM one-electron Hamiltonian. QM and MM atoms also interact through van der Waals nonbonded terms. Often the QM and MM regions are connected through a covalent bond and a special treatment of the boundary region is required. In one of the most common methods, the MM atom in such bonds is replaced in the QM region by a capping “link” atom, typically hydrogen. Other methods use pseudoatom link atoms or a frozen orbital representation of the linking covalent bond.
As in computational studies of inorganic and organometallic systems, one of the main applications of QM/MM studies has been to understanding structure and reactivity. Both of these properties are very sensitive to the quality of the potential energy surface, and as a consequence, much of the testing carried out for QM/MM methods has focussed on comparing energies predicted with this method with those obtained by accurate QM or MM or experiment. However, like QM methods, QM/MM calculations can also be used to predict many other properties, and these have perhaps received less attention. A number of studies10, 11 have examined QM/MM methods for calculating NMR properties for large systems including biomolecules, generally showing that the chemical shift is well reproduced, provided the QM–MM boundary is not too close to the nucleus being investigated. More recently, a QM/MM approach to calculate EPR and Mössbauer parameters has been developed, and applied to the compound I intermediates of bacterial cytochrome P450cam12 and of horseradish peroxidase.13 In the latter case, experimental data is available, and agreement with experiment is reasonable provided that a fairly large QM region is used.
We are interested in the electronic structure and reactivity of human isoforms of cytochrome P450 and of other heme enzymes, and in using QM/MM methods to address these topics. We are also interested in testing the accuracy of such methods for describing the electronic structure of complicated metalloprotein systems. As discussed below, the Compound I intermediates of the heme enzymes cytochrome c peroxidase and ascorbate peroxidase represent a particularly challenging case for QM/MM methods, as the chemical structure of the active site of these enzymes is essentially identical, yet the electronic structure of the intermediate is strikingly different. We have therefore examined to what extent this difference can be reproduced using QM/MM methods. Our results on these two systems have been published in part previously.14 We present here some further results together with additional tests of the effect of different computational parameters on the calculated electronic structure. We also discuss the previous and new results and assess the accuracy of QM/MM calculations for describing the electronic structure of such metalloprotein active sites. We also briefly discuss our published results on the electronic structure of Compound I in mammalian cytochrome P450 isoforms.15 Finally, we discuss the insight that these calculations give into the effects of the protein on the calculated electronic structure.
Computational Details
Some of the calculations on APX and CcP14 and all those on cytochrome P450 isoforms15 have been published previously, and full details of system setup and of methods can be found in the corresponding articles.
We report here some new calculations on APX and CcP, using very similar methods and setup to those used in the previous work.14 The calculations have been carried out using our own implementation of the QM/MM approach, in which the QM region is polarized by the MM region by including a point-charge representation of the latter in the one-electron Hamiltonian of the former. Where covalent bonds bridge the QM/MM boundary, a link atom method is used to cap unsaturated valencies in the QM region. The QM region (which is defined below) is described with the B3LYP density functional and the 6-31G basis on all atoms (except for iron, where the LACVP ECP and corresponding double zeta basis are used) using the Jaguar package.16 Calculations were carried out using a restricted open-shell ansatz, on the quartet state. The MM region is described using the Tinker package17 and the CHARMM22 forcefield.18 Input generation and output analysis for the two programs, computation of the required coupling terms, and geometry optimization for the QM region are carried out using our own program, QoMMMa.19 The QM/MM models are built by adding solvent to the crystal structures of the resting states of APX (pdb code 1OAF20) and CcP (1CCA21), then carrying out molecular dynamics equilibration of all atoms within a radius of 25 Å around the iron atom using the CHARMM package22 and forcefield.18 Some calculations are also carried out using as a starting point the crystal structure of CcP Compound I (pdb code 1EBE23).
Heme Peroxidases
Heme peroxidases form a large enzyme family present in most life forms, from yeast to humans.24 They are involved in a range of biological processes, such as removal of hydrogen peroxide, lignin degradation, hormone synthesis, and antibacterial and antifungal defense. The heme group is coordinated to the protein through a histidine side chain, and in the resting state, the iron is in its ferric oxidation state. The catalytic cycle is shown in Scheme 1, where the thick horizontal lines represent the heme group. Addition of hydrogen peroxide followed by proton transfer from one oxygen to the other leads to loss of water and formation of an active intermediate, called Compound I, in which the iron is formally in the +IV oxidation state, and the heme ring (or another protein group) is oxidized. Compound I is a triradicaloid state, with three unpaired electrons. Two of these are located on the FeO ferryl center, in two near-degenerate π* orbitals. The third electron is located either on the heme ring or on whichever other protein group is oxidized. The Compound I intermediate can abstract one electron from the substrate, leading to a new FeIV intermediate, Compound II. The latter is protonated twice and transfers another electron either from the same substrate or a second molecule (the order of these steps is not always clear) to return to the resting state.

Two heme peroxidases have recently attracted particular interest. Plant ascorbate peroxidase (APX), whose structure has recently been determined with high resolution, catalyzes the oxidation of ascorbate (vitamin C) and other small aromatic substrates.25 It is structurally related to another heme peroxidase, cytochrome c peroxidase (CcP), with a sequence identity of more than 30% between the two enzymes.26 CcP has been studied extensively for many years, and it was the first peroxidase to have its crystal structure solved.27 It has some characteristics that are not typical of heme peroxidases: first, its redox partner substrate is not a small molecule, but a protein, cytochrome c.28 Next, the third unpaired electron in the Compound I intermediate is not located on the heme group as in APX29 and other heme peroxidases such as horseradish peroxidase,13 but on the side chain of the tryptophan 191 residue located close to the ligating histidine group.30 In most peroxidases, the corresponding residue is instead phenylalanine, whose phenyl side chain is less readily oxidized. It was therefore thought that heme peroxidases with a Trp group in this position would also lead to Compound I species with this electronic structure. This is what makes the case of APX so interesting: it too has a proximal Trp residue, with a very similar position to that of CcP, yet its Compound I intermediate has a “normal,” heme-based, electronic structure.29 This is an example of the type of effect discussed above: the nearest neighbors of the heme group and the Trp side chain are very similar in the two proteins, yet the electronic structure of the intermediate is different.
In our first set of calculations,14 we used a medium-sized QM region, including the heme group without side chains, the imidazole ring of the proximal His ligand, the indole side chain of the proximal Trp group, and an acetate group corresponding to the carboxylate side chain of the Asp residue that forms hydrogen bonds to the NH groups of both of these heterocyclic systems. This QM region is shown in Scheme 2. We note that this set of atoms can conceivably lead to both types of Compound I electronic structure, with the unpaired electron on the heme ring or the Trp side chain (or, as discussed below, delocalized over both groups). Indeed, gas-phase calculations on precisely this model system have been reported,31 and lead to an electronic structure in which the unpaired electron is delocalized over both groups. In the QM/MM calculations, no attempt is made to enforce SCF convergence towards either state.

When starting the QM/MM geometry optimizations from protein conformations derived from the crystal structure of the corresponding ferric resting states of APX and CcP (set-up “A” in ref.14), similar geometries (discussed below) and electronic structures were found for the Compound I species in the two proteins. As expected, two of the three unpaired electrons are situated on the ferryl moiety, in antibonding π* orbitals. The third unpaired electron is delocalized over the heme group and the proximal Trp indole side chain. There is slightly more unpaired electron density on Trp in CcP (ρTrp = 0.83, ρheme = 0.20; ρFeO = 1.95; ρHis = 0.01) than in APX, where there is correspondingly more on the heme group (ρTrp = 0.59, ρHeme = 0.44; ρFeO = 1.95; ρHis = 0.02). However, these calculated electronic structures do not agree with the experimental observations: the third unpaired electron is delocalized over the Trp and heme groups in both enzymes, instead of being completely localized on one in CcP and the other in APX.
This is partly due to the fact that DFT methods tend to artificially favor states in which a single electron delocalizes over an extended region. This error, due to electron self-interaction, explains for example the well-known low barriers calculated with DFT for the H + H2 reaction,32, 33 or the anomalous delocalized states obtained at long range for dissociation of radical–cation systems.34, 35 This last example is highly relevant to that discussed here: the heme iron atom and the carbon atoms of the five-membered ring of the indole group are separated by 6 to 8 Å, and are not covalently bonded, which is a situation where the delocalization error is substantial, with the delocalized state predicted to be as much as 40–50 kcal/mol more stable than the correct localized state in some model cases.35 The neutral and cationic forms of the indole and heme groups have rather similar geometries, and this too favors delocalization.35 It has been suggested that the difference in ionization potentials of two completely separated groups A and B needs to be significantly larger than the unphysical delocalization energy of either fragment before localized solutions A·+ --- B or A --- B·+ are obtained with B3LYP, rather than delocalized Ap+ --- Bq+ (p + q = 1) states.35 It is difficult to assess exactly what the delocalization energy would be in the QM system of Scheme 2, but it should be noted at this stage that the observation of a delocalized state for both proteins may be due purely to computational error.
Setting aside this issue, another possible explanation for the incorrect electronic structures obtained is that the wrong protein conformations, corresponding to the ferric resting state of the two enzymes, have been used. No crystal structure is available for the APX Compound I, but for CcP, two structures have been reported,28, 36 and one of these is available in the Protein Data Bank (entry 1EBE, ref.28). These structures do not involve major changes with respect to the resting state, but do involve some motion of side chains. Because structure 1EBE is at fairly low resolution, and no structure is available for APX, we have not used the CcP Compound I structure in our initial calculations. Instead, starting from the hydrated ferric crystal structure, we have subjected all atoms in a sphere of 25 Å radius around the heme iron atom to molecular dynamics equilibration at 310 K for 100 ps before QM/MM minimization (setup “B” in ref.14). For this equilibration, a Compound I model was used, in which point charges on the central region were chosen to reflect the experimentally characterized electronic structure of Compound I in APX and CcP. In other words, the third unpaired electron was placed on the Trp group in CcP, and on the heme ring in APX. This may somewhat bias the subsequent QM/MM calculations in favor of the observed electronic structures, because the protein environment is expected to relax so as to stabilize the charge distribution used in the MM model. However, as already noted, no explicit bias was included when setting up the QM/MM calculations.
The MD equilibrations on APX and CcP do not lead to any major changes in conformation. However, some minor changes which may be significant do occur. For example, the side chain of Asn 195 in CcP rotates such that the NH2 group donates a hydrogen bond to the backbone carbonyl group of Gly 178, which is fairly close to the proximal Trp side chain. A similar change in orientation of Asn 195 was noted in the recent crystal structure of the Compound I species.36 Another change, which is not observed in the crystal structure, is a reorientation of one of the propionate side chains of the heme group in CcP. This change will be discussed below.
The QM/MM calculations using the modified setup are in better agreement with experiment concerning the electronic structures. For CcP, the third unpaired electron is almost entirely located on the Trp side chain (ρTrp = 0.97, ρheme = 0.07; ρFeO = 1.95; ρHis = 0.01), whereas in APX, it is very largely located on the heme group (ρTrp = 0.21, ρheme = 0.80; ρFeO = 1.96; ρHis = 0.03).14 The distribution of the unpaired electron within each group is also consistent with experiment: in APX, it is in an a2u-like π orbital on the porphyrin group, whereas in CcP, it is in a π orbital on the indole group, largely situated within the five-membered nitrogen-containing ring. Using this setup, we tested the effect of a number of computational settings, such as the nature of the DFT functional used and the size of the basis set and found them to make only very small changes to the results.
In the present work, we have also carried out QM/MM calculations using a larger QM region (but the same system setup) in which all of the heme side chains are included (this QM region is shown in Scheme 3). This QM region has a total charge of −1, as opposed to +1 for the smaller model, due to the inclusion of the two negatively charged carboxylate groups. These calculations lead to similar results as with the smaller QM region, except that slightly more oxidation occurs in both enzymes on the heme group (ρTrp = 0.94, ρheme = 0.11 for CcP, and ρTrp = 0.07, ρheme = 0.94 for APX). The alkyl and vinyl groups are slightly electron donating, and this may explain why the heme group is slightly more oxidized, although this is in any case a small effect. We note that the calculated electronic structure involves no unpaired electron spin density on the heme propionate side-chain carboxylate oxygens in APX or in CcP. In the case of the Compound I species of bacterial cytochrome P450, QM/MM calculations in another group led to significant spin density corresponding to the third unpaired electron on one of the carboxylate groups.37 It was suggested that this plays a role in tuning the reactivity of the active species. Other groups, however, have not found any spin density on these oxygen atoms.38 In APX, the substrate, ascorbate, binds to the protein through hydrogen bonds to the heme side-chain propionate groups.20 This suggests that electron transfer may involve electrons moving through the propionate groups, and it would have been of particular interest had some unpaired electron density been found on these groups. It is possible that calculations in the presence of bound substrate would indeed lead to spin localization on the propionate sidechains, but we have not yet carried out such calculations.

Taking into account the fact that the B3LYP method presumably continues to erroneously exaggerate the degree of delocalization, both sets of calculations, with the QM regions of Schemes 2 and 3, effectively correspond to complete localization, on Trp in CcP and on the heme group in APX. This is in very good agreement with experiment. Considering the tendency of DFT to delocalize charge, the preference must also be a fairly large one. As there is very little structural relaxation in either the heme or Trp groups upon oxidation, studies of model systems35 suggest that the difference of ionization potential of the heme and indole groups must in each case be of close to or more than 10 kcal/mol for such near-localized solutions to be found.
In our published work,14 we confirmed this by calculating oxidation potentials for the two groups using the same starting geometry as used for the QM/MM optimization, but a different setup. First, MM point charges for the surface groups of the two enzymes were modified so as to obtain an overall charge of −1 for the system, as opposed to −6 and −8, respectively, for APX and CcP using standard MM point charges. This reflects the long-range effects of solvation and counterions, which are only partly accounted for in a medium-sized QM/MM model. The change in overall charge changes the electrostatic potential in the region occupied by the QM region, and hence affects the calculated ionization potentials. It does not, however, introduce significant changes in the electric field across the QM region, so that QM/MM calculations using the modified charges and the full QM region of Scheme 1 lead to almost exactly the same calculated electronic structure as using the normal charges. The QM/MM calculations of the ionization potentials use smaller QM regions: for ionization of the Trp side chain, only the indole ring was treated at the QM level, using MM charges on the heme group suitable for a reduced, Compound II-like, species. For oxidation of the heme group, only the porphyrin ring (without substituents) and the proximal imidazole ring of the ligating His were treated QM, with MM parameters for neutral Trp used.
With these settings, the ionization potentials are calculated to be 85.2 and 64.6 kcal/mol, respectively, for the heme and Trp groups in CcP, and 73.9 and 79.4 kcal/mol in APX.14 As can be seen, in each case, the ionization potential calculated for the group that is present in oxidized form in the Compound I species is lower than for the other group. The difference is substantial, of 20.6 kcal/mol in CcP and 5.5 in APX. This also corresponds to a relative change in redox potentials of 26.1 kcal/mol, or 1 eV, between the two proteins! This confirms that the environment of the protein has a large effect on the oxidation potentials of both groups: the heme group is easier to oxidize in APX than in CcP, and the Trp group is easier to oxidize in CcP than in APX. Clearly, the protein environment has been tuned to stabilize the “correct” electronic structure of the respective Compound I species.
Perhaps the most interesting question, however, is how the proteins achieve this remarkable change. As already mentioned, the structure in the immediate environment of the heme group is very similar in the two proteins. However, it has been noted25 that a cation-binding site is present in APX roughly 8 Å from the proximal Trp group, which is absent in CcP. When an analogous cation binding site was engineered into CcP by site-directed mutagenesis, the enzyme did not form a stable Trp 191 radical in the presence of potassium ions.39, 40 Based on this evidence, it was proposed that the metal binding site in APX prevents the formation of a Trp cation radical by electrostatic destabilization. However, an ascorbate peroxidase structure from soybean without a bound cation in its substrate-free form was crystallized recently.26 This enzyme still forms a porphyrin cation radical,41 which indicates that the absence of a cation is not sufficient to allow the formation of a stable Trp radical. Three methionine side chains and their backbone carbonyl groups in the vicinity of the Trp group in CcP have also been suggested42, 43 to stabilize the charged indole ring in CcP Compound I. Mutants of CcP in which these groups were modified to the corresponding groups in APX exhibited a lesser stability of the Trp radical cation.43 Very recently, a mutant of APX in which three methionine groups have been introduced into these positions exhibits reduced stability of the heme positive charge, and properties more similar to CcP Compound I.44
We have investigated14 the effect of these groups by “turning off” the corresponding point charges in the QM/MM calculations, either separately or several at a time. For APX Compound I, zeroing the charge on the metal ion and the carboxylate to which it is bound leads to a very small change of charge on Trp, from 0.21 unpaired electrons to 0.24. Zeroing the charges on the side chain of Met 230, the closest of the three Met residues in CcP, reduces the spin density on Trp from 0.97 to 0.94. Turning off the charges on the side chain of Met 231 also decreases the density further, to 0.93. This is, however, a small change, unlikely to account alone for the difference in the two enzymes. When the charges on the backbone carbonyl group of Met 230 as well as on the side chain are neutralized, a bigger effect is obtained (ρTrp = 0.83) but this is not a physical change as the mutant will still have a peptide bond. In short, these calculations suggest that the electrostatic effects of these groups are quite small, and cannot explain on their own the full difference in electronic structure. Note that our calculations with modified charges were not carried out using molecular dynamics equilibration first. While crystallographical evidence shows that the CcP mutants in which Met groups have been changed have a rather similar structure to the wild-type enzyme,43 it is likely that changing the groups leads to small changes in tertiary structure that modify the electrostatic potential, and that it is these changes in conformation that account for the different properties of the mutants, rather than the direct difference in polar character of the corresponding side chains.
One obvious point to analyze is whether the electrostatic potential created by the protein environment at different positions within the QM region is different. We have computed this potential here, using the modified MM point charges in which the charged groups at the surface of the protein have been neutralized to account for solvation and counterions. As already noted, these point charges lead to the same electronic structure as the full set of charges, suggesting that the electric field (change in electrostatic potential) in the QM region must be fairly similar in the two cases. We first consider the potential at the Trp indole ring of both enzymes. Because this group is mostly oxidized within the five-membered N-containing ring, we have calculated the average electrostatic potential created by the MM charges at the positions of the five nuclei of this ring. The five values are in each case narrowly spread around the average value (in atomic units) of −0.035 in APX and of −0.075 in CcP. This difference would lead to a predicted difference in ionization potential of ca. 24 kcal/mol using simple electrostatics and neglecting polarization effects. This value is of the same order of magnitude as obtained in the explicit QM/MM ionization potential calculations. The spread within the set of electrostatic potentials at the nuclei of the heme ring is larger. Nevertheless, there is still a clear differential effect of the protein environment. At the site of the iron atom, the potential is −0.046 in APX vs. −0.002 in CcP, and the average over the values at the positions of the iron, the four heme nitrogens, and the four meso carbons (the latter eight atoms are those with the largest contribution for the a2u-type orbital, which is oxidized in APX), is of −0.050 in the APX environment, vs. −0.013 in CcP. This corresponds to a rough change of ionization potential for the heme group of 23 kcal/mol.
The magnitude of these effects should not perhaps be taken too literally, due to the fact that these quantities are calculated for a single geometry, and do not take into account protein dynamics. Also, the treatment of long-range electrostatics is certainly somewhat unsatisfactory. Nevertheless, the effect calculated here is so large that it clearly shows that these two enzymes have evolved to create electrostatic potentials in their active site which lower the oxidation potential of the heme group in APX, and of the Trp side chain in CcP.
Although these calculations confirm that electrostatic effects are key in defining the electronic structure of the Compound I species, they still do not directly address the broader question of which structural features are responsible for the difference. This question has been the object of extensive theoretical and experimental work, on CcP and APX26, 42-44 as well as on other proteins. Theoretical work has mainly used electrostatic potential evaluation, using either static protein structures3 or averaging over many conformations.45, 46 This work has led to the recognition of several important factors affecting the redox potentials of oxidizable sites in proteins, including the destabilizing effect of creating charged groups in a low dielectric hydrophobic environment,3 variability in the screening of charged groups,3 and the fact that the bulk of most changes in potential are a cumulative effect of the structure of all groups in the protein, as well as the solvent, and so are a property of the whole protein rather than a single group.47 For CcP and APX, all these effects were found to be important in defining the relative redox potential of the proximal Trp side chain.42 For heme proteins, shielding of the propionate side chains is considered to play an important role in defining redox potentials based on computations.3 Experimentally, as already discussed, various mutants of APX and CcP have been prepared and their active species characterized. Extensive work has also been done on examining the role of overall structure, metal ligands, close-lying charged groups, solvent exposure, and other factors, on redox potentials of heme groups.2, 48 Recently, it has been shown that modified reactivity in a cytochrome P450 mutant in which a nonpolar phenylalanine residue is modified to another group is due to changes in the geometry of the heme side chains, especially the vinyl and propionate groups.49
In this respect, analysis of the final QM/MM structures shows one interesting aspect relating to the shielding of the propionate side chains (Fig. 1). In CcP (shown on top), the negative charge on the propionate side chains is not counterbalanced by positive groups—although the carboxylates interact with positively charged His181, the latter also forms a salt bridge with the negatively charged Asp37. In APX, however, there is the positively charged Arg172 that interacts with the propionates. This difference was noted by Poulos et al.49 in their recent article, and a mutant was prepared in which Arg172 of APX was changed to a neutral asparagine. The Compound I species in this mutant does not display fully heme-centered character, suggesting that the positive charge plays a role in stabilizing the Compound I. It should finally be noted that under physiological conditions, the Compound I species in APX will probably only be formed in the presence of bound substrate, ascorbate, and that as noted above, the crystal structure of the ascorbate complex with the APX resting state20 shows that ascorbate binds by forming hydrogen bonds to the propionate oxygen atoms. This will affect the electrostatic environment of the heme and Trp groups. However, the spectroscopic data which is relevant for our work refers to APX Compound I in the absence of ascorbate.

Optimized QM/MM geometry of the heme group in CcP (top) and APX (bottom), showing the heme group with all side-chains, the proximal histidine and tryptophan side-chains, and the charged groups in the vicinity of the propionate carboxylate groups.
We note that the propionate carboxylate groups, as well as being close to the heme ring, are also very close to the Trp side chain. For example, in CcP, the distance from the carboxylate carbon atoms of the two propionates to the iron atom is 7.6 and 7.8 Å. The carbon atom in the closer of the two carboxylates is only 5.1 Å from the closest carbon of the Trp group, and the distance to the closest atom in the oxidized five-membered ring of the indole is only 7.3 Å. As can be seen in the Figure, the conformation of one of the propionate groups in CcP is different from that of the others, due to rotation around the ChemeCH2 bond, placing the carboxylate “under” the heme group. In the ferric resting state, the carboxylate is above the plane; this change occurs during MD equilibration as noted above. The crystal structure36 of CcP Compound I does not show this rotation, but inspection of the structure suggests that both conformations should probably be accessible and may be present. In APX, the distance between the propionate carboxylate groups and the Trp side chain is slightly larger, with the smallest C–C distance being 6.1 Å. Clearly, if shielding of the propionate side chains can affect the redox potentials of heme groups, then inspection of the structures suggests that it could also affect the redox potential of the Trp group in APX and CcP. The greater proximity to this group and lesser shielding of the carboxylates in CcP might partly explain the preferred oxidation of the indole side chain.
In summary, our QM/MM calculations correctly reproduce the electronic structure of the Compound I intermediates in CcP and APX. As the QM region is the same in both calculations, this must be mainly due to differences in the electric field across the QM region created by the protein bulk. Analysis of the structures suggests that screening of the propionate may play a role in explaining this. However, given that we consider only one static snapshot, and do not include a proper representation of the long-range electrostatics, this conclusion is only provisional.
The discussion above focuses on the electronic structure of the Compound I species. We have not mentioned the details of the structures derived from QM/MM optimization. Some of these details are discussed in our previous article;14 we will simply note some aspects that could potentially relate to the question of electronic structure. First of all, all our calculations lead to a structure in which the proximal Trp side chain is not deprotonated, but forms a strong hydrogen bond to one of the oxygen atoms of the proximal Asp carboxylate. The other Asp oxygen, in turn, forms a strong hydrogen bond to the iron-ligating imidazole side chain of histidine. In CcP, in which the indole group is cationic, this network of hydrogen bonds is slightly stronger, but the protons remain on the same atoms. This finding agrees with previous QM-only calculations at the DFT level.31
We have also investigated the protonation state of the ferryl oxygen atom in Compound I of CcP. In this enzyme, because the third unpaired electron is situated on a site fairly remote from the heme group, the Compound I species effectively behaves like a reduced Compound II species. There has been recent X-ray crystallographical evidence for Compound II species of related heme enzymes including horseradish peroxidase50 and myoglobin51 suggesting that the ferryl oxygen atom is protonated. For myoglobin Compound II, a combined X-ray crystallographical and computational study also supported a protonated structure.52 X-ray absorption spectroscopy of chloroperoxidase provides very strong evidence that the ferryl oxygen in Compound II is basic and therefore protonated.53 This enzyme, however, is different from the others discussed here in that the iron atom is bound to a cysteinate side chain, rather than a histidine, and may therefore have different acid-base properties. In contrast to these suggestions, three extended X-ray absorption fine structure (EXAFS) measurements indicate a short FeO bond for Compound II in myoglobin54 and HRP.53, 55
Using the QM model of Schemes 2 or 3, protonation of Compound I in CcP is only possible if we add an extra proton, and this proton cannot be removed by any of the neighboring groups. Accordingly, we have carried out14 additional calculations using a modified system set-up (“C” in ref.14) and the enlarged QM region shown in Scheme 4. As well as the heme group, and Trp, Asp and His side chains, this includes the side chain of His 52 in the distal pocket, and a water molecule, which is found to hydrogen bond to the ferryl oxygen atom during molecular dynamics equilibration. Unlike in the previous calculations, in which His 52 was modeled in a neutral form, the imidazole side chain in these calculations was in its protonated form, and various QM/MM geometry optimizations were carried out using different starting geometries, including ones in which the proton on the histidine is transferred to the ferryl oxygen. Geometry optimization leads in each case back to the deprotonated form. However, optimization with a constraint such that the proton lies on the ferryl oxygen does not lead to a significantly higher energy, so that the pKA of this group and of the distal histidine are probably fairly close to each other. As for a histidine group, then, the protonation state of the ferryl oxygen may vary depending on the conditions, and this may help to explain the conflicting experimental data mentioned above.

In the present context, perhaps the most noteworthy aspect of these calculations is that they too lead to an electronic structure in which the third unpaired electron of the Compound I is located in large part (ρTrp = 0.90) on the Trp side chain. This shows that the residues in the distal pocket do not play an important role in defining the electronic structure of this species.
The above calculations were carried out using a starting structure derived from the resting state of CcP, after hydration and MD equilibration. In the corresponding QM/MM optimized structure of the protonated form of Compound I, the N–H proton on a tryptophan side chain (Trp 51, see Fig. 2) is in a position to provide a favorable interaction with the ferryl oxygen (rH–O = 2.39 Å). However, the side chain of another protic group in the distal pocket, Arg 48, adopts a conformation that does not enable it to form a H bond to the hydroxo group. The distance between the proton of the NH group on the Arg 48 side chain and the ferryl oxygen is as large as 4.68 Å. In the high-resolution X-ray structure of CcP Compound I (CcP-3),36 this residue is much closer to the iron, with a distance between the N on the Arg end-group and the ferryl oxygen of 2.76 Å, compared with 4.62 Å in our calculated geometry. This recent structure is currently not available in the Protein Data Bank, but there is a lower resolution structure (1EBE23) for CcP Compound I, in which Arg 48 is also fairly close to the ferryl group. Because the position of these protic groups may play a role in stabilizing the protonated form of the Compound I, indeed a change in the position of Arg 48 was observed36 between the resting state and Compound I, it is important to probe alternative geometries.

Optimized QM/MM geometry of the heme group and surroundings in CcP. The starting structure was derived from a constrained MD simulation in which Arg 48 was placed at a fixed distance from the ferryl oxygen.
In this work, we have therefore carried out new calculations on CcP Compound I, starting from structure 1EBE. The system was set up as in setup “C” of our previous work.14 However, during molecular dynamics equilibration Arg 48 moved away from the ferryl group. This equilibration was therefore repeated with a constraint such that the Arg 48 –ferryl O distance was fixed at 2.76 Å as in the recent crystal structure. As in the previous calculations, a water molecule inserted between the ferryl oxygen and the distal histidine during equilibration. After this setup, QM/MM geometry optimization was carried out using the QM region shown in Scheme 4, and leading to the optimized structure shown in Figure 2. The geometry obtained includes hydrogen bonding between both Arg 48 and Trp 51 and the ferryl oxygen (rH(Arg)–O = 1.86 Å and rH(Trp)–O = 1.98 Å), but no proton transfer occurs to the ferryl oxygen, and the FeO bond remains short (1.67 Å). This shows that the geometry used in our previous calculations,14 despite having Arg 48 situated further from the ferryl oxygen than in the crystal structure of ref.36, is not biased against proton transfer to the ferryl oxygen. In turn, this again suggests that protonation of the ferryl oxygen is at the very least not highly favorable, although it may be possible under certain conditions.
The electronic structure obtained in this calculation is rather similar to that obtained in the other calculations on CcP Compound I. Two unpaired electrons are located on the ferryl group, and one mainly on the proximal Trp side chain (ρTrp = 0.81, ρheme = 0.25; ρFeO = 1.93; ρHis = 0.01). Unlike in the previous calculations, both propionate side chains have the same orientation, with the carboxylate groups positioned “above” the heme ring (compare top part of Fig. 1 with Fig. 2). This shows that the orientation of these propionate side chains is not a decisive factor in establishing the electronic structure of CcP Compound I. The ionic environment of these groups may, however, be important, as discussed previously—the closest carboxylate group is still very close to the proximal Trp side chain.
Cytochrome P450 Enzymes
Cytochrome P450 (Cyp) enzymes56 play a major role in the oxidative metabolism of exogeneous hydrophobic chemicals, for example, drug compounds, and thereby contribute significantly to important pharmaceutical properties of drugs such as their bioavailability. Like the peroxidases, the key intermediate in the oxidative catalytic cycle is a high-valent Compound I species. The heme group in Cyp enzymes is bonded to the protein through a cysteinate side chain rather than the histidine group found in peroxidases, and this accounts in part for the different chemistry. Extensive computational work has been carried out on the electronic structure of the active species and its reactivity, using both gas-phase models6, 57, 58 and a QM/MM treatment of the bacterial isoform, P450cam.6, 12, 37, 38 This work has led to an improved understanding of the electronic structure of the Compound I intermediate, of the reaction mechanisms involved in its formation and its reactions with substrates, and of the reactivity of different substrates.
One of the most interesting aspects of the chemistry of the CYP enzymes is their selectivity patterns. Different isoforms can oxidize the same substrates in different positions, and different substrates can undergo different reactions with a given isoform. For example, the drug molecule diclofenac is oxidized in different positions by the human P450 isoforms 2C9 and 3A4.59 There can, of course, be many reasons that explain this difference in selectivity, including aspects such as substrate binding in the active site, and the shape of the active site in the vicinity of the heme group.
One other possibility is of particular significance given the topic of this study. Like the Compound I species in peroxidases, the electronic structure of P450 Compound I species is somewhat variable depending on the environment. This again concerns the position in which the third unpaired electron is situated. Calculations on model compounds in the gas phase lead to this electron being situated almost equally on the proximal cysteinate sulphur atom and the a2u orbital of the heme group. Only upon adding the hydrogen bonding groups close to this sulphur atom, or by carrying out QM/MM calculations, does the third electron localize mainly on the heme ring. This change in electronic structure depending on the environment has led to Compound I being called a “chameleon” species.60 QM/MM calculations also show that the electronic structure is responsive to the details of the environment.38 Finally, calculations on gas-phase models, using an external electric field to mimic the possible effects of the protein environment, have shown that the reactivity of Compound I can be changed by the environment.61 Depending on the orientation of the electric field, the electronic structure of the Compound I varies somewhat, and the relative height of the calculated energy barriers for CH bond activation and epoxidation of propene is changed.
Based on all these observations, it is possible that the different selectivity profiles of the different P450 isoforms are due to a different electronic structure of the Compound I intermediate. The work described in the previous section makes this possibility highly plausible: although the different isoforms have a similar structure in the immediate vicinity of the heme group, this is true for CcP and APX also, yet their Compound I species are very different. To explore whether several typical P450 isoforms have a different electronic structure, we have carried out QM/MM calculations using a very similar approach to that used for CcP and APX on two human isoforms (2C9 and 3A4), one further mammalian form (2B4), and the bacterial P450cam, which had already been studied by other groups.37, 38
As for the calculations on CcP and APX, our calculations15 first of all examined the effect of a number of computational parameters on the calculated electronic structure, focusing initially on P450 2C9. We examined the effect of varying the size of the QM region, the size of the basis set, the density functional, and the treatment of the nonbonded van der Waals terms at the QM/MM boundary. These parameters only changed the electronic structure in a minimal way. Using a large QM region similar to that shown in Scheme 3 did not lead to any unpaired electron density on the heme side-chain propionate groups, unlike in one previous QM/MM study,37 but in agreement with another one.38 We also found that the presence of substrate does not affect the calculated electronic structure. One parameter that does change the calculated electronic structure is the details of how the calculations are set up prior to QM/MM optimization. For example, use of the hydrated crystal structure, or of a structure obtained after MD equilibration, led to notable differences in the unpaired electron density on the heme group and cysteinate sulphur. The spin density on sulphur can vary from ca. 25 to 50%! Using different snapshots from the MD equilibration also led to changes in the spin density on sulphur. This confirms that Compound I in cytochrome P450 is indeed a chemical chameleon, which can change electronic structure depending on the environment.61
However, when we then computed the electronic structure for the other P450 isoforms 3A4, 2B4, and P450cam,15 we found that the differences from one enzyme to another were by no means as large as the differences from one calculation to another for a given isoform. The thermal variation in the structure of the enzymes leads to at least as much variability in the electronic structure as does the chemical difference from one isoform to another. The only exception is that P450cam appears to have a slightly lower degree of delocalization of the third unpaired electron onto the sulphur atom, due to slighter stronger hydrogen bonds being formed to this atom.
It is, of course, possible that the different electric field effects created by the different enzymes will cause a greater change in electronic structure, and hence, in reactivity at the transition states for oxidation of substrates. As a consequence, it is not possible to completely rule out that such effects are responsible for the different selectivity of the different isoforms. However, the results summarized here suggest that unlike in CcP and APX, there is no large change in electronic structure in Compound I from one P450 isoform to another.
Conclusions
QM/MM calculations can now be applied routinely to calculating the properties of large metalloproteins, using relatively high levels of QM theory, typically DFT. As for molecular species, this level of theory is expected to predict accurately the electronic structure of the metal centre in these proteins. The QM/MM calculations also enable one to explore effects that are specific to biomolecules, and in particular, the polarizing effects caused by the electric field created by the protein in the active site.
In this article, we review our recent calculations on two different types of heme protein, the heme peroxidases APX and CcP, and the cytochrome P450 enzymes, and also present some new results on the APX and CcP cases. This first set of enzymes provide a very striking example of how the protein environment can have a significant influence on the electronic structure of a reactive intermediate. The QM/MM calculations reproduce the experimental electronic structures of Compound I in CcP and APX, despite the fact that the QM region in these two sets of calculations is identical. This conclusion appears to be fairly robust, as many different calculations basically lead to the same conclusion. This is a remarkable observation, because due to a shortcoming in current DFT methods, which favor delocalized solutions over localized ones, one might expect that both enzymes would lead to a similar, delocalized wavefunction in which the third unpaired electron in the Compound I species is located partly on the heme group, partly on the Trp side chain. That this is not the case shows that the energy preference in each enzyme for the observed wavefunction must be significant, probably at least 5 or 10 kcal/mol.
The calculations do not provide a clear answer as to why the two enzymes share a common intermediate but with such different properties. This is clearly mainly an electrostatic effect, as is shown by calculating localized ionization potentials, and by computing the electrostatic potential created by the protein in different regions of the active site. The effect cannot easily be attributed to any single group, however. As well as aspects that have been noted before, such as the metal ion binding site in APX and the Met side chains in CcP, we suggest that the different hydrogen bonding environment around the heme propionate groups in the two enzymes may be one contributing factor.
Our calculations include only a small number of separate geometries, and sampling a broader range of conformations would be important. Also, we use relatively limited QM/MM models (∼7000 atoms), which cannot describe the long-range electrostatic effects of solvation and counterions, and it would also be desirable to extend the models to test the stability of the conclusions. Finally, it would be desirable to carry out improved QM/MM calculations using improved DFT methods that do not suffer from an artefactual preference for delocalized unpaired electrons, or correlated ab initio methods.
Despite these uncertainties, our calculations show that electrostatic effects can be important in determining the electronic structure of reactive intermediates in metalloenzymes. This “chameleonic” character of Compound I in particular is also found in cytochrome P450, where protein fluctuations lead to changes in the calculated electronic structure. However, no systematic difference is observed from one P450 isoform to another, suggesting that this effect does not play a major role in determining selectivity in these enzymes.
Acknowledgements
The authors thank J. W. Ponder for the Tinker molecular modelling package.