Volume 99, Issue 12 pp. 955-968
Invited Review
Full Access

A historical account of hoogsteen base-pairs in duplex DNA

Evgenia N. Nikolova

Evgenia N. Nikolova

Department of Chemistry & Biophysics, The University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109-1055

Integrative Structural & Computational Biology Department, The Scripps Research Institute, 10550 North Torrey Pines Road, La Jolla, CA, 92037

Search for more papers by this author
Huiqing Zhou

Huiqing Zhou

Department of Chemistry & Biophysics, The University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109-1055

Search for more papers by this author
Federico L. Gottardo

Federico L. Gottardo

Department of Chemistry & Biophysics, The University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109-1055

Search for more papers by this author
Heidi S. Alvey

Heidi S. Alvey

Department of Chemistry & Biophysics, The University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109-1055

Search for more papers by this author
Isaac J. Kimsey

Isaac J. Kimsey

Department of Chemistry & Biophysics, The University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109-1055

Search for more papers by this author
Hashim M. Al-Hashimi

Corresponding Author

Hashim M. Al-Hashimi

Department of Chemistry & Biophysics, The University of Michigan, 930 North University Avenue, Ann Arbor, MI, 48109-1055

Correspondence to: Hashim M. Al-Hashimi; e-mail: [email protected]Search for more papers by this author
First published: 02 July 2013
Citations: 93

This article was originally published online as an accepted preprint. The “Published Online” date corresponds to the preprint version. You can request a copy of the preprint by emailing the Biopolymers editorial office at [email protected]

ABSTRACT

In 1957, a unique pattern of hydrogen bonding between N3 and O4 on uracil and N7 and N6 on adenine was proposed to explain how poly(rU) strands can associate with poly(rA)-poly(rU) duplexes to form triplexes. Two years later, Karst Hoogsteen visualized such a noncanonical A–T base-pair through X-ray analysis of co-crystals containing 9-methyladenine and 1-methylthymine. Subsequent X-ray analyses of guanine and cytosine derivatives yielded the expected Watson–Crick base-pairing, but those of adenine and thymine (or uridine) did not yield Watson–Crick base-pairs, instead favoring “Hoogsteen” base-pairing. More than two decades ensued without experimental “proof” for A–T Watson–Crick base-pairs, while Hoogsteen base-pairs continued to surface in AT-rich sequences, closing base-pairs of apical loops, in structures of DNA bound to antibiotics and proteins, damaged and chemically modified DNA, and in polymerases that replicate DNA via Hoogsteen pairing. Recently, NMR studies have shown that base-pairs in duplex DNA exist as a dynamic equilibrium between Watson–Crick and Hoogsteen forms. There is now little doubt that Hoogsteen base-pairs exist in significant abundance in genomic DNA, where they can expand the structural and functional versatility of duplex DNA beyond that which can be achieved based only on Watson–Crick base-pairing. Here, we provide a historical account of the discovery and characterization of Hoogsteen base-pairs, hoping that this will inform future studies exploring the occurrence and functional importance of these alternative base-pairs. © 2013 Wiley Periodicals, Inc. Biopolymers 99: 955–968, 2013.

INTRODUCTION

In 1953, sixty years ago, Watson and Crick proposed their iconic double helix structure for deoxyribonucleic acid (DNA) based on very little experimental data.1 Although the structure is most known for its double helical appearance, its most important feature was, and remains to this date, the specific pairing of purine with pyrimidine nucleobases—guanine with cytosine and adenine with thymine—through complementary hydrogen bonds (Figure 1).11 This endowed the structure with the ability to self-duplicate, making DNA, and not proteins as was widely believed at the time, the likely carrier of genetic information.2 Despite the absence of any experimental data in support of the specific pairing proposed by Watson and Crick, and despite the fact that there are alternative modes for pairing purines with pyrimidines, the pairing proposed by Watson and Crick utilized bases in their most probable tautomeric forms and, most importantly, resulted in similar overall shapes for all four base-pair combinations, so that any sequence could be accommodated within the same double helix framework.

Details are in the caption following the image
Chemical structures of A–T and G–C Watson–Crick (WC) and Hoogsteen (HG) base-pairs.

Although the discovery of the double helix set in motion one of the greatest scientific revolutions, the structure itself was met with a good deal of skepticism. The available X-ray fiber diffraction data obtained on noncrystalline DNA fibers, particularly B-form DNA, did not provide adequate resolution to determine atomic positions. This is because molecules in the fiber are generally not rotationally oriented relative to one another in a regular manner. Indeed, this was the main reason Rosalind Franklin pursued the more complicated diffraction pattern presented by the dry “A-form” version of DNA,3, 4 where the molecules are not in random rotational orientations, allowing for a more objective 3D crystallographic analysis and where one could, in Franklin's own words, ‘let the data speak for itself'.

While the story of the double helix is well known to scientists and nonscientists alike, it is not commonly known that definitive proof for the DNA double helix structure did not come until 1980 – more than a quarter century after Watson and Crick initially proposed their model – when Drew, Dickerson and coworkers solved the single crystal structure of a DNA dodecamer using heavy atom X-ray crystallography.5, 6 In the ensuing period, experimental evidence began to accumulate for an alternative base-pair, referred to now as the “Hoogsteen” base-pair (Figure 1),7, 8 which, together with other alternative structures of DNA such as left-handed Z-DNA,9 raised doubts about the B-form structure proposed by Watson and Crick. Today, there is little doubt that Hoogsteen (HG) base-pairs do indeed represent an alternative pairing scheme that can expand the structural and functional versatility of duplex DNA beyond that which can be achieved based only on Watson–Crick (WC) base-pairing. The purpose of this review is to provide a historical account of the discovery and characterization of HG base-pairs, hoping that this will inform future studies exploring the occurrence and functional importance of these alternative base-pairs.

PURINE-PYRIMIDINE CO-CRYSTALS

Soon after Watson and Crick proposed their double helix structure, experimentalists rushed to gather data to test its various aspects. Improvements in X-ray cameras and analytical methods for refining models to fit X-ray fiber diffraction data in the following years provided additional evidence in support of the general features of the DNA double helix model. However, fine details of the structure, including the specific base-pairs proposed by Watson and Crick, could not be assessed. Throughout the late 1950s and into the 1970s, much effort was directed toward solving X-ray structures of isolated purine-pyrimidine dimers. The idea was that the monomers might associate to form intermolecular complexes that reflect the pairing that occurs in the double helix. The high-resolution diffraction data afforded by single crystals allowed for an objective characterization of hydrogen bonding interactions between the bases.

The first such study was reported in 1959, when Karst Hoogsteen – an associate of Robert Corey at Caltech – used single crystal X-ray analysis to determine the structures of co-crystals containing 9-methyladenine and 1-methylthymine, where methyl groups were used to block hydrogen bonding to nitrogen atoms otherwise bonded to sugar carbons in DNA.7 Rather than observing a Watson–Crick base-pair, Hoogsteen observed a markedly different pairing scheme (Figure 1), in which the adenine base was flipped upside down. In DNA, such a flip is accomplished by a 180-degree rotation of the adenine base around the glycosidic bond (N9–C1′), changing the base from an anti to a syn conformation. As in WC base-pairs, the thymine base formed two hydrogen bonds with the adenine base, one of which (thymine O4 and adenine N6) is identical to that proposed by Watson and Crick. However, the second hydrogen bond is not between thymine N3 and adenine N1, but, rather, between thymine N3 and N7 of the flipped adenine base (Figure 1). This very same hydrogen bonding scheme was proposed 2 years earlier by Rich and his colleagues to explain how poly(rU) strands might associate with poly(rA)-poly(rU) duplexes to form triplexes.10 Hoogsteen recognized that relative to the scheme proposed by Watson and Crick, this hydrogen bonding scheme required translation of the complementary bases into closer proximity, which requires constriction of the DNA helix diameter by ∼ 2.5 Å.7, 8

In 1963, 1 year after Watson and Crick were awarded the Noble Prize in Physiology or Medicine, single crystal structures were reported for G–C base-pairs using crystals containing 9-ethylguanine and 1-methylcytosine or 1-methyl-5-bromocytosine.11 WC base-pairing was observed in the two cases, even though the two intermolecular complexes crystallized in different space groups and experienced different packing arrangements. As predicted earlier by Pauling and his colleagues,12 G–C base-pairs were stabilized by three and not two hydrogen bonds as proposed by Watson and Crick.1, 2 Subsequent X-ray diffraction analysis of co-crystals involving guanine and cytosine derivatives consistently formed the expected WC base-pairs.13 However, in sharp contrast, attempts to generate co-crystals of adenines and thymine (or uridine) derivatives failed to yield WC base-pairs and, in most cases, favored “Hoogsteen” base-pairs.13-15

In 1968, Guschlbauer and colleagues proposed the formation of G–C+ HG base-pairs in poly(dG)-poly(dC) at pH 3–4 based on optical rotatory dispersion spectra suggesting guanine adopted a syn conformation.16 As in A–T base-pairs, the transition from WC to HG entails flipping of the guanine base from an anti to a syn conformation and the preservation of hydrogen bonding between cytosine N4 and guanine O6. However, unlike A–T base-pairs, formation of a second hydrogen bond with N7 of the flipped guanine requires protonation of cytosine N3. The transition results in a net loss of one hydrogen bond and the build-up of a positive charge on the cytosine. Because of this, protonated G–C+ HG base-pairs were generally thought to be less energetically favored than A–T HG base-pairs and more stably formed at lower pH.17 G–C+ HG base-pair were subsequently used to explain how poly(rC) associates with poly(dC)-poly(dG) duplexes to form triplexes under acidic conditions,18 and NMR studies provided chemical shift evidence for protonated G–C+ HG base-pairs at cytosine N3 in a poly(dC)-poly(dC) complex with dGMP at low pH.19 The first crystallographic observation of G–C+ HG base-pair came many years later for a DNA duplex bound to the bisintercalating antibiotic triostin A.20

HG base-pairs presented an alternative to WC base-pairs, especially for A–T rich sequences, and this contributed to skepticism about the details of the DNA double helix structure. Among the skeptics was Linus Pauling, who had proposed an incorrect triple helix model for the DNA structure the same year that Watson and Crick proposed their double helix model.21 Indeed, Pauling felt that awarding Watson and Crick a Noble Prize for their discovery was premature “because of existing uncertainty about the detailed structure of nucleic acid” (personal correspondence to the Nobel Committee for Chemistry and Physics). The only real experimental indication that both G–C and A–T/U could form WC base-pairs in double and triple helices came from subsequent solution NMR studies of tRNA and polynucleotide complexes in 1970s.19,22-27 These studies, which were performed under physiological solution conditions in the absence of potentially perturbing crystal packing forces, showed distinct chemical shift signatures that were consistent with theoretical predictions for WC rather than HG hydrogen bonding in A–T/U base-pairs.

HG BASE-PAIRS IN NAKED DUPLEXES

In 1973, Rich and colleagues reported the single crystal X-ray structure of the AU and GC dinucleoside phosphates and the results were heralded as “the double helix at atomic resolution.”28, 29 These structures verified key aspects of the double helix proposed by Watson and Crick. Both structures revealed a right-handed double helix with two strands running anti-parallel to each other. Importantly, both structures featured WC type base-pairing. This was the first time a WC rather than HG base-pair was observed involving adenine. The structures seemed to put the controversy regarding HC versus WC A–T base-pairs to rest, since it was the WC form that was favored when the bases were constrained in a double helix. As noted by Alex Rich, James Watson phoned him after receiving a preprint of the AU manuscript and said “he had his first good night sleep in 20 years!”30 In retrospect, the observation of A-U WC base-pairs in A-form RNA created a false sense of comfort as recent studies have shown that, in contrast to DNA, HG base-pairs are not likely to form in A-form RNA, as suggested by the lack of HG base-pairs in A-form RNA duplexes in over 1000 high resolution crystal structures surveyed in the PDB.

Six years later, in 1979, following advances in phosphotriester methods for chemical synthesis of large quantities of homogeneous oligonucleotides,31, 32 Rich and colleagues reported the first single crystal X-ray structure of DNA for the d(CG)3 sequence.9 Prior studies had shown that repeating polymers of inosine-cytosine33 and guanine-cytosine34 resulted in a “reverse” circular dichroism (CD) spectrum and this was interpreted as evidence for a left-handed helix. The structure of d(CG)3 revealed a stunning left-handed double helix with an unusual zig-zag shape and was called “Z-DNA.”9 To convert B-DNA into Z-DNA, both bases in the base-pair have to be flipped upside down. As in HG base-pairs, the guanine base is flipped into a syn conformation; however, in Z-DNA, the concomitant flipping of the cytosine base and sugar allows the two flipped bases to regroup into WC base-pairs, with the flipping of the cytosine sugar giving rise to the unusual zig-zag backbone. Thus, although the first single crystal structure of DNA provided evidence for G–C WC base-pairs, it fueled skepticism about the overall structure of the double helix.

In the early 1980s, models were put forward for a Z-DNA structure that are exclusively comprised of HG base-pairs,35 particularly for A–T rich sequences that frequently exhibited unusual diffraction patterns when dried (referred to as D- or E-type X-ray diffraction patterns).35-37 This form of DNA required helical structures with 7–7.5 base-pairs per turn, which cannot be stereochemically achieved by right-handed B-form DNA. Spectroscopic studies of poly(rA)-poly(rU) sequences that bear substituents at the adenine C2 position, which sterically block WC base-pairing, also suggested formation of duplexes with parallel or anti-parallel chain polarity, in which strands are held together by HG or reverse HG base-pairing, respectively.38-40

During the same period, in 1980, Drew and Dickerson reported the first single crystal X-ray structure of a DNA polymer longer than one helical turn.5 They used heavy atom replacement approaches to solve the structure of a synthetic DNA dodecamer sequence d(CGCGAATTCGCG). The structure adopted a right-handed duplex containing WC G–C and A–T base-pairs—precisely as proposed by Watson and Crick. It is this structure that is considered to be crystallographic proof that DNA can indeed adopt the structure proposed by Watson and Crick.

The ability to prepare large quantities of highly pure DNA samples in a facile manner, in parallel with developments in 13C/15N isotopic enrichment and solution state NMR spectroscopy of nucleic acids resulted in the high-resolution X-ray and NMR structure determination of diverse DNA sequences in the 1980s and 1990s showing WC B-form DNA duplexes. However, spectroscopic evidence for HG base-pairs continued to mount in the 1990s and 2000s in the context of A–T rich sequences,41 in poly(dG-dC)-poly(dG-dC) sequences at low pH as possible intermediates along the B-to-Z DNA transition,42 as well as in noncanonical DNA regions as closing base-pairs of apical loops.43, 44 But it was not until 2002, when Subirana and colleagues reported the first crystal structure of an AT-repeat not capped by GC base-pairs, that the first single crystal X-ray structure of a naked DNA duplex containing exclusively HG base-pairs was resolved.45 The structure of d(AT)3 revealed an anti-parallel right-handed double helix made up exclusively of HG base-pairs, with an overall structure similar to that of B-form DNA (Figure 2).45, 46 Key differences included a change in the position of the helical axis relative to the base-pairs, reduction in helical radius and C1′–C1′ distance by ∼2.5–3.0 Å, altered hydrogen bonding donor/acceptor pattern in the major and minor grooves, a narrower and less electronegative minor groove, which favors hydrophobic interactions, and distinct helix stacking and hydration patterns relative to B-DNA. Together, these features provide a distinct physicochemical presentation of the genetic code for potential sequence-specific recognition by the cellular machinery. Similar HG structures were subsequently reported for related sequences d(ATATATCT)47 and d(CGATATATATAT).48

Details are in the caption following the image
Comparison between an ideal Watson–Crick (WC) and a Hoogsteen (HG) double helix from the crystal structure of d(AT)3 (PDB ID: 1RSB). The B-DNA helix was built using w3DNA.49

It is important to note that, in all cases, solution state NMR studies of the above DNA sequences under the same conditions used to grow crystals argued against formation of a HG helix, and in favor of a prototypical WC B-form double helix (Zhou et al. unpublished data).46 This suggests that crystal packing plays an important role in stabilizing the HG double helix.

DNA-ANTIBIOTIC COMPLEXES

In 1984, Rich and colleagues reported the single crystal X-ray structure of a DNA double helix with sequence d(CGTACG) bound to triostin A,50 a cyclic octadepsipeptide anti-tumor antibiotic containing two quinoxaline rings that binds DNA and inhibits replication and transcription in vivo (Figure 3).51, 52 This was the first structure of a peptide antibiotic in complex with an oligonucleotide. The structure showed that the two quinoxaline rings bis-intercalate in the minor groove of the DNA double helix and surround the WC G–C base-pairs, disrupting stacking interactions to the central A–T base-pairs (Figure 3). Remarkably, although the two central A–T base-pairs are not covered by the two triostin A molecules, they form HG rather than WC base-pairs. This marked the first crystallographic observation of the co-existence of WC and HG base-pairs within the same duplex. No direct contacts are observed between the antibiotic and the exposed Watson–Crick face of the A–T bases. Rather, the helical constriction at the HG base-pairs appears to stabilize the complex by allowing close packing of the oligonucleotide around the end of the triostin A. Thus, several favorable van der Waals contacts would be lost if the deoxyribose rings were further apart as in WC base-pairs. Similar structures were subsequently reported for DNA bound to the related echinomycin antibiotic,53 and for triostin A bound to d(GCGTACGC),20 which featured two central A–T HG base-pairs, and two terminal G–C+ HG base-pairs, marking the first crystallographic observation of protonated G–C+ HG base-pairs within a duplex (Figure 3).

Details are in the caption following the image
Hoogsteen (HG) base-pairs in a DNA-antibiotic complex. (A) 3D (left) and secondary (right) structures of a DNA octamer in complex with the bis-intercalating antibiotic triostin A (orange), highlighting internal A–T HG base-pairs (red) and terminal G–C+ (HG) base-pairs (blue) (PDB ID: 1VS2). Top view of the (B) terminal G–C+HG base-pair and (C) helical A–T HG base-pairs depicting characteristic hydrogen bonds (dashed lines) and stacking between the two base-pairs and the quinoxaline moiety of triostin A.

Soon after, chemical footprinting studies performed in solution showed that sites that form HG base-pairs in X-ray structures of DNA-echinomycin complexes are hyperreactive to diethyl pyrocarbonate (DEPC),54 which preferentially reacts with exposed N7 atoms of syn purines in noncanonical Z-DNA55 and cruciform loops.56 However, these results were challenged by footprinting studies employing DEPC and other reagents that target thymines, that showed little change in thymine chemical reactivity when replacing adenine with 7-deazaadenine, which has a diminished ability to form HG base-pairs.57-60 Moreover, oligonucleotides containing 7-deazaadenine and 7-deazaguanine bound echinomycin with affinity comparable to that of their unmodified counterparts, suggesting that HG base-pairs are not essential for binding.60 These studies argued that hyperreactivity does not arise from formation of HG base-pairs but, rather, from unwinding and extension of the DNA helix upon drug binding.

Subsequent NMR studies by Feigon, Patel, and their co-workers confirmed formation of HG base-pairs in DNA-antibiotic complexes,61, 62 although their occurrence was shown to be highly dependent on sequence, temperature, and pH.63-65 A–T base-pairs generally form WC, and if they ever form HG, they do so transiently at physiological temperatures. Even the terminal HG A–T base-pairs were only favored in DNA-antibiotic complexes having purine 5′ and pyrimidine 3′ to CG (i.e., ACGT, GCGC) and only at low pH for G–C+ HG base-pairs.61, 63, 66, 67

Despite many studies, to date, it remains unclear whether quinoxaline antibiotics stabilize HG base-pairs in DNA in vivo and whether this is related in any way to their biological activity.

DNA-PROTEIN COMPLEXES

In the late 1990s, X-ray structures emerged showing that certain proteins bind and in some cases specifically recognize HG base-pairs embedded in B-form DNA (Figure 4). These studies raised the possibility that proteins exploit the unique structural and chemical features of HG base-pairs in sequence-specific DNA recognition, and therefore, provided evidence for a functional role for HG base-pairs in vivo.

Details are in the caption following the image
Hoogsteen (HG) base-pairs in DNA-drug/protein complexes. 3D structure (left) and DNA sequence (right) of DNA complexes with the dcm very-short-patch repair (Vsr) DNA endonuclease (PDB ID: 1ODG), p53 tumor suppressor protein (PDB ID: 3IGL), TATA-box binding protein (TBP) (PDB ID: 1QN3), integration host factor (IHF) (PDBID: 1IHF), MATα2 homeodomain (PDB ID: 1K61), and TnpA transposase (PDB ID: 2A6O), highlighting the A–T (red) and G–C+ (blue) HG base-pairs (5mC, 5-methylcytosine; 5IU, 5-iodouridine; dashed lines indicate unpaired residues).

The first crystallographic observation of HG base-pairs in a protein-DNA complex was reported by Rice et al. in 1996,68 who visualized a single A–T HG base-pair immediately adjacent to a nicked site in the X-ray structure of a highly bent (>160°) 35 base-pair (bp) DNA bound to the integration host factor (IHF) protein (Figure 4). Interestingly, a hydrogen bond was observed between the backbone amide group of an arginine residue and N3 of the syn A, suggesting specific recognition of the Watson–Crick face in the HG base-pair. However, the nick is involved in crystal packing with a neighboring molecule in the complex and HG formation helps move the phosphate backbone away from a neighboring molecule. In addition, the protein makes specific contacts with N3 of an anti-A in a symmetric site in the DNA lacking the nick, suggesting interactions that are specific for WC rather than HG base-pairing. Moreover, NMR studies of IHF binding to a shorter recognition sequence containing the first nicked site argue against the presence of an A–T HG base-pair in solution.69

Subsequent X-ray structures of TATA elements bound to the TATA box-binding protein (TBP) revealed a G–C+ HG base-pair in the mutant TATAAAC box in a region of DNA unwinding and intercalation.70 No direct contacts were observed between the syn guanine base and the protein. However, the HG base-pair appears to contribute to binding by preventing steric clashes between the protein leucine 72 and the guanine exocyclic NH2, while still preserving favorable van der Waals contacts with two neighboring phenylalanine residues. A second G–C HG base-pair was observed but attributed to crystal packing forces. Interestingly, the ∼150-fold weaker binding affinity observed for TBP to this mutant TATA box,71 which could be correlated to the selection of a transient HG over a WC base-pair at that site,72 has been implicated in the transcriptional regulation of the human osteocalcin gene.73 This observation suggests a biological role for the formation of a G–C+ HG base-pair at the mutant promoter site.

Both IHF and TBP induce large distortions in the DNA, which could facilitate formation of HG base-pairs. In contrast, Wolberger and coworkers observed a single A–T HG base-pair within an otherwise undistorted B-form WC duplex in the X-ray structure of MATα2 homeodomain nonspecifically bound to DNA.74 Van der Waals contacts were observed between an arginine side chain and the syn adenine base as well as the sugar-phosphate backbone of the adenine and the neighboring thymine. Once again, the HG base-pair appears to avoid unfavorable steric clashes that would otherwise arise with a WC base-pair. The HG base-pair is accommodated within the duplex DNA without inducing major distortions, even for the directly neighboring base-pairs. The ease with which HG base-pairs could seamlessly fit within B-DNA raised the possibility that HG base-pairs may have been incorrectly assigned to be WC base-pairs due to misinterpretation of ambiguous electron density at medium to low resolution.74

More recently, HG base-pairs have been observed in the complex of the dmc very-short-patch repair (Vsr) DNA endonuclease, which participates in the nucleotide excision repair of G–T mismatches arising from deamination of 5-methylcytosines, with a specific recognition hemi-deaminated/hemi-methylated DNA sequence.75 Remarkably, the A–T HG base-pair, which is sandwiched between the mismatched and hemi-methylated sites, is also found in the equivalent unbound DNA site within the same crystal but not in a slightly different unmethylated sequence,76 implying that its presence could be an inherent property of the specific DNA sequence and not due to protein-induced distortions in DNA structure.

Two neighboring A–T HG base-pairs were subsequently observed in structures of a palindromic CATG/CATG sequence bound to the DNA binding domain of p53.77 Although no direct contacts are observed with the syn adenines, the formation of the HG base-pairs results in a narrowed minor groove in the region flanking the CATG site, leading to enhanced negative electrostatic potential that is further stabilized by insertion of the positively charged arginine side chains. Remarkably, these HG base-pairs adopt WC geometry in X-ray structures with a longer spacer length77 or a different intervening sequence78, 79 between DNA half-sites, which is accompanied by a different organization between p53 dimers, altered DNA helix conformation, and that also yield different DNA-tetramer binding affinities.80 These studies suggest that WC and HG base-pairs likely exist in equilibrium with each other and that their selection in DNA-p53 complexes is largely dictated by the nature of the DNA binding sequence.

DAMAGED DNA

By the 1960s, it had become clear that DNA could be damaged by exogenous and endogenous factors, and that this in turn may be linked to disease states such as cancer.81 During the 1970s and 1980s, enzymes that recognize and repair damaged DNA began to be uncovered, resulting in great interest in characterizing the structure of damaged DNA.82 These studies showed that HG base-pairing provides an important mechanism for stacking and hydrogen bonding, in cases where the Watson–Crick face of the purine bases is damaged, preventing favorable WC base-pairing.

The first evidence for HG-type base-pairs in damaged DNA was reported in the late 1980s in solution NMR studies by Patel and co-workers showing that guanine adducts on the Watson–Crick edge or the C8 positions strongly favor a syn base orientation.83-86 Subsequent NMR studies showed HG-type pairing in various purine lesions, including WC face alkylation adducts (e.g., 1,N2-propanoguanine87, 88 and 1,N2-ethenoguanine89), the bulky guanine C8 mutagenic adduct aminofluorine-C8-guanine,86, 90 and the common mutagenic lesion N1-methyladenine (Figure 5).91 The direct observation of HG base-pairing (rather than extrahelical states) in a wide variety of lesions in naked DNA in the 1990s and 2000s established HG base-pairs as an energetically closer alternative to WC base-pairs.

Details are in the caption following the image
The N1-methyladenine (N1-Me-dA) damage favors Hoogsteen base-pairing with T. (A) The methyl damage (green) on A N1 causes a steric clash in the WC base-pair, which favors formation of the alternative HG base-pair that relieves the steric clash. (B) Crystal structure of an HG base-pair between N1-Me-dA and T where the methyl modification is shown in green (PDB ID: 3H8O).

There is great speculation and experimental evidence that HG-type pairs play important roles in DNA damage and mismatch repair. For example, it is likely that the enzyme AlkB, which repairs the mutagenic lesion N1-methyladenine, initially recognizes the HG base-pair between N1-methyladenine and thymine (Figure 5)91, 92 before flipping out the damaged purine for oxidative demethylation. The flipping of one purine base to a syn conformation is also often observed in purine-purine mismatches, where the syn–anti base-pair configuration affords a shorter helical radius that can be more readily accommodated within B-DNA as compared to the anti–anti configuration. There is X-ray structural evidence that the DNA mismatch repair enzyme MutS specifically recognizes HG type purine-purine and purine-pyrimidine mismatches, even though they may not be the dominant conformation in unbound DNA, by making specific hydrophobic and hydrogen bonding minor groove contacts with the syn adenine/guanine base in A–C, A–A, and G–G mismatches.93 The recognition of the increased population of syn–anti rather than anti–anti configuration in certain mismatched base-pairs may help the enzyme discriminate against undamaged anti–anti Watson–Crick base-pairs. Thus, HG base-pairs not only provide a mechanism for maintaining the overall structural integrity of damaged or incorrectly replicated DNA, they can play an important role in DNA repair mechanisms.

It is worth noting that HG base-pairs have also been observed in DNA containing non-natural modifications in the sugar-phosphate backbone, including the addition of an ethylene bridge between C3′and C5′ in “bicyclo-DNA,” which fixes the gamma backbone torsion angle to a noncanonical orientation,94 a single-residue substitution of sugar O4′ with a methylene group,95 or in dinucleotide d(TA) analogs containing a nonionic diisopropylsilyl-modified backbone at very low temperatures.96

DNA REPLICATION

Watson–Crick base-pairs were the most important aspect of the DNA double helix structure because, as succinctly stated in the very last sentence of their 1953 Nature paper, it “immediately suggests a possible copying mechanism for the genetic material.”1 Four years later, Kornberg discovered the enzyme that catalyzed template DNA replication97 and ensuing biochemical and structural studies established that high fidelity DNA polymerases replicate DNA by Watson–Crick pairing of the incoming dNTP with the template strand. In particular, multiple studies have demonstrated that the active site of replicative DNA polymerases is highly selective toward insertion of the correct dNTP and that catalytic efficiency is severely diminished when Watson–Crick geometry is not present.98-102 This strict stereochemical requirement for Watson–Crick pairing, together with efficient 3′–5′ proofreading exonuclease activity, prevents misincorporation of incorrect or damaged nucleotides during DNA synthesis that is essential for genome stability.

During the 1990s, studies revealed that certain families of DNA polymerases (the X and Y families)103-105 contributed to damage-induced mutagenesis. Such specialized polymerases function in the replication and repair of damaged DNA, which could present severe replication blocks for common replicative polymerases, and thus play an important role in the maintenance of genome stability. These enzymes are also characterized by much lower replication fidelity than regular polymerases as their active sites are more tolerant toward noncanonical geometries between template and incoming nucleotide and they often lack the 3′–5′ exonuclease domain (in the case of Y family).104, 106 It was later shown that some members of the Y family of DNA polymerases efficiently bypass DNA damage by replicating the template DNA via HG rather than WC base-pairing. HG-based replication was first visualized in X-ray structures of an archaeal DNA Polη homolog, Dpo4, by Yang and coworkers nearly a decade ago.107 The structure showed that Dpo4 replicates UV cross-linked thymine dimers by forming a HG base-pair between the 5′ thymine and an incoming ddATP, thus avoiding backbone distortion and allowing discrimination against guanine and pyrimidines.107

Aggarwal and coworkers108 subsequently showed using X-ray crystallography and biochemical experiments that another member of this family, human DNA Polι, employs HG base-paring as a general mechanism to replicate both damaged and undamaged DNA. A striking X-ray structure of Polι showed a template adenine in the active site of the enzyme adopting a syn conformation and forming a HG base-pair with an incoming dTTP (Figure 6).108 Unlike replicative polymerases or other members of the Y family, Polι featured a narrower active site, which strongly favors formation of HG type base-pairs that are characterized by shorter C1′–C1′ distances as compared to WC base-pairs. The ability to insert the correct nucleotide across an adenine base also provided a rationale for prior biochemical studies showing a much higher efficiency of correct base incorporation across a templating adenine than across a templating thymine, which in fact favors G misincorporation because of its high propensity for forming of anti-G–T wobble base-pair.109 This raised HG base-pairs to a prominent position reserved previously only for WC base-pairs; they provided a basis for copying DNA.

Details are in the caption following the image
DNA replication by human lesion bypass DNA polymerase ι (hPolι) proceeds via Hoogsteen (HG) base-pairing. Shown are complexes between hPolι and an HG base-pair between (A) a template A and an incoming dTTP (PDB ID: 1T3N), (B) a template 1,N6-ethenoadenine (εA) and an incoming dTTP (PDB ID: 2DPJ), (C) a template G and an incoming dCTP (PDB ID: 2ALZ), and (D) a template N2-ethylguanine (N2-Et-G) and an incoming dCTP (PDB ID: 3EPG). The damaged sites in εA and N2-Et-G are highlighted in green.
Details are in the caption following the image
The conformational exchange between a ground state Watson–Crick (WC) and a transient Hoogsteen (HG) base-pair can be monitored using NMR relaxation dispersion methods. (A) A two-state exchange between WC and HG A–T and G–C+ HG base-pairs showing the relative populations and exchange rate constants obtained from R relaxation dispersion experiments at pH 6.8 and the estimated pKa for a HG G–C+ base-pair.17, 72 (B) Corresponding thermodynamic profiles obtained from a temperature dependence of R relaxation dispersion at pH 5.4 (G, free energy; H, enthalpy; TS, entropy).72

The proposal that hPolι replicates DNA via HG base-pairing was quickly met with skepticism. In an accompanying News and Views article, Wang110 pointed out that, based on the weak electron density for the active site A–T base-pair, it is difficult to resolve a WC from a HG geometry. He also questioned the ability of such a polymerase to form protonated G–C+ base-pairs at physiological pH, given the low intrinsic pKa of cytosine N3 (∼4.2–4.4).111 Aggarwal et al.112 later put the matter to rest by (i) solving X-ray structures of Polι, unambiguously showing a protonated G–C+ HG base-pair at pH 6.5, reinforcing their hypothesis that Polι has evolved to favor HG base-pairing by constraining the backbone C1′–C1′ distance between template and incoming nucleotide in its narrow active site and (ii) showing selective inhibition of DNA synthesis by Polι but not other polymerases when using 7-deazaadenine or 7-deazaguanine, which are incapable of forming HG base-pairing, as the templating residue.113 Several other structures capturing DNA synthesis by Polι followed, ultimately demonstrating that major purine alkylation and oxidation lesions, including 1,N6-ethenoadenine,114 N2-ethylguanine,115 O6-methylguanine116 and 8-oxoguanine,117 adopted a syn conformation and, where possible, formed HG type base-pairs with incoming complementary pyrimidine and purine nucleotides (Figure 6) (reviewed in Makarova et al.118). These observations, in conjunction with biological studies showing that Polι was important for cell survival in the presence of alkylating agents119, 120 and oxidative stress,121 provide the most compelling evidence to date for a biological function for HG base-pairs in duplex DNA.

TRANSIENT HOOGSTEEN BASE-PAIRS

The earliest fiber X-ray diffraction studies of DNA highlighted its polymorphic nature and the ability of the double helix to adopt different forms depending on environmental conditions and sequence contexts. Subsequent studies showed that DNA does indeed come in many different forms and that even B-DNA is not rigid, but rather, can undergo large deformations and thermal fluctuations in a sequence-dependent, biologically important manner.122 This flexibility was not confined to the weakly constrained sugar and phosphodiester backbone, but also includes the Watson–Crick base-pairs themselves.123 Chemical probing and hydrogen exchange studies spanning the 1970s–1990s established that WC base-pairs break apart and open at millisecond timescales and that the open state exists in at most ∼0.002% abundance for A–T or ∼0.00008% for G–C base-pairs.124-128 There are now several X-ray structures that capture these open states of the base-pairs when bound to proteins that establish their functional significance.

Two years ago, NMR studies from our laboratory showed that both A–T and G–C Watson–Crick base-pairs can transiently undergo excursions toward HG base-pairs in duplex DNA.71, 129 The transient HG base-pairs were characterized with the use of recently developed NMR R relaxation dispersion spectroscopic methods that make it possible to observe and structurally characterize fleeting states of macromolecules.130-132 The transient HG base-pairs had populations of ∼0.1–1%, making them nearly three orders of magnitude more abundant than the open state, with the G–C+ HG base-pairs being less abundant than their A–T counterparts at physiological pH by at least a factor of 20 due to an additional required protonation event at cytosine N3.17 The transient HG base-pairs have lifetimes on the order of hundreds of microseconds to milliseconds (∼0.3–1.5 ms), which are significantly longer than the lifetimes of base-pair open states found to be in the nanosecond range.128 The free energy and enthalpy of the WC-to-HG transition were found to closely match those of base-pair opening,128 suggesting that the transition may be limited by a base-pair breaking event that could be coupled to the purine anti–syn isomerization, inside or outside of the double helix. It is remarkable that the HG base-pairs are energetically less favorable than WC counterparts by a mere ∼3 kcal/mol in the case of A–T base-pairs, roughly the equivalent of one strong hydrogen-bond. These energetic differences are small compared to forces that exist in cells due to protein interactions, torsional stress due to binding and supercoiling, or those applied due to crystal packing forces or that arise from changing pH. Studies suggest that the transient HG base-pairs occur universally across all DNA sequence contexts, in a noncooperative manner, and with small, albeit significant, sequence-specific differences in population and lifetimes (unpublished results).

The picture that emerges is one in which every base-pair in DNA exists as an equilibrium of WC and HG base-pairs, with external parameters operating on the DNA resolving one or the other base-pair type. This helps explain the long and controversial observation of WC versus HG—small changes in conditions can favor one form over the other. It is striking that the difference in the abundance of transient G–C+ and A–T HG base-pairs mirrors the differences in efficiency observed in Polι replication of A/T versus G/C. The HG base-pairs transiently expose the Watson–Crick faces of purines, and may potentially help explain the much greater abundance of N1 methylation in adenine versus guanine. Most importantly, the observation of transient HG base-pairs in duplex DNA, with comparable energetics to WC, raises the possibility that HG base-pairs exist in much greater abundance in vivo, particularly in A–T rich regions of the genome. When combined with the current difficulties in resolving WC from HG based on X-ray diffraction data, it may well be the case that there are more HG base-pairs in X-ray structures currently deposited in the PDB that have gone undetected, particularly for A–T base-pairs. We hope that this review provides the impetus to be more critical of the interpretation of X-ray diffraction data to rule out the possibility of HG base-pairs.

FUTURE OUTLOOK

Thus, 60 years later, there are fundamental questions that remain to be answered regarding the structure of the DNA double helix.
  1. What fraction of the existing DNA structures in the Protein Data Bank contain base-pairs that, due to biases and poor electron density, have been misinterpreted to be WC rather than HG? It should be relatively straightforward to re-interpret the electron density for DNA structures and to examine, base-pair by base-pair, to what extent the data favor WC versus HG. We predict that there will be many structures that will be revised to include uncertain base-pair geometry, or even HG base-pairs, particularly for A–T base-pairs.
  2. Are HG base-pairs that are observed in X-ray structures also observed under solution conditions? Here, there is a need to develop new methods to allow the characterization of HG versus WC under solution conditions. Developments in NMR that can allow studies of large protein-DNA complexes will undoubtedly be important—but other, high-throughput, approaches are also needed to streamline such applications.
  3. If HG base-pairs do occur, to what extent are they functionally important? Are the HG base-pairs “passively” present because they merely provide a more stable form under a particular condition or do they “actively” participate in biological function? Here, there is a danger of rushing to interpret results with 7-deazapurines, which diminish the ability to form HG base-pairs. It is conceivable that the 7-deazapurines can still form distorted HG base-pairs that preserve function, or affect an aspect of function in vivo that is not explored in such in vitro studies.
  4. To what extent do HG or any other type of base-pair occur in vivo? Today, there are no methods for characterizing, at the atomic level, the nature of the base-pairs that hold together DNA duplexes in vivo. In Eukaryotes, DNA is wrapped around nucleosomes and packaged with proteins to form chromatin fibers that make up chromosomes. The DNA is subjected to extreme packing and supercoiling, which present forces that in all likelihood exceed the energetic differences between WC and HG observed in duplex DNA. Given the growing evidence that HG base-pairs tend to be favored under tight packing conditions, and in regions of stress, we can predict that the genome may in fact be enriched with HG base-pairs in vivo relative to relaxed duplex DNA. One can imagine the existence of “HG islands” in A–T rich regions that have been shown to be important for DNA minor groove recognition by a variety of protein factors and anticancer drugs regulating replication/transcription, DNA bending, supercoiling induced DNA destabilization, nucleosome positioning, and chromosomal translocation.133-140 A challenge for the future will be the development of methods for visualizing the high resolution structure of DNA in vivo.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.