Artificial Intelligence-Enhanced Analysis of Genomic DNA Visualized with Nanoparticle-Tagged Peptides under Electron Microscopy
Abstract
DNA visualization has advanced across multiple microscopy platforms, albeit with limited progress in the identification of novel staining agents for electron microscopy (EM), notwithstanding its ability to furnish a broad magnification range and high-resolution details for observing DNA molecules. Herein, a non-toxic, universal, and simple method is proposed that uses gold nanoparticle-tagged peptides to stain all types of naturally occurring DNA molecules, enabling their visualization under EM. This method enhances the current DNA visualization capabilities, allowing for sequence-specific, genomic-scale, and multi-conformational visualization. Importantly, an artificial intelligence (AI)-enabled pipeline for identifying DNA molecules imaged under EM is presented, followed by classification based on their size, shape, or conformation, and finally, extraction of their significant dimensional features, which to the best of authors' knowledge, has not been reported yet. This pipeline strongly improved the accuracy of obtaining crucial information such as the number and mean length of DNA molecules in a given EM image for linear DNA (salmon sperm DNA) and the circumferential length and diameter for circular DNA (M13 phage DNA), owing to its image segmentation capability. Furthermore, it remained robust to several variations in the raw EM images arising from handling during the DNA staining stage.
1 Introduction
DNA visualization is essential for comprehending and analyzing the genome of an organism, as well as the emergent biochemical phenomena involving DNA sequences.[1] DNA visualization techniques have evolved both vertically and horizontally in terms of the sheer possibilities for staining a given natural DNA molecule, whether long or short, linear, circular, or coiled. In terms of horizontal evolution, DNA visualization has been made possible across different microscopy platforms, such as fluorescence microscopy (FM),[2-5] electron microscopy (EM),[6-9] and atomic force microscopy (AFM),[10, 11] with each platform having its own pace of vertical evolution, which includes the development of successive generations of staining agents over time.
In FM-based DNA visualization, staining agents such as 4′-6-diamidino-2-phenylindole (DAPI),[2] Hoechst family of dyes,[12, 13] thiazole family of dyes—including thiazole orange dimer (TOTO) and oxazole yellow homodimer (YOYO-1)[14-16] and DNA-binding peptides/proteins (DBPs) conjugated with fluorophores[4, 17] have been universally used. This field is currently progressing toward the development of next-generation photo switchable dyes that can be used in super-resolution imaging methods[18-20] such as stochastic optical resolution microscopy (STORM).[21-23] Recently, chromatin expansion microscopy (ChromExM) was reported for in situ visualization of chromatin fibers inside the cell nucleus by making use of fluorescent labelling molecules anchored to swellable hydrogels. They were able to achieve a lateral resolution of ≈15 nm using confocal microscopy and ≈3 nm using super-resolution microscopy.[24] In addition to these super-resolution methods, MINFLUX (MINimal photon FLUXes) fluorescence nanoscopy has been gaining traction recently wherein the merits of both coordinate-stochastic single molecule imaging methods like STORM and coordinate-targeted methods such as stimulated emission depletion microscopy (STED) have been synergistically coupled to further enhance the resolution to a few nanometers.[25] Notwithstanding the merits of recently developed high-resolution FM techniques such as multiplexed imaging of DNA molecules[26, 27] using multiple fluorophores, ease of handling,[22] in situ visualization[24] and significant technical advances in recent times to enhance their resolution, they remain limited in the following ways for single-molecule DNA visualization: requisites considerable time and high storage capacity for data acquisition, employs extensive postprocessing and analysis to yield essential structural or biomolecular information.[28] Furthermore, with regard to ChromExM, they still require bright, photostable fluorophores along with chemical fixation to achieve their reported resolution, which leaves the user with very few fluorophore options.[24]
In this regard, EM-based DNA visualization offers several essential advantages over FM, such as a wide range of magnification spanning six orders of magnitude (from millimeter to nanometer scale), as well as providing high-resolution structural and conformational details.[29-32] Nanometer-scale resolution imaging is crucial in DNA studies, as it enables the observation of minute differences in DNA conformations,[33] such as left-handed helices in Z-DNA[34-36] or the more common right-handed helices in B-DNA,[7, 9] as well as the recently reported formation of sub-nanometer scale nacre-mimicking structures.[37] Moreover, studying the dynamics of DNA-involving reactions becomes feasible at the nanometer-scale resolution.[38] Furthermore, given the abundance of information on DNA molecules, high-resolution imaging adds an extra dimension to optical DNA mapping techniques developed using FM.[39-42] On the other hand, DNA visualization using AFM is scarcely considered owing to its limited range of magnifications available for genomic DNA analysis, lack of resolution and reproducibility in solution-based DNA staining, and requirement of extensive modification of the substrate for efficient DNA visualization.[43]
Till now, heavy metal-based staining agents such as uranyl acetate,[44] osmium tetroxide,[45] and lead salts[46] have been conventionally used for EM-based biomolecule visualization, as they increase typically low electron scattering capabilities of biomolecules. However, it has been found that each of these staining agents binds preferentially to a particular category of biomolecules—osmium tetroxide binds to lipids preferentially enabling visualization of cell membranes, uranyl acetate to proteins, and lead salts to RNA molecules,[47] thereby making their interaction with DNA nonspecific. The evolution of these staining agents has been limited within this platform owing to the practical restrictions on uranium use in biochemical laboratories and the requirement of government authorization for its use.[48, 49]
Recently, DNA metallization has been developed as a possible alternative to heavy metal staining, wherein metal nanoparticles are deposited onto the DNA backbone instead of heavy metals to act as the electron scattering source.[50, 51] However, it has received less attention as a DNA staining technique as current research focuses on developing novel metal nanostructures using DNA as the template and studying its material aspects.[52-56] Furthermore, like heavy metal staining, DNA metallization does not provide sequence-specific visualization of DNA molecules. For this purpose, DBPs act as ideal candidates owing to their sequence-specific binding capabilities to DNA molecules.[57-59] Over the years, several A–T/G–C sequence-specific DBPs have been identified and characterized for their DNA binding mechanism,[60-62] which could possibly be employed for a next-generation DNA staining method for EM. In this regard, a recent study from our group demonstrated the possibility of using DBP-synthetic polymer combination as electro-stains to visualize DNA molecules derived from biological samples under a scanning electron microscope (SEM). Since the width of DNA molecules is ≈2 nm, which makes it difficult for them to be visualized under SEM, this DBP-synthetic polymer combination increased their width to ≈15 nm, thereby making them suitable for high-resolution visualization under SEM.[63]
Parallel to the evolution of staining agents for EM, the advent of artificial intelligence (AI) has facilitated extensive statistical analyses of nanomaterials.[64, 65] AI has also been utilized to analyze molecules imaged across different microscopy platforms, either for sensing purposes[66, 67] or to extract key features for downstream studies.[68, 69] Such analysis remains essential to microscopy because it improves our understanding of biomolecular and biochemical phenomena. However, to the best of our knowledge, no such AI-enabled analyses of EM-based DNA molecular images have been reported yet; existing literature has focused mostly on FM,[28, 70, 71] AFM,[72] volume electron microscopy, cryo-electron tomography[73] and single-molecule localization microscopy-based images.[28, 74] In addition, the use of AI for the classification of DNA molecules based on structural or conformational differences has remained limited over the years, in contrast to its widespread use in classifications based on DNA sequence information.[75-77] This highlights the need to develop more image-based classification models, particularly those based on EM images.
In this study, we explored an EM-based staining strategy that employs metal nanoparticles attached to DBPs, coupled with AI-enabled image analysis for DNA classification and feature extraction. Similar to DNA metallization, this strategy ensures that the metal nanoparticles act as electron scatterers, enabling EM visualization of the underlying DNA molecules. Moreover, the use of DBPs enables sequence-specific visualization because metal nanoparticles selectively bind to the thiol group present in these DBP molecules rather than the DNA molecule itself. As the choice of DBP is arbitrary and depends on the visualization requirements, both sequence-specific and non-sequence-specific DNA visualizations are achievable. Furthermore, the AI-enabled pipeline facilitated the classification of DNA molecules from their corresponding EM images based on their size and shape, followed by downstream analysis to extract various dimensional features which, to the best of our knowledge, have not been previously reported. Herein, this analysis has been demonstrated using both linear and circular-shaped candidate DNA molecules. Thus, this holistic package—which includes EM-based visualization using nanoparticle-tagged peptides and AI-enabled DNA molecular analysis—could serve as a cornerstone tool for molecular biologists, with potential utility in DNA sequence mapping using different combinations of nanoparticles and DBPs.
2 Results and Discussion
2.1 Visualization of λ-DNA Using Our Staining Agents—DBP and Engineered Gold Nanoparticles: A Proof-of-Concept Demonstration
Figure 1A illustrates our proposed methodology for the staining and subsequent visualization of DNA using gold nanoparticles (AuNPs) and DBPs. Herein, we have shown a two-step reaction: in the primary step, the DNA (in this case, λ-DNA) is reacted with the thiol-tagged DBP (in this case, High Mobility Group-DNA Binding Domain (2,3) [HMG-DBD (2,3)]) and incubated to facilitate binding of the DBP molecules to their sequence-specific sites on the DNA. In the second step, this DNA:DBP mixture is stained with labeled AuNPs (in this case, polyvinylpyrrolidone (PVP)-capped AuNPs [PVP-AuNPs]) and incubated again to facilitate covalent binding of the PVP-AuNPs to the thiol groups of cysteine present in the DBP.

The aforementioned steps shaped our staining method, wherein the engineering of labeled AuNPs and selection of the DBP remained crucial. Control experiments were performed to verify that a specific concentration ratio of the DNA:DBP:PVP-AuNPs combination was required for appropriate DNA visualization under EM; cases lacking either one of the components or having varying concentrations weakened or disabled the visualization of expected long and linear assemblies of single DNA molecules (Supplementary Section S4, Figure S8). This was corroborated with reproducible visualization of long λ-DNA strands under EM after multiple trials (Supplementary Section S2, Figure S5).
Regarding the engineered labeled AuNPs, three major factors were considered and optimized in our staining method: particle size, capping agent, and concentration (Supplementary Section S1). First, obtaining the appropriate concentration of AuNPs remained crucial as it dictated the nanoparticle density on the DNA backbone through DBP interactions; the higher the concentration of AuNPs, the greater the number of particles attached to the DNA backbone, giving rise to thicker DNA linear assemblies, and vice versa (Figure S4). This observation was consistent for PVP-AuNPs of all synthesized sizes (25 nm, 13 nm, and 9–10 nm).
Second, the choice of the capping agent on the AuNPs was significant because it determined the interaction of the nanoparticles with the DBP molecules. We used capping agents with varying charges, including citrate (negatively charged), PVP (neutral), thiol-functionalized polyethylene glycol (PEG-SH; neutral), amine-functionalized polyethylene glycol (PEG-NH2; neutral), and cetyltrimethylammonium chloride (CTAC; positively charged). In this study, we found that AuNPs with neutral capping agents such as PVP, PEG-SH, and PEG-NH2 enabled visualization, though with varying efficiencies (Figure S2, Supporting Information). Among these three capping agents, PVP showed ideal results with consistent long-term, single-molecule visualization of the λ-DNA molecules, whereas the SH/NH2-functionalized PEG resulted in the visualization of single molecule yet very short strands of λ-DNA.
We tried to determine the possible reason for this varying efficiency, especially between PVP and PEG-SH, using a fluorescence competition assay (Figure S3A, Supporting Information). The assay results indicated that the PEG-SH-capped AuNPs had a relatively lower barrier to DBP binding than PVP-capped AuNPs, implying that more fluorescently tagged DBP molecules bound to their surface, as evidenced by their higher fluorescence intensity. This could have resulted in a more than optimal number of DBP molecules bound to the PEG-SH-capped AuNPs, resulting in a shrunken DNA strand compared to the full-length visualization, as shown in Figure S2 (Supporting Information). Conversely, the barrier to DBP binding was optimal in PVP-capped AuNPs, thereby maintaining the optimal stoichiometric ratio for full-length visualization of the λ-DNA molecule. On the contrary, charged capping agents such as citrate and CTAC did not enable visualization due to their interference with the DBP:DNA interaction, followed by displacement of the DBP molecules from the DNA backbone.
Finally, the size of AuNPs remained important because of the arrangement of nanoparticles on the DNA backbone. We considered differently sized AuNPs (9–10 nm, 13 nm, and 25 nm) synthesized by Turkevich's method, followed by direct ligand exchange to replace the citrate with PVP as the capping agent, based on previously mentioned capping agent optimization studies (Figure S2). Single-molecule visualization of λ-DNA remained possible with all three nanoparticle sizes, however, with varying visualization properties. The 9–10 nm PVP-AuNPs enabled full-length visualization of the λ-DNA (16.49 µm in contour length, corresponding to 48 502 bp), as seen in the TEM images in Figure 2B. In contrast, the larger 25 and 13 nm PVP-AuNPs allowed for the visualization of much shorter DNA strands compared to the 9–10 nm PVP-AuNPs (Figure S1, Supporting Information).

We assume that this variation in visualization based on size could be attributed to the possible spatial hindrance concurrent with larger PVP-AuNPs interacting with the DBP molecules, while the smaller 9–10 nm PVP-AuNPs seem to fit spatially without hindering the DNA:DBP interaction during the binding step. Furthermore, the coverage of bases in the DNA per nanoparticle increased (assuming that the same number of DBP molecules were attached to the DNA), thereby resulting in much shorter DNA contour lengths with increasing nanoparticle sizes. Hence, there exists a negative correlation between the size of the PVP-AuNPs and the efficient visualization of longer DNA molecules using our staining method. This clearly indicates that a specific size range of PVP-AuNPs is to be considered in our staining method, preferably less than 13 nm to address the steric hindrance and base coverage issues that might come along while using PVP-AuNPs larger than 13 nm for efficient visualization and analysis of DNA molecules.
Figure 1B shows the broad resolution scale for visualization available using EM. SEM images provide broader resolution details, such as the length of each strand and number of DNA molecules present in a given image. TEM images provide details on the particle arrangement pattern related to sequence specificity, thickness of the DNA strand corresponding to the resolution of visualization in terms of nanoparticle size, or the number of layers of particles arranged onto the DNA backbone as a result of variation in the nanoparticle concentration and the conformation in which the DNA is present (linear, circular, or coiled).
2.2 Sequence-Specific Visualization of A–T Patterned λ-DNA Fragments
Figure 2A demonstrates the capability of our staining method to visualize sequence-specific DNA molecules. For this purpose, we utilized different A–T patterned DNA fragments of 4–5 kbp length: A–T rich, A–T even, and A–T poor fragments, with λ-DNA as the template and HMG-DBD (2,3) as the DBP candidate. As shown in the TEM images, a significant difference was observed in the particle arrangement pattern between the A–T-rich and A–T-poor fragments, wherein an almost continuous and linear strand was visible for the former fragment, while a strand with diffusely arranged particles was evident for the latter strand. Furthermore, SEM images for each fragment clearly exhibited a particle arrangement pattern throughout their entire length, with a continuous strand observed for A–T-rich fragments, and a strand with a good number of breaks observed for A–T even fragments, and a strand with diffused, fewer particles present at specific spots only observed for the A–T-poor fragment.
This sequence-specific visualization of A–T patterned DNA fragments was possible owing to the “relaxed specificity” of the three DNA binding domains of HMG protein (of which, we have used two in our DBP candidate) to a wide variety of A–T rich sequences that are 4–8 bps in length.[62] We have used the truncated form of the HMG protein comprising its second (DBD2) and third (DBD3) DBDs, owing to the fact that two DBDs are enough to facilitate DNA binding with nanomolar affinity and through NMR studies, it has been found that at a given point of time, only one or two of the three DBDs are involved in binding with the DNA.[62]
The “relaxed specificity” of HMG-DBD (2,3) is a consequence of binding of the N-terminal and C-terminal arginine residues of the core R-G-R motif sequence (Arg 9, 11 of DBD (2) and Arg 35, 37 of DBD (3) respectively; Table 1) to the A–T tract sequences present in the minor groove of these DNA fragments.[62] The conformation of these arginine residues in the core R-G-R motif sequence is additionally stabilized by the presence of intervening glycine residues, CαH protons of which, are packed deeply within the adenine bases and amide in the backbone of which is H-bonded to the oxygen atom of thymine bases in the A–T tract sequences. Such close proximity between the bases and the peptide backbone excludes the possibility of the presence of other amino acid residues at those positions, thereby providing a snug fit at the minor groove and enabling this “relaxed specificity” to A–T tract sequences.[62] As our staining method requires the binding of DBP molecules to specific sites on the DNA sequence for the AuNPs to bind to them and enable the visualization of the underlying DNA molecule, the aforementioned binding mechanism of HMG-DBD (2,3) molecules to the A–T patterned DNA fragments resulted in the observed A–T specific visualization under EM.
Name of the DBP | Sequence | Length of the DBP [aa] | Molecular weight [g mol−1] |
---|---|---|---|
AT hook | CTPKRPRGRPKK | 12 | 1422.84 |
HMG-DBD (2,3) | CVPTPKRPRGRPKGSKNKGAAKTRKTTTTPGRKPRGRPKKLE | 42 | 4621.69 |
HMG-DBD (2) | CKRPRGRPKGSKNKGAA | 17 | 1810.02 |
H-NS | MGSSHHHHHHSSGLVPRGSHMMSEALKILNNIRTLRAQARECTLETLEEMLEKLEVVVNERREEESAAAAEVEERTRKLQQYREMLIADGIDPNELLNSLAAVKSGTKAKRAQRPAKYSYVDENGETKTWTGQGRTPAVIKKAMDEQGKSLDDFLIKQGSGC | 162 | 1.5 µM |
H-NS (DBD) | CKTWTGQGRTPAVIKK | 16 | 1772.97 |
Lsr2 (CTD) | CAIDREQSAAIREWARRNGHNVSTRGRIPADVIDAYHAAT | 39 | 4446.22 |
Lsr2 (DBD) | CVSTRGRIPADVID | 14 | 1500.77 |
DBP | CKWKWKKA | 8 | 1076.60 |
NB-DBP | CEWEWEEA | 8 | 1080.39 |
As a subsequent step, we demonstrated the universality of our staining method in terms of using different DBP candidates for staining and visualization of λ-DNA under the EM. Various recombinant DNA-binding peptides and proteins were tested (Figure 2B). These candidates were selected based on their ability to bind to A–T rich sequences, similar to HMG-DBD (2,3). From the TEM images exhibited in Figure 2B, we can clearly see that each DBP candidate enabled the full-length visualization of the λ-DNA molecules, with slightly varying patterning efficiency and arrangement. This variation could be attributed to the differences in their A–T tract binding specificity and binding mechanism, despite the fact that all these candidates have a common feature of binding to the minor groove regions of the DNA molecule to which they bind.[60, 61, 78, 79]
Some candidate DBPs either did not provide visualization of the underlying λ-DNA molecules or enabled less-efficient visualization at a higher reaction ratio than the aforementioned candidates (Section S5, Figure S9, Supporting Information), owing to the mechanistic differences in their binding to DNA bases.[60, 62] Moreover, similar to the λ-DNA molecules visualized with HMG-DBD (2,3), we noted a slight reduction in the observed contour length of the DNA molecules from their expected length of 16.49 µm, which could be significantly attributed to the nature of these DBP molecules to “pull together” the DNA bases at the minor groove. This “pulling together” of the DNA bases results from the snug fit of the narrow concave surface of the DBP molecule created by the R-G-R motif present in all of the DBP candidates, with the bases at the minor groove regions.[62] In addition, since 4–8 bases of DNA are bound by a single DBP molecule to which AuNPs are expected to bind, a difference in length is expected per DBP molecule attached to the DNA backbone.
2.3 Visualization of Genomic Scale and Multi-Conformational DNA Molecules Using Nanoparticle-Tagged Peptides
Figure 3 shows the visualization of the genomic scale and multi-conformational DNA molecules with various candidates tested in each of these categories. First, considering the genomic DNA candidates, commercially obtained salmon sperm DNA was used as the first candidate in this category owing to its sheared nature and shorter size. As we can clearly see from the low-magnification SEM image in Figure 3A, many shorter single-molecule DNA strands of contour lengths between 1 and 2 µm were visualized. We obtained similar results with successive trials, which were reproducible (n = 7). Additional SEM and TEM images in the panel of Figure 3A show the length, particle arrangement pattern, and thickness of the visualized salmon sperm DNA.

Further, we tested another genomic DNA candidate with a length comparable to that of λ-DNA, which led to T7 bacteriophage DNA. As shown in the low-magnification SEM image in Figure 3B, long and linear DNA strands were visualized. The supporting SEM images in this figure panel exhibit the full-length visualization of T7 bacteriophage DNA, around 13.58 µm in contour length corresponding to its 39 936 bp, while the high magnification TEM image shows the particle arrangement pattern and, in turn, the sequence information of this DNA candidate.
Shifting the focus to another capability of our staining technique, multi-conformational DNA visualization was demonstrated using the M13 phage and pEGFP-N3 plasmid candidates, as shown in Figure 3C,D. As M13 phage DNA is a circular genomic DNA capable of undergoing coiling, a coiled DNA strand was expected upon visualization. Similarly, upon staining M13 DNA, we observed different circular conformations, as shown in the SEM and TEM images in Figure 3C.
With regard to the pEGFP-N3 plasmid, wherein a circular DNA structure was expected upon visualization, upon staining, we visualized various spherical structures with varying particle densities, as shown in Figure 3D. This varying particle density on the expected spherical structures (TEM images in Figure 3D) corresponded to coiled pEGFP-N3 plasmid molecules giving rise to bridge-like particle arrangement across the center, in addition to the circular boundary of the spherical structure observed. Length measurements were made for both the M13 phage and pEGFP-N3 plasmid DNA candidates to further confirm their visualization using our staining method.
Finally, as an important milestone in this genomic-scale DNA visualization, we considered the human genomic DNA isolated from HeLa cells. The DNA isolation was carried out as per the standard protocol, and the isolated HeLa genomic DNA (gDNA) was stained with the same concentration ratio of DBP: AuPVP NPs that were considered for the other DNA candidates (refer to Methods in Supporting Information). Figure 4A exhibits the successful visualization of 124.1 µm long HeLa gDNA (corresponding to a size of ≈365 kbp) with varying particle arrangement patterns seen across its entire length. Similarly, we were able to visualize another longer strand of HeLa gDNA with a contour length of 113.56 µm (corresponding to a size of ≈334 kbp), at lower magnification during this instance, thereby demonstrating the reproducibility of visualizing long gDNA candidates using our staining method. Furthermore, Figure 4C–E shows the arrangement pattern in detail, taken from various regions across the same strand, under both TEM and SEM. This clearly validates the use of our proposed staining method for any type of genomic/non-genomic DNA visualization, as is evident from the sections discussed thus far.

2.4 AI-Enabled Classification and Analysis of Imaged DNA Molecules
First, differently shaped DNA molecules (linear, coiled, and circular) were identified from the arrangement of stained nanoparticles in the EM images using U-Net,[80] a completely convolutional network-based machine learning model for image segmentation (Figure 5A). Such a model architecture remained essential to our DNA classification from EM images owing to its faster image segmentation capability which adopts a contracting path to capture the context of the images and an expanding path to enable precise localization, despite needing only a few training images to get trained end-to-end.[80] The latter merit was crucial in our study because we considered only 28 EM images across three different DNA candidates as the image set. This image set included 5 images of λ-DNA, 12 images of M13 phage DNA, and 11 images of plasmid DNA. In the training step, mask images were manually obtained by labeling all strands from the image set. After this training and optimization, EM images containing each of the differently shaped DNA molecules were processed and represented by a red color, as can be seen from Figure 5A.

Second, after identifying and classifying DNA based on its shape, we extracted its dimensional features from the EM images obtained using our staining technique. The dimensional features considered were the contour lengths of the linear DNA molecules and the contour diameters and circumferences of the circular DNA molecules. Although measuring the length of a linear strand individually using several image processing tools is easier, measuring the length of a larger number of strands in multiple images manually involves significant errors and bias and requires strenuous labor.
In this regard, we developed an algorithm that can initially identify the linear arrangement of nanoparticles on DNA strands against the background of the EM image, followed by analysis and estimation of the contour length of each strand. The upper row images in Figure 5B show the steps and flow of image processing performed to obtain the contour length distribution data of the stained DNA from their corresponding EM images. Salmon sperm DNA was used as a model for linear-strand image analysis because of its wide distribution range.
In this algorithm, the linear arrangement of nanoparticles was initially skeletonized using the U-Net model. Processes such as dilation and blurring were included to correct truncated parts in the skeletonized strands after the initial processing of the low-magnification EM images of salmon sperm DNA, as shown in Figure S10(A–C) and Section S6 of the Supporting Information. Furthermore, from these processed images, we observed the presence of small particles as a background, which were removed by increasing the size of the threshold filters, and skeletonization of the strands was performed again (Supplementary Section S6, Figure S10D, Supporting Information). Finally, the contour length of each salmon sperm DNA strand was determined. From this analysis, we identified 3885 strands in total from 27 EM images, for which the overall contour length distribution was estimated along with features such as the average and maximum contour length of DNA observed from this image set, as shown in the upper row of Figure 5B. Furthermore, we have provided contour length distribution plots for each of the 27 salmon sperm DNA EM images in the Supporting Information (available in this GitHub Repository).
In the case of circular and coiled DNA, measuring its length manually is complicated compared to that of linear strands. Therefore, we developed an algorithm to measure the circumference of circular DNA by tracking the arrangement of nanoparticles on these DNA strands. The lower-row images in Figure 5B briefly represent how this algorithm was adopted to extract the features of circular DNA strands. We used the M13 phage DNA as a candidate because it has both circular and coiled structures.
For this analysis, 20 high-magnification images of nanoparticles arranged in a circular shape obtained by M13 phage DNA staining were used. They were distinctly identified from their background in raw EM images through skeletonization using the U-Net model, similar to the first step of salmon sperm DNA image processing. From this processed image, the center point of the nanoparticles was determined, and the position of each nanoparticle in the image was plotted as a function of the angle (0–2π radians) and distance from the center point. The sigma-clipping method was then applied to remove nanoparticles with a highly deviated distance from the average, followed by fitting the graph with a Fourier series to convert the arrangement of the discrete nanoparticles into a continuous line, as shown in Figure S11F (Section S6, Supporting Information). Finally, the circumferential contour length of the circular DNA was estimated by measuring the length of the continuous line. This was repeated for all the 20 high-magnification images considered for the M13 phage DNA to obtain a histogram of their circumferential contour lengths, as shown in Figure S12 (Supporting Information). A mean circumferential contour length of 3188.76 ± 1396.35 nm (n = 20 strands) was obtained for the considered coiled and circular M13 phage DNA.
The detailed processes of this dimensional analysis and the estimation of salmon sperm and M13 phage DNA are discussed in the Methods section and Section S6 of the Supporting Information.
3 Discussion
Thus, we have demonstrated the ability of our simple DNA staining method using gold nanoparticle-tagged peptides to visualize DNA molecules of various lengths—ranging from a few hundred nanometers (salmon sperm DNA) to a few tenths of micrometers (λ-DNA, T7 DNA) and to a few hundred micrometers (HeLa cell-derived genomic DNA) and of various conformations (M13 phage and plasmid DNA). In addition, high-resolution visualization of these DNA molecules was achieved through the use of 9–10 nm PVP-AuNPs, which corresponds to a DNA coverage length of ≈26–29 bp. Such a resolution of DNA visualization under EM with good contrast of the stained DNA molecules is one of the key demonstrations of our study. Downstream to the EM visualization, an AI-enabled image analysis of the stained DNA molecules was employed to extract important dimensional features from different types of DNA—linear and circular, and of varying sizes as well. This certainly opens up a new avenue for EM-based DNA visualization and analysis, wherein the DNA molecules could be visualized under EM at ease using nontoxic chemicals as well as identify and comprehend certain dimensional features of the stained DNA molecule that are latent to the naked eye through the AI-powered image analysis. Such an analysis of imaged DNA strands of various properties (size, shape, and conformation) strongly improves the accuracy of extracting dimensional features useful for downstream or derived studies after their visualization and remains robust to several variations in the raw EM images arising from handling during the DNA staining stage, in comparison to heavily biased, inaccurate, and time-consuming manual analysis.
However, an important consideration for our method is that it requires isolated DNA molecules without attached proteins for staining. Hence, if genomic DNA is stained using this method, DNA isolation is essential. In addition, there remains a specific range of PVP-AuNPs size that needs to be used for staining in our method (less than 13 nm), owing to steric hindrance and base coverage issues when larger sized PVP-AuNPs are used. Thus, if larger PVP-AuNPs are used for staining in our method, it could lead to erroneous estimation of dimensional features of the stained DNA molecules using our downstream AI model.
Finally, considering our demonstration of sequence-specific visualization of A–T-patterned DNA fragments, our staining method remains limited to perform the same with much larger genomic DNA molecules. This will be our immediate goal in the subsequent study, so that we open up a novel opportunity for our staining method to be considered extensively and universally for DNA sequencing and barcoding through image analysis of the visualized DNA strands, leading to EM-based mapping. This has the potential to replace currently used optical detection methods for DNA mapping, which are crucially limited by the resolution associated with FMs.
4 Experimental Section
Chemicals Used
Gold (III) chloride trihydrate, trisodium citrate dihydrate, polyvinylpyrrolidone (molecular weight [M.W.]: 10 000), polyethylene glycol methyl ether thiol (M.W.: 6000), methoxy polyethylene glycol amine (M.W.: 5000), hexadecyltrimethylammonium bromide, sodium borohydride, deoxyribonucleic acid (low molecular weight; obtained from salmon sperm), and L-ascorbic acid were purchased from Sigma Aldrich (USA). All cysteine-tagged DNA-binding peptides were purchased from Peptron (Daejeon, South Korea). Distilled water was obtained using an Arium Pro Ultrapure Water System (Sartorius, Germany) installed at our laboratory.
Hexadecyltrimethylammonium chloride was purchased from TCI (Japan); λ-DNA from Bioneer (South Korea) and Thermo Fisher Scientific (USA); 1X TE buffer from Biosesang (South Korea). Molecular Weight Cut-Off (MWCO) filters: 100K, 300K, 1000K, and Minisart Syringe Filters (pore size: 0.45 µm) were purchased from Sartorius (Germany); 50X TAE buffer, 6X gel loading buffer, agarose, and Accupower PCR Premix (50 µL) from Bioneer; Quick Load 1 kb Extend DNA ladder (50 µg mL−1), M13mp18 RF I DNA, and T4 GT7 DNA from New England BioLabs (USA); 6X DNA Loading Dye and ethidium bromide from Thermo Fisher Scientific; and pEGFP-N3 plasmid (#6080-1) from Addgene (MA, USA).
DMEM (high glucose, pyruvate), fetal bovine serum (qualified), antibiotic–antimycotic (100×), and trypsin-EDTA (0.25% – phenol red) were purchased from Gibco (USA). Ultrapure Low Melting Point agarose was obtained from Invitrogen (USA); proteinase K from Enzynomics (South Korea); phage T7 DNA from Bioron (Germany); and QIAquick PCR purification kit from Qiagen (Germany). The TEM grids (Carbon Type-B, 300 mesh, Copper and Carbon Type-B, Triple Slot, Copper) used in this study were purchased from Ted Pella (USA).
Equipment Used
A UV–Visible Spectroscopy System (Agilent 8453; Agilent Technologies, USA) was used to characterize the optical properties of the synthesized AuNPs. A ZetaSizer Nano ZS90 (Malvern Panalytical, UK) was used to measure the surface zeta potential of the labeled AuNPs. An ultra-high-resolution transmission electron microscope with an accelerating voltage of 300 kV (JEM-3010; JEOL, Japan) and a Schottky Field Emission Scanning Electron Microscope (JSM-IT800; JEOL), with an accelerating voltage of 15 kV were used to visualize different DNA molecules using our developed staining method. An Eppendorf 5430 R Centrifuge (Eppendorf, Germany) was used to process the nanoparticles and perform molecular weight cut-off filtration during staining. EzDrop1000 Nanodrop (Blue-Ray Biotech, Taiwan) was used to measure the stock concentrations of all the DNA candidates used in this study. A Varioskan LUX multimode microplate reader (Thermo Fisher Scientific) was used to measure fluorescence during the competition assay. An Agaro-Power electrophoresis system (Bioneer) with an electrophoretic tank, loading adaptor, power supply, gel caster, and tray was used for agarose gel electrophoresis to characterize the filtration performance of the MWCO filters used in this study.
Gold Nanoparticle Engineering—Size, Capping Agent, Concentration–Size Variation
The AuNPs used in this study were synthesized using the conventional Turkevich's method[81-83] with some modifications to the trisodium citrate dihydrate concentration to alter the particle size. This method was considered in our study because of two important reasons: a) since this method makes use of trisodium citrate as the reducing and capping agent, which has a weaker binding affinity to gold compared to other capping agents, direct ligand exchange was made possible and b), it enabled gold nanoparticle synthesis in high yield and good polydispersity.[82, 83]
Briefly, the synthesis protocol was as follows: First, the reaction glassware was washed with aquaregia and subsequently with distilled water before use. Then, an auric chloride solution (1 mM; 100 mL) was added to a washed three-necked flask placed in a reflux system. The solution was heated at approximately 120 °C and stirred at approximately 400 rpm. Upon observing rapid condensation inside the flask, 10 mL of the corresponding concentration of trisodium citrate dihydrate solution was added, and the mixture was left undisturbed for approximately 5 min until a drastic color change to ruby red was observed. This was followed by reducing the temperature by 10–20 °C to allow the growth of nanoparticles for about 20 min. Subsequently, the reflux system was switched off, and the reaction mixture was transferred to another stirrer for about 1 h to cool. Here, it must be noted that 19.4 mM, 38.8 mM, and 77.6 mM of trisodium citrate dihydrate were used for the synthesis of 25 nm, 13 nm, and 9–10 nm AuNPs, respectively.
Further, for capping the synthesized citrate-capped AuNPs with PVP, equivolume solutions of the synthesized AuNPs (all sizes: 25 nm, 13 nm, and 9–10 nm) and 10 µM of PVP (M.W.: 10 000) were mixed together and incubated overnight. Subsequently, to remove excess PVP, the incubated nanoparticles were washed twice by centrifugation.
Briefly, the washing step was as follows: 800 µL of the incubated nanoparticles were added to microcentrifuge tubes and centrifuged at 16 700 rcf for 15 min (for 13 nm particles) / 20 817 rcf for 15 min (for 9–10 nm particles) / 20 817 rcf for 8 min (for 25 nm particles). Further, 780 µL of the supernatant was removed from all tubes and 380 µL of distilled water was added to each of them, followed by re-centrifugation using the above-mentioned conditions. This step was repeated twice to complete the washing process. Hereafter, these processed solutions are termed as stock PVP-capped AuNPs (PVP-AuNPs) solutions.
Subsequently, the stock PVP-AuNPs solutions (all sizes: 25 nm, 13 nm, and 9–10 nm) were characterized for size, optical, and surface properties using TEM, UV–Visible spectroscopy, and zeta potential measurements, respectively.
Briefly, for TEM sampling, the stock PVP-AuNPs solutions (all sizes: 25 nm, 13 nm, and 9–10 nm) were diluted 4x times using distilled water (250 µL of stock NPs added to 750 µL of distilled water) and centrifuged at 18 000 rcf for 10 min to collect them. Further, 950 µL of the supernatant was removed, and 1 µL of the collected particles was added to the TEM grid. The sample was dried in vacuum for 15 min at 40 °C. For all samples, except for HeLa genomic DNA, the 300 mesh TEM grid was used. In contrast, for better and more continuous visualization of the longer HeLa genomic DNA without breaks, the 3-mesh TEM grid was used.
For UV–visible spectroscopic measurements, the stock PVP-AuNPs solutions (all sizes: 25 nm, 13 nm, and 9–10 nm) were diluted by half, and their absorption spectra were measured. Here, 1.77 mM, 3.527 mM, and 5 mM trisodium citrate dihydrate solutions were used as blanks for the 25 nm, 13 nm, and 9–10 nm particles, respectively. For zeta potential measurements, the stock PVP-AuNPs solutions (all sizes: 25 nm, 13 nm, and 9–10 nm) were diluted 10× times, and their surface zeta potential values were measured.
Capping Agent Variation
The CTAC-capped AuNPs were prepared using a seed-mediated growth method.[84, 85] This protocol was divided into two parts: preparation of seed particles, followed by primary particle synthesis. As the 9–10 nm particles showed much better results for DNA visualization than the 13 and 25 nm particles, the CTAC-capped AuNPs were synthesized using the 9–10 nm particles.
Briefly, the synthesis protocol was as follows: First, the reaction glassware was washed with aquaregia and subsequently with distilled water before use. For seed particle preparation, 10 mL of 100 mM CTAB was taken in a washed beaker, followed by adding 250 µL of 10 mM auric chloride solution under stirring at 500 rpm. Then, 600 µL of 10 mM freshly prepared NaBH4 (using ice-cold distilled water) was added to the above solution and the reaction was allowed to proceed for 2 min under continuous stirring. The prepared seed solution was incubated at room temperature (≈28 °C) for 2 h.
For primary particle synthesis, 20 mL of 200 mM CTAC was added to a washed beaker, followed by 15 mL of 100 mM ascorbic acid under stirring at 500 rpm. Then, 500 µL of the prepared seed solution and 20 mL of 0.5 mM auric chloride were added to the above solution. The solution was stirred for 15 min until the color of the solution changed to dark purple, confirming the growth of nanoparticles. Finally, the prepared primary nanoparticles were centrifuged at 20 817 rcf for 1 h, followed by washing with 20 mM CTAC solution (for 1 mL of primary nanoparticles centrifuged, 950 µL of supernatant was removed, and the same volume of 20 mM CTAC solution was added). Hereafter, this processed solution is termed as the stock CTAC-capped AuNPs (CTAC-AuNPs) solution.
PEG-thiol- and PEG-amine-capped AuNPs were obtained using a direct ligand exchange method, similar to the method by which PVP-capped AuNPs were obtained from citrate-capped AuNPs. The 9–10 nm stock citrate-capped AuNPs were used for ligand exchange with PEG-thiol owing to their better performance in DNA visualization. Briefly, equivolume solutions of citrate-capped AuNPs and 10 µM of PEG-thiol (M.W.: 6000)/PEG-amine (M.W.: 5000) were mixed together and incubated overnight. Subsequently, to remove excess PEG-thiol/amine, the incubated nanoparticles were washed twice by centrifugation.
Briefly, the washing step was as follows: 800 µL of the incubated nanoparticles were added to microcentrifuge tubes and centrifuged at 19 000 rcf for 15 min. Then, 780 µL of the supernatant was removed from all tubes, and 380 µL of distilled water was added, followed by another round of centrifugation using the same conditions. This step was repeated twice to complete the washing process. Hereafter, this processed solution is termed as stock PEG-thiol/amine-capped AuNPs (PEG-SH AuNPs/PEG-NH2 AuNPs) solution.
The direct ligand exchange method was initially considered for CTAC capping of the citrate-capped AuNPs; however, the AuNPs aggregated immediately upon addition of the CTAC solution, owing to the charge neutralization of negatively charged citrates by positively charged CTAC, thereby leaving no driving force to stabilize the AuNPs. Therefore, as an alternative, seed-based synthesis was considered.
Subsequently, the CTAC-AuNPs, PEG-SH AuNPs, and PEG-NH2 AuNPs stock solutions were characterized for size, optical, and surface properties using TEM, UV–Visible Spectroscopy and zeta potential measurements, respectively, under the same conditions as that for PVP-AuNPs, except that distilled water was used as a blank solution for UV–visible measurements.
Concentration Variation
Stock PVP-AuNPs, at an estimated concentration of 30 nM, were initially used for DNA visualization. However, the entire λ-DNA backbone was not visible under TEM due to less particle density at the backbone. Hence, to optimize the nanoparticle density on the DNA backbone for proper visualization under EM, the stock PVP-AuNPs were concentrated 5× (150 nM), 10× (300 nM) and 20× (600 nM) times their original concentration by centrifuging the particles at 20 817 rcf for 15 min and removing the appropriate amount of supernatant to obtain the requisite concentrations.
The aforementioned procedure was also followed to concentrate the 9–10 nm stock PEG-SH AuNPs, and the 13 nm and 25 nm stock PVP-AuNPs.
Isolation of Various Genomic DNA and Preparation of Salmon Sperm DNA
HeLa cells were cultured in DMEM media with final 10% FBS and 1× antibiotic-antimycotic solutions. When HeLa cells covered about 80% of the cell culture dish, the media was suctioned carefully using a Pasteur pipette. Further, 5 mL of 1X PBS buffer was dispensed onto the side of the dish and suctioned. Then, trypsin-EDTA buffer (2 mL) was dispensed into the dish, and it was incubated for 2 min after shaking manually to cover the bottom of the dish. Subsequently, 8 mL of the medium was added, and 10 mL of the total solution was transferred to a 15 mL conical tube and centrifuged at 500 rcf for 10 min. HeLa cells were resuspended in 1 mL of 1X PBS buffer and transferred to an Eppendorf tube. HeLa cells were finally covered in 500 µL of 1X PBS buffer after washing twice with 1X PBS buffer.
The DNA plug was made by mixing the above cell solution with 2% 1X TE Low Gelling Temperature (LGT) agarose solution to obtain a final concentration of 0.7% LGT and incubating them for 30 min at 4 °C. The DNA plug was then washed with 1X TE buffer for 30 min, and this was repeated four times. Proteinase K was inoculated at a final concentration of 2 mg mL−1 and incubated for 4 h at 42 °C. The same amount of proteinase K was added to the DNA plug, and it was incubated at 42 °C overnight. It was then washed with 1X TE buffer for 30 min, and this was repeated four times. Finally, 380 µL of 1X TE buffer was added, and the DNA plug was melted by incubating at 65 °C for 20 min.
A stock solution of salmon sperm DNA was prepared by dissolving sperm DNA powder in distilled water to prepare a concentration of 5 mg mL−1. The prepared solution was heated at 65 °C for 15–20 min for melting. Then, the solution was vortexed gently until it turned transparent, followed by filtering with a syringe filter (pore size: 0.45 µm). This filtered solution was used as a stock solution for the visualization experiments.
Staining Protocol of Different DNA Candidates with Nanoparticles
The protocol for DNA staining with nanoparticles used in this study was developed in-house after optimization trials for nanoparticle size, capping agent, sample incubation time, and concentrations of involved reactants. Herein, we report and briefly explain an optimized staining protocol.
First, the concentration of the stock solution of DNA candidates (λ-DNA and salmon sperm DNA) was measured using Nanodrop. Based on the stock concentration, the solution was diluted to a reaction concentration of 15 pM. For all other DNA candidates, different dilutions were used to obtain their working reaction concentrations because their stock concentrations could not be measured accurately. Second, the stock concentration of Cys-tagged HMG-DBD (2,3) DBP was measured to be approximately 216 µM. The stock solution was diluted to a reaction concentration of 60 nM to maintain an optimal DNA:DBP ratio of 1:4000 for proper visualization. All DNA and DBP dilutions were performed using 1X TE buffer.
Then, 10 µL of DNA solution and 10 µL of 60 nM Cys-HMG-DBD (2,3) solution were reacted, mixed by gentle tapping, and incubated for 20 min. Subsequently, 10 µL of AuNPs (for all capping agents, particle size and particle concentration used in this study) was added to the above solution, mixed by gentle tapping, and incubated for 45 min. The reaction mixture was then filtered using MWCO filters (300K) to remove unreacted AuNPs and DBP from the reaction solution. Briefly, in this step, 30 µL of the reacted DNA:DBP:AuNPs sample was mixed with 300 µL of 1X TE buffer and added to the 300K MWCO filter, which was then centrifuged at 3000 rcf for 10 min. Then, 2 µL of the filtered solution was mixed with 2 µL of 1× TE buffer for easier TEM sample preparation. Finally, 2 µL of this mixed solution was added to the TEM grid, followed by drying in vacuum for 30 min.
Characterization of MWCO Filters
Three MWCO filters with different molecular weight cut-off values were tested in this study: 100K, 300K and 1000K. These filters were characterized for their filtering efficiency using two methods (Figures S6 and S7, Supporting Information).
In the first method, 300 nM (10× concentrated) PVP-AuNPs were prepared, and their absorption spectra were obtained using UV–visible spectroscopy (Figure S6A, Supporting Information). Then, 30 µL of this sample was mixed with 1X TE buffer and filtered using three filters (100K, 300K, and 1000K) by centrifugation at 3000 rcf for 10 min. 5 µL of this filtered sample was mixed with 75 µL of distilled water to prepare an 80 µL solution, whose absorption spectrum was measured.
In the other method, gel electrophoresis (Figure S6B,C, Supporting Information) was performed to check for any shearing of λ-DNA upon filtering using all three cut-off filters (100K, 300K, and 1000K). Herein, 1.3% agarose gel was prepared using 1X TAE buffer and set up for electrophoresis by standard procedure.[51] Simultaneously, 10 µL of λ-DNA (of different dilutions) was mixed with 300 µL of 1X TE buffer and filtered using the three MWCO filters by centrifugation at 3000 rcf for 10 min. Then, 4 µL of gel loading dye was mixed with 20 µL of filtered/supernatant samples obtained from each filter and added to the electrophoresis wells. The gel electrophoresis was run for 30–45 min, and the bands were subsequently observed under UV light. A 1 kb DNA ladder with an exclusive band for 48.5 kbp λ-DNA was used in this study.
Fluorescence Measurement to Evaluate Competition between Thiol Groups of PEG-SH and Cys-HMG-DBD (2,3)
Initially, to match the concentrations of PVP-capped and PEG-SH-capped AuNPs, UV–visible spectra of PVP-capped solutions with different dilutions (no dilution, 10%, 25%, and 34% diluted) and PEG-SH solution (no dilution) were taken, and absorption intensity at 523 nm was considered. In this study, we observed that 10%-diluted PVP matched the absorption intensity of undiluted PEG-SH. Then, 250 µL of 10%-diluted PVP-AuNPs/undiluted PEG-SH AuNPs was mixed with 250 µL of 120 nM mNG-Cys-HMG-DBD (2,3)–DBP tagged with fluorescent protein and incubated for 45 min. The above samples were then centrifuged at 20 817 rcf for 15 min to remove unreacted fluorescent protein-tagged DBP molecules. The washing process was repeated thrice, wherein 400 µL of supernatant was removed and 400 µL of 1X TE buffer was added during each step. Samples from each washing step were used for fluorescence measurements of these sets of nanoparticles. Finally, because mNG is a tagged fluorescent protein with excitation and emission wavelengths of 490 and 517 nm, respectively, these values were used for fluorescence measurement; samples were added to a microplate and measured using a microplate reader.
H-NS Protein Preparation
The H-NS plasmid was constructed by combining two plasmids, H-NS-mCherry[39] and tTALE-mScarlet.[86] Specifically, the tTALE-mScarlet plasmid in pET15b was digested with restriction enzymes (NdeI and BamHI) to remove tTALE-mScarlet, and the amplified H-NS gene was inserted and ligated using the AccuRapid Cloning Kit (Bioneer).
The primers used for amplifying the H-NS protein are described below:
H-NS Forward primer: 5′-GCG GCC TGG TGC CGC GCG GCA GCC ATA TGA TGA GCG AAG CAC TTA AAA TTC TG-3′
H-NS Reverse primer: 5′-GGG CTT TGT TAG CAG CCG GAT CCT TGC TTG ATC AGG AAA TCG TCG-3′
Using a standard cloning procedure, the constructed H-NS plasmid was transfected into Escherichia coli BL21 (DE3) strains.[86]
Protein Expression and Purification
A single colony of the transformed cells was inoculated into fresh Luria-Bertani (LB) medium containing ampicillin and incubated for 1 h. When the cells were fully saturated, a subsequent culture was conducted to an optical density of approximately 0.6 at 37 °C with the corresponding antibiotics. Then, 1 mM Isopropyl β-d-1–thiogalactopyranoside (IPTG) was used to induce H-NS/SATB1-mScarlet overnight in a shaker at 20 °C and 200 rpm. Cells for protein purification were harvested by centrifugation at 10 000 rcf for 10 min (following centrifugation performed under similar conditions), and the residual media were washed with cell lysis buffer (50 mM Na2HPO4, 300 mM NaCl, and 10 mM imidazole; pH 8.0). The cells were lysed by ultrasonication for 15 min, and cell debris was centrifuged at 10 000 rcf for 10 min at 4 °C.
His-tagged H-NS/SATB1-mScarlet proteins were purified by affinity chromatography using a Ni-NTA agarose resin. The protein–resin mixture was kept on a shaking platform at 4 °C for 2 h. The lysate containing proteins bound to Ni-NTA agarose resin were loaded onto the column for gravity chromatography and further rinsed multiple times using a protein washing buffer (50 mM Na2HPO4, 300 mM NaCl, and 20 mM imidazole; pH 8.0).
Finally, the proteins were eluted using a protein elution buffer (50 mM Na2HPO4, 300 mM NaCl, and 250 mM imidazole; pH 8.0). All proteins were diluted (10 µg mL−1) using 50% w/w glycerol/1X TE buffer (Tris 10 mM and EDTA 1 mM; pH 8.0).
Synthesis of A–T Patterned DNA Fragments and Their Sequence Information
A–T rich (3778 bp), A–T even (3978 bp) and A–T poor (5454 bp) DNA fragments were prepared using λ-DNA as the DNA template for PCR. PCR was performed in accordance with the standard protocol using the AccuPower PCR Premix (Bioneer). DNA fragments were obtained by PCR purification using a PCR Purification Kit (Qiagen, Venlo, Netherlands). The primers and amplified DNA sequences used in this study are described below.
A–T rich Forward primer: 5′-TTT GCT ACC ACC ATG ACT AAC G-3′
A–T rich Reverse primer: 5′-GCA GGA AGA CAA ACA CAG AGC-3′
A–T Rich DNA Sequence
5′-TTTGCTACCACCATGACTAACGCGCTTGCGGGTAAACAACCGAAGAATGCGACACTGACGGCGCTGGCAGGGCTTTCCACGGCGAAAAATAAATTACCGTATTTTGCGGAAAATGATGCCGCCAGCCTGACTGAACTGACTCAGGTTGGCAGGGATATTCTGGCAAAAAATTCCGTTGCAGATGTTCTTGAATACCTTGGGGCCGGTGAGAATTCGGCCTTTCCGGCAGGTGCGCCGATCCCGTGGCCATCAGATATCGTTCCGTCTGGCTACGTCCTGATGCAGGGGCAGGCGTTTGACAAATCAGCCTACCCAAAACTTGCTGTCGCGTATCCATCGGGTGTGCTTCCTGATATGCGAGGCTGGACAATCAAGGGGAAACCCGCCAGCGGTCGTGCTGTATTGTCTCAGGAACAGGATGGAATTAAGTCGCACACCCACAGTGCCAGTGCATCCGGTACGGATTTGGGGACGAAAACCACATCGTCGTTTGATTACGGGACGAAAACAACAGGCAGTTTCGATTACGGCACCAAATCGACGAATAACACGGGGGCTCATGCTCACAGTCTGAGCGGTTCAACAGGGGCCGCGGGTGCTCATGCCCACACAAGTGGTTTAAGGATGAACAGTTCTGGCTGGAGTCAGTATGGAACAGCAACCATTACAGGAAGTTTATCCACAGTTAAAGGAACCAGCACACAGGGTATTGCTTATTTATCGAAAACGGACAGTCAGGGCAGCCACAGTCACTCATTGTCCGGTACAGCCGTGAGTGCCGGTGCACATGCGCATACAGTTGGTATTGGTGCGCACCAGCATCCGGTTGTTATCGGTGCTCATGCCCATTCTTTCAGTATTGGTTCACACGGACACACCATCACCGTTAACGCTGCGGGTAACGCGGAAAACACCGTCAAAAACATTGCATTTAACTATATTGTGAGGCTTGCATAATGGCATTCAGAATGAGTGAACAACCACGGACCATAAAAATTTATAATCTGCTGGCCGGAACTAATGAATTTATTGGTGAAGGTGACGCATATATTCCGCCTCATACCGGTCTGCCTGCAAACAGTACCGATATTGCACCGCCAGATATTCCGGCTGGCTTTGTGGCTGTTTTCAACAGTGATGAGGCATCGTGGCATCTCGTTGAAGACCATCGGGGTAAAACCGTCTATGACGTGGCTTCCGGCGACGCGTTATTTATTTCTGAACTCGGTCCGTTACCGGAAAATTTTACCTGGTTATCGCCGGGAGGGGAATATCAGAAGTGGAACGGCACAGCCTGGGTGAAGGATACGGAAGCAGAAAAACTGTTCCGGATCCGGGAGGCGGAAGAAACAAAAAAAAGCCTGATGCAGGTAGCCAGTGAGCATATTGCGCCGCTTCAGGATGCTGCAGATCTGGAAATTGCAACGAAGGAAGAAACCTCGTTGCTGGAAGCCTGGAAGAAGTATCGGGTGTTGCTGAACCGTGTTGATACATCAACTGCACCTGATATTGAGTGGCCTGCTGTCCCTGTTATGGAGTAATCGTTTTGTGATATGCCGCAGAAACGTTGTATGAAATAACGTTCTGCGGTTAGTTAGTATATTGTAAAGCTGAGTATTGGTTTATTTGGCGATTATTATCTTCAGGAGAATAATGGAAGTTCTATGACTCAATTGTTCATAGTGTTTACATCACCGCCAATTGCTTTTAAGACTGAACGCATGAAATATGGTTTTTCGTCATGTTTTGAGTCTGCTGTTGATATTTCTAAAGTCGGTTTTTTTTCTTCGTTTTCTCTAACTATTTTCCATGAAATACATTTTTGATTATTATTTGAATCAATTCCAATTACCTGAAGTCTTTCATCTATAATTGGCATTGTATGTATTGGTTTATTGGAGTAGATGCTTGCTTTTCTGAGCCATAGCTCTGATATCCAAATGAAGCCATAGGCATTTGTTATTTTGGCTCTGTCAGCTGCATAACGCCAAAAAATATATTTATCTGCTTGATCTTCAAATGTTGTATTGATTAAATCAATTGGATGGAATTGTTTATCATAAAAAATTAATGTTTGAATGTGATAACCGTCCTTTAAAAAAGTCGTTTCTGCAAGCTTGGCTGTATAGTCAACTAACTCTTCTGTCGAAGTGATATTTTTAGGCTTATCTACCAGTTTTAGACGCTCTTTAATATCTTCAGGAATTATTTTATTGTCATATTGTATCATGCTAAATGACAATTTGCTTATGGAGTAATCTTTTAATTTTAAATAAGTTATTCTCCTGGCTTCATCAAATAAAGAGTCGAATGATGTTGGCGAAATCACATCGTCACCCATTGGATTGTTTATTTGTATGCCAAGAGAGTTACAGCAGTTATACATTCTGCCATAGATTATAGCTAAGGCATGTAATAATTCGTAATCTTTTAGCGTATTAGCGACCCATCGTCTTTCTGATTTAATAATAGATGATTCAGTTAAATATGAAGGTAATTTCTTTTGTGCAAGTCTGACTAACTTTTTTATACCAATGTTTAACATACTTTCATTTGTAATAAACTCAATGTCATTTTCTTCAATGTAAGATGAAATAAGAGTAGCCTTTGCCTCGCTATACATTTCTAAATCGCCTTGTTTTTCTATCGTATTGCGAGAATTTTTAGCCCAAGCCATTAATGGATCATTTTTCCATTTTTCAATAACATTATTGTTATACCAAATGTCATATCCTATAATCTGGTTTTTGTTTTTTTGAATAATAAATGTTACTGTTCTTGCGGTTTGGAGGAATTGATTCAAATTCAAGCGAAATAATTCAGGGTCAAAATATGTATCAATGCAGCATTTGAGCAAGTGCGATAAATCTTTAAGTCTTCTTTCCCATGGTTTTTTAGTCATAAAACTCTCCATTTTGATAGGTTGCATGCTAGATGCTGATATATTTTAGAGGTGATAAAATTAACTGCTTAACTGTCAATGTAATACAAGTTGTTTGATCTTTGCAATGATTCTTATCAGAAACCATATAGTAAATTAGTTACACAGGAAATTTTTAATATTATTATTATCATTCATTATGTATTAAAATTAGAGTTGTGGCTTGGCTCTGCTAACACGTTGCTCATAGGAGATATGGTAGAGCCGCAGACACGTCGTATGCAGGAACGTGCTGCGGCTGGCTGGTGAACTTCCGATAGTGCGGGTGTTGAATGATTTCCAGTTGCTACCGATTTTACATATTTTTTGCATGAGAGAATTTGTACCACCTCCCACCGACCATCTATGACTGTACGCCACTGTCCCTAGGACTGCTATGTGCCGGAGCGGACATTACAAACGTCCTTCTCGGTGCATGCCACTGTTGCCAATGACCTGCCTAGGAATTGGTTAGCAAGTTACTACCGGATTTTGTAAAAACAGCCCTCCTCATATAAAAAGTATTCGTTCACTTCCGATAAGCGTCGTAATTTTCTATCTTTCATCATATTCTAGATCCCTCTGAAAAAATCTTCCGAGTTTGCTAGGCACTGATACATAACTCTTTTCCAATAATTGGGGAAGTCATTCAAATCTATAATAGGTTTCAGATTTGCTTCAATAAATTCTGACTGTAGCTGCTGAAACGTTGCGGTTGAACTATATTTCCTTATAACTTTTACGAAAGAGTTTCTTTGAGTAATCACTTCACTCAAGTGCTTCCCTGCCTCCAAACGATACCTGTTAGCAATATTTAATAGCTTGAAATGATGAAGAGCTCTGTGTTTGTCTTCCTGC-3′
A–T even Forward primer: 5′-CGC TTT GTA ACG GAG TAG ACG-3′
A–T even Reverse primer: 5′-GTT AAC CGC CCT ATT CTC TCG-3′
A–T Even DNA Sequence
5′-CGCTTTGTAACGGAGTAGACGAAAGTGATTGCGCCTACCCGGATATTATCGTGAGGATGCGTCATCGCCATTGCTCCCCAAATACAAAACCAATTTCAGCCAGTGCCTCGTCCATTTTTTCGATGAACTCCGGCACGATCTCGTCAAAACTCGCCATGTACTTTTCATCCCGCTCAATCACGACATAATGCAGGCCTTCACGCTTCATACGCGGGTCATAGTTGGCAAAGTACCAGGCATTTTTTCGCGTCACCCACATGCTGTACTGCACCTGGGCCATGTAAGCTGACTTTATGGCCTCGAAACCACCGAGCCGGAACTTCATGAAATCCCGGGAGGTAAACGGGCATTTCAGTTCAAGGCCGTTGCCGTCACTGCATAAACCATCGGGAGAGCAGGCGGTACGCATACTTTCGTCGCGATAGATGATCGGGGATTCAGTAACATTCACGCCGGAAGTGAATTCAAACAGGGTTCTGGCGTCGTTCTCGTACTGTTTTCCCCAGGCCAGTGCTTTAGCGTTAACTTCCGGAGCCACACCGGTGCAAACCTCAGCAAGCAGGGTGTGGAAGTAGGACATTTTCATGTCAGGCCACTTCTTTCCGGAGCGGGGTTTTGCTATCACGTTGTGAACTTCTGAAGCGGTGATGACGCCGAGCCGTAATTTGTGCCACGCATCATCCCCCTGTTCGACAGCTCTCACATCGATCCCGGTACGCTGCAGGATAATGTCCGGTGTCATGCTGCCACCTTCTGCTCTGCGGCTTTCTGTTTCAGGAATCCAAGAGCTTTTACTGCTTCGGCCTGTGTCAGTTCTGACGATGCACGAATGTCGCGGCGAAATATCTGGGAACAGAGCGGCAATAAGTCGTCATCCCATGTTTTATCCAGGGCGATCAGCAGAGTGTTAATCTCCTGCATGGTTTCATCGTTAACCGGAGTGATGTCGCGTTCCGGCTGACGTTCTGCAGTGTATGCAGTATTTTCGACAATGCGCTCGGCTTCATCCTTGTCATAGATACCAGCAAATCCGAAGGCCAGACGGGCACACTGAATCATGGCTTTATGACGTAACATCCGTTTGGGATGCGACTGCCACGGCCCCGTGATTTCTCTGCCTTCGCGAGTTTTGAATGGTTCGCGGCGGCATTCATCCATCCATTCGGTAACGCAGATCGGATGATTACGGTCCTTGCGGTAAATCCGGCATGTACAGGATTCATTGTCCTGCTCAAAGTCCATGCCATCAAACTGCTGGTTTTCATTGATGATGCGGGACCAGCCATCAACGCCCACCACCGGAACGATGCCATTCTGCTTATCAGGAAAGGCGTAAATTTCTTTCGTCCACGGATTAAGGCCGTACTGGTTGGCAACGATCAGTAATGCGATGAACTGCGCATCGCTGGCATCACCTTTAAATGCCGTCTGGCGAAGAGTGGTGATCAGTTCCTGTGGGTCGACAGAATCCATGCCGACACGTTCAGCCAGCTTCCCAGCCAGCGTTGCGAGTGCAGTACTCATTCGTTTTATACCTCTGAATCAATATCAACCTGGTGGTGAGCAATGGTTTCAACCATGTACCGGATGTGTTCTGCCATGCGCTCCTGAAACTCAACATCGTCATCAAACGCACGGGTAATGGATTTTTTGCTGGCCCCGTGGCGTTGCAAATGATCGATGCATAGCGATTCAAACAGGTGCTGGGGCAGGCCTTTTTCCATGTCGTCTGCCAGTTCTGCCTCTTTCTCTTCACGGGCGAGCTGCTGGTAGTGACGCGCCCAGCTCTGAGCCTCAAGACGATCCTGAATGTAATAAGCGTTCATGGCTGAACTCCTGAAATAGCTGTGAAAATATCGCCCGCGAAATGCCGGGCTGATTAGGAAAACAGGAAAGGGGGTTAGTGAATGCTTTTGCTTGATCTCAGTTTCAGTATTAATATCCATTTTTTATAAGCGTCGACGGCTTCACGAAACATCTTTTCATCGCCAATAAAAGTGGCGATAGTGAATTTAGTCTGGATAGCCATAAGTGTTTGATCCATTCTTTGGGACTCCTGGCTGATTAAGTATGTCGATAAGGCGTTTCCATCCGTCACGTAATTTACGGGTGATTCGTTCAAGTAAAGATTCGGAAGGGCAGCCAGCAACAGGCCACCCTGCAATGGCATATTGCATGGTGTGCTCCTTATTTATACATAACGAAAAACGCCTCGAGTGAAGCGTTATTGGTATGCGGTAAAACCGCACTCAGGCGGCCTTGATAGTCATATCATCTGAATCAAATATTCCTGATGTATCGATATCGGTAATTCTTATTCCTTCGCTACCATCCATTGGAGGCCATCCTTCCTGACCATTTCCATCATTCCAGTCGAACTCACACACAACACCATATGCATTTAAGTCGCTTGAAATTGCTATAAGCAGAGCATGTTGCGCCAGCATGATTAATACAGCATTTAATACAGAGCCGTGTTTATTGAGTCGGTATTCAGAGTCTGACCAGAAATTATTAATCTGGTGAAGTTTTTCCTCTGTCATTACGTCATGGTCGATTTCAATTTCTATTGATGCTTTCCAGTCGTAATCAATGATGTATTTTTTGATGTTTGACATCTGTTCATATCCTCACAGATAAAAAATCGCCCTCACACTGGAGGGCAAAGAAGATTTCCAATAATCAGAACAAGTCGGCTCCTGTTTAGTTACGAGCGACATTGCTCCGTGTATTCACTCGTTGGAATGAATACACAGTGCAGTGTTTATTCTGTTATTTATGCCAAAAATAAAGGCCACTATCAGGCAGCTTTGTTGTTCTGTTTACCAAGTTCTCTGGCAATCATTGCCGTCGTTCGTATTGCCCATTTATCGACATATTTCCCATCTTCCATTACAGGAAACATTTCTTCAGGCTTAACCATGCATTCCGATTGCAGCTTGCATCCATTGCATCGCTTGAATTGTCCACACCATTGATTTTTATCAATAGTCGTAGTCATACGGATAGTCCTGGTATTGTTCCATCACATCCTGAGGATGCTCTTCGAACTCTTCAAATTCTTCTTCCATATATCACCTTAAATAGTGGATTGCGGTAGTAAAGATTGTGCCTGTCTTTTAACCACATCAGGCTCGGTGGTTCTCGTGTACCCCTACAGCGAGAAATCGGATAAACTATTACAACCCCTACAGTTTGATGAGTATAGAAATGGATCCACTCGTTATTCTCGGACGAGTGTTCAGTAATGAACCTCTGGAGAGAACCATGTATATGATCGTTATCTGGGTTGGACTTCTGCTTTTAAGCCCAGATAACTGGCCTGAATATGTTAATGAGAGAATCGGTATTCCTCATGTGTGGCATGTTTTCGTCTTTGCTCTTGCATTTTCGCTAGCAATTAATGTGCATCGATTATCAGCTATTGCCAGCGCCAGATATAAGCGATTTAAGCTAAGAAAACGCATTAAGATGCAAAACGATAAAGTGCGATCAGTAATTCAAAACCTTACAGAAGAGCAATCTATGGTTTTGTGCGCAGCCCTTAATGAAGGCAGGAAGTATGTGGTTACATCAAAACAATTCCCATACATTAGTGAGTTGATTGAGCTTGGTGTGTTGAACAAAACTTTTTCCCGATGGAATGGAAAGCATATATTATTCCCTATTGAGGATATTTACTGGACTGAATTAGTTGCCAGCTATGATCCATATAATATTGAGATAAAGCCAAGGCCAATATCTAAGTAACTAGATAAGAGGAATCGATTTTCCCTTAATTTTCTGGCGTCCACTGCATGTTATGCCGCGTTCGCCAGGCTTGCTGTACCATGTGCGCTGATTCTTGCGCTCAATACGTTGCAGGTTGCTTTCAATCTGTTTGTGGTATTCAGCCAGCACTGTAAGGTCTATCGGATTTAGTGCGCTTTCTACTCGTGATTTCGGTTTGCGATTCAGCGAGAGAATAGGGCGGTTAAC-3′
A–T poor Forward primer: 5′-GAA GCC GTT AAA CAG ATT GAG C-3′
A–T poor Reverse primer: 5′-GTG CTG ATC TCC TGA GAA ACC-3′
A–T Poor DNA Sequence
5′-GAAGCCGTTAAACAGATTGAGCAGGAAGTGCTTACCACCTGGCCCACGGAGGCAATTTCTCATGCTGAAAACGTGGTGTACCGGCTGTCTGGTATGTATGAGTTTGTGGTGAATAATGCCCCTGAACAGACAGAGGACGCCGGGCCCGCAGAGCCTGTTTCTGCGGGAAAGTGTTCGACGGTGAGCTGAGTTTTGCCCTGAAACTGGCGCGTGAGATGGGGCGACCCGACTGGCGTGCCATGCTTGCCGGGATGTCATCCACGGAGTATGCCGACTGGCACCGCTTTTACAGTACCCATTATTTTCATGATGTTCTGCTGGATATGCACTTTTCCGGGCTGACGTACACCGTGCTCAGCCTGTTTTTCAGCGATCCGGATATGCATCCGCTGGATTTCAGTCTGCTGAACCGGCGCGAGGCTGACGAAGAGCCTGAAGATGATGTGCTGATGCAGAAAGCGGCAGGGCTTGCCGGAGGTGTCCGCTTTGGCCCGGACGGGAATGAAGTTATCCCCGCTTCCCCGGATGTGGCGGACATGACGGAGGATGACGTAATGCTGATGACAGTATCAGAAGGGATCGCAGGAGGAGTCCGGTATGGCTGAACCGGTAGGCGATCTGGTCGTTGATTTGAGTCTGGATGCGGCCAGATTTGACGAGCAGATGGCCAGAGTCAGGCGTCATTTTTCTGGTACGGAAAGTGATGCGAAAAAAACAGCGGCAGTCGTTGAACAGTCGCTGAGCCGACAGGCGCTGGCTGCACAGAAAGCGGGGATTTCCGTCGGGCAGTATAAAGCCGCCATGCGTATGCTGCCTGCACAGTTCACCGACGTGGCCACGCAGCTTGCAGGCGGGCAAAGTCCGTGGCTGATCCTGCTGCAACAGGGGGGGCAGGTGAAGGACTCCTTCGGCGGGATGATCCCCATGTTCAGGGGGCTTGCCGGTGCGATCACCCTGCCGATGGTGGGGGCCACCTCGCTGGCGGTGGCGACCGGTGCGCTGGCGTATGCCTGGTATCAGGGCAACTCAACCCTGTCCGATTTCAACAAAACGCTGGTCCTTTCCGGCAATCAGGCGGGACTGACGGCAGATCGTATGCTGGTCCTGTCCAGAGCCGGGCAGGCGGCAGGGCTGACGTTTAACCAGACCAGCGAGTCACTCAGCGCACTGGTTAAGGCGGGGGTAAGCGGTGAGGCTCAGATTGCGTCCATCAGCCAGAGTGTGGCGCGTTTCTCCTCTGCATCCGGCGTGGAGGTGGACAAGGTCGCTGAAGCCTTCGGGAAGCTGACCACAGACCCGACGTCGGGGCTGACGGCGATGGCTCGCCAGTTCCATAACGTGTCGGCGGAGCAGATTGCGTATGTTGCTCAGTTGCAGCGTTCCGGCGATGAAGCCGGGGCATTGCAGGCGGCGAACGAGGCCGCAACGAAAGGGTTTGATGACCAGACCCGCCGCCTGAAAGAGAACATGGGCACGCTGGAGACCTGGGCAGACAGGACTGCGCGGGCATTCAAATCCATGTGGGATGCGGTGCTGGATATTGGTCGTCCTGATACCGCGCAGGAGATGCTGATTAAGGCAGAGGCTGCGTATAAGAAAGCAGACGACATCTGGAATCTGCGCAAGGATGATTATTTTGTTAACGATGAAGCGCGGGCGCGTTACTGGGATGATCGTGAAAAGGCCCGTCTTGCGCTTGAAGCCGCCCGAAAGAAGGCTGAGCAGCAGACTCAACAGGACAAAAATGCGCAGCAGCAGAGCGATACCGAAGCGTCACGGCTGAAATATACCGAAGAGGCGCAGAAGGCTTACGAACGGCTGCAGACGCCGCTGGAGAAATATACCGCCCGTCAGGAAGAACTGAACAAGGCACTGAAAGACGGGAAAATCCTGCAGGCGGATTACAACACGCTGATGGCGGCGGCGAAAAAGGATTATGAAGCGACGCTGAAAAAGCCGAAACAGTCCAGCGTGAAGGTGTCTGCGGGCGATCGTCAGGAAGACAGTGCTCATGCTGCCCTGCTGACGCTTCAGGCAGAACTCCGGACGCTGGAGAAGCATGCCGGAGCAAATGAGAAAATCAGCCAGCAGCGCCGGGATTTGTGGAAGGCGGAGAGTCAGTTCGCGGTACTGGAGGAGGCGGCGCAACGTCGCCAGCTGTCTGCACAGGAGAAATCCCTGCTGGCGCATAAAGATGAGACGCTGGAGTACAAACGCCAGCTGGCTGCACTTGGCGACAAGGTTACGTATCAGGAGCGCCTGAACGCGCTGGCGCAGCAGGCGGATAAATTCGCACAGCAGCAACGGGCAAAACGGGCCGCCATTGATGCGAAAAGCCGGGGGCTGACTGACCGGCAGGCAGAACGGGAAGCCACGGAACAGCGCCTGAAGGAACAGTATGGCGATAATCCGCTGGCGCTGAATAACGTCATGTCAGAGCAGAAAAAGACCTGGGCGGCTGAAGACCAGCTTCGCGGGAACTGGATGGCAGGCCTGAAGTCCGGCTGGAGTGAGTGGGAAGAGAGCGCCACGGACAGTATGTCGCAGGTAAAAAGTGCAGCCACGCAGACCTTTGATGGTATTGCACAGAATATGGCGGCGATGCTGACCGGCAGTGAGCAGAACTGGCGCAGCTTCACCCGTTCCGTGCTGTCCATGATGACAGAAATTCTGCTTAAGCAGGCAATGGTGGGGATTGTCGGGAGTATCGGCAGCGCCATTGGCGGGGCTGTTGGTGGCGGCGCATCCGCGTCAGGCGGTACAGCCATTCAGGCCGCTGCGGCGAAATTCCATTTTGCAACCGGAGGATTTACGGGAACCGGCGGCAAATATGAGCCAGCGGGGATTGTTCACCGTGGTGAGTTTGTCTTCACGAAGGAGGCAACCAGCCGGATTGGCGTGGGGAATCTTTACCGGCTGATGCGCGGCTATGCCACCGGCGGTTATGTCGGTACACCGGGCAGCATGGCAGACAGCCGGTCGCAGGCGTCCGGGACGTTTGAGCAGAATAACCATGTGGTGATTAACAACGACGGCACGAACGGGCAGATAGGTCCGGCTGCTCTGAAGGCGGTGTATGACATGGCCCGCAAGGGTGCCCGTGATGAAATTCAGACACAGATGCGTGATGGTGGCCTGTTCTCCGGAGGTGGACGATGAAGACCTTCCGCTGGAAAGTGAAACCCGGTATGGATGTGGCTTCGGTCCCTTCTGTAAGAAAGGTGCGCTTTGGTGATGGCTATTCTCAGCGAGCGCCTGCCGGGCTGAATGCCAACCTGAAAACGTACAGCGTGACGCTTTCTGTCCCCCGTGAGGAGGCCACGGTACTGGAGTCGTTTCTGGAAGAGCACGGGGGCTGGAAATCCTTTCTGTGGACGCCGCCTTATGAGTGGCGGCAGATAAAGGTGACCTGCGCAAAATGGTCGTCGCGGGTCAGTATGCTGCGTGTTGAGTTCAGCGCAGAGTTTGAACAGGTGGTGAACTGATGCAGGATATCCGGCAGGAAACACTGAATGAATGCACCCGTGCGGAGCAGTCGGCCAGCGTGGTGCTCTGGGAAATCGACCTGACAGAGGTCGGTGGAGAACGTTATTTTTTCTGTAATGAGCAGAACGAAAAAGGTGAGCCGGTCACCTGGCAGGGGCGACAGTATCAGCCGTATCCCATTCAGGGGAGCGGTTTTGAACTGAATGGCAAAGGCACCAGTACGCGCCCCACGCTGACGGTTTCTAACCTGTACGGTATGGTCACCGGGATGGCGGAAGATATGCAGAGTCTGGTCGGCGGAACGGTGGTCCGGCGTAAGGTTTACGCCCGTTTTCTGGATGCGGTGAACTTCGTCAACGGAAACAGTTACGCCGATCCGGAGCAGGAGGTGATCAGCCGCTGGCGCATTGAGCAGTGCAGCGAACTGAGCGCGGTGAGTGCCTCCTTTGTACTGTCCACGCCGACGGAAACGGATGGCGCTGTTTTTCCGGGACGTATCATGCTGGCCAACACCTGCACCTGGACCTATCGCGGTGACGAGTGCGGTTATAGCGGTCCGGCTGTCGCGGATGAATATGACCAGCCAACGTCCGATATCACGAAGGATAAATGCAGCAAATGCCTGAGCGGTTGTAAGTTCCGCAATAACGTCGGCAACTTTGGCGGCTTCCTTTCCATTAACAAACTTTCGCAGTAAATCCCATGACACAGACAGAATCAGCGATTCTGGCGCACGCCCGGCGATGTGCGCCAGCGGAGTCGTGCGGCTTCGTGGTAAGCACGCCGGAGGGGGAAAGATATTTCCCCTGCGTGAATATCTCCGGTGAGCCGGAGGCTATTTCCGTATGTCGCCGGAAGACTGGCTGCAGGCAGAAATGCAGGGTGAGATTGTGGCGCTGGTCCACAGCCACCCCGGTGGTCTGCCCTGGCTGAGTGAGGCCGACCGGCGGCTGCAGGTGCAGAGTGATTTGCCGTGGTGGCTGGTCTGCCGGGGGACGATTCATAAGTTCCGCTGTGTGCCGCATCTCACCGGGCGGCGCTTTGAGCACGGTGTGACGGACTGTTACACACTGTTCCGGGATGCTTATCATCTGGCGGGGATTGAGATGCCGGACTTTCATCGTGAGGATGACTGGTGGCGTAACGGCCAGAATCTCTATCTGGATAATCTGGAGGCGACGGGGCTGTATCAGGTGCCGTTGTCAGCGGCACAGCCGGGCGATGTGCTGCTGTGCTGTTTTGGTTCATCAGTGCCGAATCACGCCGCAATTTACTGCGGCGACGGCGAGCTGCTGCACCATATTCCTGAACAACTGAGCAAACGAGAGAGGTACACCGACAAATGGCAGCGACGCACACACTCCCTCTGGCGTCACCGGGCATGGCGCGCATCTGCCTTTACGGGGATTTACAACGATTTGGTCGCCGCATCGACCTTCGTGTGAAAACGGGGGCTGAAGCCATCCGGGCACTGGCCACACAGCTCCCGGCGTTTCGTCAGAAACTGAGCGACGGCTGGTATCAGGTACGGATTGCCGGGCGGGACGTCAGCACGTCCGGGTTAACGGCGCAGTTACATGAGACTCTGCCTGATGGCGCTGTAATTCATATTGTTCCCAGAGTCGCCGGGGCCAAGTCAGGTGGCGTATTCCAGATTGTCCTGGGGGCTGCCGCCATTGCCGGATCATTCTTTACCGCCGGAGCCACCCTTGCAGCATGGGGGGCAGCCATTGGGGCCGGTGGTATGACCGGCATCCTGTTTTCTCTCGGTGCCAGTATGGTGCTCGGTGGTGTGGCGCAGATGCTGGCACCGAAAGCCAGAACTCCCCGTATACAGACAACGGATAACGGTAAGCAGAACACCTATTTCTCCTCACTGGATAACATGGTTGCCCAGGGCAATGTTCTGCCTGTTCTGTACGGGGAAATGCGCGTGGGGTCACGCGTGGTTTCTCAGGAGATCAGCAC-3′
Classifying DNA from EM Images Using Machine Learning
Data for the three types of DNA were obtained from 28 SEM images (.jpg format), comprising 11 images of plasmid DNA, 12 images of M13 phage DNA, and five images of λ-DNA. The data were then labeled using LabelMe (version 4.5.12).[87] The resulting labels are in the JavaScript Object Notation (JSON) format. To organize the dataset, all images and labels were converted into Visual Object Classes (VOC) format.
For class segmentation of DNA, we used the U-Net architecture based on generative and application models in TensorFlow.[88] The model input shape was fixed at 480 mm × 640 mm × 3 mm (width × height × channel). To prevent overfitting, each DNA dataset was randomly split into 90% of total cases for model training and 10% of cases for validation. Then, the model training was executed for 40 epochs with a batch size of four for λ-DNA and 40 epochs with a batch size of eight for M13 phage and plasmid DNA. For each DNA candidate, model optimization was performed using the loss values of each training and validation step, as shown in Figure S13 (Supporting Information).
Calculation and Distribution of Salmon Sperm DNA Length
The raw data for fish sperm DNA comprised 27 TEM images. Annotation steps were performed in the same manner as the classifying step (generation of labels and organizing the dataset into VOC format), as mentioned in the previous section.
Subsequently, the U-Net model was used for image denoising.[88] To avoid overfitting and enhance denoising performance, data augmentation techniques like adjusting brightness, horizontal/vertical flipping, rotating, and shifting were used. Model training was executed for 40 epochs, and the trained model was used to obtain denoised segmentation images of the input data.
Morphological transformations such as Gaussian blurring and skeletonization were conducted to preprocess the output segmented images. The region proposed by Scikit-Image[89] was used to divide all the individual DNA strands in each image. Segment sorting and path measurements provided by PlantCV morphology[90] were used to calculate the length of each DNA strand. Finally, the length distribution of salmon sperm DNA in each image was obtained by plotting the strand lengths determined by the segment length in the previous step. The mean strand length for each image was calculated by dividing the sum of the segment lengths by the total number of segments identified in that image. The overall length distribution of salmon sperm DNA was obtained by plotting all calculated lengths from 27 TEM images using the Origin 2019 tool (OriginLab Corporation, USA).
Calculation of M13 Phage DNA Circumferential Length
The M13 phage DNA image dataset consisted of 20 EM grayscale images (12 TEM and eight SEM images). The annotation steps were performed in a manner similar to the salmon sperm DNA image analysis, as mentioned in the previous section.
U-Net model training was then performed to remove unnecessary parts of the processed image and subsequently identify the M13 phage DNA perimeter. Data augmentation was also conducted for these M13 phage DNA images using the same protocol that was followed for salmon sperm DNA, as mentioned in the previous section. Finally, model training was executed for 60 epochs with a batch size of two. By testing this model, we obtained improved M13 images for circumferential length calculations.
To calculate the circumferential length of M13 phage DNA, the watershed technique of Scikit-Image[89] was used to identify each nanoparticle object separately and obtain their coordinate points. Based on this coordinate information, a Fourier series model was fitted using Symfit-Model[91] and Fit function.[92] The circumferential length of each M13 phage DNA was calculated by summing the Euclidean distances between two data points at infinitesimalintervals.
Our AI based image analysis model for DNA visualization under EM could be accessed in GitHub Repository at this link.
Acknowledgements
This work was supported by Samsung Research Funding Center (SRFC-MA2101-07). This research was also partially supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (B.L.; NRF-2022R1A6A3A01086178).
Conflict of Interest
The authors declare no conflict of interest.
Author Contributions
P.R.S., J.C. and Y.K. contributed equally to this work. This paper was conceptualized by P.R.S., J.H.L. and K.J. P.R.S., J.C., J.H.L. and Y.C. prepared the methodology. J.H.L. and B.L designed the experimental studies and AI-enabled image analysis workflow, respectively. C.N. isolated genomic DNA from HeLa cell lines for staining and synthesized A–T patterned fragments. S.H. constructed protein plasmids and expressed Cys-tagged DNA binding proteins needed for staining experiments. P.R.S. and J.C. performed all the experimental studies and investigation. Y.K., Y.S., and B.L developed source codes for image analysis and investigated the AI-enabled image analysis. P.R.S performed visualization of the stained DNA molecules under TEM and SEM. K.J., J.H.L., K.L. and S.S. were involved in funding acquisition. K.J. and J.H.L. managed project administration. S.S.L, B.L., K.L., S.S, J.H.L, and K.J. did the supervision. P.R.S., J.C., and Y.S. did the writing (original draft). J.H.L., K.J., K.L., S.S., B.L., Y.K., P.R.S., J.C, and S.S.L. did the review and editing.
Open Research
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.