Volume 40, Issue 5 pp. 630-636
Full Access

A proposal for standardization in forensic bovine DNA typing: allele nomenclature of 16 cattle-specific short tandem repeat loci

L. H. P. Van De Goor

L. H. P. Van De Goor

Dr Van Haeringen Laboratorium BV, Agro Business Park 100, NL 6708 PW Wageningen, The Netherlands

Search for more papers by this author
H. Panneman

H. Panneman

Dr Van Haeringen Laboratorium BV, Agro Business Park 100, NL 6708 PW Wageningen, The Netherlands

Search for more papers by this author
W. A. Van Haeringen

W. A. Van Haeringen

Dr Van Haeringen Laboratorium BV, Agro Business Park 100, NL 6708 PW Wageningen, The Netherlands

Search for more papers by this author
First published: 04 September 2009
Citations: 31
L. H. P. van de Goor, Dr Van Haeringen Laboratorium BV, Agro Business Park 100, NL 6708 PW Wageningen, The Netherlands
E-mail: [email protected]

Summary

In this study, a proposal is presented for the allele nomenclature of 16 polymorphic short tandem repeat (STR) loci (BM1824, BM2113, ETH10, ETH225, INRA023, SPS115, TGLA122, TGLA126, TGLA227, ETH3, TGLA53, BM1818, CSRM60, CSSM66, HAUT27 and ILSTS006) for bovine genotyping (Bos taurus). The nomenclature is based on sequence data of the polymorphic region(s) of the STR loci as recommended by the DNA commission of the International Society of Forensic Genetics for human DNA typing. To cover commonly and rarely occurring alleles, a selection of animals homozygous for the alleles at these STR loci were analysed and subjected to sequence studies. The alleles of the STR loci consisted either of simple or compound dinucleotide repeat patterns. Only a limited number of alleles with the same fragment size showed different repeat structures. The allele designation described here was based on the number of repeats including all variable regions within the amplified fragment. The set of 16 STR markers should be propagated for the use in all bovine applications including forensic analysis.

Introduction

The capability of PCR amplification of a multiplex of species-specific, polymorphic, short tandem repeat (STR) loci has led to the development of a robust set of 16 bovine STR markers (BM1818, BM1824, BM2113, CSRM60, CSSM66, ETH3, ETH10, ETH225, HAUT27, ILSTS006, INRA023, SPS115, TGLA53, TGLA122, TGLA126 and TGLA227). Although recently a large population study in cattle (Bos taurus) was presented for this robust set of 16 bovine-specific STR markers (Van de Goor L.H.P., Koskinen M.T. & Van Haerignen W.A., unpublished data), no details have yet been published about allele structure or DNA sequence variation in the alleles. This set of 16 markers, which includes the loci recommended by the International Society for Animal Genetics (ISAG) for bovine parentage testing, is included in commonly used commercial kits (Applied Biosystems and Finnzymes Diagnostics) and routinely employed by bovine genotyping providers. For the purpose of more effectively sharing data and for better use in legal casework, a universal nomenclature system should be developed based on the principles of the human repeat-based nomenclature advocated by the International Society of Forensic Genetics (ISFG) recommendations (Gill et al. 1994, 1997; Bär et al. 1997; Schneider et al. 1998). The commonly used current nomenclature system in cattle for parentage verification cases is based on the measured length of PCR amplicons and has evolved in an informal manner from discussions between laboratories during conferences of the ISAG. This allele nomenclature in cattle is based solely on amplicon length and is internationally regulated within ISAG. However, it has major limitations because of the ability to effectively share data generated using different electrophoretic platforms and the fact that allele designation for new STR loci or alleles will not be automatically available. Allele designations based on number of repeats have been advocated for human DNA typing, but it has yet to be universally accepted for animal genetic typing. Although a proposal for allele nomenclature based on number of repeats and an operational comparison with an allelic reference ladder was published for dogs (Canis familiaris) (Eichmann et al. 2004; Hellmann et al. 2006) and partially for cats (Felis catus) (Lipinski et al. 2007), no effort has been made to adapt the same allele nomenclature to a number of widely used important animal species such as cattle, horses and pigs. To develop a repeat-based nomenclature in cattle, a selection of alleles from a dataset containing 22 cattle breeds (N = 9.738) (Van de Goor et al. unpublished data) was sequenced for the 16 polymorphic dinucleotide STR loci. In this study, a proposal is presented for a repeat-based allele nomenclature based on sequenced alleles.

Materials and methods

Selection of loci

The 16 loci (BM1818, BM1824, BM2113, CSRM60, CSSM66, ETH3, ETH10, ETH225, HAUT27, ILSTS006, INRA023, SPS115, TGLA53, TGLA122, TGLA126 and TGLA227) are used globally for routine bovine genotyping for various purposes such as forensic studies, parentage verification and kinship analysis. The markers are commercially available in two panels known as Bovine Genotypes Panels 1.1 and 2.1 (Finnzymes Diagnostics).

Samples and DNA extraction

The samples used in this study were collected between January 2005 and February 2007. Over 9000 DNA profiles covering 22 breeds were available, of which 75% were based on hair root samples, 23% originated from blood samples and 2% were based on semen straws and other sample types. A limited number of breeds originated from the Netherlands, i.e. Brandrood Cattle, Dutch Friesian, Groningen Whiteheaded, Dutch Belted, Verbeterd Roodbont and Maas Rijn IJssel, whereas all other breeds exist and are common throughout Europe and North-America. For each of the 16 STR loci, a selection of at least three of the most frequent alleles in the dataset with 22 cattle breeds was sequenced. For the selected alleles, one animal homozygous for the STR allele was sequenced. For the four loci ETH3, ETH225, SPS115 and TGLA122, additional samples or alleles were sequenced as well.

For ETH3, the alleles present in the population data reveal a gap between the three alleles with the smallest observed amplicon lengths. The smallest allele was sequenced to confirm its repeat motif. As one allele of STR ETH225 has been discussed extensively within the ISAG nomenclature, being called either allele 158 or 160, this allele has been selected for sequencing in several animals. In the dataset, only one intermediate allele was observed (STR SPS115) and sequenced. As STR TGLA122 reveals a very extensive size range, the alleles with the largest amplicon lengths have also been selected for sequencing in several animals.

Genomic DNA of the animals was isolated from blood samples or hair roots using routine procedures. For blood samples, 10 μl of blood was washed three times in 150 μl Tris–HCl based buffer. The cell pellet was lysed with proteinase K (0.5 U in 10 mm Tris, 0.5% Tween for 45 min at 56 °C followed by heat inactivation). For hair roots, approximately eight follicles were placed into a PCR tube and lysed with proteinase K (6 U in 10 mm Tris, 0.5% Tween overnight at 56 °C followed by heat inactivation).

PCR and allele sequencing

The 16 STR loci were amplified in a 20-μl volume including 2 μl of DNA (20–150 ng per reaction) using unlabelled primers with the following reagents: 1× GeneAmp PCR Buffer including MgCl2 1.5 mm (Applied Biosystems), 10 pmol of each primer and Taq Gold (Applied Biosystems). The PCR was performed in a 9700 Thermocycler (Applied Biosystems); cycling conditions consisted of 15 min at 95 °C followed by 32 cycles of 30 s at 95 °C, 30 s at 60 °C and 60 s at 72 °C. For the majority of loci, the primer sequences shown in Table 1 were used for the amplification; however, to enable sequencing of the repeats in which the primers were too close to the repeat sequence, new primers were designed for loci where the primers were located within 30 bp of the repeat sequence. For some markers, the primer sequence was partially located in the repeat sequence. Following visualization of an aliquot of the PCR product on an agarose gel, the remaining PCR product was purified with ExoSAP-IT (USB Europe GmbH) treatment according to the manufacturer’s instructions. For each purified PCR product, direct sequencing reactions were performed for the forward and reverse strand using the BigDye terminator v1.1 cycle sequencing kit (Applied Biosystems) according to the manufacturer’s instructions. The sequence reactions were purified using Xterminator (Applied Biosystems) and visualized on an ABI 3100 sequencer. The trace files were analysed using the sequencing analysis 3.7 software (Applied Biosystems) according to the manufacturer’s instructions.

Table 1. Locus name, chromosomal location, repeat structure and repeat sequence, original reference, primer sequences and the true size ranges of the amplicons.
Locus Chromosome location Repeat structure Repeat sequence Original references Primer sequences (Forward and Reverse) Amplicon length (bp)
BM1818 D23S21 Simple (TG)n Bishop et al. (1994) F: AGCTGGGAATATAACCAAAGG
R: AGTGCTTTCAAGGTCCATGC
253–277
BM1824 D1S34 Simple (GT)n Barendse et al. (1994) F: GAGCAAGGTGTTTTTCCAATC
R: CATTCTCCAACTGCTTCCTTG
176–188
BM2113 D2S26 Simple (CA)n Sunden et al. (1993) F: GCTGCCTTCTACCAAATACCC
R: CTTCCTGAGAGAAGCAACACC
124–146
CSRM60 D10S5 Simple (AC)n Baylor College of Medicine Human Genome Sequencing Center (2006) F: AAGATGTGATCCAAGAGAGAGGCA
R: AGGACCAGATCGTGAAAGGCATAG
91–117
CSSM66 D14S31 Simple (AC)n Barendse et al. (1994) F: AATTTAATGCACTGAGGAGCTTGG
R: ACACAAATCCTTTCTGCCAGCTGA
177–203
ETH3 D19S2 Compound (GT)nAC(GT)6 Solinas-Toldo et al. (1993) 1F: GAACCTGCCTCTCCTGCATTGG
1R: ACTCTGCCTGTGGCCAAGTAGG
100–128
ETH10 D5S3 Simple (AC)n F: GTTCAGGACTGGCCCTGCTAACA
R: CCTCCAGCCCACTTTCTCTTCTC
206–222
ETH225 D9S2 Compound (TG)4CG(TG)(CA)n Steffen et al. (1993) F: GATCACCTTGCCACTATTTCCT
R: ACATGACAGCCAGCTGCTACT
139–157
HAUT27 D26S21 Simple (AC)n Thieven et al. (1997) F: TTTTATGTTCATTTTTTGACTGG
R: AACTGCTGAAATCTCCATCTTA
137–155
ILSTS006 D7S8 Simple (GT)n Brezinsky et al. (1993) F: TGTCTGTATTTCTGCTGTGG
R: ACACGGAAGCGATCTAAACG
279–297
INRA023 D3S10 Simple (AC)n Vaiman et al. (1994) F: GAGTAGAGCTACAAGATAAACTTC
R: TAACTACAGGGTGTTAGATGAACTC
201–225
SPS115 D15 Compound (CA)nTA(CA)6 Baylor College of Medicine Human Genome Sequencing Center (2006) F: AAAGTGACACAACAGCTTCACCAG
R: AACCGAGTGTCCTAGTTTGGCTGTG
247–261
TGLA53 D16S3 Compound (TG)6CG(TG)4(TA)n Georges & Massey (1992) F: GCTTTCAGAAATAGTTTGCATTCA
R: ATCTTCACATGATATTACAGCAGA
151–187
TGLA122 D21S6 Compound (AC)n(AT)n 1F: AATCACATGGCAAATAAGTACATAC
R: CCCTCCTCCAGGTAAATCAGC
136–182
TGLA126 D20S1 Simple (TG)n 1F: CTAATTTAGAATGAGAGAGGCTTCT R: TTGGTCCTCTATTCTCTGAATATTCC 111–127
TGLA227 D18S1 Simple (TG)n 1F: GGAATTCCAAATCTGTTAATTTGCT
1R: ACAGACAGAAACTCAATGAAAGCA
76–104
  • 1Primer sequence was re-designed for sequencing; modified primer sequences are not shown.

Nomenclature

For each STR locus, the sequence was compared with the original GenBank sequence. The proposed nomenclature for the 16 bovine STR loci is based on the number of repeat units and the recommendations for human STR nomenclature (Gill et al. 1994, 1997; Bär et al. 1997; Schneider et al. 1998). Table 2 shows an overview of the principles of the human-based nomenclature approach, which has been used for the repeat-based nomenclature applied in this paper. A conversion from the amplicon length-based ISAG nomenclature to the newly presented repeat-based nomenclature is proposed.

Table 2. Overview of the principles of the human based nomenclature.
1. DNA sequences are read in the 5′–3′ direction.
2. For STRs within protein coding genes, the coding strand should be used. For STRs without any connection to protein coding genes, the sequence originally described in the literature or the first public database entry should be used.
3. The repeat sequence motif must be defined so that the first 5′-nucleotides that can define a repeat motif are used.
4. STR repeats are subdivided into three categories based on their repeat structure:
 Simple repeats: the repeat sequence contains only one motif (e.g. CAn: CA12 = 12).
 Compound repeats: the repeat sequence varies (e.g. CAnCTn: CA12CT2 = 14).
 Complex repeats: the repeat sequence contains several different types of repeat motifs, e.g. tetrameric combined with trimeric and/or dimeric motifs (e.g. CAGGnCATnCTn: CAGG2CAT12CT2 = 16).
5. For intermediate alleles, the designation should be based on the number of complete repeats followed by, the number of additional basepairs, each separated by a decimal point.

Results

Previously, all STR loci included in this study were described as dinucleotide repeat sequences (Table 1 shows locus name, chromosomal location, repeat structure and sequence, original reference, primer sequences and size range of the amplicon length). All STR loci revealed a simple or compound variable repeat structure. The proposed nomenclature for the 16 bovine STR loci investigated is based on the number of repeat units and is adopted from the recommendations of the ISFG for the nomenclature of human STRs.

In the following section (1-3 and S1), the investigated STR loci are presented. They are described by (i) the UniSTS identification number (ii) the GenBank accession number, (iii) the general sequence structure including the flanking regions and (iv) the average allele frequency distribution as observed in the dataset (Van de Goor et al. unpublished data). Sequenced alleles are indicated by an asterisk (*), whereas alleles which were not sequenced have been extrapolated based on the allele mobilities from the raw data. Although deduced allele designation will only be operationally correct, additional sequencing of alleles may not be required for routine analyses as these are based on the application of control samples. If this is confirmed, the STR is described according to previous published studies.

Details are in the caption following the image

BM2113. Six alleles were sequenced (14, 15, 18, 19, 20 and 21). The sequence corresponds with GenBank accession M97162. The alleles of the locus BM2113 displayed the dinucleotide repeat structure (CA)n. All observed alleles in the sample population clustered into 12 categories; no intermediate alleles were found. Allele 19 was the most frequent with a frequency of 0.18. Accession number for UniSTS: 250697; GenBank: M97162. Sequence: GCTGCCTTCTACCAAATACCCCCTGCTCCGGCCCCCACCTCAAC(CA)n GAGTGAGCTCATAGTCTTGAGTTAAAAAAGTGACAGGTGTTGCTTCTCTCAGGAAG. *, sequenced alleles.

Details are in the caption following the image

ETH225. Four alleles were sequenced (19, 23, 24 and 28). The sequence corresponds with GenBank accession Z14043. The alleles of the locus ETH225 displayed a compound repeat structure with dinucleotide repeats (TG)4CG(TG)(CA)n. The allele with (TG)4CG(TG)(CA)17 has been designated 23. Furthermore, we observed a single nucleotide polymorphism (SNP) adjacent to the 3′ end of the repeat structure. This C/T polymorphism has no impact on the nomenclature of the locus. The alleles 19, 23 and 24 revealed the T-nucleotide at the SNP position, allele 28 was sequenced in six animals all showing the C-nucleotide at the SNP position. All observed alleles in the sample population clustered into eight categories; no intermediate alleles were found. Allele 24 was the most frequent at 0.35. Accession number for UniSTS: 250852; Genbank: Z14043. Sequence: GATCACCTTGCCACTATTTCCTCCAACATA(TG)4CG(TG)(CA)n[C/T]GATAGCCACTCCTTTCTCTAATGCCACAGAATTACACAGTCAACTCTCTAGTAGCAGCTGGCTGTCATGT. *, sequenced alleles.

Details are in the caption following the image

SPS115. Eight alleles have been sequenced (21, 23, 24, 25, 26, 27, 27.1 and 28). All alleles sequenced are consistent with the repeat sequence of GenBank: NW_001503418. The allele structure is that of the compound sequence (CA)nTA(CA)6. According to the guidelines of the International Society of Forensic Genetics, e.g. allele (CA)14TA(CA)6 was designated as allele 21. All observed alleles except one in the sample population clustered into eight categories; the one allele was intermediate between the alleles 27 and 28. This intermediate allele with repeat structure (CA)20TA(CA)6, designated as 27.1, contained one additional A-nucleotide in a stretch of 10 A-nucleotides, 32 nucleotides 3′ upstream of the repeat. Allele 21 was the most frequent, with a frequency of 0.58. Accession number for UniSTS: 279634; GenBank: NW_001503418. Sequence: AAAGTGACACAACAGCTTCACCAGAGCATCTCCAATATCT(CA)nTA(CA)6TCTCATTCCTCTAGTGTCTTTTGCCTTTAAAGAAAAAAAAACTAAGCAGATCAACATGGGATCTCCTTTTTGTAGATTTATAGAAAGGGTTCCTTTGTTGCGCACTCACTTGTAAGAAAATGAGACAAAAACGTGAAACCCACAGCCAAACTAGGACACTCGGTT. *, sequenced alleles.

Discussion

Dimeric STRs are used for bovine genetic identity testing. Because dimeric STRs are more susceptible than trimeric or tetrameric STRs to slippage during PCR amplification resulting in the generation of stutter bands, the human forensic community chose larger repeat motif STRs, predominately tetrameric STRs. In contrast, for parentage verifications and identity testing in bovines, historically dimeric STRs have been selected because tetranucleotide loci were not available initially. There is less need to change tousing tetrametric STRs in cattle, because of (i) the extensive, global databases with cattle genotypes based on dimeric STRs; (ii) the optimization of amplification conditions that reduce stutter production for these loci; (iii) efforts to implement Single Nucleotide Polymorphisms in cattle genotyping; and (iv) the use of the Phusion DNA-polymerase in the bovine STR panels provides proofreading activity that reduces the tendency observed with DNA-polymerases for non-template adenylation of PCR products (+A). Therefore, the current bovine STR loci are likely to be used in the foreseeable future for identity testing.

The preferred method for naming STR alleles in the human DNA forensic science community is based on the number of repeats (Gill et al. 1994, 1997; Bär et al. 1997; Schneider et al. 1998) and this is applied operationally by comparison with an allelic ladder. Previously, the motif structure had not been described for the dinucleotide STR loci used in bovine genetic identity testing. This is necessary to develop a consistent nomenclature and for generating a community-wide allelic ladder. The repeat structure of the bovine STRs has been described herein by sequence analysis of a sampling of the alleles at each locus.

Homozygous profiles of alleles at each STR locus were directly sequenced from the purified PCR product; no alleles from heterozygote profiles were used. From the 16 investigated loci, 11 showed a simple repeat structure, whereas compound repeat structures were present in the five loci ETH3, ETH225, SPS115, TGLA53 and TGLA122 (2, 3 & S1). None of the 16 STRs showed complex repeat sequences. There were a few inconsistencies in sequence with those in GenBank, but these were predominately in the flanking region and did not affect the repeat structure. The inconsistencies are primarily caused by the original sequence methods, which were used approximately 15 years ago. Allele designation was straightforward in almost all cases, as only one intermediate allele (a single nucleotide variant –SPS115 allele 27.1) was observed in our dataset of 9738 individuals. Considering the low number of intermediate alleles found in the large dataset (>9000 animals covering 22 breeds), the number of other intermediate alleles not yet identified should remain low.

In Table 3, a conversion is provided for the amplicon length-based allele nomenclature in cattle, which is internationally recommended within ISAG. One allele of STR ETH225 has been extensively discussed within the ISAG nomenclature, either being called allele 158 or 160 depending on the analytical conditions used. This allele was identified by our sequencing results to have a repeat-based nomenclature of 28.

Table 3. Conversion of ISAG nomenclature to repeat-number nomenclature.
Repeat number ISAG
BM1818 BM1824 BM2113 CSRM60 CSSM66 ETH3 ETH10 ETH225 HAUT27 ILSTS006 INRA023 SPS115 TGLA53 TGLA122 TGLA126 TGLA227
11 75
12 178 121 77
13 256 180 123 198 79
14 258 182 125 88 209 140 200 137 109 81
15 260 184 127 90 179 103 142 202 139 111 83
16 262 129 92 181 213 144 284 204 141 113 85
17 264 188 131 94 183 215 146 286 206 143 115 87
18 266 190 133 96 185 109 217 148 288 208 145 117 89
19 268 135 98 187 219 140 150 290 210 154 147 119 91
20 270 137 100 189 221 142 152 292 212 156 149 121 93
21 272 139 102 191 115 223 144 154 294 214 248 158 151 123 95
22 141 104 193 117 225 146 156 296 216 250 160 153 125 97
23 143 106 195 119 148 158 298 218 252 162 155 99
24 108 197 121 150 300 220 254 164 157 101
25 280 199 123 152 302 222 256 166 159 103
26 201 125 258 168 161
27 114 127 260 170 163
27.1 261
28 205 129 * 262 172 165
29 131 174 167
30 176 169
31 178 171
32 180 173
33 182 175
34 184 177
35 186 179
36 188 181
37 190 183
  • ISAG, International Society of Animal Genetics.
  • *No agreement within ISAG about nomenclature of this allele, allele is being called 158 or 160 depending on electrophoretic conditions used.

In the human forensic community, the recommendations by ISFG (Gill et al. 1994, 1997; Bär et al. 1997; Schneider et al. 1998) include the use of standardized allelic ladders to enable designation of alleles in samples. At present, unfortunately no allelic ladders for the 16 STRs are commercially available. Thus, laboratories performing STR typing will have to base their allele calls only on an internal size standard and one control sample. However, the data from these allele sequences will be useful for future efforts to develop an allelic ladder for all 16 loci.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.