A diagnostic genetic test for the physical mapping of germline rearrangements in the susceptibility breast cancer genes BRCA1 and BRCA2†
Communicated David E. Goldgar
Abstract
The BRCA1 and BRCA2 genes are involved in breast and ovarian cancer susceptibility. About 2 to 4% of breast cancer patients with positive family history, negative for point mutations, can be expected to carry large rearrangements in one of these two genes. We developed a novel diagnostic genetic test for the physical mapping of large rearrangements, based on molecular combing (MC), a FISH-based technique for direct visualization of single DNA molecules at high resolution. We designed specific Genomic Morse Codes (GMCs), covering the exons, the noncoding regions, and large genomic portions flanking both genes. We validated our approach by testing 10 index cases with positive family history of breast cancer and 50 negative controls. Large rearrangements, corresponding to deletions and duplications with sizes ranging from 3 to 40 kb, were detected and characterized on both genes, including four novel mutations. The nature of all the identified mutations was confirmed by high-resolution array comparative genomic hybridization (aCGH) and breakpoints characterized by sequencing. The developed GMCs allowed to localize several tandem repeat duplications on both genes. We propose the developed genetic test as a valuable tool to screen large rearrangements in BRCA1 and BRCA2 to be combined in clinical settings with an assay capable of detecting small mutations. Hum Mutat 33:998–1009, 2012. © 2012 Wiley Periodicals, Inc.
Introduction
Breast cancer is the most common malignancy in women, affecting approximately 10% of the female population. Incidence rates have increased dramatically for 50 years and it is estimated that about 1.4 million women will be diagnosed with breast cancer annually worldwide and about 460,000 will die from the disease [Jemal et al., 2011]. Germline mutations in the hereditary breast and ovarian cancer susceptibility genes BRCA1 (MIM# 113705) and BRCA2 (MIM# 600185) are highly penetrant and account for 5–10% of all breast or ovarian cancer cases [King et al., 2003; Nathanson et al., 2001]. A mutation in one of these two genes confers a 10–20 times increased relative risk of developing a breast cancer, translating into 70–80% risk of developing a breast cancer at age 70 [King et al., 2003]). Screening is important for genetic counseling of individuals with a positive family history and for early diagnosis or prevention in mutation carriers. Most common mutations have a small size, consisting of point mutations, nonsense/frameshifts (small insertions or deletions), missense mutations in conserved domains, or splice-site mutations resulting in aberrant transcript processing [Szabo et al., 2000]. Mutations also include large rearrangements, including deletions and duplications of large genomic regions that escape detection by traditional polymerase chain reaction (PCR)-based mutation screening combined with DNA sequencing [Mazoyer, 2005; Sluiter and van Rensburg, 2011]. It is estimated that between 10 and 15% of the hereditary breast and ovarian cancer cases are imputable to large rearrangements mainly in BRCA1 but also in BRCA2 genes. As a consequence, the screening of large rearrangements in both genes has become mandatory, and should be always performed in combination to the screening of point mutations [Puget et al., 1999b; Walsh et al., 2006].
Techniques adapted to detect large rearrangements for routine prescreening and predictive purposes are quantitative multiplex PCR of short fluorescent fragments (QMPSF) [Hofmann et al., 2002], real-time PCR [Barrois et al., 2004], fluorescent DNA microarray assays [Frolov et al., 2002], multiplex ligation-dependent probe amplification (MLPA) [Casilli et al., 2002; Hofmann et al., 2002], qPCR-HRM [Rouleau et al., 2009], and EMMA (enhanced mismatch mutation analysis). However, these routine techniques provide limited information to characterize the mutations. Techniques capable of detecting and characterizing large rearrangements in diagnostic settings include high-resolution oligonucleotide array comparative genomic hybridization (aCGH) [Rouleau et al., 2007; Staaf et al., 2008], followed by PCR and sequencing for the exact characterization of the breakpoints. Of notice, massively parallel sequencing combined with genomic capture has been recently proposed for simultaneous detection of small mutations and large rearrangements of 21 genes involved in breast and ovarian cancer [Walsh et al., 2010]. However, the limitation of NGS (next-generation sequencing) for the detection of large rearrangements has been recently highlighted because of the high content of hard-to-sequence repetitive sequences present in high percentage in both BRCA1 and BRCA2 [De Leeneer et al., 2011]. Therefore, there is a clear need for alternative technologies combined with the existing ones, capable of detecting efficiently the full spectrum of large rearrangements, including the often problematic tandem repeat duplications.
Molecular combing (MC) is a powerful FISH-based technique for direct visualization of single DNA molecules that are stretched and attached, uniformly and irreversibly, to specially treated glass surfaces [Herrick and Bensimon, 2009; Schurra and Bensimon, 2009]. This technology considerably improves the structural and functional analysis of DNA across the genome and is capable of visualizing multiple genomic regions at high resolution (in the kb range) in a single analysis. MC is particularly suited to the detection of structural variations such as copy number variations (CNVs), translocations, inversions and loss of heterozygosity (LOH) [Caburet et al., 2005], thus extending the spectrum of mutations potentially detectable in breast cancer genes. Of notice, MC has been recently employed in clinical settings to detect and measure the contraction of the repeat array D4Z4, associated with the Facioscapulohumeral dystrophy (FSHD), one of the most common hereditary neuromuscular disorders. The MC-based test enabled the accurate diagnosis of 32 FSHD patients and is becoming the reference routine diagnostic method, replacing the techniques employed so far [Nguyen et al., 2011].
MC has already been employed to detect large rearrangements in BRCA1 [Gad et al., 2001, 2002a, 2003] and BRCA2 [Gad et al., 2002b], using a first-generation low resolution “color bar coding” screening approach. The originally employed DNA probes (cosmids, PACs and long-range PCR products) also encompassed repetitive sequences particularly abundant at the two loci [Gad et al., 2001, 2002b]. This resulted in the superposition of individual colored signals after probe detection and in strong background noise, undermining the quality of the images and preventing robust measurement of the derived signals.
Here, we describe a substantial technical improvement of the original approach, based notably on the design of second-generation high-resolution BRCA1 and BRCA2 Genomic Morse Codes (GMCs). A GMC is a series of “dots or dashes” (corresponding to DNA probes with specific sizes and colors) and “gaps” (corresponding to uncolored gaps with specific sizes located between the DNA probes), designed to physically map and define with a specific “signature” a particular genomic region [Lebofsky et al., 2006]. For BRCA1 and BRCA2 GMCs, the majority of the repetitive sequences were eliminated from the DNA probes, thus reducing background noise and permitting robust measurement of the color signal lengths within the GMC. Both GMCs were statistically validated on samples from 50 controls and then tested on 10 patients (index cases) with a positive family history of breast cancer. Large rearrangements were detected on both genes, and the nature of all the identified mutations was confirmed by high-resolution aCGH. Four new large rearrangements in BRCA1 and BRCA2 were characterized, demonstrating the robustness of our approach, even for the detection and characterization of hard-to-detect mutations, such as tandem repeat duplications or mutations located in genomic regions rich of repetitive elements (e.g., 5′ region of BRCA1). The developed MC-based platform permits simultaneous detection of large rearrangements in BRCA1 and BRCA2, and will be part of a novel diagnostic genetic test for breast and ovarian cancer.
Materials and Methods
Preliminary Patient Screening
The developed GMC were validated on samples from 50 negative controls with no deleterious mutations detected in BRCA1 or BRCA2. The genetic test was validated on 10 samples from patients (index cases) with positive family history of breast cancer and known to bear large rearrangements affecting either BRCA1 or BRCA2. Total human genomic DNA was obtained from PBMCs (peripheral blood monocytic cells) or EBV-immortalized lymphoblastoid cell lines. Preliminary screening for large rearrangements was performed with the QMPSF assay (Quantitative multiplex PCR of short fluorescent fragments) in the conditions described by Casilli et al. and Tournier et al. [Casilli et al., 2002, Tournier et al., 2004] or by means of MLPA (multiplex ligation-dependent probe amplification) using the SALSA MLPA kits P002 (MRC Holland, Amsterdam, The Netherlands) for BRCA1 and P045 (MRC-Holland) for BRCA2. All 60 screened individuals gave their written consent for BRCA1 and BRCA2 analysis.
Molecular Combing Procedures
A detailed description of operation procedures is included as Supplementary Material and Methods. Briefly, EBV-immortalized lymphoblastoid cell lines or PBMCs were embedded in agarose plugs. DNA was purified by proteinase K and sarkosyl treatment overnight. Agarose was melted and digested by an overnight beta-agarose treatment. Purified DNA was diluted in MES buffer and combed on coverslips. Probes were initially subcloned by PCR in plasmids, which were used as templates for probe labeling by random priming. Hybridization was performed overnight with biotin-, digoxigenin-, or Alexa Fluor 488-labeled probes, detected using fluorophore-coupled antibody or streptavidin layers. The entire coverslip was scanned by an automated fluorescence microscope. Image analysis and signals measurement was performed using software developed in-house, and statistical analysis was performed as described in the Supp. Materials and Methods. As we are developing a series of MC-based assays for a CLIA labs, we have developed adequate QA/QC procedures (Supp. Fig. S1).
Dedicated Zoom-In CGH Array
A detailed description of operation procedures is included as Supp. Material and Methods. To confirm the large rearrangements detected by MC, a zoom-in CGH array was used as previously described [Rouleau et al., 2007]. For the interpretation of the oligonucleotide signal, the chosen threshold was deleted if the log2 ratio was <−0.4 and duplicated if >0.4. The analytical approach used for zoom-in CGH arrays has been described elsewhere [Rouleau et al., 2007].
Breakpoints Mapping
The breakpoints were characterized by using classical PCR amplification followed by sequencing. The PCR primers were selected in the genomic region surrounding the breakpoints and designed using the Oligo 6 software (Molecular Biology—Insights). We systematically chose two sets of primers (available on request) to nest the PCR. The PCR products were analyzed on agarose gel and then purified and sequenced in both directions by using each PCR primer with the BigDye Terminator Cycle Sequencing Reaction kit (Applied Biosystem) and an ABI Prism 3030 automated sequencer (Applied Biosystems, Foster City, CA).
Nomenclature
The present version of BRCA1 and BRCA2 is based on GenBank reference sequences NM_007294.2 (mRNA:U14680) and NM_000059.3 (mRNA: U43746), respectively. For the BRCA1 gene, all rearrangements were described with the same orientation than the BRCA1 gene, telomeric to centromeric sense. So, the 5′ breaking point had a genomic position smaller than the 3′ breaking point. The nomenclature used was the genomic nomenclature based on HGVS recommendation and the build36/hg18 for the chromosome 17 (NC_000017.9). The first nucleotide of the ATG translation start site is +1. All mutations have been submitted to the UMD-BRCA1 and BRCA2 database which are fully and freely accessible [Caputo et al., 2012].
Results
Design of High-Resolution BRCA1 and BRCA2 GMCs
We have designed high-resolution GMCs covering the exons, the noncoding regions and large genomic portions flanking both genes. Importantly, all repetitive sequences were removed from the DNA probes. We identified 38 genomic regions in the BRCA1 locus and 32 regions in the BRCA2 locus that were devoid of repetitive sequences, and that were used to design and clone DNA hybridization probes compatible with the visualization process associated with MC (Supp. Figs. S2 and S3). The name, size, and color of the DNA hybridization probes, and the exons covered by the probes, are listed in Supp. Tables S1 (BRCA1) and S2 (BRCA2). Adjacent DNA probes of the same color form a signal. Thus, a GMC is composed of a series of colored signals distributed along a specific portion of the genomic DNA. Colors were chosen to create unique nonrepetitive sequences of signals, which differed between BRCA1 and BRCA2. To facilitate GMC recognition and measurement, signals located on the genes were grouped together in specific patterns called “motifs.” An electronic reconstruction of the designed BRCA1 and BRCA2 GMC is shown in Figure 1. The BRCA1 GMC covers a region of 200 kb, including the upstream genes NBR1, NBR2, LOC100133166, and TMEM106A, as well as the pseudogene ΨBRCA1. The complete BRCA1 GMC is composed of 18 signals (S1B1–S18B), and the 8 BRCA1-specific signals are grouped together in 7 motifs (g1b1-g7b1) (Fig. 1A and B). The BRCA2 GMC covers a genomic region of 172 kb composed of 14 signals (S1B2–S14B2), and the 7 BRCA2-specific signals are grouped together in 5 motifs (g1b2-g5b2) (Fig. 1C and D). Deletions or insertions, if present, are detected in the genomic regions covered by the motifs.

In silico-generated GMCs for high-resolution physical mapping of the BRCA1 and BRCA2 genomic regions. A: The complete BRCA1 GMC covers a genomic region of 200 kb and is composed of 18 signals (S1B1–S18B) of a distinct color (green, red, or blue). Each signal is composed of one (e.g., S2B1) to three small horizontal bars (e.g., S15B1), each bar corresponding to a single DNA probe. The region encoding the BRCA1gene (81.2 kb) is composed of 7 “motifs” (g1b1-g7b1). Each motif is composed of one to three small horizontal bars and a black “gap” (no signal). B: Zoom-in on the BRCA1 gene-specific signals and relative positions of the exons. C: The complete BRCA2 GMC covers a genomic region of 172 kb and is composed of 14 signals (S1B2–S14B2) of a distinct color (green, red, or blue). Each signal is composed of one (e.g., S14B2) to five small horizontal bars (e.g., S1B2). The region encoding the BRCA2 gene (84.2 kb) is composed of five motifs (g1b2-g5b2). Each motif is composed of two to four small horizontal bars and a black gap. D: Zoom-in on the BRCA2 gene-specific signals and relative positions of the exons. Deletions or insertions, if present, will appear in the region covered by the motifs.
Validation of BRCA1 and BRCA2 GMC Signals in Negative Controls
The newly designed GMC were first validated on genomic DNA isolated from 50 negative controls. Typical visualized signals and measured motif lengths for one negative control are reported in Supp. Figure S4 and Supp. Table S3. Importantly, we never observed a “sequencing gap” in the BRCA1 locus [Staaf et al., 2008] (LOC100133166 was always located upstream and directly attached to NBR1), confirming the validity of the BRCA1 sequence proposed in the GRChg19 genome assembly and underlining the utility of MC as a tool for physical mapping [Herrick and Bensimon, 2009; Schurra and Bensimon 2009]. For BRCA1, we obtained delta values (difference between µ and calculated) in the range of −0.2 and +0,6 kb, whereas BRCA2 delta values were in the range of −0.1 and +0.2 kb, underlining the precision of the developed measurement approach.
Within the 50 control negative samples, 1 false positive sample was found (Supp. Table S3, control nr. 27). This sample is supposed to be negative, since MLPA and aCGH analysis were also performed on it, without finding any mutation. The low number and the low quality of images derived after MC analysis on the same sample, suggested that for motif g2b2 on BRCA2 the measurement could not be performed efficiently (Supp. Fig. S5A). Such an event is rather infrequent, and it can be verified by repeating the MC assay. In fact, a second MC analysis performed on the control sample nr.27, confirmed the absence of large rearrangements (Supp. Fig. S5B). Thus, based on the analysis performed on 50 control samples, test specificity resulted to be 98%, for rearrangements larger than 2 kb (Supp. Table S4). A larger set of negative samples may be necessary to consolidate our estimation. Calculations were performed according to standard mathematic formulas employed for diagnostics tests [Altman and Bland, 1994].
Characterization of Four Novel BRCA1 and BRCA2 Large Rearrangements in Familial Breast Cancer Patients
MC was then applied to 10 samples from patients with a severe family history of breast cancer and known to bear large rearrangements either on BRCA1 or BRCA2. Importantly, the MC analysis was a blind test, meaning that for each patient the identity of the mutation was unknown before the test, since it was revealed to the operator only after having completed the test on all the samples. The BRCA1 and BRCA2 GMCs were measured in all 10 samples, and the size of all measured motifs, including the associated statistical analysis, is reported in Supp. Table S5 (see Supp. Materials and Methods for details). Ten different large rearrangements were identified, of which four were novel and six had already been described in the literature (see Table 1). Importantly, as mutations were detected and characterized in all analyzed samples, no false negative samples were found. Thus, within this limited set of samples, sensitivity resulted to be 100%, for rearrangements larger than 2 kb (Supp. Table S4). A larger set of positive samples may be necessary to consolidate our estimation. MC analysis of the four novel large rearrangements is reported in Figure 2. As the identified rearrangements have never been described before in the literature, they were also characterized by zoom-in high-resolution CGH array, and all four breakpoints were determined by sequencing.

Four novel BRCA1 and BRCA2 large rearrangements detected in breast cancer patients. A: Dup ex 17–20 on BRCA2 (case BR09), visible as a tandem repeat duplication of the red signal S6B2. To confirm the presence of the mutation, the motif g4b2 (21.1 kb) was first measured on a mixed population of 40 images, comprising wt and mt alleles, and following values were obtained: µ (BRCA2 + BRCA2mt) = 28.3 ± 7.3 kb and delta = 7.2 kb (≥2kb). To measure the mutation size, the images were then divided in two groups, 21 were BRCA1wt, whereas 19 were BRCA1mt: µ (BRCA2wt) = 21.7 kb ± 2.3 kb, µ (BRCA2mt) = 35.5 ± 2.1 kb, mutation size = µ (BRCA2mt) − µ (BRCA2wt) = 13.8 ± 2.0 kb. B: Dup ex 5–7 on BRCA1 (case BR08), visible as a tandem repeat duplication of the blue signal S8B1. The motif g5b1 (19.7 kb) was first measured on 29 images, yielding the following values: µ (BRCA1wt + BRCA1mt) = 23 ± 6.6 kb, delta = 3.3 kb (delta ≥ 2kb). The images were then divided in two groups, 16 images were BRCA1wt and 13 images were BRCA1mt: µ (BRCA1wt) = 17.6 ± 1.9 kb, µ (BRCA1mt) = 29.6 ± 3.0 kb, mutation size = µ (BRCA1mt) − µ (BRCA1wt) = 12 ± 2.6 kb. C: Del ex 3 on BRCA1 (case BR07), visible as a deletion of the blue signal S9B1 and the downstream genomic region (motif g6b1). The motif g6B1 (9.3 kb) was first measured on a mixed population of 25 images, yielding the following values: µ (BRCA1wt + BRCA1mt) = 3.6 ± 4.6 kb, delta = −5.7 kb (≤ −2kb). The images were divided in two groups, 10 images were BRCA1wt and 15 images were BRCA1mt: µ (BRCA1wt) = 9.1 ± 2.1 kb, µ (BRCA1mt) = 0 ± 0 kb, mutation size = µ (BRCA1mt) − µ (BRCA1wt) = −9.1 ± 1.3 kb. D: Del ex 24 on BRCA1 (case BR10), visible as a deletion of the genomic regions between the two red signals S2B1 and S3B1, including a portion of S2B1 (motif g1b1). The deleted region encodes for exon 24. The motif g1b1 (8.5 kb) was first measured on a mixed population of 35 images, yielding the following values: µ (BRCA1wt + BRCA1mt) = 5.8 ± 3.4 kb, delta = −2.7 kb, (≤−2kb). The images were then divided in two groups, 20 images were BRCA1wt and 15 images were BRCA1mt: µ (BRCA1wt) = 9.0 ± 0.9 kb, µ (BRCA1mt) = 3.2 ± 0.3 kb, mutation size = µ (BRCA1mt) − µ (BRCA1wt) = −5.8 ± 0.6 kb. The GMC signals obtained after microscopic visualization are shown in the top panels. The zoom-in on the BRCA1 or BRCA2 gene-specific signals and the relative positions of the mutated exons are shown in the middle panels. The aCGH profiles including the mutation size interval and the identified breakpoints are shown in the lower panels. mt = mutated allele, wt = wild-type allele.
Sample | Gene | MLPA status | Molecular Combing | Zoom-in aCGH | Breakpoints (bp) | Mechanism | Mutation name (HGVS) | Reference |
---|---|---|---|---|---|---|---|---|
Novel mutations | ||||||||
BR07 | BRCA1 | Del ex 3 | 9.1 ± 1.3 kb/Del ex 3 | 8.3–12.2 kb/Del ex 3 | 38,526,903–38,515,491 (11,413 bp) | AluY–AluY | c.80+2657_135-3415del | Novel |
BR10 | BRCA1 | Del ex 24 | 5.8 ± 0.6 kb/Del ex 24 | 5,5–5,9 kb/del ex 24 | 38,447,102–38,452,877 (5,776 bp) | AluJo–Non-Alu | c.5467+309_1383*+2736delinsAC | Novel |
BR08 | BRCA1 | Dup ex 5–7 | 12 ± 2.6 kb/Dup ex 5–7 | 8,4–9,4 kb/Dup 5–7 | 38,506,535–38,514,662 (8,128 bp) | AluJb–AluSx | c.135-2586_442-1112dup | Novel |
BR09 | BRCA2 | Dup ex 17–20 | 13.8 ± 2.0 kb/Dup ex 17–20 | 8,3–12 kb/Dup 17–20 | 31,831,434–31,846,221 (14,788 bp) | Non-Alu–MLT1C-int | c.7805+1368_8633-d2586dup | Novel |
Known mutations | ||||||||
BR01 | BRCA1 | Dup ex 13 | 6.1 ± 1.6 kb/Dup ex 13 | 5–7.9 kb/Dup ex 13* | 38,483,825–38,489,905 (6,080 bp) | AluSx–AluSx | c.4186-1785_4358-1667dup | Puget et al. (1999) |
BR02 | BRCA1 | Del ex 2 | 40.8 ± 3.5 kb/Del ex 2 | 40.4–58.1 kb/Del ex 2* | 38,525,493–38,562,427 (36,934 bp) | Pseudogene-gene | c.-232-u31400_80+4067del | Puget et al. (2002) |
BR03 | BRCA1 | Del ex 2 | 39.0 ± 2.6 kb/Del ex 2 | 40.4–58.1 kb/Del ex 2* | 38,525,493–38,562,427 (36,934 bp) | Pseudogene-gene | c.-232-u31400_80+4067del | Puget et al. (2002) |
BR04 | BRCA1 | Dup ex 18–20 | 6.7 ± 1.2 kb/Dup ex 18–20 | 7.4–10.9 kb/Dup ex 18–20* | 38,461,760–38,470,417 (8,658 bp) | AluY–AluJb | c.5075-923_5277+835dup | Gad et al. (2002) |
BR05 | BRCA1 | Del ex 15 | 4.1 ± 1.2 kb/Del ex 15 | 1.9–5.0 kb/Del ex 15* | 38,478,177–38,481,175 (2,998 bp) | AluSx/g–AluSp | c.4484+858_ 4676-1395del | Puget et al. (1999b) |
BR06 | BRCA1 | Del ex 8–13 | 20.0 ± 2.8 kb/Del ex 8–13 | 20–24 kb / Del ex 8–13* | 38,483,557—38,507,323 (23,767 bp) | AluSx–AluSp | c.442-1900_4358-1400del | Puget et al. (1999b) |
- *These patients were previously characterized by high resolution aCGH, and the reported values were originally described by Rouleau et al. (Rouleau, 2007).
- Breakpoints positions relate to reference genome Hg18—Build 36—NC_000017.9.
Duplication from exon 17 to exon 20 of BRCA2
By visual inspection, the mutation appeared as a tandem duplication of the S6B2 red signal. After measurement, the mutation was estimated to have a size of 13.8 ± 2.0 kb related to the DNA probe BRCA2-15 and a portion of the uncovered gap between signals S6B2 and S7B2, encoding exons 17 to 20 (Fig. 2A, top panel). This rearrangement was recently reported by Gaudet et al. [Gaudet et al., 2010] but was not fully characterized. The aCGH array gave a size between 8.3 and 12 kb and the mapping of the breakpoints for this rearrangement indicated that the duplicated region was 14,788 bp long (Fig. 2A, bottom panel).
Duplication from exon 5 to exon 7 of BRCA1
By visual inspection, the mutation appeared as a tandem duplication of the S8B1 blue signal. After measurement, the mutation was estimated to have a size of 12 ± 2.6 kb, restricted to a portion of the BRCA1 gene that encodes exons 5 to 7 (Fig. 2B, top panel). The aCGH array gave a size between 8.4 and 9.4 kb and the mapping of the breakpoints indicated that the duplicated region was 8,128 bp long (Fig. 2B, bottom panel). Of notice, the characterization of these two duplications indicate that the developed GMCs allow to unambiguously localize tandem repeats, providing information on the genome location, the nature and the size of the mutation, in one single experiment.
Deletion of exon 3 in BRCA1
By visual inspection, the mutation appeared as a deletion of the S9B1 blue signal. After measurement, the mutation was estimated to have a size of 9.1 ± 1.3 kb, on a portion of the BRCA1 gene that encodes exon 3 (Fig. 2C, top panel). This variant had already been detected, but not characterized, by Rouleau et al. [Rouleau et al., 2007]. The aCGH array gave a size between 8.3 and 12.2 kb. Finally, the mapping of the breakpoints for this rearrangement indicated that the deleted region was 11,413 bp long (Fig. 2C, bottom panel). Such a large deletion in BRCA1 exon 3 has never been previously reported.
Deletion of exon 24 of BRCA1
By visual inspection, the mutation appeared as a deletion of the genomic region located between the S2B1 and S3B1 red signals, including a portion of the DNA probe BRCA 1–3 on S2B1 (Fig. 2D, top panel). After measurement, the mutation was estimated to have a size of 5.8 ± 0.6 kb, including exon 24. The aCGH array gave a size between 5.5 and 5.9 kb. The mapping of the breakpoint for this rearrangement indicated that the deleted region was 5,776 bp long, never reported so far (Fig. 2D, bottom panel).
Detection of Known BRCA1 Large Rearrangements in Breast Cancer Patients
All identified six known large rearrangements have been recently characterized by aCGH and breakpoints sequencing [Rouleau et al., 2007]. Complete characterization by MC of three selected known BRCA1 large rearrangements is reported in Figure 3 and is described here below.

Known BRCA1 large rearrangements detected in breast cancer patients. A: Del ex 2 (case BR02), visible as a deletion of the green signal S10B1, as well as a large genomic portion of the 5′ region upstream of BRCA1, including S11B1 and S12B1. To confirm the presence of the deletion in the BRCA1 gene, the g7B1 (17.7 kb) motif was first measured on a mixed population of 20 images, yielding following values: µ (BRCA1wt + BRCA1mt) = 12.3 ± 2.9 kb, delta = −5.4 kb (deletion is confirmed since delta ≤ −2kb). To measure mutations size within the BRCA1gene, 11 images were then classified as BRCA1wt and 9 images as BRCA1mt, yielding following values: µ (BRCA1wt) = 18.1 ± 0.7 kb, µ (BRCA1mt) = 8.1 ± 1.6 kb, mutation size = µ (BRCA1mt) − µ (BRCA1wt) = −10 ± 1.5 kb. To include the deleted genomic region upstream of BRCA1and determine the whole mutation size, we had to measure the genomic region between the signals S8B1 and S14B1 (89.9 kb). The S8B1-S12B1 region was first measured on 19 images, yielding following values: µ (BRCA1wt + BRCA1mt) = 62.3 ± 18.4 kb, delta = −27.6 kb. 11 images were then classified as BRCA1wt, and 8 images as BRCA1mt, yielding following values: µ (BRCA1wt) = 92.2 ± 3.2 kb, µ (BRCA1mt) = 51.4 ± 2.2 kb, mutation size = µ (BRCA1mt) − µ (BRCA1wt) = −40.8 ± 3.5 kb. B: Del ex 8–13 (case BR06), visible as a deletion of the blue signal S7B1, including a large genomic portion between signals S7B1 and S8B1. The g4B1 (16.5 kb) and the g5b1 (19.7 kb) motifs were first measured on a mixed population of 23 images, yielding following values. For g4b1: µ (BRCA1wt + BRCA1mt) = 17.5 ± 4.0 kb, delta = −2.2 kb (delta ≤ 2 kb); 13 images were then classified as BRCA1wt and 10 images as BRCA1mt: µ (BRCA1wt) = 20.8 ± 1.6 kb, µ (BRCA1mt) = 13.3 ± 1.1 kb, µ (BRCA1mt) −µ (BRCA1wt) = −7.5 ± 1.6 kb. For g5b1: µ (BRCA1wt + BRCA1mt) = 12.8 ± 5.5 kb, delta = −3.7 kb (delta ≤ −2kb); 13 images were then classified as BRCA1wt and 10 images as BRCA1mt: µ (BRCA1wt) = 18.3 ± 1.3 kb, µ (BRCA1mt) = 5.8 ± 0.5 kb, µ (BRCA1mt) − µ (BRCA1wt) = −12.5 ± 1.0 kb. Total mutation size = mutation size g4B1 + mutation size g5b1 = −20 ± 2.8 kb. C: Dup ex 13 (case BR01), visible as a tandem repeat duplication of the blue signal S7B1. The g4B1 motif (16.5 kb) was first measured on a mixed population of 40 images, comprising wild type and mutated alleles, and following values were obtained: µ (BRCA1wt + BRCA1mt signals) = 19 kb ± 3.5 kb, delta = 2.5 kb (duplication is confirmed since delta ≥ 2kb). The images were then divided in two groups: 21 images were classified as BRCA1wt, and 19 images were classified as BRCA1mt. The size was then calculated as the difference between the motif mean sizes of the two alleles: µ (BRCA1wt) = 16.1 ± 1.6 kb, µ (BRCA1mt) = 22.2 ± 2.0 kb, mutation size = µ (BRCA1mt) − µ (BRCA1wt) = 6.1 ± 1.6 kb. The bottom panel shows the MLPA fragment display (left) and the normalized MLPA results (right), arrows indicating exons interpreted as duplicated. mt, mutated allele; wt, wild-type allele.
Deletion of the upstream 5′ region to exon 2
By visual inspection, the mutation appeared as a deletion of the S10B1 green signal, as well as a large genomic portion of the 5′ region upstream of BRCA1, including S11B1 and S12B1 (Fig. 3A). After measurement, the mutation was estimated to have a size of 40.8 ± 3.5 kb, encompassing the portion of the BRCA1 gene that encodes exon 2, the entire NBR2 gene (signal S11B1), the genomic region between NBR2 and the pseudogene ΨBRCA1 (signal S12B1), and a portion of ΨBRCA1 (signal S13B1). Importantly, the reported size of the exon 2 deletion is highly variable, estimated to be in the range of 13.8–36.9 kb [Mazoyer, 2005]. Six different exon 1–2 deletions have been reported and there are 16 reports in the literature in various populations [Sluiter and van Rensburg, 2011]. The rearrangement reported here has already been described with an identical size (36,934 bp). The hotspot for recombination is explained by the presence of ΨBRCA1 [Puget et al., 2002]. MC proved capable of characterizing events even in this highly homologous region.
Deletion from exon 8 to exon 13
By visual inspection, the mutation appeared as a visible deletion of the S7B1 blue signal, including a large genomic portion between S7B1 and S8B1 signals (Fig. 3B). After measurement, the mutation was estimated to have a size of 20 ± 2.8 kb in a portion of the BRCA1 gene that encodes from exon 8 to exon 13. The size reported in the literature and here is 23,767 bp [Puget et al., 1999b], and this is a recurrent mutation in the French population [Mazoyer, 2005; Rouleau et al., 2007].
Duplication of exon 13
By visual inspection, this mutation appears as a partial tandem duplication of the blue signal S7B1 (Fig. 3C, top panel). After measurement, the mutation was estimated to have a size of 6.1 ± 1.6 kb, restricted to a portion of the DNA probe BRCA1-8 that encodes exon 13. The estimated mutation size is fully in line with the 6,081 bp reported in the literature [Puget et al., 1999a], and according to the Breast Cancer Information Core database, this mutation is one of the 10 most frequent mutations in BRCA1 [Szabo et al., 2000]. The characterized patient was also analyzed by MLPA, and the duplication of exon 13 was confirmed. In addition, a duplication of exons 1A + 1B was also found by MLPA in the same individual (Fig. 3C, bottom panel), but this mutation could not be detected by MC analysis (a duplication of exon 1, if present, would yield two distinct S10B1 signals), QMPSF, or dedicated CGH array (data not shown). Therefore, we consider the exon 1A + 1B mutation detected by MLPA to be a false-positive signal.
Discussion
We describe a novel diagnostic genetic test for the detection of large rearrangements and other structural variations in the BRCA1 and BRCA2 genes. Large rearrangements represent 10–15% of deleterious germline mutations in the BRCA1 gene and 1–7% in the BRCA2 gene [Mazoyer, 2005; Sluiter and van Rensburg, 2011]. We designed specific high-resolution GMCs [Lebofsky et al., 2006] and tested them on a series of 60 biological samples. The robustness of the associated measurement strategy was statistically validated on 50 control samples, and 10 different large rearrangements (nine of BRCA1 and one of BRCA2), initially detected with other techniques, were fully characterized by MC in samples from patients with a severe family history of breast cancer. The robustness of the newly designed GMCs, devoid of repetitive sequences, is endorsed by the fact that the nature of the identified mutations was confirmed by high-resolution zoom-in aCGH (11k), with a precision in the 1–2 kb range.
Four out of the 10 characterized large rearrangements have never been previously described: a 11.4 kb deletion of exon 3 (case BR07), a 5.8 kb deletion of exon 24 (case BR10), a 8.1 kb duplication of exons 5–7 (case BR08) of BRCA1, and a 14.8 kb duplication of exons 17–20 of BRCA2 (case BR09).
Two deletions involving BRCA1 exon 3 have been described in the literature, but with sizes significantly different (1,049 and 1,039 bp) [Payne et al., 2000; Walsh et al., 2006] from that reported in our study (11,413 bp). The rearrangement we describe is the largest reported to involve this exon. The mechanism is associated with Alu–Alu recombination involving the 5′ region of BRCA1. Five (6%) of 81 reported BRCA1 deletions involve exon 24, which is localized in a known hotspot of deletion related to the 3′ region of BRCA1. Two deletions have been reported in the literature involving only the exon 24 and the 3′UTR, one of 4,427 bp [Armaou et al., 2007] and one of 1,506 bp [Engert et al., 2008]. The deletion we describe here measures 5,776 bp and involves another locus. As in the 4,427 bp deletion, there is an insertion of a few base pairs (5 and 2 bp, respectively) and an Alu-non-Alu mechanism: the 5′ breakpoint lies in a AluGo sequence and the 3′ breakpoint within a region without any Alu sequences. The impact of such an event is the deletion of the stop codon, the polyA tail region, and the 3′UTR of the BRCA1 gene. No mRNA transcript production was detected so far. The deletion in exon 24 is a recurrent event in the Greek population [Pertesi et al., 2011]. The deletion characterized in our study remains to be explored in the French population. Two 5 kb deletions including BRCA1 exons 5–7 have been reported in the literature [de Juan et al., 2009; Hansen et al., 2009; Preisler-Adams et al., 2006], but this is the first observation of exons 5–7 duplication. The exact size of this event is 8,127 bp. A common region of 26 nucleotides was detected at the breakpoints with one lying in an AluSx and the other one in an AluJb. These events show the sensitivity of this particular genomic region, since it can contain different mutation types (deletions and duplications) with different sizes. Very few rearrangements at a specific locus include both deletions and mirror duplications. For instance, BRCA1 exon 13 has been shown to exhibit a deletion of 3,835 bp [Petrij-Bosch et al., 1997] and three duplications of 6,081 bp (recurrent among people of English ancestry), 5,275 [Walsh et al., 2010] and 8,463 bp [Yap et al., 2006]. The distance between the breakpoints was between 144 bp and 3 kb. Another example is the rearrangement in exons 18 and 19, with deletions of 4,826 bp [Montagna et al., 2003], 1,940 bp [Foretova et al., 2006], and 7,245 bp [Engert et al., 2008] and a duplication of 5,923 bp [Walsh et al., 2006]. For these events the 5′ breakpoint was within 65–1,601 bp and the 3′ breakpoint was within 1 and 3 kb from the duplication. Finally, several deletions have been reported in BRCA1 exon 20. A 7,029 bp deletion [Foretova et al., 2006] had a very similar breakpoint in the 3′-region (29 bp) to that of the 8,706 bp duplication [Agata et al., 2006]. Strangely, no duplications have been reported so far in the 5′-region involving at least exon 1 and 2, despite the wide variety of rearrangements, suggesting that such duplications might not be pathogenic. The composition of Alu sequences in those sensitive regions should be studied to understand the DNA-strand organization.
Sluiter et al. reported 17 large rearrangements in the BRCA2 gene [Sluiter and van Rensburg, 2011]. A 10.8 kb deletion of exons 17–18 [Agata et al., 2006] and a 9.7-kb duplication of exons 19–20 [Walsh et al., 2006] have also been reported. The 14.8 kb BRCA2 mutation reported here is the second biggest duplication, after the 16.2 kb duplication (exon 4 and part of exon 11) reported by Lim [Lim et al., 2007]. Another duplication, involving exons 19 and 20 has been reported in this region [Caux-Moncoutier et al., 2011]. This confirms the presence of a sensitive area between exon 17 and exon 20, with three different duplications. Only about 10% of large rearrangements in BRCA1 and BRCA2 are amplification events (duplication or triplication) [Sluiter and van Rensburg, 2011], and few of these events are associated with Alu–Alu recombination.
Duplications are the most difficult large rearrangements to detect and characterize. Contrary to other techniques, such as aCGH and MLPA, the capacity of MC to visualize hybridized DNA probes at high resolution permits precise mapping and characterization of tandem repeat duplications, as shown here in cases BR01 (BRCA1 Dup Ex 13), BR09 (BRCA2 Dup Ex 17–20), BR08 (BRCA1 Dup Ex 5–7), and BR04 (BRCA1 Dup Ex 18–20). aCGH can be used to determine the presence and size of duplications, but not their exact location and orientation on the genome (see cases BR08 and BR09). Only a combination of aCGH and sequencing would allow defining the identified duplications as tandem repeats. In contrast, MC allows to unambiguously localize tandem repeats, providing information on the genome location, the nature and the size of the mutation, in one single experiment and using one single technique.
In PCR-based techniques such as MLPA, duplications are considered to be present when the ratio between the number of duplicated exons in the sample carrying a mutation and the number of exons in the control sample is at least 1.5, reflecting the presence of three copies of a specific exon in the mutated sample and two copies in the wild-type sample. The ratio of 1.5 is difficult to demonstrate unambiguously by MLPA, which often gives false-positive signals, as observed in case BR01 (BRCA1 Dup Ex 13). The limits of MLPA have been underlined in several recent studies [Cavalieri et al., 2008; Staaf et al., 2008]. MLPA is limited to coding sequences and can also give false-negative scores, due to the restricted coverage of the probes [Cavalieri et al., 2008]. Staaf et al. recently suggested that MLPA should be regarded as a predictive screening tool that needs to be complemented by other means of mutation characterization, such as aCGH [Staaf et al., 2008]. Another multiplex PCR assay similar to MLPA, is QMPSF [Charbonnier et al., 2000]. MLPA has the advantage over QMPSF that it allows the analysis of up to 40 loci in a single multiplex reaction, and, because of the required ligation step, is very specific, allowing copy number analysis of regions with high-sequence homology. Primer design, on the other hand, appears less critical for the QMPSF method and this approach may be more cost effective and suitable for rapid validation experiments of loci with unique sequences. The performance of both methods has not been compared, and therefore it remains speculative which technology is most suited for targeted high-throughput analysis of large rearrangements. We propose MC as a complementary technology for MLPA, QMPSF, or aCGH, as it unambiguously identifies and visualizes duplications.
Another advantage of MC is its capacity to cover noncoding regions, including the 5′-region of the BRCA1 gene and the genomic region upstream of BRCA1 that comprises the NBR2 gene, the ΨBRCA1 pseudogene and the NBR1 gene. Recent studies show that it is very difficult to design exploitable PCR or aCGH probes in this rearrangement-prone genomic region [Rouleau et al., 2007; Staaf et al., 2008], because of the presence of duplicated regions and the high density of Alu repeats, particularly BRCA1 that contains more than 40% of Alu sequences. Genomic rearrangements typically arise from unequal homologous recombination between segmental duplication or Alu sequences [Hastings et al., 2009]. MC permits precise physical mapping within this difficult region, as shown here for three different large rearrangements. In cases BR07 (BRCA1 Del Ex 3), BR02, and BR03 (BRCA1 Del Ex 2), we were able to determine precisely the mutation sizes within the hard-to-sequence BRCA1 5′-region and confirm the result by aCGH / breakpoints mapping. In cases BR02 and BR03 (BRCA1 Del Ex 2), we measured mutation sizes of 40.8 ± 3.5 kb and 39.0 ± 2.6 kb, respectively. The statistical error found by MC does not allow to state that these two mutations are different and aCGH analysis of these two samples delivered identical results (40.4–58.1 kb). The detected mutations are supposed to be identical since individuals BR02 and BR03 belong to the same biological family. This mutation was originally described by Puget et al., who determined a mutation size of 37 kb by breakpoint mapping [Puget et al., 2002]. The larger mutation size range estimated via aCGH is probably caused by the low density of exploitable oligonucleotide sequences in this genomic region and the reduced sensitivity of some oligonucleotides due to sequence homology [Rouleau et al., 2007].
We were able to demonstrate the absence of the 100-kb “sequencing gap” in the genomic region upstream of the BRCA1 gene, thus confirming the BRCA1 genomic structure proposed in the GRCh37 genome assembly. The zoom-in aCGH arrays designed by Staaf et al. were based on the NCBI build 35 genome assembly (release date May 2004), which still included the “sequencing gap.” It follows that the 100-kb genomic gap that could not be covered by the arrays is indeed non-existent. Thus, the size of previously identified mutations that include this “sequencing gap” must be reduced by 100 kb. For example, the 300 kb BRCA1 large rearrangement (case L1985) must be reduced to 200 kb [Staaf et al., 2008]. MC can therefore be used for physical mapping of hard-to-sequence genomic regions that contain large numbers of repetitive elements. Here we demonstrate that the high concentration of Alu sequences in BRCA1 does not represent an obstacle for MC.
From a practical point of view, the global turnaround time for the complete analysis of 10–20 patients via MC is 2 weeks (corresponding to 250–500 patients per year), which is close to the turnaround time of aCGH analysis, and compatible to the needs of clinical diagnostic testing. By comparison, MLPA can be used to process up to 100 samples in parallel (e.g., 96-well plates) in just 2–3 days, making this technology more suitable for routine high-throughput predictive testing [Schouten, 2002]. Automation of MC is being further improved, by developing all the necessary technological tools that will significantly increase the number of samples that can be analyzed in parallel. Another limitation of MC is the large amount of DNA required, in the range of 500–1,000 ng (typically corresponding to 5×105 to 106 cells). This is far more than required for MLPA (20–50 ng), but similar to what is required for aCGH analysis [Rouleau, 2007; Staff, 2008]. The smallest large rearrangement described in this work has a size of 3kb (case BR05), and the limit of resolution of MC is in the range of 1–2 kb [Herrick 2009; Lebofsky, 2003]. As a comparison the resolution of aGCH is in the range of 500 bp [Rouleau et al.]. Within the BRCA1 large rearrangements reported so far in the literature, only 10% (9/81) have a size smaller than 2 kb and only 4% (3/81) have a size smaller than 500 bp [Sluiter et al., 2011]. According to our experience, the large rearrangements in the range of 500 bp (or smaller) should be also characterized with PCR-based technologies, since these mutations are close to detection limit of aCGH. Thus, the vast majority (90%) of large rearrangements identified so far are higher than our resolution limit. Generally speaking, if a potential mutation is detected with our genetic test, the whole MC assay is repeated. Measurements data in combination with the known position of probes can help to physically map the large rearrangement. For instance, primers can be placed on intact signals not involved in the mutations, which should get PCR product size smaller than 10 kb (applicable for classical or long-range PCR). In the cases of estimated PCR product sizes larger than 10kb, zoom-in CGH array can be employed to refine the mapping.
We propose the genetic test based on MC as a valuable tool for the detection and characterization of large rearrangements in BRCA1 and BRCA2, to be combined in clinical diagnostic settings with an assay that allows the detection of small mutations (e.g., sequencing). This is particularly valuable for the 80% of patients with breast and ovarian cancer predisposition, for which no deleterious mutations is detected with MLPA/sequencing techniques. We estimate the price of the consumables related to our genetic test (DNA extraction kit, DNA probe set kit and coverslips) to be in the range of approximately $1,000 (€800). Other technologies currently employed in diagnostic testing of BRCA1 and BRCA2 are NGS and aCGH. As a comparison, the cost of the consumables related to the NGS, has been estimated to be in the order of $1,500 [Walsh et al., 2010]. Of notice, the limitation of NGS in clinical diagnostic settings was recently highlighted because of the high content of hard-to-sequence repetitive sequences present in high percentage in both BRCA1 and BRCA2 [De Leeneer et al., 2011]. Large rearrangements could be detected by NGS, only with a very high coverage of 1200x [Walsh et al., 2010], but a lower coverage of 120x did not allow the detection of large rearrangements in both genes [De Leeneer et al., 2011]. The needed high coverage could influence the overall cost of large rearrangements detection via NGS. The cost of CGH arrays has been described has varying between $250 and $800, depending on the array format, but no information has been provided on the global cost of consumables related to CGH analysis [Staaf et al., 2008]. Of notice, the standard commercial test (DNA sequencing of both genes and screening for five large deletions and duplications in BRCA1) proposed by Myriad Genetics costs $3,340, and comprehensive testing for gene rearrangements is offered as a separate test at an additional cost of $650.
We see the main application of the developed MC-based assay as a diagnostic genetic test. Once the overall throughput will be improved, we envisage to extended the application of the developed assay as a companion diagnostic test, for instance in the screening of BRCA-mutated cells in the context of the development of PARP-1 inhibitors. Thus, the genetic test may be applied not only to clinical blood samples, but also to circulating cells and heterogeneous cell populations, such as tumor tissues.
Conflict of Interest
All the authors declare that they have no Conflicts of Interest linked to the submitted manuscript.
Acknowledgements
This study is dedicated to Daniel Nerson. The authors would like to thank Sylvie Mazoyer, for critical reading of the manuscript and Clemence Thiberville for the provided inputs on QA/QC checkpoints analysis. The authors would also like to thank Jennifer Abscheidt and Solenne Guillon for the help provided in the analysis of the negative controls.