Analysis of TP53 Mutation Status in Human Cancer Cell Lines: A Reassessment
Contract grant sponsors: Cancerföreningen i Stockholm and Cancerfonden; The University of Texas Southwestern Medical Center and The University of Texas MD Anderson Cancer Center Lung SPORE (grant P50 CA070907).
For the TP53 Special Issue
ABSTRACT
Tumor-derived cell lines play an important role in the investigation of tumor biology and genetics. Across a wide array of studies, they have been tools of choice for the discovery of important genes involved in cancer and for the analysis of the cellular pathways that are impaired by diverse oncogenic events. They are also invaluable for screening novel anticancer drugs. The TP53 protein is a major component of multiple pathways that regulate cellular response to various types of stress. Therefore, TP53 status affects the phenotype of tumor cell lines profoundly and must be carefully ascertained for any experimental project. In the present review, we use the 2014 release of the UMD TP53 database to show that TP53 status is still controversial for numerous cell lines, including some widely used lines from the NCI-60 panel. Our analysis clearly confirms that, despite numerous warnings, the misidentification of cell lines is still present as a silent and neglected issue, and that extreme care must be taken when determining the status of p53, because errors may lead to disastrous experimental interpretations. A novel compendium gathering the TP53 status of 2,500 cell lines has been made available (http://p53.fr). A stand-alone application can be used to browse the database and extract pertinent information on cell lines and associated TP53 mutations. It will be updated regularly to minimize any scientific issues associated with the use of misidentified cell lines (http://p53.fr).
Introduction
The first tumor cell line was developed in the middle of the 20th century [Gey et al., 1952; Dulbecco and Vogt, 1954]. Thereafter, as the effectiveness of culture media improved, the number of lines continued to grow and today the research community has several thousand human cancer cell lines from various types of neoplasia at its disposal [Neve et al., 2006; Gazdar et al., 2010]. These cell lines have been of tremendous assistance in improving our knowledge on cell transformation, as illustrated by genotype and phenotype studies.
First, analyses of the genomes of these cell lines were essential in the discovery of the various cancer genes including oncogenes and tumor-suppressor genes. As the genetic alterations of these cell lines accurately recapitulated original tumors, they were an invaluable source of material at a time when PCR and omics studies of human tissue were still in the realm of science fiction.
Second, the analysis of the consequences of genetic alterations on the intricate networks that sustain cancer cell survival led to the identification of key pathways targeted by the various genetic and epigenetic modifications, and to the definition of the fundamental hallmarks of cancer. With the development of novel methodologies that combine full genome sequencing and global RNA and protein expression, the intimate networks characterizing particular cell lines have already become available for the most popular ones and will become available for all of them in the near future [Pleasance et al., 2010; Abaan et al., 2013].
Cell lines are also used in thousands of research laboratories as biological test tubes for a large variety of experiments. They are essential for the screening of drugs (the NCI-60 panel), producing various macromolecules, and modeling human tumors [Shoemaker, 2006]. High-throughput studies examining the relationship between tumor cell line genomics and sensitivity to anticancer agents have been released and are invaluable sources of information [Barretina et al., 2012; Moghaddas Gholami et al., 2013].
The problem of cell line cross-contamination and misidentification has been known for quite some time as illustrated by the first—but not the last—warnings expressed by Nelson-Rees et al. (1981) more than thirty years ago. Today, this issue continues to be ignored. Indeed. recent studies have suggested that the “silent and neglected danger” of cross-contamination or misidentification may affect 10%–20% of cell lines [Drexler et al., 2000; MacLeod et al., 2002]. Cell line misidentification results in an incapacity to reproduce research results and the retraction of published papers, both of which are a waste of research resources. In a previous study, using TP53 status as the sole criterion for analysis, we found discrepancies for 23% (88/384) of cell lines, for which the p53 status was established independently in two laboratories [Berglind et al., 2008].
Cell line misidentification is the first source of these discrepancies. It has multiple origins such as mislabeling of culture flasks, working with cell lines that have close or similar names, or obtaining secondhand cell lines from other laboratories. The second source of discrepancies is cell line cross-contamination, which results in a composite phenotype of the two cell lines (Table 1). Cross-contamination has multiple origins too, for example, working simultaneously with multiple culture flasks or using cell lines contaminated during the establishment process. Handling only one cell line at a time in the cabinet would be an easy way to avoid cross-contamination, but unfortunately this is rarely done in practice.
Cell line in publication 1 | Cell line in publication 2 | Potential origin of the controversies | Incidence | Resolution |
---|---|---|---|---|
Mutation X | Mutation Y unrelated to mutation X | Cell line misidentification | Frequent |
|
Mutation X | Wild typea | Cell line misidentification | Frequent |
|
Loss of one allele if the cell line expresses both wild type and mutant alleles | Possible but very unlikely. Has never been experimentally proven | SNP analysis | ||
If the status of cell line 2 was analyzed via RNA/cDNA sequencing, NMD could lead to a false-negative results | Possible but occasional | Genomic analysis | ||
Mutation X and mutation Y | Only one of the two mutations, X or Y | Loss of one mutant allele | Possible but very rare. Has never been experimentally proven | SNP analysis |
Cell line cross-contamination between cell line 1 and an unidentified cell line that express the other mutation | Frequent |
|
||
Mutation is a single-nucleotide variant | Mutation is a deletion |
|
Very frequent | Check the boundary of the deletion if it corresponds to an intron–exon junction |
Cell line cross-contamination | Possible |
|
- a It has been assumed that the same regions of the gene have been covered by the studies.
Another potential source of cell line genotype discrepancies is related to the methodology used for the analysis. Using either DNA or cDNA sequencing can lead to serious differences with mutation miscalls using RNA-based assays [Kropveld et al., 1999]. An initial problem is associated with splice mutations. Using RNA sequencing, splice mutations may not be correctly detected as only the consequence of the mutation is identified, usually a deletion that starts close to the intron/exon boundary. This leads to the misidentification of a missense mutation located at a splice site, since it is quoted as a deletion. This problem can be circumvented by performing genomic DNA sequencing. Any deletion that starts or ends at a splice site and encompasses a part of an exon or includes an intronic sequence in an analysis using an RNA-based assay should be considered as suspicious and warrant DNA-based screening. Nonsense-mediated mRNA decay (NMD) is a second potential problem associated with RNA-based assays. Frameshift and nonsense mutations have been known to induce significant RNA instability via the NMD pathway. This instability could impair the detection of mutations and lead to a false wild-type genotype.
Using RNA-based assays has led to mutation discrepancies for the TP53 gene (MIM #191170) in numerous tumors and cell lines. Cell lines such as OVCAR-8 or HOP62 were previously described with an exon deletion, but more recent studies using genomic DNA sequencing have confirmed that these two cell lines have single-nucleotide mutations at splice sites. Using both RNA and DNA sequencing, Hauser et al. (2002) have revised the TP53 status of nine of the 14 cell lines from head and neck squamous cell carcinoma.
This problem is not minor as partial gene deletion and splicing mutations have different effects. For a complex gene such as TP53 that expresses multiple RNA species synthesizing in turn at least 12 isoforms, this difference is important for phenotype interpretation in cell lines. This issue in genotyping cell lines using RNA sequencing must not be considered as specific to TP53; indeed, it applies to any gene.
In the present paper, we will present an updated version of the p53 Mutations in Cell Lines Compendium, comprising 2,500 tumor cell lines from various cancer types. We will also offer recommendations to avoid the use of misidentified or cross-contaminated cell lines.
The NCI-60 Cell Line
The “NCI-60” cell panel was developed by the National Cancer Institute primarily for in vitro anticancer drug screening and has been used thusly for more than 100,000 compounds [Shoemaker, 2006]. In addition, these cell lines have been used in a vast number of studies as tools for exploring cellular transformation or as models for tumorigenesis. They have been extensively analyzed at the DNA, RNA, and protein levels using conventional methods or, more recently, large high-throughput omic studies [Abaan et al., 2013; Moghaddas Gholami et al., 2013]. Careful analyses of genetic markers have shown that several cell lines from the NCI-60 panel were misidentified by the original depositor (Table 2). One of the most dramatic cases is to be found with the NCI/ADR-RES cell line [Liscovitch and Ravid, 2007]. This line was previously known as MCF-7/Adr and thought to be a multidrug-resistant, P-glycoprotein-expressing cell line derived from the popular breast cancer cells MCF-7. Studies of MCF-7/Adr and the presumed parental cell line MCF-7 resulted in several hundred publications and both lines were included on the NCI-60 panel. However, in 1998, it was shown that MCF-7/Adr was not a derivative of MCF-7 and the line was consequently renamed NCI/ADR-RES [Scudiero et al., 1998]. Then, in 2007, it was demonstrated that this cell line was identical to the ovarian cancer cell line OVCAR-8 [Liscovitch and Ravid, 2007]. Unfortunately, this story is not unique:
Over the last decade, with the development of short tandem repeat (STR) profiling, the origin of identity of many cell lines, belonging or not to the NCI60 panel, have had to be updated (Table 2 and (American Type Culture Collection Standards Development Organization Workgroup ASN-0002, 2010)). The popularity of these cell lines has led to a high rate of their second-hand, inter-laboratory exchange, which in turn increases the incidence of misidentification and cross-contamination.
Sample_ID | ATCC | Cancer | cDNA_varianta | Protein_variantb | COSMIC databasec | Comments |
---|---|---|---|---|---|---|
BT-549 | HTB-122 | Breast carcinoma | c.747G>C | p.R249S | Yes | |
Hs-578-T | HTB-126 | Breast carcinoma | c.469G>T | p.V157F | Yes | |
MCF-7 | HTB-22 | Breast carcinoma | Wild type | Wild type | Yes | |
MDA-MB-231 | HTB-26 | Breast carcinoma | c.839G>A | p.R280K | Yes | |
T47D | HTB-133 | Breast carcinoma | c.580C>T | p.L194F | Yes | |
COLO-205d | CCL-222 | Colorectal carcinoma | c.308del26ins2 | p.Y103fs*37 | Yes | Controversial status; see text |
COLO-205d | CCL-222 | Colorectal carcinoma | c.308A>T | p.Y103F | Yes | Controversial status; see text |
HCC-2998 | Colorectal carcinoma | c.637C>T | p.R213* | Yes | ||
HCT-116 | CCL-247 | Colorectal carcinoma | Wild type | Wild type | Yes | |
HCT-15d | CCL-225 | Colorectal carcinoma | c.1101-2A>C | p.0? | Yes | Controversial status; see text |
HCT-15d | CCL-225 | Colorectal carcinoma | c.722C>T | p.S241F | No | Controversial status; see text |
HT-29 | HTB-38 | Colorectal carcinoma | c.818G>A | p.R273H | Yes | |
KM12e | Colorectal carcinoma | c.536A>G | p.H179R | Yes | Controversial status; only the missense mutation is described in several publications | |
KM12e | Colorectal carcinoma | c.216del | p.V73fs*50 | Yes | Controversial status; only the missense mutation is described in several publications | |
KM12e | Colorectal carcinoma | c.210del | p.V73fs*50 | Yes | Controversial status; only the missense mutation is described in several publications | |
SW480d | CCL-228 | Colorectal carcinoma | c.925C>T | p.P309S | No | SW480 and SW620 derived from the same patient and have similar TP53 mutations |
SW480d | CCL-228 | Colorectal carcinoma | c.818G>A | p.R273H | No | SW480 and SW620 derived from the same patient and have similar TP53 mutations |
SW620d | CCL-227 | Colorectal carcinoma | c.925C>T | p.P309S | Yes | SW480 and SW620 derived from the same patient and have similar TP53 mutations |
SW620d | CCL-227 | Colorectal carcinoma | c.818G>A | p.R273H | Yes | SW480 and SW620 derived from the same patient and have similar TP53 mutations |
SF-268 | Glioblastoma | c.818G>A | p.R273H | Yes | ||
SF-295 | Glioblastoma | c.743G>A | p.R248Q | Yes | ||
SF-539 | Glioblastoma | c.1024del | p.R342fs*3 | Yes | ||
SNB-19 | CRL-2219 | Glioblastoma | c.818G>A | p.R273H | No | This cell line is identical to U251 and has been discontinued |
SNB-75 | Glioblastoma | c.772G>A | p.E258K | Yes | ||
U251 | Glioblastoma | c.818G>A | p.R273H | Yes | ||
CCRF-CEMd | CCL-119 | Leukemia/lymphoma | c.743G>A | p.R248Q | Yes | |
CCRF-CEMd | CCL-119 | Leukemia/lymphoma | c.524G>A | p.R175H | Yes | |
HL-60 | CCL-240 | Leukemia/lymphoma | c.(?_28)_(*1027_?)del | p.0 | No | |
K-562 | CCL-243 | Leukemia/lymphoma | c.406_407ins1 | p.Q136fs*13 | Yes | |
MOLT-4 | CRL-1582 | Leukemia/lymphoma | c.916C>T | p.R306* | Yes | Controversial status; see text |
RPMI-8226 | CCL-155 | Leukemia/lymphoma | c.853G>A | p.E285K | Yes | |
SR | CRL-2262 | Leukemia/lymphoma | Wild type | Wild type | Yes | |
A-549 | CCL-185 | Lung (NSCLC) | Wild type | Wild type | Yes | |
EKVX | Lung (NSCLC) | c.del609_610insTG | p.E204* | Yes | ||
HOP-62 | Lung (NSCLC) | c.673-2A>G | p.0? | Yes | ||
HOP-92 | Lung (NSCLC) | c.524G>T | p.R175L | Yes | ||
NCI-H226 | CRL-5826 | Lung (NSCLC) | c.473G>T | p.R158L | Yes | Reported as wild type in the COSMIC database |
NCI-H23 | CRL-5800 | Lung (NSCLC) | c.738G>C | p.M246I | Yes | |
NCI-H322M | Lung (NSCLC) | c.743G>T | p.R248L | Yes | ||
NCI-H460 | HTB-177 | Lung (NSCLC) | Wild type | Wild type | Yes | |
NCI-H522 | CRL-5810 | Lung (NSCLC) | c.572delC | p.P191fs*56 | Yes | |
LOXIMVI | Melanoma | Wild type | Wild type | Yes | ||
MALME-3M | HTB-64 | Melanoma | Wild type | Wild type | Yes | |
M14f | Melanoma?/Breast? | c.797G>A | p.G266E | Yes | M14 and MDA-MB-345 are similar; the origin of these cell lines (melanoma or breast) is highly controversial | |
MDA-MB-435f | HTB-129 | Melanoma?/Breast? | c.797G>A | p.G266E | Yes | M14 and MDA-MB-345 are similar; the origin of these cell lines (melanoma or breast) is highly controversial |
SK-MEL-2 | HTB-68 | Melanoma | c.733G>A | p.G245S | Yes | |
SK-MEL-28 | HTB-72 | Melanoma | c.del434_435insTG | p.L145R | Yes | |
SK-MEL-5 | Melanoma | Wild type | Wild type | Yes | ||
UACC-257 | Melanoma | Wild type | Wild type | Yes | ||
UACC-62 | Melanoma | Wild type | Wild type | Yes | ||
IGROV-1d | Ovarian carcinoma | c.267_268ins1 | p.P90fs*59 | Yes | Controversial status; described as wild type in several publications | |
IGROV-1d | Ovarian carcinoma | c.377A>G | p.Y126C | Yes | Controversial status; described as wild type in several publications | |
OVCAR-3 | HTB-161 | Ovarian carcinoma | c.743G>A | p.R248Q | Yes | |
OVCAR-4 | Ovarian carcinoma | c.388C>G | p.L130V | Yes | ||
OVCAR-5 | Ovarian carcinoma | Wild type | Wild type | Yes | Controversial status | |
NCI-ADR-RES | Ovarian carcinoma | c.376-1G>A | p.0? | Yes | Originally named MCF-7/AdrR cells, later re-designated NCI/ADR-RES, this cell line is identical to OVCAR-8 | |
OVCAR-8 | Ovarian carcinoma | c.376-1G>A | p.0? | Yes | ||
SK-OV-3 | HTB-77 | Ovarian carcinoma | c.267del | p.P90fs*33 | Yes | |
DU-145d | HTB-81 | Prostate carcinoma | c.820G>T | p.V274F | Yes | Controversial status; see text |
DU-145d | HTB-81 | Prostate carcinoma | c.668C>T | p.P223L | No | Controversial status; see text |
PC-3 | CRL-1435 | Prostate carcinoma | c.414del | p.K139fs*31 | Yes | |
786-0d | CRL-1932 | Renal cell carcinoma | c.832C>G | p.P278A | Yes | |
786-0d | CRL-1932 | Renal cell carcinoma | c.560-2A>G | p.0? | Yes | |
A498 | HTB44 | Renal cell carcinoma | Wild type | Wild type | Yes | |
ACHN | CRL-1611 | Renal cell carcinoma | Wild type | Wild type | Yes | |
CAKI-1 | HTB46 | Renal cell carcinoma | Wild type | Wild type | Yes | |
RXF393 | Renal cell carcinoma | c.524G>A | p.R175H | Yes | ||
SN12C | Renal cell carcinoma | c.1006G>T | p.E336* | Yes | ||
TK10 | Renal cell carcinoma | c.791T>G | p.L264R | Yes | ||
UO-31 | Renal cell carcinoma | Wild type | Wild type | Yes |
- a Nomenclature using NM_00546.5.
- b Nomenclature using NP_000537.3.
- c Indicates whether or not this cell line is included in the last issue of the COSMIC database (v67) (http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/).
- d Cell line with two TP53 mutations.
- e Cell line with three TP53 mutations.
- f Derivative cell lines from MDA-MB-435 (MDA-MB-435S or MDA-N) have a similar TP53 status and a controversial origin.
In 1997, the TP53 status of the entire NCI 60 cell line panel was published [ O'Connor et al., 1997]. Very quickly, it became apparent that the study contained numerous inaccuracies. The authors had used an RNA-based assay and thus several splice mutations had been mislabeled as TP53 gene deletion, and furthermore they had obviously misidentified several cell lines.
Ikediobi et al. (2006) from the Welcome Trust Sanger Institute published a novel study of the entire NCI-60 panel. Their work included the genomic sequencing of 24 cancer genes, including TP53 [Ikediobi et al., 2006]. Using the UMD_TP53 database and unpublished data from various laboratories, we also released a list of the genotypes of the TP53 gene in this panel [Berglind et al., 2008]. Both studies showed that the TP53 status of 19 cell lines in the NCI-60 panel were previously miscalled, due to either misidentification or the use of an RNA-based assay for screening [Berglind et al., 2008]. Despite this illustration of obvious errors in the original work and beyond, the 1997 paper has been cited more than 640 times and is still regularly cited, using thus erroneous data to infer the genotype of these various cell lines including the CellMiner portal from the NCI (http://discover.nci.nih.gov/cellminer/home.do).
A revised version of TP53 gene status in the entire NCI-60 panel is shown in Table 2. It has been updated (since our last release in 2008) using data from new publications and the latest version of the COSMIC database, V67 (http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/). Although consensus has been established for the majority of the cell lines, a few controversies still remain.
The most complicated case is the HCT-15 colorectal carcinoma cell line. This cell line and DLD1 came from the same patient and it has been recently suggested that two other cell lines, HCT-8 and HRT-18, share a similar genetic background. The exact history of the four cell lines is currently unclear. Ikediobi et al. (2006) reported two alterations: one missense mutation in the donor site of intron 10 (c.1101-2A>C) and a missense mutation c.722C>T, (p.S241F) in exon 7 of the second allele. In several publications, only the missense mutation has been detected, whereas only the splice mutation is described in the COSMIC database [Rodrigues et al., 1990; Cottu et al., 1996]. This may be due to a lack of screening of exons 9–11 of the TP53 gene in a few of the studies, but if we exclude this technical problem, another possible explanation for this discrepancy could be the loss of one of the TP53 alleles. Such an event, although never described, would result in the description of a single-homozygote mutation. The missense mutation has also been described as a single event in the DLD1 cell line, but as above, most of the concerned studies did not extend their analysis to exon 10. For HCT-8 and HRT-18, observations are too scarce to draw any definitive conclusions.
A similar situation can be observed for the prostate cancer cell line DU-145 (ATCC HTB-81). Several independent publications described two TP53 mutations in different alleles (c.820G>T, p.V274F and c.668C>T, p.P223L), whereas others, including the 2006 publications of Ikediobi et al. (2006), described only one of the two mutations [Isaacs et al., 1991; Bajgelman and Strauss, 2006; Forbes et al., 2011]. The latest version of the COSMIC database, V67 (http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/), hold only a single mutation (c.820G>T, p.V274F).
The TP53 status in the OVCAR-5 cell line is also controversial. The COSMIC database reports it as wild type, but several independent investigators have described the cell line as TP53 null with a 3-bp insertion localized close to the splice donor sequence of intron 6. Western blot analyses performed in these studies confirmed the absence of TP53 expression, whether the cell line had been induced or not [Debernardis et al., 1997; Mabuchi et al., 2007].
The status of the MOLT-4 cell line is one of the best examples of cell line misidentification. Indeed, at least five different TP53 statuses have been published, including an unidentified splice mutation [Chow et al., 1993], p.R248Q [Rodrigues et al., 1990], p.L111V [Murai et al., 2005], p.R306* [Ikediobi et al., 2006], or no mutations [Smardova et al., 2005; Tichy et al., 2008]. The misidentification is supported by several publications that showed either accumulation of a full-length mutated TP53 protein or the expression of a DNA damage-inducible wild-type TP53 in the cell line. Table 2 includes the TP53 status from the V67 COSMIC database (p.R306*), but we advise those who may use this cell line to check its status before undertaking any projects. The ATCC describes a MOLT-4 cell line associated with a p.R248Q TP53 mutation.
The status of the colorectal carcinoma cell line COLO-205 is also ambiguous with a mix of cell line misidentification, nomenclature inaccuracy, mutation description imprecision, or inadequate definition of a genotype and phenotype. COLO-205 was previously reported with a missense mutation (p.G266E), but Ikediobi et al. (2006) later described complex genetic events (c.308_333>TA, p.Y103fsX37) for it, which are also reported at the CellMiner portal from the NCI (http://discover.nci.nih.gov/cellminerdata/rawdata/mutation.txt). In the V67 COSMIC database (http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/), the cell line is associated with the same event described by Ikediobi et al. (2006) but with a correct nomenclature (c.308_333>TA, p.Y103_R110delYQGSYGFR), and a second event, c.308A>T, p.Y103F. Whether these two events are on similar or different alleles is not known. Browsing the literature, we found multiple publications stating that COLO-205 has a wild-type TP53, but due to a lack of references we could not trace any genetic evidence for this status. Several studies have evidenced this by showing an accumulation of the TP53 protein after DNA damage. This confusing situation may be due to several overlapping problems but we cannot formally exclude that the mutations do not fully impair TP53 function in this cell line.
Cell lines from the NCI-60 panel are among the most commonly used in cancer biology and are therefore highly prone to misidentification and contamination. Purchasing these cell lines from an authorized dealer such as the ATCC and checking their genomic status regularly will be vital for preventing the dissemination of controversial materials.
Does TP53 Status in Cell Lines Reflect That of the Original Tumors? Individual and Global Analysis
There are several ways to consider TP53 status in cell lines. First, it can be considered individually and to confirm that a mutation found in a cell line was present in the original tumor. This question has been investigated in different types of cancer and across multiple studies, most finding perfect concordance between the original samples and the derivative cell lines. In a few instances, it was shown that the TP53 mutation was initially localized in a minor clone that expanded during the procedure to establish the cell lines, an observation suggesting that a lack of TP53 function give a strong selective advantage for in vitro culture. In a thorough analysis, Drexler et al. (2000) analyzed the TP53 status of matched primary cells and cell lines of 62 hematopoietic tumors. Concordance was confirmed in 85% of the pairs and in several additional cases, genetic analyses employing sensitive methodologies detected TP53 mutation in minor clones of the primary tumors. This observation mimics several clinical situations such as those observed in patients with either chronic lymphocytic or acute myelogenous leukemia. These cancers have TP53 alterations only infrequently at presentation but they display TP53 mutations during transformation of the disease into a more acute phase. Similar observations have been made for lung and brain tumors [Tada et al., 1996; Wistuba et al., 1999]. To our knowledge, de novo TP53 mutations arising during cell culture establishment have never been reported.
A more encompassing point of view can also be used to analyze TP53 alteration in cell lines. It is now widely accepted that a spectrum of mutations reflects specific mutagenesis processes induced by either external exposure to mutagens or internal mechanisms. The analysis of the spectrum of mutations in TP53 was vital in the studies that established this tenet, although they are now superseded by studies on the entire genome landscape obtained from cancer-genome sequencing. A comparison of the various mutational events in cell lines and tumors is shown in Figure 1. As previously observed, the pattern of mutations differs between various types of cancer, but there is a striking similarity when tumors and cell lines from a same origin are compared, whether the analysis is restricted to TP53 or extended to the whole genome [Bignell et al., 2010; Soussi, 2011; Lawrence et al., 2013]. In colorectal and brain cancer, there is a predominance of GC>AT transitions, whereas in lung cancer, GC>TA transversion is more frequent due to tobacco smoking. These observations also argue against accidental TP53 mutation during culturing, a process that would create a more random spectrum. The frequency of TP53 mutation in cell lines is apparently higher than in tumors, an observation that may reflect the selective advantage imparted by a lack of TP53 function for their establishment. However, this observation may also be explained by a possible underestimation of the frequency of TP53 mutations in human cancer, since most TP53 analyses were performed using Sanger sequencing. This latter provided a global TP53 status for tumors where at least 20% of tumor cells express the mutations. We do know now that this corresponds to a snapshot performed at the time of diagnosis on a very heterogeneous tissue. Small clones carrying mutant TP53 would not be identified by this methodology, but as previously suggested, they may be at the origin of the cell lines, leading thus to an underestimation of the frequency of TP53 mutations.

Using Cell Lines for TP53 Studies: Recommendations
Recommendations for avoiding the cross-contamination of cell lines or the use of misidentified cell lines are beyond the scope of this review but available from numerous publications and Websites (see also Box 1).
Box 1: General recommendations
Acquisition and manipulation are two steps that must be controlled to avoid potential problems.
- - Always purchase cell lines from an official distributor.
- - Cell lines obtained second hand must be quarantined, checked for authentication and cross-contamination by short-tandem repeat (STR) profiling, and assessed for spurious mycoplasma infection before any experiments are carried out.
- - Accurate lists of misidentified cell lines are now available via the Website of the International Cell Line Authentication Committee (ICLAC) (http://standards.atcc.org/kwspub/home/the_international_cell_line_authentication_committee-iclac_/).
- - The information gathered from a simple PUBMED search analysis will often be sufficient to avoid numerous problems.
- - Upon arrival of a cell line, establishing a frozen seed stock will insure that the original cell line remains available to researchers to improve reproducibility and prevent overpassaging.
- - Only one cell line should be passed at a time. Using this simple recommendation will prevent most cross-contamination problems. It should be noted that STR profiling will not identify cross-contamination between stable transfectants issued from a single cell line. Collections of stable transfectants are commonly used in TP53 research, but since they express different TP53 mutants, they can only be differentiated via the sequencing of the exogenous transgene. Handling and passing several transfectants simultaneously should be avoided at all costs.
- - Recording passage number is mandatory and starting from a fresh stock on a regular basis is recommended.
- - TP53 responds to many endogenous and exogenous stress events, including cell handling activities such as transfection, confluence or medium changes. Cell passaging should be performed on a regular basis using constant procedures. Cell lines that harbor deficiencies in DNA repair genes, for example, HCT-116, should be regularly checked to avoid the selection of novel genetic variants.
Useful Website:
http://en.wikipedia.org/wiki/List_of_contaminated_cell_lines
https://www.dsmz.de/fileadmin/Bereiche/HumanandAnimalCellLines/Cross_Contaminations_v7_1.pdf
Here, we will focus on the use of cell lines in the light of TP53 status as this latter has profound consequences on cell phenotype due to the pleiotropic functions of the protein on multiple pathways. Many of these recommendations are nothing more than simple common sense but nonetheless sufficient to prevent most problems. Of note also is the point that several journal editors now require an assessment of proper handling of cell lines upon submission.
For all cell lines and before undertaking any scientific project, we recommend performing an in silico check of the genetic background of the cell line using the Cell Lines Project hosted at the COSMIC Website (http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/), the TP53 Mutation cell lines compendium available on the TP53 Website (http://p53.fr), or by browsing the literature and particularity of the databases of contaminated and misidentified cell lines available on the Web. The ICLAC maintains a database of cross-contaminated and misidentified cell lines (http://standards.atcc.org/kwspub/home/the_international_cell_line_authentication_committee-iclac_/).
TP53 Null Tumor Cell Lines
The “p53-null” statement is used in multiple ways in the literature. The most common meaning is a cell line with an absence of TP53 expression assessed either via RNA or protein analysis. Unfortunately, many events can result in an absence of TP53 expression, including small insertions and deletions or nonsense and splice mutations because NMD eliminates the aberrant transcripts. HeLa cells that express wild-type TP53 and have no mutations may nonetheless end up being identified as null when E6 protein expressed from an endogenous papillomavirus degrades TP53, thus leaving no protein to be detected by Western blot.
To avoid problems due to left over TP53 sequences, only cell lines with deletion of the endogenous TP53 gene must be stated as “TP53 null.” Several popular cell lines, such as saos-2, HL-60, or H1299, are widely known for having a biallelic deletion of the TP53 gene.
Tumor Cell Lines with Frameshift or Splice Mutations
These cell lines still contain intact regulatory elements that can be actionable by the addition of exogenous TP53 such as a TP53 response element in intron 4. The discovery that the TP53 gene expresses multiple isoforms via alternative splicing or the use of different start codons suggests that some mutations may target only a few isoforms, leaving intact the expression of the remaining ones [Bourdon et al., 2005; Soussi et al., 2014].
In most cases, RNA expression in these lines was assessed by Northern blot, which is not sufficiently specific to identify a residual expression of truncated transcripts. Protein expression was usually assessed with monoclonal antibodies, which recognize only the full-length TP53 and therefore miss shorter isoforms. Several polyclonal antibodies are also biased toward specific isoforms.
To avoid problems associated with spurious expression of shorter RNA and/or proteins, we recommend analyzing these cell lines very carefully. Methodologies for detecting and quantifying isoforms at mRNA and protein levels are currently available [Khoury et al., 2013; Marcel et al., 2013].
Cell Lines Expressing a Single-Mutant TP53
The majority of these cell lines do not contain a wild-type TP53 allele and thus the detection of one is highly suggestive of cross-contamination. This should be carefully checked before undertaking any long-term project.
Loss of the wild-type allele can occur either via a true loss of heterozygosity (LOH) with a partial or total deletion of the short arm of chromosome 17p or via copy-neutral LOH (cnLOH) where homologous recombination replaces a chromosomal segment with a wild-type allele by the mutant sequence. cnLOH has been highly underestimated in human tumors as it is not detected efficiently via restriction fragment length polymorphism (RFLP) or comparative genomic hybridization (CGH); its detection requires the use of high-resolution SNP mapping or fluorescence in situ hybridization (FISH). Several cell lines have been shown to harbor two TP53 alleles bearing the same mutations, a situation highly suggestive of cnLOH [Saeki et al., 2011]. Having one or two copies of TP53 mutant alleles should not influence the outcome of experimental procedures except for those that involve specific gene manipulation where two copies of the gene will have to be managed.
TP53 mutations are highly heterogeneous. Hot-spot TP53 mutants are indeed fully impaired functionally, but other mutants may be only partially defective resulting in a mixed phenotype.
To avoid the risk of misinterpretation, it is vital to assess remaining TP53 activity in relation to the project to be carried out.
Cell Lines Expressing Multiple TP53 Mutants
About 10% of cell lines express two (or even more) different TP53 mutants. There are several cases where cross-contamination cannot be excluded, and we thus do recommend STR profiling for cell lines that are not included in the various databases or appear to have controversial genotypes in the literature.
For many cell lines, it is currently unknown as to whether or not the different mutations are on a single allele or distributed on two (or more) copies of the TP53 gene. Assessing this information may be useful depending on the project, but would require a cloning step to independently sequence the individual alleles.
Tumor Cell Lines Expressing Wild-Type TP53
Mutations are not the only way to inactivate TP53 function. Additionally, amplification of the MDM2 and MDMX genes, which encode negative regulators of TP53, is frequent in several types of cancer such as sarcoma, melanoma, or breast cancer [Wade et al., 2013]. The two widely used cell lines SJSA-1 (CRL-2098, osteosarcoma) and MCF-7 (HTB-22, breast carcinoma) express wild-type TP53 but also display MDM2 and MDMX amplification, respectively. In both lines, TP53 activity is impaired but can be restored by silencing the amplified gene. Whether all functions of TP53 are disabled in these cell lines is not known.
Other conditions such as viral infection (HeLa cell line) or genetic alterations to the various members of the TP53 network can also impair TP53 function. Using “wild type” to define the status of the TP53 gene at the genomic level and to define the phenotype of the cell line is highly confusing. It furthermore reflects no reality and should thus be avoided.
We also recommend checking TP53 functions in laboratory conditions. For several cell lines, assuming a given activity for TP53 based on the literature should be avoided. The numerous cases of cell line misidentification have introduced erroneous data, which, if used, would result in a study based on false starting information. As discussed above, the MOLT-4 cell line, with a highly controversial status, has been shown to express a full-length wild-type TP53 in several studies.
Due to the tremendous number of TP53 functions, we will not make specific recommendations. We can state however that a good place to start would be to assess whether endogenous TP53 activates after various stress events and induces the transcription of specific TP53 targets.
Special Notes for Cell Lines CAPAN2, NCI-H2347, and NCI-H82
The cell lines CAPAN-2 (c.375G>T), NCI-H2347 (c.375G>A), and NCI-H82 (c.375G>T) carry a mutation in the third base of codon 125 that does not lead to a change in the amino acid (Threonine). These variations are considered as nonpathogenic in most databases. Codon 125 is localized at the end of exon 4. It has been experimentally demonstrated that substitutions at this position impair TP53 splicing dramatically and abolish TP53 expression in CAPAN-2 cells as well as in primary tumors [Suwa et al., 1994]. The mutant status should be assigned to cell lines harboring this mutation.
The status of CAPAN-2 cells is highly controversial in the literature with either a wild-type status or a missense mutation at codon 273. This situation may be attributable to cell line misidentification or possibly to partial screenings that did not include exon 4. The final status of CAPAN-2 is based on the status initially defined by Suwa et al. (1994) and confirmed by the exome sequencing project performed at the Welcome Trust Sanger Institute.
The p53 Mutations in Cell Lines Compendium
The p53 Mutations in Cell Lines Compendium is a novel and original application that provides an ID card for each cell line included in the UMD TP53 database. The current version includes 2,000 and 500 cell lines with mutant TP53 and wild-type TP53, respectively. The ID cards include cell line data such as cancer type and ATCC number when available, as well as some specific information on potential misidentifications, changes of identity, or shared origins with other cell lines (Fig. 2). A first set of information focuses on the TP53 mutation and provides data on the various properties of the protein mutant, for example, its frequency in the database, the consequences of the mutation on TP53 activities as predicted by the most common algorithms, or its residual activities assessed using a yeast functional assay. A second set of information depicts the consequences of this mutation on the various transcripts and isoforms expressed by the TP53 gene using the official HGVS description (http://www.hgvs.org/) and the TP53 coordinates and nomenclature recommended by the Locus Reference Genomic (http://www.lrg-sequence.org/).

The p53 Mutations in Cell Lines Compendium can easily be browsed using multiple criteria, with cell line name, cancer type, or mutation type serving as entry point data (Fig. 2; Supp. Fig. S1). The p53 Mutations in Cell Lines Compendium can be downloaded for both Mac and Windows from the TP53 Website (http://p53.fr). Full documentation is provided with the software.
Conclusion
In 2008, we published a first report describing the status of TP53 mutation in cell lines [Berglind et al., 2008]. We were highly concerned by the finding that many cell lines had controversial TP53 status, including popular cell lines such as those from the NCI-60 panels.
In the present analysis, we have established that a high number of cell lines have been misidentified using only TP53 gene mutations as an identifier to distinguish the various cell lines. Using TP53 mutations for this analysis was twofold: first, the TP53 status is one of the most important parameters in cell line phenotype in response to multiple experiments including drug assays and is a part of information that is essential before making any particular interpretation; second, as TP53 mutations are very frequent in cell lines and have been published for the majority of them, it is one of the most convenient parameters to perform this type of analysis. This is also supported by the observation that the high diversity of TP53 mutation makes most cell lines quite unique, something that is not achievable for other genes such as HRAS.
In few cases, the controversial status detected in several cell lines could be due to the difference in sequencing strategies such as RNA- or DNA-based assays or genes coverage (exon 5–8 vs. full gene sequencing). Nevertheless, we believe that the majority of the controversies are due to cell misidentification due to faulty manipulation. The present analysis using a single gene as an identifier to compare the various cell lines has some limitations but it only uncovered the tip of the iceberg. Particularly, this type of analysis will not address the problem of the origin of a cell line as exemplified by the status of the M14 cell line [Chambers, 2009, Christgen and Lehmann, 2007, Hollestelle and Schutte, 2009, Rae et al., 2007]. The TP53 status of this cell line in every study was always similar but its origin (breast vs. melanoma) can only be addressed by other types of analysis such as expression profiling.
In 2014, the situation has not really progressed despite efforts from the scientific community to raise awareness and encourage the provision of accurate data. A few journals have begun to tackle this problem and require cell line authentication before publication.
The literature remains plagued with reports of studies that used cell lines with controversial status. Furthermore, several recent databases, including the CellMiner database developed at the NCI for drug discovery using the NCI-60 panel, display lists of TP53 mutations with numerous mistakes (http://discover.nci.nih.gov/cellminer/home.do).
The latest issue of the COSMIC Cell Lines Project database (V67 January 2014) includes full exome sequencing of 1,015 cell lines and is freely available via a user-friendly interface (http://cancer.sanger.ac.uk/cancergenome/projects/cell_lines/). The third version of the p53 Mutations in Cell Lines Compendium is now available on the TP53 Website (http://p53.fr). The wealth of curated information it offers should prevent most mistakes, but only if researchers stop burying their heads in the sand and accept to face the problem of cell line misidentification.