Can replication save noisy microarray data?
Abstract
Microarray experiments are multi-step processes. At each step — the growth of cultures, extraction of mRNA, reverse transcription, labelling, hybridization, scanning, and image analysis — variation and error cannot be completely avoided. Estimating the amount of such noise and variation is essential, not only to test for differential expression but also to suggest at which level replication is most effective.
Replication and averaging are the key to the estimation as well as the reduction of variability. Here I discuss the use of ANOVA mixed models and of analysis of variance components as a rigorous way to calculate the number of replicates necessary to detect a given target fold-change in expression levels. Procedures are available in the package YASMA (http://www.cryst.bbk.ac.uk/wernisch/yasma.html) for the statistical data analysis system R (http://www.R-project.org). Copyright © 2002 John Wiley & Sons, Ltd.
The power of averaging
In the following, I assume that the results of a replicated microarray experiment are log2 ratios of fluorescent intensities in the two channels of a two-dye experiment comparing an experimental condition with a control condition. Any necessary background correction and normalization has been applied and each gene is associated with a series of log2 ratio values from replicated experiments. The statistical assumption is that log2 ratio values are normally distributed around their true mean. Genes with no differential expression have a true mean of 0. Any deviation of a mean log2 ratio from this base line indicates over- or underexpression.
Averaging is the key to the estimation of the true mean. Figure 1 demonstrates the fact — well-known from elementary statistics — that the average of a growing number of replicates approaches the true mean with decreasing variability. It shows 50 simulations from a normally distributed random variable with mean µ = 0.5 and variance σ2 = 1. The sample means of increasingly larger samples are indicated by open circles connected by a line. Notice that even though the variability is large (if these were log2 ratios of a real gene its fold-change would range from eight-fold under- to eight-fold overexpression), the true mean can be reliably estimated, provided enough replicates are considered. The reason for this improved accuracy with replication is that the variance of the sample mean is inversely proportional to the number n of replicates,
.

Sample means of increasingly larger samples of a normally distributed random variable with mean 0.5 and variance 1
How many arrays do I need?
The exact number of replicates necessary to detect overexpression of a gene reliably depends on several parameters. First, the underlying variance σ2 must be obtained (in the next section I discuss variance estimates from microarray data). Next, a significance level α needs to be specified for the probability of a type I error, i.e. the probability that a neutral gene with zero mean is falsely called as overexpressed. A gene with sample mean m will be called, if 1 − Φ (m/σm) ≤ α (where Φ is the standard normal cumulative distribution function). Finally, a minimum log2 ratio value t (target change) for overexpression, as well as a level β for the type II error of not calling an overexpressed gene, must be provided.



In the above example, σ = 1. If the target t is a log2 ratio value of 0.5 (1.41 fold-change), which should be detectable at a significance level of 0.05, then n ≥ 10.82, i.e. 11 replicates are needed if we accept that half of the overexpressed genes might go unnoticed (β = 0.5, although this is a very conservative estimate). If we want to detect at least 75% of overexpressed genes (β = 0.25), the number of necessary replicates rises to 22.
Hierarchical replication
Microarray experiments are done in stages, and at each stage replication can be used for averaging and reduction of variation. Examples are the growing of several cultures of the same mutant, multiple mRNA extractions, multiple spots on one array, or even multiple image analyses. Considering that replications on each level come at different costs, estimating their relative contributions to overall variation can help designing cost-efficient experiments.
To be more specific, let us assume that replication has been conducted in the following way. nC cultures are grown separately, each one hybridized on nA arrays, and each array scanned and analysed as image nI times. This results in a total of nCnAnI array data sets replicated in a hierarchical fashion (hybridizations nested in cultures, scannings nested in hybridizations). One difficulty with the analysis of such experiments is that replicates are no longer independent of each other, e.g. replicates from the same culture will show some correlation in their noise, due to the common underlying variation in the culture.
Assuming linearity of effects and a normal distribution of noise, an analysis of variance components allows us to calculate the variance contributions at different levels in such hierarchical designs (see Oehlert 1) for an accessible introduction to this topic). Analysis of variance components is similar to a standard ANOVA analysis but recognizes that some effects are random. For example, if the same culture was used in all future microarray experiments, the culture effect would be a fixed factor in an ANOVA analysis. The likelier scenario is that new cultures are grown every time, and the culture effect enters the analysis as a random factor. Usually, we are interested in noise inherent in the process of growing new cultures or hybridizing to new arrays. Consequently, such factors are best analysed as random factors.






If such variance components have been derived from preliminary studies, then this equation combined with equation 1 allows the calculation of the number of cultures, of arrays per culture, and of image analyses per array, necessary to achieve a desired resolution in differential gene expression. Simply increasing the number nC of cultures is the best way to decrease variability. If different costs are involved in producing cultures and arrays or image analyses, then some balance between numbers of cultures, arrays and image analyses might be better. The best combination of such numbers can be obtained by optimizing under the additional constraints on costs.
Acknowledgements
I acknowledge BµG@S (the Bacterial Microarray Group at St George's) and The Wellcome Trust for funding their work.