ClinGen advancing genomic data-sharing standards as a GA4GH driver project
For the ClinGen/ClinVar Special Issue
Abstract
The Clinical Genome Resource (ClinGen)’s work to develop a knowledge base to support the understanding of genes and variants for use in precision medicine and research depends on robust, broadly applicable, and adaptable technical standards for sharing data and information. To forward this goal, ClinGen has joined with the Global Alliance for Genomics and Health (GA4GH) to support the development of open, freely-available technical standards and regulatory frameworks for secure and responsible sharing of genomic and health-related data. In its capacity as one of the 15 inaugural GA4GH “Driver Projects,” ClinGen is providing input on the key standards needs of the global genomics community, and has committed to participate on GA4GH Work Streams to support the development of: (1) a standard model for computer-readable variant representation; (2) a data model for linking variant data to annotations; (3) a specification to enable sharing of genomic variant knowledge and associated clinical interpretations; and (4) a set of best practices for use of phenotype and disease ontologies. ClinGen's participation as a GA4GH Driver Project will provide a robust environment to test drive emerging genomic knowledge sharing standards and prove their utility among the community, while accelerating the construction of the ClinGen evidence base.
1 INTRODUCTION
The Clinical Genome Resource (ClinGen) is building a central knowledge base for understanding the clinical relevance of genes and variants for use in precision medicine and research (Rehm et al., 2015). This includes the curation of genes for disease validity, dosage sensitivity, and actionability, as well as the curation of variants for pathogenicity. Curated variants are shared in the National Center for Biotechnology Information's ClinVar data archive and curated genes are available on the ClinGen website. These resources depend on robust technical standards for sharing data and information that are broadly applicable to a variety of use cases and are adaptable across a diversity of countries and systems, including both clinical and nonclinical settings. ClinGen is working to (1) standardize the clinical annotation and interpretation of genomic variants, (2) enable clinicians, researchers, and patients to share evidence including genomic and phenotypic data, and (3) provide unrestricted access to its knowledge base for direct use as well as integration into electronic health records and other resources. As part of this effort, ClinGen has joined with the Global Alliance for Genomics and Health (GA4GH; www.ga4gh.org), an international, nonprofit alliance that is catalyzing the creation of technical standards and regulatory frameworks to enable responsible, voluntary, and secure sharing of genomic and health-related data across institutional and national boundaries.
Formed in 2013 to accelerate the potential of research and medicine to advance human health (Page et al., 2016), the GA4GH membership brings together over 500 leading organizations as well as individual contributors working in healthcare, research, patient advocacy, life science, and information technology, from across more than 70 countries. In October 2017, GA4GH launched a new phase (“GA4GH Connect”, [https://www.ga4gh.org/docs/GA4GH-Connect-A-5-year-Strategic-Plan.pdf]) that depends on the expertise of real-world clinical and research projects to establish priorities and needs within the community. These real-world “Driver Projects” provide contexts for international genomic data sharing by: (1) establishing priorities for tool development, (2) contributing to the creation of technical standards, policies, and other deliverables, and (3) implementing GA4GH standards into real-world use in order to provide feedback and demonstrate the value of genomic data sharing to the broader community.
Previously, ClinGen and GA4GH have worked together on developing guidelines for sharing pediatric genomic data (Friedman et al., 2018; Rahimzadeh et al., 2018) and variant-level information with ClinVar (Azzariti et al., 2018), and developing consent resources for clinical genomic data sharing (Riggs et al., 2018). ClinGen is also a key participating group within the BRCA Challenge, one of four early demonstration projects that helped in launching and demonstrating the value of GA4GH. The BRCA Challenge launched the BRCA Exchange (https://brcaexchange.org/) that brings together all publically accessible variant resources on BRCA1 and BRCA2, including ClinVar as the primary source of interpreted BRCA1 and BRCA2 variants. Following this established history of mutual collaboration, ClinGen was invited to serve as one of the 15 inaugural GA4GH Driver Projects, alongside other leading research and clinical initiatives in North America, Europe, and Australia. In this role, ClinGen is contributing to the development of standards for discovering, accessing, storing, and analyzing genomic and health-related data that will be used by projects across the globe, including national precision medicine initiatives, such as the US-based All of Us Research Program (Collins and Varmus, 2015) and Genomics England (Marx, 2015), both of which are also participating as GA4GH Driver Projects. It will also play a leadership role in the representation of genomic knowledge for use in the accurate interpretation of genomic data.
-
Create a standard model for computer readable variant representation. Genomic variants are described with many naming conventions, making it difficult to unambiguously define a variant and ensure the accurate use of associated knowledge. ClinGen is leveraging prior work done by its Data Modeling Work Group (including experience in developing the ClinGen Allele Registry [reg.clinicalgenome.org]), contributing examples, and providing domain expertise, to inform efforts within GA4GH to develop a data model for unambiguous representation of variants. This work began within GA4GH as the Variant Modeling Consortium (VMC; https://github.com/ga4gh/vmc]) and is now continued through the variant representation subgroup of the Genomic Knowledge Standards Work Stream (GKS). GKS includes representatives from other organizations, including HL7 Fast Healthcare Interoperable Resources (FHIR; https://hl7.org/fhir/) and Human Genome Variation Society (HGVS), ensuring that its standards meet the needs of the clinical and genomics communities and are compatible with HGVS standards that are widely used to contextualize genetic variation. Notably, both VMC and the ClinGen Allele Registry, as transmission formats, have the potential to adapt to each other fluidly, and the ClinGen Allele Registry is working with the GA4GH group to define a pilot project to implement the GKS Variant Representation specification. A 0.1 release of VMC has been released and already proposes a language and nomenclature for describing variation.
-
Develop a data model for linking variant data to annotations. Standardized variant annotation and interpretation is central to ClinGen's mission and is an area in which the consortium has considerable expertise. ClinGen and the Monarch Initiative (Mungall et al., 2016) are working with the GA4GH GKS Work Stream to develop a common data model to guide the linkage of variant evidence to clinical interpretations with a standard format. This includes support for applying current professional interpretation standards (e.g., ACMG/AMP [Richards et al., 2015]) in a computable manner that can be validated, as well as documenting the associated disease and inheritance pattern.
-
Develop a network for sharing knowledge about genomic variants and associated clinical interpretations. Sharing curated genomic knowledge with databases, such as ClinVar is a high priority for the genomics community. Building off of work in the GKS and Clinical & Phenotypic Data Capture (CPDC) Work Streams, the Discovery Work Stream will develop standards for sharing variant classifications and supporting evidence. The effort will standardize technical descriptions of a variant and its attributes (e.g., clinical significance) to streamline the electronic submission of clinically relevant information to genomic knowledge bases, such as ClinVar. The data models will build off of GA4GH standards developed by the GKS and CPDC Work Streams in the areas of variant annotation and phenotyping, and will be implemented within ClinGen's knowledge bases, as well as disseminated to the broader community for widespread use. Facilitating knowledge exchange between disparate sources will enable the development of integrative and comprehensive applications helping to inform clinicians of the consequences and impacts of genomic variant events.
-
Establish recommended phenotype and disease ontologies and best practices for their use in genomic medicine. Interpretation of genes and variants and their possible role in a patient's disease requires associating genes and variants to diseases and phenotypic features. The GA4GH CPDC Work Stream is developing standards, best practices, and benchmarking for the use of ontologies and clinical terminologies to capture clinical phenotype information and gene–disease associations for use in genomic medicine and to ensure data captured clinically can be used in genomic research. CPDC will also develop standards and best practices for how clinical phenotype information can be exchanged between clinical information systems and with research, through using the emerging HL7 FHIR and Phenopackets (https://phenopackets.org/) standards. ClinGen is implementing these standardized disease and phenotype ontologies into its gene–disease curation efforts as well as incorporating and testing developed phenotyping standards in its data capture approaches, including through its GenomeConnect patient registry (Kirkpatrick et al., 2015).
In summary, ClinGen is a critical Driver Project for GA4GH, providing a robust environment to test drive genomic knowledge-sharing standards and prove their utility in the sharing of evidence and knowledge among the community, as well as applying that knowledge to clinical care and research. In exchange, ClinGen can more quickly and consistently build its evidence base by working with GA4GH to disseminate and instantiate the collaboratively built standards through global involvement and engagement.
ACKNOWLEDGMENTS
Core funders of the Global Alliance for Genomics and Health include the Broad Institute of MIT and Harvard; CanSHARE (Génome Québec, Genome Canada, the Government of Canada, Ministère de l'Économie, Innovation et Exportation du Québec, and the Canadian Institutes of Health Research [fund #141210]); Genome Canada; Ontario Institute for Cancer Research (funded by the Ontario Ministry of Economic Development, Job Creation, and Trade); the U.S. National Institutes of Health (Big Data to Knowledge (BD2K), National Cancer Institute, National Heart, Lung, and Blood Institute, and National Human Genome Research Institute); Wellcome (WT201535/Z/16/Z); and Wellcome Sanger Institute. Additional funding is obtained through annual sponsorships from GA4GH Organizational Members.
Additional contributions to this work included support by the following grants and organizations: NIH/National Human Genome Research Institute: U41HG006834 (ClinGen-Rehm), U41HG009649 (ClinGen-Bustamante), U41HG009650 (ClinGenBerg), HG008900 (Broad Center for Mendelian Genomics); NIH Office of the Director: 5R24OD011883 (Monarch Initiative); European Molecular Biology Laboratory.
The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding organizations.