Checklist for gene/disease-specific variation database curators to enable ethical data management
Abstract
Databases with variant and phenotype information are essential for advancing research and improving the health and welfare of individuals. These resources require data to be collected, curated, and shared among relevant specialties to maximize impact. The increasing generation of data which must be shared both nationally and globally for maximal effect presents important ethical and privacy concerns. Database curators need to ensure that their work conform to acceptable ethical standards. A Working Group of the Human Variome Project had the task of updating and streamlining ethical guidelines for locus-specific/gene variant database curators. In this article, we present practical and achievable steps which should assist database curators in carrying out their responsibilities within acceptable ethical norms.
1 INTRODUCTION
We are now in an era where the genetic basis of clinical conditions is sought for targeted clinical care and research, especially where the gene(s) involved have been identified. Analysis of several genes at once or the entire human genome has generated huge amounts of genetic data which need to be carefully assessed to determine their involvement in a patient's phenotype. There are now several publicly available, web-based, locus-specific variation databases (LSDBs), or gene variant databases (GVDBs; e.g. https://databases.lovd.nl/shared/genes). These databases can only be helpful as guides for both clinical and research purposes if they hold useful information. In addition, sharing data assists with patient care, as well as advancing research. However, collecting and sharing data require database curators to operate within national and international standards to ensure the confidentiality of patients and family members (Takashima et al., 2018) is not breached.
Guidelines were published in 2010 (Povey et al., 2010) to help curators of web-based LSDBs or GVDBs to make information within their databases accessible where these can be used for clinical and research purposes, while safeguarding the confidentiality and privacy of individuals. With increasing interest in the curation of variant and phenotype information, several curators raised concerns at the 2014 Biennial Meeting of the Human Variome Project (HVP, http://www.humanvariomeproject.org/), about the application of existing guidelines. Curators indicated that they found some of the existing guidelines difficult to follow in practice, for example, “Limit links to other LSDBs” when some diseases have two genes involved and linking variants found in both genes enables the interpretation of results. In response to this, an HVP working group was established to develop “A checklist of actions and processes related to the ethical management of data in a genetic variation database that curators of gene/disease-specific databases should consider when establishing and curating their database.” Although this checklist was primarily intended for curators of web-based LSDBs or GVDBs, extended consultation showed that its content is relevant to others in related areas, including national/population variation databases and biobanks. Therefore, we encourage the adoption of this Checklist by other database curators with publicly accessible data, where it is judged to be applicable. As new developments emerge, it is envisaged that this Checklist will be adapted to accommodate any relevant changes.
The purpose of the Checklist is to provide “practical steps” to enable LSDB and GVDB curators to collect and share data in a manner that ensures and promotes acceptable ethical standards. Not only should the work of curators meet international requirements, but individual database curators should be aware of, and act within, relevant local and national regulatory frameworks that govern their operations when sharing data both nationally and internationally. Requirements may vary in different countries (Ludvigsson et al., 2015; Stoeklé et al., 2019) or cultures (Al Aqeel, 2007), and this might require aspects of the Checklist to be modified for implementation. On the other hand, the Checklist could present an opportunity to improve requirements where this had not been updated.
2 SURVEY OF CURATORS AND OTHER MATERIAL
As a first step, a survey of curators was conducted to gain a better understanding of current practices and to determine the extent to which the existing guidelines (Povey et al., 2010) had been implemented. The checklist below considers the results of this survey, as well as the views of experts in database curation from within HVP, members of HVP's various councils and committees, and inputs from the broader HVP membership. In addition, some of the “practical” guidelines in Povey et al. (2010) have been retained and information previously published in other articles have also been included (Celli, Dalgleish, Vihinen, Taschner, & den Dunnen, 2012; Mascalzoni et al., 2015; Vihinen, den Dunnen, Dalgleish, & Cotton, 2012).
- (1)
"Ethical and Privacy Principles in relation to Responsible Sharing of Genomic and Health-Related Data" produced by the International Society for Gastrointestinal Hereditary Tumours (InSiGHT; Appendix APPENDIX 1).
- (2)
A broader “Framework for Responsible Sharing of Genomic and Health-related Data” (Version 10, Sept 2014) by the Global Alliance for Genomics and Health (GA4GH; Knoppers, 2014), also available at https://www.ga4gh.org/ga4ghtoolkit/regulatoryandethics/framework-for-responsible-sharing-genomic-and-health-related-data/.
- (3)
The “WMA Declaration of Taipei on Ethical Considerations regarding Health Databases and Biobanks” (World Medical Association, 2016), which covers additional ethical principles for the collection, storage and use of identifiable data will be relevant for curators with access to confidential/sensitive data.
- (4)
The EU General Data Protection Regulation 2016/679 GDPR (2016) which came into effect on 25 May 2018.
3 TERMS AND DEFINITIONS
-
Coded data: Coded data refers to data that have undergone pseudonymisation as defined in Article 4(5) of the EU General Data Protection Regulation 2016/679 GDPR (2016) which came into effect on 25 May 2018.
-
Pseudonymization: This means the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional information, provided that such additional information is kept separately and is subject to technical and organizational measures to ensure that the personal data are not attributed to an identified or identifiable natural person." GDPR document is available at http://ec.europa.eu/justice/data-protection/reform/files/regulation_oj_en.pdf. A coded ID permits re-identification by the submitter.
-
Database curator: Database curator, as used here, refers to a person or persons who is/are “…responsible for assuring the quality, integrity, and access arrangements of data and metadata in a manner that is consistent with applicable law, institutional policy, and individual permissions” (Global Alliance for Genomics and Health, 2016). The activities of a database curator include data extraction, integration, presentation, publication, management, monitoring, and reducing redundant information, thereby resulting in up-to-date information that is as accurate as possible. A curator also oversees who has access to the information and under what circumstances. In HVP terminology, the database curator is responsible for ensuring that the database policy is kept current and useful. Some organizations use the term “data steward” (e.g., Global Alliance for Genomics and Health, 2016) to equate with database curator, but this term is not widely used.
-
Personal data or identifiable data: The checklist adopts the definition in Article 4(1) of the EU General Data Protection Regulation 2016/679 GDPR (2016) that defines “personal data” as “any information relating to an identified or identifiable natural person (“data subject”); an identifiable natural person is one who can be identified, directly, or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier, or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural, or social identity of that natural person”; and in alignment with the GA4GH Data Sharing Lexicon (Global Alliance for Genomics and Health, 2016), personal/identifiable data refers to “data that alone or in combination with other data may reasonably be expected to identify an individual.”
-
Personal data breach: This is defined in Article 4(12) of the EU General Data Protection Regulation 2016/679 GDPR (2016) as “a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to, personal data transmitted, stored, or otherwise processed.”
4 DETAILS OF THE CHECKLIST
4.1 Define the purpose of the database
This should include the scope and type of information in database. Further information can be found in Povey et al., (2010).
4.2 Define the database policy
In accordance with the agreed purpose of a curator's database, database policy governing data collection, display, access, variant interpretation and classification, and corrections should be defined. The policy should also include clauses on data correction and withdrawal/erasure. Examples of database policies are in Vihinen et al., (2012; also Appendix APPENDIX 2) and Knoppers, (2014; GA4GH Proposed Policy Template). The database policy should also include information suitable for patients with limited technical knowledge who may wish to submit their data.
4.3 Attribution
-
In recognition for the submission of unpublished data, submitter names should be listed along with their data (unless not permitted by database regulations). This form of microattribution (Patrinos et al., 2012) has been demonstrated by Giardine et al. (2011).
-
Where local database regulations do not permit listing submitter names, consider offering coauthorship to submitters where their data are used in publications authored by curators, and the submitters have made significant scientific contributions. What constitutes “significant contribution” can be found in the document published by the ICMJE (International Committee of Medical Journal Editors, 2017).
-
If the options in the first two points are not possible, acknowledge submitters in publications.
4.4 Establish an Oversight Committee (OC)
-
The purpose of the Oversight Committee is:
- a.
To act as an independent forum for the consideration of practical ethical issues arising in the day-to-day work of the database.
- b.
To consider any other matters relating to sharing of unpublished data submitted to the database, in line with local regulations/requirements and recommendations in the field.
- a.
-
Guidance on the composition of Oversight Committee
- a.
Members of the OC should be independent of the curation and funding of the database they have oversight of, but knowledgeable about the condition and represent the different groups involved, for example, clinicians, researchers, database curators, and lay persons from patient groups.
- b.
At least one member of the OC should have ethics training.
- a.
4.5 Data collection
This includes both published and unpublished data.
4.5.1 Published data
Privacy concerns discussed in this article do not apply to data obtained from publications.
4.5.2 Consented unpublished data
-
Submitters should be informed of their responsibility to ensure that valid consent has been obtained and that only coded patient IDs are submitted. Coded IDs allow submitters to respond to queries from the curator or to update new information about a specific case.
-
Note that completely anonymizing patient IDs makes it virtually impossible to update valuable information that subsequently becomes available, either by the submitter or curator.
-
Ensure coded IDs are used for submissions that are not linked to any publicly available source, for example, data from diagnostic labs (health service labs and commercial sources), clinics/clinicians and sometimes from patients.
4.6 Curation of unpublished data with consent
4.6.1 Unpublished data
-
If the data is from a query, it is important to inform the enquirer that the variant will be included in the database.
-
Assign a coded ID to each entry, if there is none already.
-
Keep sensitive personal data nonpublic. This refers to information that is of a private nature that could be used in a discriminatory manner.
-
In linking entries to details of the submitter, curators should abide by relevant applicable regulations.
-
Ensure that variant nomenclature adheres to HGVS standards (http://www.HGVS.org/varnomen). The original variant description should be kept in a separate column. If a variant is ambiguous or does not match the reference sequence, consult with the submitter. Incorrect variant descriptions should not be included in the database. If the problem with a variant description cannot be resolved, exclude the case from the database.
-
Publicly viewable, unpublished data: Before submitted data can be made publicly available, the following should be done:
- a.
Information that will be publicly viewable should be summarized to ensure clarity on family relationships.
- b.
Ensure that personal details in the submitted data are curated such that individuals cannot be identified.
- c.
Phenotype information is important for clinical diagnosis. Where phenotype information is available, and efforts have been made to protect the identity of the individual, details on the phenotype should be displayed.
- a.
-
Nonpublic data
- a.
This section of the database is reserved for sensitive information that curators will need to refer to.
- b.
Requests to share nonpublic data should be forwarded to the data submitter (see the following section on “Permitting the use of nonpublic data for scientific/clinical purposes”).
- a.
4.7 Permitting the use of nonpublic data for scientific/clinical purposes
4.7.1 Request from clinician or diagnostic lab
Curators may receive requests to share nonpublic information from bona fide clinicians/diagnostic labs who need the information as a guide for patient care/diagnostic report. An example may be a new variant with the associated clinical data, segregation information and pathogenicity, which a submitter has requested that these should not be made public until after their impending publication (see the following section on “Request to keep submitted data nonpublic”). Such requests should be forwarded to the submitter.
4.7.2 Request from researcher
This should also be forwarded to the submitter.
4.8 Request to keep submitted data nonpublic
-
DNA diagnosis is improved by sharing data on genes, variants and phenotypes; and that publicly sharing data offers optimal care to patients and their families.
-
Publishing the variant in the database does not result in the rejection of a subsequent manuscript that mentions the data.
-
Searches in publicly available variant databases may return a message indicating a nonpublic record with a variant at that position is in the database, with the suggestion to contact the curator to receive more details.
-
The following options may be adopted for requests to keep submitted data out of public view:
- a.
Enter data but make the entire entry “nonpublic.” Note the point above which is on “Searches in publicly available variant databases…”; or
- b.
Enter data but make the variant public and associated information “nonpublic.” This option should be discussed with the submitter.
- a.
-
Any request for information about the variant should be forwarded to the submitter.
4.9 Request for submitter's details
Some LSDBs do not link submitter details to unpublished variant data, especially where most of these are rare variants. Any request for submitter details should be forwarded to the submitter allowing them to respond directly to the requester.
4.10 Giving your opinion
-
If you, as a curator, have a team (both clinical and scientific) that is qualified and knowledgeable about the disease, the opinion of the team on the potential consequence of a variant may be given, especially when the curator has assigned “concluded pathogenicity” to variants listed in their database.
-
If a curator does not have a team and does not have in-depth knowledge about the disease, it is best to refrain from giving any opinion.
4.11 Sharing variant information with genome browsers
This increases visibility for LSDBs or GVDBs and should be encouraged.
4.12 Managing personal data breach
-
Before data curation, curators should identify who to notify in the event of a personal data breach and obtain further procedures on reporting.
-
On discovering an incident involving personal data breach, a report of the incident should be filed with the appropriate contact.
-
Establishments and/or infrastructure providers will have procedures that should be followed in the event of an incident involving personal data breach.
4.13 Data erasure
Complete erasure of publicly available data, presented in web-based databases, is impossible as the data may be distributed to cyberspace. However, where database curators have knowingly shared data, for example, with genome browsers, and submitters have requested data erasure, those recipients should be informed of the request for erasure. The request for data erasure may be due to the withdrawal of consent. Further details on the specific grounds for request of data erasure can be found in the EU General Data Protection Regulation 2016/679 GDPR (2016).
-
The curator is provided with details that permit the identification of the data to be erased, for example, the coded ID and a specific factor such as the genetic variant, to ensure the correct data are erased.
-
The data are identified and deleted from the database.
-
Where the data in question have been shared, for example, with genome browsers, the recipient is contacted and informed of the request for erasure. However, it may not be possible for the curator to ensure that all the instances in third party systems are erased.
A brief description of the detailed checklist is presented in Box 1.
BOX 1. The Checklist for LSDB/GVDB curators, in brief
- 1.
Define the purpose of the database.
- 2.
Define the database policy.
- 3.
Offer attribution to submitters.
- 4.
Establish an ethics oversight committee.
- 5.
Collect data that has valid consent and coded ID.
- 6.
Curate unpublished data to protect patient privacy.
- 7.
Requests for nonpublic data should be forwarded to the data submitter.
- 8.
Requests to keep submitted data nonpublic should be honored.
- 9.
Requests for submitter's details should be forwarded to the submitter.
- 10.
You could consider giving your opinion if you have a team qualified and knowledgeable about the disease.
- 11.
Information on variants can be shared with genome browsers.
- 12.
Personal data breaches should be dealt with immediately.
- 13.
Data erasure requests should be honored.
In conclusion, database curators should maintain an ethical approach to their work and encourage their collaborators (i.e., data submitters) to adhere to the same standards. Ethical concerns in data sharing will not go away and database curators should acquaint themselves with requirements that enable ethical data presentation and sharing. A version of this Checklist is also available on the HVP website at http://www.humanvariomeproject.org/images/documents/HVP_-_Ethical_Data_Management-_Checklist_for_Gene_Disease_Specific_Database_Curators__-_FINAL.pdf.
ACKNOWLEDGMENTS
We thank Professor Michael Parker for his participation in the working group, his suggestions and contributions; Helen Robinson and Amy McAllister of the HVP International Coordinating Offices (ICO), and Timothy Smith (formerly of the HVP ICO) for organizing various stages of consultations, collating feedbacks and edits to the document. Thanks also to all who made helpful suggestions for improvements to the checklist: Members of the HVP Gene/Disease-Specific Database Advisory Council, the HVP International Confederation of Countries Advisory Council, the HVP International Scientific Advisory Council, the Pan-European Research Infrastructure Consortium (ERIC) for Biobanking and BioMolecular Resources Research Infrastructure (BBMRI-ERIC), and Professor Bartha Knoppers, Director of the Centre of Genomics and Policy, McGill University. This paper is dedicated to the memory of Professor Sue Povey who advocated sharing variant data for the benefit of patients and their families.
FUNDING
R. E. is supported by The Tuberous Sclerosis Alliance (TS Alliance) and the Tuberous Sclerosis Association (TSA) for her LSDB work; M. V. is supported by the Swedish Research Council (Vetenskapsrådet) and the Swedish Cancer Society.
CONFLICT OF INTERESTS
The authors have no conflict of interests to declare.
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
APPENDIX 1: InSiGHT - ETHICAL AND PRIVACY PRINCIPLES IN RELATION TO RESPONSIBLE SHARING OF GENOMIC AND HEALTH-RELATED DATA (2018)
An example from InSiGHT – how a framework approach can be helpful. A flexible approach assists database curators and their collaborators to address changing local and international changes in regulations, as well as changing community expectations. While policies in this area benefit from regular discussion and review for continuous improvement, this should not be an onerous task; a framework approach helps to keep the focus on improving outcomes. https://www.insight-group.org/content/uploads/2018/06/EthicsFramework.pdf.
APPENDIX 2: AN EXAMPLE OF DATABASE POLICY FROM ORAI1base (VARIATION REGISTRY FOR SEVERE COMBINED IMMUNODEFICIENCY) AT http://structure.bmc.lu.se/idbase/ORAI1base/?content = _db_policy/IDbases
DATABASE POLICY
The ImmunoDeficiency Variation Databases (IDbases) and other variation databases maintained at the Protein Structure and Bioinformatics Group (PSB), Lund University, are maintained and provided as a public service for academic community.
- 1.
The PSB has a uniform policy of free and unrestricted access for academic community to all of the data records their databases contain. Scientists worldwide can access these records to plan experiments or publish any analysis or critique. Appropriate credit is given by citing the database. Instructions for citing are provided in each individual database.
- 2.
The databases are intellectual property of the PSB. Details are available for Copyright and Liability.
- 3.
Corrections of errors and update of the records by authors are welcome and erroneous records may be removed from the next database release.
- 4.
Submitters are advised that the information displayed on the Web sites maintained by the PSB is fully disclosed to the public. It is the responsibility of the submitters to ascertain that they have the right to submit the data. This applies also the appropriate consent from the patient and/or family.
- 5.
Beyond limited editorial control and some internal integrity checks, the quality and accuracy of the record are the responsibility of the submitting author, not of the database. The databases will work with submitters and users of the database to achieve the best quality resource possible.
- 6.
Data in the PSB mutation databases may be shared with central repositories according to published Human Genome Variation Society guidelines.
- 7.
The information provided on this site is designed to support, not replace, the relationship that exists between a patient/site visitor and his/her existing physician.
- 8.
We keep the confidentiality of the data relating to individual patients and visitors to the website, including their identity. No data are collected that would allow identification of the patients for whom information is stored and distributed in the database. We do not share any information about database visitors with third parties. As database curators and owners we undertake to honor or exceed the legal requirements of medical/health information privacy that apply in Sweden
- 9.
The database does not host any advertisements.