The GCAT|Panel is the first complete genetic map of the Iberian population that helps identifying possible genetic causes of common diseases

Tuesday, 22 February 2022

The panel is the result of a collaboration between the GCAT|Genomes for Life Cohort of the IGTP and the Barcelona Supercomputing Center. Massive genome sequencing of a sample of healthy members of the population has made it possible to provide a genetic tool to study complex variants in the genome that can potentially cause common diseases. The panel will allow researchers using low-cost sequencing techniques to improve the discovery and interpretation of the genetic changes behind common diseases more easily.

Researchers from the Barcelona Supercomputing Center (BSC) and the Germans Trias Research Institute (IGTP) have produced the first haplotype map for the Iberian population: a tool that will produce a better understanding of the genetic changes behind many common diseases. The work, published in Nucleic Acids Research, represents the first deep-characterization of genetic variation created using data for people in Spain, but the group have demonstrated that it is also effective for analyzing genome studies from populations from around the world. The joint first authors are Jordi Valls-Margarit and Daniel Matías-Sánchez of the Life Sciences Department at the BSC and Iván Galván-Femenía of the GCAT Laboratory at the IGTP and currently at the Institute for Research in Biomedicine (IRB Barcelona).

The group studied the genomes of 5,200 participants living in Catalonia and the completely sequenced genomes of 785 individuals using the more powerful high coverage whole-genome sequencing (30X), they also had access to detailed clinical health notes and environmental information for all the subjects. "The result of this work, the GCAT|Panel, is the first study to use sequencing to describe the complex regions of the genome in a populational cohort in Spain. It is now available for researchers to use in multiple genetic studies to look for the genetic mechanisms behind diseases," explains Rafa de Cid, Scientific Director of theGCAT|Genomes for Life Project and senior co-author the study.

"We have applied the extensive battery of algorithms available, along with statistical regression analysis models to generate a highly sensitive solution in the form of a haplotype panel that contains more than 35 million variants, included in more than 100,000 structural variants," adds David Torrents Leader of the Computational Genomics Group at the BSC and ICREA Research Professor, the other senior co-author.

Genetic Variants

Although many studies suggest that genetics play a large part in many common diseases, most of the mechanisms are still unknown. When the human genome was sequenced in 2000 scientists finally had the genetic code, but not the map with which to interpret it. This study provides a big step forward towards having that map. Most research on genetic variants has focused on single nucleotide changes, when one basic unit of the genetic chain is changed. However, many of the changes that cause people to develop diseases are more complex; involving insertions or deletions of material, sections being switched around or exchanged for example. Their role in diseases is much less well-known. The team working on this study now has a map of nearly three times as many variants as have been identified up to now. The precise knowledge previously available on the GCAT|Panel participants is essential for connecting these variants with actual disease. This work is ongoing.

Improved performance of the GCAT|Panel Haplotype map

The researchers checked their map against the currently available tools for identifying possible disease-causing changes and found that it worked similarly or better than most of them and correctly made a suggestive association between the variants found and known conditions in the GCAT cohort individuals. It also detected the presence of rare element called the AluYa5-element in some individuals, this is associated with Moneuritis of lower limb, a rare neuromuscular disease. When they checked back, those individuals the study had identified found were indeed shown to be carriers.

"The combination of the simple genetic risks that we have previously identified in our GCAT cohort and this new data on complex variants will contribute to our understanding of more complex diseases," confirmed de Cid.

"These results really provide an additional argument for using low-cost and easily reproducible genomics techniques together with panels like the GCAT|Panel, which are derived from more complex whole genome sequencing, to find the impact of more complex variants and advance our understanding of the molecular basis of common diseases," concludes Torrents.

GCAT|Genomes for Life Cohort - the power of volunteers collaborating with research

The GCAT is a publicly funded strategic research project at the IGTP. For the last ten years the project has studied the genetic causes behind all types of common diseases. Thanks to 20,000 healthy volunteers across Catalonia, who donated a blood sample and providing detailed information about lifestyle, diet, medication etc. The cohort is followed, through their digital health records and continuing follow-up questionnaires and blood samples. The initial samples and data were collected through a collaboration with the Barcelona Blood and Tissue Bank (BST), the identification of the participants is not revealed and their data is encrypted, although they can opt to receive some information for themselves privately. The data from the whole project is available to scientists for studies such as this one and data must be accessed through a strict data protection protocol. The GCAT|Genomes for Life Cohort participates in international open data projects producing information about the causes of common diseases, for example, it has participated in major genomic studies that have found risk factors for server Covid-19.

The Barcelona Supercomputing Center - understanding big data is key to medical research

The powerful computing resources and the complex design of strategies to analyze genomic data, provided by the BSC have made the generation of this map possible. This work increases the resolution of the genetic map in both the number of variants found and the types of variants catalogued. The MareNostrum computer in the BSC and its computational environment were key to this research, which needed a total of 766,663 hours ( CPU/h: 3,418,524) of computation to complete the map.

Original Paper

Jordi Valls-Margarit, Iván Galván-Femenía, Daniel Matías-Sánchez, Natalia Blay, Montserrat Puiggròs, Anna Carreras, Cecilia Salvoro, Beatriz Cortés, Ramon Amela, Xavier Farre, Jon Lerga-Jaso, Marta Puig, Jose Francisco Sánchez-Herrero, Victor Moreno, Manuel Perucho, Lauro Sumoy, Lluís Armengol, Olivier Delaneau, Mario Cáceres, Rafael de Cid, David Torrents, GCAT|Panel, a comprehensive structural variant haplotype map of the Iberian population from high-coverage whole-genome sequencing, Nucleic Acids Research, 2022;, gkac076, https://doi.org/10.1093/nar/gkac076

Funding

This work was funded by the Government of Spain, the Government of Catalonia and the European Regional Development Fund (FEDER, EU).