HYPERGENES

European Network for Genetic-Epidemiological Studies

Progress

Project progress

CP-CSA-NoE 1st Periodic Report Summary

Period covered: 01/01/2008 - 30/12/2008

The project is focused on the definition of a comprehensive genetic epidemiological model of complex traits like Essential Hypertension (EH) and intermediate phenotypes of hypertension dependent/associated Target Organ Damages (TOD).

The discovery of the genetic component in common complex disease is extremely challenging since most of them are multifactorial and since the genetic component is likely to be described by the interactions of several genes involved in the disease pathway, each predisposing imperceptibly to the disease. Indeed it became soon clear that also the traditionally called simple, "Mendelian" diseases are not simple at all to be fully disentangled (as clearly demonstrated by the always possible gene-gene or gene-environment interaction present for cystic fibrosis, for example).

To identify the common genetic variants relevant for the pathogenesis of EH and TODs, Hypergenes will perform a Genome wide association (GWA) study of 4.000 subjects recruited from historical well-characterized European cohorts. There is now relatively shared agreement on the allelic architecture of multifactorial traits. GWA studies can identify common variants contributing to the inherited component of common diseases. Almost all such variants have modest effect and their impact and predictive power appears limited. The most striking observation is the marked disparity between the extent of the familial aggregation observed for the phenotype and that attributable to the genetic variants. One potential possible way forecasted to find difficult to detect alleles, but important since they generate a high sibling relative risk is through novel resequencing technologies, allied to large-scale association testing. In the first instance, such efforts will be targeted to genes already implicated in disease susceptibility (Nat Reviews Genet 2008).

Genotyping is being performed with the highly informative Illumina Human 1M BeadChip.

Well-established multi-variate techniques and innovative genomic analyses through machine learning techniques will be used for the GWA investigations. Using machine learning approach we aim at developing a disease model of EH integrating the available information on EH and TOD with relevant validated pathways and genetic/environmental information to mimic the clinician’s recognition pattern of EH/TOD and their causes in the individual patient.

The statistical design is with two samples run in parallel, each with 1,000 cases and 1,000 controls, followed by a replication/joint analysis. This design is more powerful than replication alone and allows also a formal testing of the potential heterogeneity of findings compared to a single step (one large sample) design.

The results of the GWA will be the source to build a customized and inexpensive genetic diagnostic chip that can be validated in the project existing cohorts.

HYPERGENES is in the unique position to propose a ground-breaking project, improving the methodology of genetic epidemiology of chronic complex diseases that have a high prevalence among EU populations.

Designing a comprehensive genetic epidemiological model of complex traits will also help us to translate genetic findings into improved diagnostic accuracy and new strategies for early detection, prevention and eventually personalized treatment of a complex trait.

The project's Technical and Scientific objectives are the following:

  • To identify the common genetic variants relevant for EH and TOD
  • To design and implement appropriate computational tools.
  • To develop a comprehensive Biomedical Information Infrastructure (BII).
  • To create a “Web-Based Portal” to allow access to the BII in order to allow dissemination of knowledge.
  • To develop new methods, protocols and standards for genomic association analysis, gene annotation and molecular pathways.
  • To develop a set of Decision Support Systems tools combining genetic, clinical and environmental information.
  • To develop a simple, inexpensive genetic diagnostic chip, that can be validated in our existing well-characterized cohorts.
  • To strengthen the existing clinician-basic scientist collaborative network on the genetic mechanisms of EH.
  • To generate educational tools to support professional training on all aspects of the project, favouring mobility of PhD students and post-docs.
  • To disseminate HYPERGENES achievements through scientific meetings, teaching in tutorial sessions, publication in high-impact scientific journals etc.
  • To exploit the results in a translational scenario

To fulfil the scientific and technological objectives previously defined, and to guarantee the mandatory multidisciplinary approach, in addition to the involvement of Centres of Excellence the project Consortium involves key actors in several different areas. Among these:

  • Appropriate industry, represented by Companies working in the health area, both Large Enterprises (IBM ISRAEL - SCIENCE AND TECHNOLOGY LTD, and STMICROELECTRONICS SRL) and SMEs (SOFTECO SISMAT, IMS and REFORM), since the expected project results appeared to be of their interest and potential benefits.
  • 11 Clinical Units (UNIVERSITA’ DEGLI STUDI DI MILANO, Milano-Italy; KATHOLIEKE UNIVERSITEIT LEUVEN, Leuven-Belgium; THE JAGIELLONIAN UNIVERSITY MEDICAL COLLEGE, Krakow-Poland INSTITUTE OF INTERNAL MEDICINE, SIBERIAN BRANCH OF THE RUSSIAN ACADEMY OF MEDICAL SCIENCES, Novosibirsk-Russian Federation; INSERM-INSTITUT NATIONAL DE LA SANTE’ ET DE LA RECHERCHE MEDICALE, Paris-France; THE UNIVERSITY OF WARWICK, Coventry-UK; UNIVERSITA’ DEGLI STUDI DI SASSARI, Sassari-Italy; SHANGHAI INSTITUTE OF HYPERTENSION, Shanghai-China; CHARLES UNIVERSITY IN PRAGUE, Prague-Czech Republic, UNIVERSITA’ DEGLI STUDI DI PADOVA, Padova-Italy; MEDICAL UNIVERSITY OF GDANSK, Gdansk-Poland) as project's content providers. All such clinicians have been involved in the research of genetic mechanisms of EH and related TOD for many years.
  • 3 Genetic Centres (UNIVERSITA’ DEGLI STUDI DI MILANO, Milano-Italy, UNIVERSITE’ DE LAUSANNE, Lausanne-Switzerland; PHARNEXT SAS, Paris, France).
  • 2 Epidemiological Centres (IMPERIAL COLLEGE OF SCIENCE, TECHNOLOGY AND MEDICINE, London-UK; THE UNIVERSITY OF WARWICK, Coventry-UK).
  • Project management and technical coordination are lead by a SME with experience in EC project management (I.M.S - ISTITUTO DI MANAGEMENT SANITARIO srl-Milan/Italy).

HYPERGENES project is structured in three steps:

  • STEP 1: DISCOVERY (from month 1 to month 27)
  • STEP 2: VALIDATION (from month 18 to month 39)
  • STEP 3: DISSEMINATION AND RESULT EXPLOITATION (from month 1 to month 42)

During the first year of the project, the activities performed were part of the Discovery phase, and were mainly activities focused in settling the genetic, clinical, epidemiological and technical infrastructure that will support the storage, integration and analysis of data produced in these three domains.

In particular the following list reports a summary of the specific achievements obtained in the main work packages in which the discovery phase is broken down:

Workpackage 1 “Setting up the stage”:

  • Standardized criteria for non genetic determinants in existing cohorts: analyzed the environmental factors potentially affecting blood pressure, and presented a list of the factors identified so far, shared among Clinical Partners.
  • Standardised criteria for clinical parameters and quantitative TOD: a document takes in consideration clinical and biochemical parameters potentially related with hypertension and the indicators of target organ damage. Attention has been put on TOD description of methods and proxies used to evaluate them.
  • Protocol with criteria for sample selection: as a fundamental step for the selection of the cases and controls form the 11 Clinical Units databases, including the rationale of the experimental design and considerations on the power of the study.
  • Analysis of the clinical-environmental databases of the existing cohorts.
  • Data model: the data model produced is a fundamental working tool to allow the design of a context fitted BII and to put the basis for a correct data flow inside the BII itself. Issues of heterogeneity among different cohorts data have been addressed within these analytic and modelling activities. The Core Ontology and the Mapping among the cohort variables have been produced as well.

Workpackage 2 “Data integration infrastructure-Biomedical Information Infrastructure (BII) design and development”:

The main efforts, in the first months of the project, were devoted to define proper procedures to transform the collected data formats and to integrate them into the BII. The clinical and environmental data structure was then designed by constraining three standards: (1) Clinical Document Architecture holds the majority of these type of data and serves as a single instance for each subject; (2) Genetic Variation holds genetic data resulting from testing subject with the lab-on-chip being developed in this project; and (3) Pedigree holds family history data when available in a structured way. The HL7 v3 Reference Information Model (RIM) is used to derive a specific data model for the data warehouse – a core component of a Biomedical Information Infrastructure (BII). The following issues have been addressed in the implementation of the project data warehouse.

Data entry process:

  • Harmonization
  • Capturing richness of data
  • Data Persistency
  • Semantic Interoperability

Data Access:

  • The Promotion Layer
  • Tools for Data Access
  • Terminology services

Workpackage 3 “Genome Wide Genotyping”:

A strong effort was committed during the project in activities in preliminary test and quality checks between the two laboratories in charge of the genetic analysis. After these checks the two laboratories have been focused in the genotyping of the Discovery Sample DNA (2000 cases and 2000 controls) including Caucasian samples from all over Europe.

Workpackege 5 “Development of machine learning techniques for genomic analysis”:

Two approaches have been taken: One aiming at deriving generalization as to which SNPs are relevant for diseases based on a higher level analysis in the meta-features space. A second approach based on large-scale information-theoretic analysis is currently being applied. Preliminary results appear interesting.

Workpackage 11 “Project Dissemination”:

The project partners participated to several meeting and congresses and interviews concerning the HYPERGENES project, its characteristics and its objectives. Within the project, the mobility of PhD students and young researchers has been promoted within the project research areas.

An appropriate project corporate image was selected during the first months of HYPERGENES; the project website was implemented, including a section for the general public and a section reserved to HYPERGENES Consortium Members.

CP-CSA-NoE 2nd Periodic Report Summary

Period covered: 01/01/2009 - 31/12/2009

The project is focused on the definition of a comprehensive genetic epidemiological model of complex traits like Essential Hypertension (EH) and intermediate phenotypes of hypertension dependent/associated Target Organ Damages (TOD) as well as other endophenotypes as the pharmacogenomic pattern of two drugs widely used in EH, namely hydrochlorothiazide and losartan.

The discovery of the genetic component in common complex diseases is extremely challenging since most of them are multifactorial and since the genetic component is likely to be described by the interactions of several genes involved in the disease pathway, each predisposing imperceptibly to the disease. HYPERGENES adopts the Genome Wide Association (GWA) approach to identify common variants contributing to the inherited component of common diseases.

The HYPERGENES project is structured in three steps:

  • STEP 1: Discovery
  • STEP 2: Validation
  • STEP 3: Dissemination & Results Exploitation

After the first two years of the HYPERGENES project reached the end of the Discovery Phase and has just entered into the Validation Phase.

The Discovery Phase has been focused in building the methodological and technical framework to support the Genome Wide Association Analysis, that were performed on 4.000 Caucasian subjects recruited from historical well-characterized European cohorts, characterized by different ethnicities (North-Western Europe, Eastern Europe, South Europe and Sardinia), responding to the project definition of Cases and Controls.

The Hypergenes Consortium decided to maintain a very neat separation between cases and controls, selecting cases among well defined hypertensives and controls among normotensives with little chances to develop hypertension later in life.

A great effort has been spent on dealing with methodological issues related to the setting up and interpretation of data. The need of integrating the observations obtained from different studies posed significant challenges which were faced through an integrated epidemiological and bioinformatics approach.

All these efforts have lead to the development of the Biomedical Information Infrastructure (BII), which is the platform developed to support the entry, persistency and retrieval of data and knowledge relevant to EH, including clinical, environmental and genotypic data.

The Genotyping have been performed on high throughput Illumina technologies, thanks the coordinated efforts of the Laboratories of University of Milan and Lausanne. The analysis involved in the project followed different methodologies for genetic analysis, including classical and machine learning techniques, to produce an enriched list of SNPs (single nucleotides polimorphisms), that resulted associated with EH or TOD, or however other endophenotypes relevant to hypertension.

The case-control association study conducted on EH lead to about three hundred significant associations, which were only partially overlapping with the results of previous studies.

TOD and endophenotypes were analysed as quantitative traits, and preliminary results include hundreds of significant associations for each phenotype considered. As for EH, only some of the identified genes were already described in previous studies.

To re-test specific genetic variations that we found associated to Hypertension or to the TOD or other endophenotypes within the Hypergenes project (as well as in previous studies or from a priory knowledge prposed by the participants to the HYPERGENES project), we then designed an Illumina Custom Infinium chip holding 15,000 SNPs. The chip will be used to validate the results on a new independent sample.

Within the next year of the project we plan to verify the findings obtained in the Discovery Sample on such independent Confirmation sample. This will be followed by the definition of a disease model and the design of a Lab-on Chip, containing only the most informative SNPs derived from the previous activities.