Progress

CP-CSA-NoE 1st Periodic Report Summary
Period covered: 01/01/2008 - 30/12/2008
The project is focused on the definition of a comprehensive genetic epidemiological model of complex traits like Essential Hypertension (EH) and intermediate phenotypes of hypertension dependent/associated Target Organ Damages (TOD).
The discovery of the genetic component in common complex disease is extremely challenging since most of them are multifactorial and since the genetic component is likely to be described by the interactions of several genes involved in the disease pathway, each predisposing imperceptibly to the disease. Indeed it became soon clear that also the traditionally called simple, "Mendelian" diseases are not simple at all to be fully disentangled (as clearly demonstrated by the always possible gene-gene or gene-environment interaction present for cystic fibrosis, for example).
To identify the common genetic variants relevant for the pathogenesis of EH and TODs, Hypergenes will perform a Genome wide association (GWA) study of 4.000 subjects recruited from historical well-characterized European cohorts. There is now relatively shared agreement on the allelic architecture of multifactorial traits. GWA studies can identify common variants contributing to the inherited component of common diseases. Almost all such variants have modest effect and their impact and predictive power appears limited. The most striking observation is the marked disparity between the extent of the familial aggregation observed for the phenotype and that attributable to the genetic variants. One potential possible way forecasted to find difficult to detect alleles, but important since they generate a high sibling relative risk is through novel resequencing technologies, allied to large-scale association testing. In the first instance, such efforts will be targeted to genes already implicated in disease susceptibility (Nat Reviews Genet 2008).
Genotyping is being performed with the highly informative Illumina Human 1M BeadChip.
Well-established multi-variate techniques and innovative genomic analyses through machine learning techniques will be used for the GWA investigations. Using machine learning approach we aim at developing a disease model of EH integrating the available information on EH and TOD with relevant validated pathways and genetic/environmental information to mimic the clinician’s recognition pattern of EH/TOD and their causes in the individual patient.
The statistical design is with two samples run in parallel, each with 1,000 cases and 1,000 controls, followed by a replication/joint analysis. This design is more powerful than replication alone and allows also a formal testing of the potential heterogeneity of findings compared to a single step (one large sample) design.
The results of the GWA will be the source to build a customized and inexpensive genetic diagnostic chip that can be validated in the project existing cohorts.
HYPERGENES is in the unique position to propose a ground-breaking project, improving the methodology of genetic epidemiology of chronic complex diseases that have a high prevalence among EU populations.
Designing a comprehensive genetic epidemiological model of complex traits will also help us to translate genetic findings into improved diagnostic accuracy and new strategies for early detection, prevention and eventually personalized treatment of a complex trait.
The project's Technical and Scientific objectives are the following:
To fulfil the scientific and technological objectives previously defined, and to guarantee the mandatory multidisciplinary approach, in addition to the involvement of Centres of Excellence the project Consortium involves key actors in several different areas. Among these:
HYPERGENES project is structured in three steps:
During the first year of the project, the activities performed were part of the Discovery phase, and were mainly activities focused in settling the genetic, clinical, epidemiological and technical infrastructure that will support the storage, integration and analysis of data produced in these three domains.
In particular the following list reports a summary of the specific achievements obtained in the main work packages in which the discovery phase is broken down:
Workpackage 1 “Setting up the stage”:
Workpackage 2 “Data integration infrastructure-Biomedical Information Infrastructure (BII) design and development”:
The main efforts, in the first months of the project, were devoted to define proper procedures to transform the collected data formats and to integrate them into the BII. The clinical and environmental data structure was then designed by constraining three standards: (1) Clinical Document Architecture holds the majority of these type of data and serves as a single instance for each subject; (2) Genetic Variation holds genetic data resulting from testing subject with the lab-on-chip being developed in this project; and (3) Pedigree holds family history data when available in a structured way. The HL7 v3 Reference Information Model (RIM) is used to derive a specific data model for the data warehouse – a core component of a Biomedical Information Infrastructure (BII). The following issues have been addressed in the implementation of the project data warehouse.
Data entry process:
Data Access:
Workpackage 3 “Genome Wide Genotyping”:
A strong effort was committed during the project in activities in preliminary test and quality checks between the two laboratories in charge of the genetic analysis. After these checks the two laboratories have been focused in the genotyping of the Discovery Sample DNA (2000 cases and 2000 controls) including Caucasian samples from all over Europe.
Workpackege 5 “Development of machine learning techniques for genomic analysis”:
Two approaches have been taken: One aiming at deriving generalization as to which SNPs are relevant for diseases based on a higher level analysis in the meta-features space. A second approach based on large-scale information-theoretic analysis is currently being applied. Preliminary results appear interesting.
Workpackage 11 “Project Dissemination”:
The project partners participated to several meeting and congresses and interviews concerning the HYPERGENES project, its characteristics and its objectives. Within the project, the mobility of PhD students and young researchers has been promoted within the project research areas.
An appropriate project corporate image was selected during the first months of HYPERGENES; the project website was implemented, including a section for the general public and a section reserved to HYPERGENES Consortium Members.
CP-CSA-NoE 2nd Periodic Report Summary
Period covered: 01/01/2009 - 31/12/2009
The project is focused on the definition of a comprehensive genetic epidemiological model of complex traits like Essential Hypertension (EH) and intermediate phenotypes of hypertension dependent/associated Target Organ Damages (TOD) as well as other endophenotypes as the pharmacogenomic pattern of two drugs widely used in EH, namely hydrochlorothiazide and losartan.
The discovery of the genetic component in common complex diseases is extremely challenging since most of them are multifactorial and since the genetic component is likely to be described by the interactions of several genes involved in the disease pathway, each predisposing imperceptibly to the disease. HYPERGENES adopts the Genome Wide Association (GWA) approach to identify common variants contributing to the inherited component of common diseases.
The HYPERGENES project is structured in three steps:
After the first two years of the HYPERGENES project reached the end of the Discovery Phase and has just entered into the Validation Phase.
The Discovery Phase has been focused in building the methodological and technical framework to support the Genome Wide Association Analysis, that were performed on 4.000 Caucasian subjects recruited from historical well-characterized European cohorts, characterized by different ethnicities (North-Western Europe, Eastern Europe, South Europe and Sardinia), responding to the project definition of Cases and Controls.
The Hypergenes Consortium decided to maintain a very neat separation between cases and controls, selecting cases among well defined hypertensives and controls among normotensives with little chances to develop hypertension later in life.
A great effort has been spent on dealing with methodological issues related to the setting up and interpretation of data. The need of integrating the observations obtained from different studies posed significant challenges which were faced through an integrated epidemiological and bioinformatics approach.
All these efforts have lead to the development of the Biomedical Information Infrastructure (BII), which is the platform developed to support the entry, persistency and retrieval of data and knowledge relevant to EH, including clinical, environmental and genotypic data.
The Genotyping have been performed on high throughput Illumina technologies, thanks the coordinated efforts of the Laboratories of University of Milan and Lausanne. The analysis involved in the project followed different methodologies for genetic analysis, including classical and machine learning techniques, to produce an enriched list of SNPs (single nucleotides polimorphisms), that resulted associated with EH or TOD, or however other endophenotypes relevant to hypertension.
The case-control association study conducted on EH lead to about three hundred significant associations, which were only partially overlapping with the results of previous studies.
TOD and endophenotypes were analysed as quantitative traits, and preliminary results include hundreds of significant associations for each phenotype considered. As for EH, only some of the identified genes were already described in previous studies.
To re-test specific genetic variations that we found associated to Hypertension or to the TOD or other endophenotypes within the Hypergenes project (as well as in previous studies or from a priory knowledge prposed by the participants to the HYPERGENES project), we then designed an Illumina Custom Infinium chip holding 15,000 SNPs. The chip will be used to validate the results on a new independent sample.
Within the next year of the project we plan to verify the findings obtained in the Discovery Sample on such independent Confirmation sample. This will be followed by the definition of a disease model and the design of a Lab-on Chip, containing only the most informative SNPs derived from the previous activities.