PsyGeNET database information
PsyGeNET contains information on eight psychiatric disorder classes, namely:
|Long Name||Short Name||Acronym|
|Alcohol use disorders||Alcohol UD||AUD|
|Bipolar disorders and related disorders||Bipolar disorder||BD|
|Schizophrenia spectrum and other psychotic disorders||Schizophrenia||SCHZ|
|Cocaine use disorders||Cocaine UD||CUD|
|Substance induced depressive disorder||SI-Depression||SI-DEP|
|Cannabis use disorders||Cannabis UD||CanUD|
|Substance induced psychosis||SI-Psychosis||SI-PSY|
Each psychiatric disorder class has been defined using concepts from the UMLS Metathesaurus. The information on the gene-disease association is then referred to each specific disease UMLS concept.
PsyGeNET data is classified in Psycur15 (from the first release of PsyGeNET) and Psycur16 (current 2.0 release). Note that all the information contained in the database has been curated by experts.
Psycur15: Genes associated to alcohol use disorders, bipolar disorders and related disorders, depressive disorders and cocaine use disorders. It contains 1537 associations between 579 genes and 32 psychiatric disease concepts. The information has been extracted from the literature by text mining, followed by expert curation. The curation process has been described in Gutiérrez-Sacristán et al.,Bioinformatics 2015, from data extracted from MEDLINE abstracts from 1980 and 2013.
Psycur16: Genes associated to 8 psychiatric disease classess (see PsyGeNET diseases). The information has been extracted from MEDLINE (1980 to 2015) using BeFree and curated by domain experts (more details in PsyGeNET curation process).
The current version of PsyGeNET (v2.0) contains 3,771 associations, between 1,549 genes and 117 diseases (UMLS CUIs describing: alcohol use disorders, bipolar disorders and related disorders, depressive disorders, schizophrenia spectrum and other psychotic disorders, cocaine use disorders, substance- induced depressive disorder, cannabis use disorders, substance induced-psychosis).
Genes associated to each psychiatric disease class
Genes shared between pairs of psychiatric disease classes.
Barplot showing number of genes per disease class (blue) and number of genes unique to each disease
Genes disease association according to each disease class.
(*) All the previous graphics have been generated with psygenet2r package.
During curation of gene-disease associations (GDAs), we found publications that supported the association between the gene and the disease while other works found just the opposite (that the gene is not associated to the disease). The latter is what is generally referred as a negative finding in the literature. In PsyGeNET we think that it is important to keep track of both “positive” and the “negative” findings, and let the user make their own judgements based on the available evidence. Thus, for each GDA and each supporting publication, we include the Association type to provide this information. According to the evidence, there are two types: “Association” and “No Association” (e.g. the “negative findings”). This information is available in the “All associations evidences” tab.
In addition to indicate the association type, we reflect the variety in the evidences for a gene-disease association in the “Evidence index” (EI). This index, like a traffic light, is green when all the evidences reviewed by the experts support the existence of an association between the gene and the disease (Association, EI = 1), is yellow when there is contradictory evidence for the GDA (some publications support the association while others publications do not support it, 1 > EI > 0), and is red when all the evidences reviewed by the experts report that there is no association between the gene and the disease (Association, EI = 0). Note that the experts validated a maximum of 5 publications for each GDA. The set of 5 publications was selected as the most recent ones.
This information is available in the “Summary of All Associations” tab in a numeric format. Note that given a set of genes of interest, psygenet2r package allows to visualize the evidence index in a heatmap where genes are located in the X axis and disorders in the Y axis, and the cell color will be red, yellow or green according to the EI value (more details in R package: psygenet2r).
Each disorder class in PsyGeNET, namely Schizophrenia or Depression, is defined as a set of diseases identified by UMLS CUIs. Some of these diseases (UMLS CUIs) are associated to several genes in the same disease class while other diseases are associated to a reduced number of genes. The Disease Load is a measure of this property of the diseases. It is the fraction of the number of genes associated to a disease over the total number of genes associated to a disease class. For example, the Schizophrenia class is defined by 24 UMLS concepts. One of these concepts, Schizophrenia (umls:C0036341) has the larger Disease share in its class (0.95) because is annotated to 861 from the total number of 903 genes. On the other hand, Catatonic schizophrenia (umls:C0036344) has a smaller share of genes since is associated to 4 genes.
The formula of the Disease share is as follows:
For the update of the PsyGeNET database the process that has been followed involves: i) the recruitment of a team of experts to curate the information extracted by text-mining; ii) the extraction of information of gene-disease associations (GDAs) from the literature using the text mining system BeFree (Bravo et al., 2015), iii) the development of a curation workflow iv) the development of a web-based annotation tool in order to facilitate the curation task v) the definition of detailed guidelines to assist the curation task.
We put in place a curation workflow including a pilot phase and two curation and analysis phases (see Figure 1). During the pilot phase, the initial training of the curators was carried out including how to use the curation tool. After this process both the curation tool and the annotation guidelines were improved and the first curation phase was launched (Curation Phase I), to evaluate 2,507 GDCAs identified by text mining and supported by 4,065 publications (from 1980 to 2015). The results of the curation were analyzed to estimate the inter-annotator agreement at the level of abstract. The validations for which an agreement was not found in Curation Phase I are then reviewed by a third expert during Curation Phase II (results not reported here). Four experts participate in this phase. Only the validations for which agreement of at least 2 experts was found have been included in the database. For more detailed information on the process check this publication: Gutiérrez-Sacristán et al. Text mining and expert curation to develop a database on psychiatric diseases and their genes. Proceedings of the 7th International Symposium on Semantic Mining in Biomedicine. Potsdam, Germany, August 4-5, 2016
A team of 22 experts from different domains (such as psychiatry, neuroscience, medicine, psychology and biology) was recruited from the Spanish Network of Addiction and other collaborators of the coordination team (Research Group on Integrative Biomedical Informatics (GRIB)) to participate in the curation process.
|Marta Portero Tresserra||Universitat Pompeu Fabra|
|Olga Valverde||Universitat Pompeu Fabra Grup de Recerca en Neurobiologia del Comportament (GReNeC). Departament de Ciències Experimentals i de la Salut|
|Antonio Armario||Universitat Autònoma de Barcelona|
|Mª Carmen Blanco Gandía||Universitat de Valencia Department of Psychobiology, Facultad de Psicología|
|Adriana Farré||Hospital del Mar, Medical Research Institute (IMIM) Institut de Neuropsiquiatria i Addiccions|
|Lierni Fernández-Ibarrondo||Hospital del Mar, Medical Research Institute (IMIM)|
|Francina Fonseca||Hospital del Mar, Medical Research Institute (IMIM) Institut de Neuropsiquiatria i Addiccions; Departament de Psiquiatrai, Universitat Autònoma de Barcelona|
|Jesús Giraldo||Universitat Autònoma de Barcelona, Institut de Neurociències and Unitat de Bioestadística Network Biomedical Research Center on Mental Health (CIBERSAM)|
|Angela Leis Machín||Universitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)|
|Anna Mané Santacana||Hospital del Mar, Medical Research Institute (IMIM) Department of Neuroscience and Psychiatry; Centro de Investigación en Red de Salud Mental (CIBERSAM)|
|Miguel A. Mayer||Universitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)|
|Sandra Montagud Romero||Universitat de Valencia Department of Psychobiology, Facultad de Psicología|
|Roser Nadal||Universitat Autònoma de Barcelona Institut de Neurociències and Psychobiology Unit|
|Jordi Ortiz||Universitat Autonoma de Barcelona Neurocience Institute and Departament of Biochemistry and Molecular Biology|
|Francisco Javier Pavon-Moron||Hospital Regional Universitario de Málaga-Universidad de Málaga Unidad Gestión Clínica de Salud Mental. Instituto de Investigación Biomédica de Málaga (IBIMA)|
|Ezequiel Jesús< Pérez Sánchez||Parc de Salut Mar, Barcelona Institut de Neuropsiquiatria i Addiccions (INAD)|
|Marta Rodriguez-Arias||Universitat de Valencia Department of Psychobiology, Facultad de Psicología|
|Antonia Mª Serrano Criado||Hospital Regional Universitario de Málaga-Universidad de Málaga Unidad Gestión Clínica de Salud Mental. Instituto de Investigación Biomédica de Málaga (IBIMA)|
|Marta Torrens||Hospital del Mar, Medical Research Institute (IMIM) Institut de Neuropsiquiatria i Addiccions; Departament de Psiquiatrai, Universitat Autònoma de Barcelona|
|Vincent Warnault||Universitat Pompeu Fabra Department of Experimental and Health Sciences|
|Alba Gutierrez-Sacristan||Universitat Pompeu Fabra (UPF)- Department of Experimental and Health Sciences Research Programme on Biomedical Informatics (GRIB)- Hospital del Mar Medical Research Institute (IMIM)|
|Laura I. Furlong|