Assessment of disease named entity recognition on a corupus of annotated sentences
Visualitza/
Impacte
Scholar |
Altres documents de l'autoria: Jiménez Ruiz, Ernesto; Jimeno Yepes, Antonio José; Lee, Vivian; Gaudan, Sylvain; Berlanga Llavori, Rafael; Rebholz-Schuhmann, Dietrich
Metadades
Mostra el registre complet de l'elementcomunitat-uji-handle:10234/9
comunitat-uji-handle2:10234/7038
comunitat-uji-handle3:10234/8634
comunitat-uji-handle4:
INVESTIGACIONMetadades
Títol
Assessment of disease named entity recognition on a corupus of annotated sentencesAutoria
Data de publicació
2008Editor
BioMed CentralISSN
14712105Tipus de document
info:eu-repo/semantics/articleVersió de l'editorial
http://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-S3-S3Versió
info:eu-repo/semantics/publishedVersionParaules clau / Matèries
Resum
Background: In recent years, the recognition of semantic types from the biomedical scientific literature has been
focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other ... [+]
Background: In recent years, the recognition of semantic types from the biomedical scientific literature has been
focused on named entities like protein and gene names (PGNs) and gene ontology terms (GO terms). Other semantic
types like diseases have not received the same level of attention. Different solutions have been proposed to identify
disease named entities in the scientific literature. While matching the terminology with language patterns suffers from
low recall (e.g., Whatizit) other solutions make use of morpho-syntactic features to better cover the full scope of
terminological variability (e.g., MetaMap). Currently, MetaMap that is provided from the National Library of Medicine
(NLM) is the state of the art solution for the annotation of concepts from UMLS (Unified Medical Language System) in
the literature. Nonetheless, its performance has not yet been assessed on an annotated corpus. In addition, little effort
has been invested so far to generate an annotated dataset that links disease entities in text to disease entries in a database,
thesaurus or ontology and that could serve as a gold standard to benchmark text mining solutions.
Results: As part of our research work, we have taken a corpus that has been delivered in the past for the identification
of associations of genes to diseases based on the UMLS Metathesaurus and we have reprocessed and re-annotated the
corpus. We have gathered annotations for disease entities from two curators, analyzed their disagreement (0.51 in the
kappa-statistic) and composed a single annotated corpus for public use. Thereafter, three solutions for disease named
entity recognition including MetaMap have been applied to the corpus to automatically annotate it with UMLS
Metathesaurus concepts. The resulting annotations have been benchmarked to compare their performance.
Conclusions: The annotated corpus is publicly available at ftp://ftp.ebi.ac.uk/pub/software/textmining/corpora/diseases
and can serve as a benchmark to other systems. In addition, we found that dictionary look-up already provides
competitive results indicating that the use of disease terminology is highly standardized throughout the terminologies and
the literature. MetaMap generates precise results at the expense of insufficient recall while our statistical method obtains
better recall at a lower precision rate. Even better results in terms of precision are achieved by combining at least two
of the three methods leading, but this approach again lowers recall. Altogether, our analysis gives a better understanding
of the complexity of disease annotations in the literature. MetaMap and the dictionary based approach are available through the Whatizit web service infrastructure (Rebholz-Schuhmann D, Arregui M, Gaudan S, Kirsch H, Jimeno A: Text
processing through Web services: Calling Whatizit. Bioinformatics 2008, 24:296-298) [-]
Publicat a
BMC Bioinformatics; 9 , Suppl 3:3Drets d'accés
© Jimeno et al.; licensee BioMed Central Ltd. 2008
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Apareix a les col.leccions
- LSI_Articles [362]