Knowledge based word-concept model estimation and refinement for biomedical text mining

Jimeno Yepes, Antonio José; Berlanga Llavori, Rafael

dc.contributor.author	Jimeno Yepes, Antonio José
dc.contributor.author	Berlanga Llavori, Rafael
dc.date.accessioned	2016-02-22T10:05:58Z
dc.date.available	2016-02-22T10:05:58Z
dc.date.issued	2014-12
dc.identifier.citation	YEPES, Antonio Jimeno; BERLANGA, Rafael. Knowledge based word-concept model estimation and refinement for biomedical text mining. Journal of biomedical informatics, 2015, 53: 300-307.	ca_CA
dc.identifier.uri	http://hdl.handle.net/10234/150910
dc.description.abstract	Text mining of scientific literature has been essential for setting up large public biomedical databases, which are being widely used by the research community. In the biomedical domain, the existence of a large number of terminological resources and knowledge bases (KB) has enabled a myriad of machine learning methods for different text mining related tasks. Unfortunately, KBs have not been devised for text mining tasks but for human interpretation, thus performance of KB-based methods is usually lower when compared to supervised machine learning methods. The disadvantage of supervised methods though is they require labeled training data and therefore not useful for large scale biomedical text mining systems. KB-based methods do not have this limitation. In this paper, we describe a novel method to generate word-concept probabilities from a KB, which can serve as a basis for several text mining tasks. This method not only takes into account the underlying patterns within the descriptions contained in the KB but also those in texts available from large unlabeled corpora such as MEDLINE. The parameters of the model have been estimated without training data. Patterns from MEDLINE have been built using MetaMap for entity recognition and related using co-occurrences. The word-concept probabilities were evaluated on the task of word sense disambiguation (WSD). The results showed that our method obtained a higher degree of accuracy than other state-of-the-art approaches when evaluated on the MSH WSD data set. We also evaluated our method on the task of document ranking using MEDLINE citations. These results also showed an increase in performance over existing baseline retrieval approaches.	ca_CA
dc.description.sponsorShip	The work was supported by the CICYT Project TIN2011–24147 from the Spanish Ministry of Economy and Competitiveness (MINECO).	ca_CA
dc.format.extent	37 p.	ca_CA
dc.format.mimetype	application/pdf	ca_CA
dc.language.iso	eng	ca_CA
dc.publisher	Copyright © 2014 Elsevier Inc.	ca_CA
dc.relation.isPartOf	Journal of biomedical informatics, 2015, 53: 300-307	ca_CA
dc.rights	Copyright © 2014 Elsevier Inc.	ca_CA
dc.rights.uri	http://rightsstatements.org/vocab/InC/1.0/	*
dc.subject	word-concept probability	ca_CA
dc.subject	text mining	ca_CA
dc.subject	word sense disambiguation	ca_CA
dc.subject	information retrieval	ca_CA
dc.subject	biomedical literature	ca_CA
dc.title	Knowledge based word-concept model estimation and refinement for biomedical text mining	ca_CA
dc.type	info:eu-repo/semantics/article	ca_CA
dc.identifier.doi	10.1016/j.jbi.2014.11.015
dc.rights.accessRights	info:eu-repo/semantics/openAccess	ca_CA
dc.relation.publisherVersion	http://www.sciencedirect.com/science/article/pii/S1532046414002676	ca_CA
dc.edition	Preprint, versió de l'autor	ca_CA
dc.type.version	info:eu-repo/semantics/submittedVersion

Ficheros en el ítem

Nombre:: jbi_jimeno_berlanga_preprint.pdf
Tamaño:: 598.0Kb
Formato:: PDF
Descripción:: Versió preprint

Ver/Abrir

Este ítem aparece en la(s) siguiente(s) colección(ones)

LSI_Articles [362]
Articles de publicacions periòdiques escrits per professors del Departament de Llenguatges i Sistemes Informàtics

Mostrar el registro sencillo del ítem