US mini logoHome | A-Z Index | People | Reference | Contact us
University of Sussex
About | People | Projects | Doctoral Programme | Seminar Series | Resources

Rule-based Protein Term Identification with Help from Automatic Species Tagging

Speaker

Xinglong Wang

Affilliation

Edinburgh

Abstract

In biomedical articles, protein mentions often refer to different protein entities. For example, an arbitrary occurrence of term p53 might denote thousands of proteins across a number of species. A human annotator is able to resolve this ambiguity relatively easily, by looking at its context and if necessary, by searching an appropriate protein database. However, this phenomenon may cause much trouble to a text mining system, which does not understand human languages and hence can not identify the correct protein that the term refers to. In this paper, we present a Term Identification system which automatically assigns unique identifiers, as found in a protein database, to ambiguous protein mentions in texts. Unlike other solutions reported in the literature, which work on gene/protein mentions in a specific model organism, our system is able to tackle protein mentions across many species, by integrating a machine-learning based species tagger. We have compared performance of our automatic system to that of human annotators, with very promising results.

see also

Site maintained by: John Carroll Disclaimer | Feedback