Instinctive Mining Of Protein Names From Biomedical Text
Author(s)
B.V.Subba Rao, Dr.K.V.Sambasiva Rao
Published Date
September 11, 2024
DOI
your-doi-here
Volume / Issue
Vol. 5 / Issue 3
Abstract
Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. With the increasing amount of biomedical text, there is a need for automatic extraction. of information to support biomedical researchers. Due to incomplete biomedical information databases, the extraction is not straightforward using dictionaries, and several approaches using contextual rules and machine learning have previously been proposed. Our work is inspired by the previous approaches, but is novel in the sense that it is fully automatic and does not rely on expert tagged corpora. The main ideas are 1) unigram tagging of corpora using known protein names for training examples for the protein name extraction classifier and 2) tight positive and negative examples by having protein- related words as negative examples and protein names/ synonyms as positive examples. We present preliminary results on Medline abstracts about gastrin, further work will be on testing the approach on BioCreative benchmark data sets.
View Full Article
Download or view the complete article PDF published by the author.