general
prion
malaria
   
data mining
evolution
genomics
structure
 

Database management and knowledge acquisition by data and text mining

Research project

   Databases can only provide the user with information that has been entered and indexed into a computer file. Unfortunately data deposition is only obligatory for sequences and three dimensional atomic coordinates. A large amount of other experimental data (e.g. mutation data, ligand binding data, expression data, etc.) is still spread out and buried in the literature, and thus difficult to access. This data has to be manually retrieved and extracted that implies a lot of time, people and thus money.

 

   Consequently, the next step in data handling is the design of techniques for the automated extraction of biologically relevant information from the literature. We are currently implementing a method to retrieve and extract point mutations from Medline abstracts and full text papers. The extracted data is then validated with plausibility filters. For each mutation, the filters retrieve the corresponding molecule name, organism type and sequence.

 

   We applied the method to G protein-coupled receptors and nuclear receptors. The first results show that more than 75% of the extracted and validated point mutations are correct (1). Preliminary results for nuclear hormone receptors are available via the mutation data section of the NucleaRDB (2).


 



home-research-people-publications-media releases-software-links-contact

 

webmaster@cmpharm.ucsf.edu
Last modified by Erik Ellestad on 26 June 2002
Copyright 2002 The University of California, San Francisco, CA
Dept. of Cellular and Molecular Pharmacology