Imagine a future in which databases are populated with accurate, valid, exhaustive, rapidly updated data where users find what they want all the time; where drug discovery costs and development time are slashed and animal experimentation is reduced through early identification of unpromising paths; where new insights are gained through integration and exploitation of experimental results, databases, and scientific knowledge; where product development archives and patents yield new directions for R&D; and where searching yields facts rather than documents to read.
This is the potential of text mining.
The JISC, BBSRC AND EPSRC have announced funding of 1m to establish a National Centre for Text Mining. The remit of the Centre, the first publicly funded centre in the world, is to contribute to the associated national and international research agenda, to establish a service for the wider academic community, and to make connections with industry.
Text mining attempts to discover new, previously unknown information by applying techniques from natural language processing, data mining, and information retrieval:
> To identify and gather relevant textual sources
> To analyse these to extract facts involving key entities and their properties
> To combine the extracted facts to form new facts or to gain valuable insights.
Text mining finds applications in many diverse areas of wide interest such as drug discovery and predictive toxicology, protein interaction, competitive intelligence, protection of the citizen, identification of new product possibilities, detection of links between lifestyle and states of health, and many more.
Led by UMIST, the National Centre for Text Mining will be run by an internationally leading consortium. The consortium has four UK partner institutions: UMIST, the Victoria University of Manchester , the University of Liverpool, and the University of Salford. These core partners are extended by international partners: the University of California Berkeley, the University of Geneva, the San Diego Supercomputing Centre, and the University of Tokyo, with the European Bioinformatics Institute having presence on the Technical Directorate. It is anticipated that the Centre will engage as part of the related emerging networks of excellence.
The Centre will be initially focused on biological and biomedical science. This area of science has the largest user community and the fastest growing literature, and the area where most applications research in text mining is being undertaken. At the same time, the tools developed by the Centre will be of interest and relevant to the needs of the wider academic community. A major challenge for the Centre will be to handle efficiently and robustly very large volumes of text and the intermediate data produced while processing.
Follow me on Twitter: @IanYorston