I spent yesterday at a one-day text and data mining (TDM) symposium in Cambridge. The event brought together researchers, librarians and publishers to discuss the challenges, triumphs and future of TDM in the UK and wider research landscape. The contexts in which presenters were using TDM ranged from systematic literature reviews in medicine to digital humanities and palaeography, from chemistry to libraries. The full programme can be found here.
I want to focus on my two main takeaways from the symposium.
Firstly, the use for TDM most immediately relevant to me is its application in systematic reviews. Researchers are already using TDM at every stage of the systematic review process, particularly during the searching and screening stages, and presenters at the symposium such as Alison O’Mara-Eves and John McNaught were enthusiastic about the vast savings in time and money that could be made were TDM to become a standard part of every systematic review conducted. In order to rigorously test the effectiveness of TDM in systematic reviews, both presenters (along with Makoto Miwa and Sophie Ananiadou) produced their own systematic review, in which they noted, among other things, that while TDM saved an enormous amount of time and money, the trade-off was a loss of around 5 per cent of references that would have been picked up using ‘traditional’ systematic review searching methods. That is, TDM software is only as good as the terms that are put into it, and the human being ‘training’ the software to find relevant data.
Many TDM tools allow for some level of human involvement.
The second theme of the symposium of most immediate relevance was that librarians need to get involved in TDM in whatever way we can — while retaining a level of critical skepticism about its value (sure, it might save time, but can we really afford to lose those crucial 5 per cent of papers that would have been found using traditional searching methods?). As Georgina Cronin pointed out, librarians may not have the expert technical skills (such as writing TDM code) to be able to carry out every aspect of TDM, but we can facilitate its use. There is a role for librarians to play at every point in the TDM workflow.
Librarians should also remember that we have particular skills and expertise that can be of use with TDM in less obvious ways: we are good at finding information, communicating that information to others, and bringing people together — so why not use those skills in support of text and data miners working in our institutions?
So far I’m still chewing over everything I learnt at the symposium, but I suspect it will spark a few changes at my library. I’ve been having ongoing conversations (and debates) with colleagues about the applicability and validity of using TDM in systematic reviews in particular, and we have yet to come to a conclusion. I am also in the process of figuring out how to fit the University’s new TDM LibGuide into my own library’s existing research support resources and training — expect to see it somewhere on our website and other training materials soon! While I’m still in the process of coming to my own conclusions about TDM and its role in my profession, I am certain that librarians need to be in the room when discussions about TDM are taking place, and that we need to keep abreast of new resources and developments in the field. If not, we run the risk of consensus on this issue being formed without us — and surely it’s better to be part of the conversation?