Monday, 20 February 2017

Thing 19: Text and data mining

I've seen Georgina's post on this topic.  Its most important advice is likely to be this:

"If you wish to use TDM in your work, we highly recommend that you ensure you are doing so legally and that you contact likeminded folk such as the team at ContentMine to ask for advice."

Will do.  My own post, like those of other participants, has to be short for want of experience.  I have not had occasion to use data-mining in my own work, but I now know that any research query that sounds as though data mining would help towards the answer is a matter for ContentMine.

Meanwhile, I suppose I get a frisson of what data mining is like when I dabble in Google Books' Ngram Viewer.  This enables the user to search vast numbers of books for the occurrence of phrases.  By it I have satisfied my idle curiosity as to the frequency of use of the locution "And Oh!" (it seems to have peaked in 1842 and then slowly declined), and the relative frequency of the phrases "railway station" and "train station" (the latter overtook the former in 1994, and peaked in 2000; they now seem to be rapidly converging again).  But I am not an expert user of this site, and I increased my knowledge of it around 150% in the past hour, revisiting it for this post.

1 comment:

