Monday, 20 February 2017

Thing 19: Text and data mining

I've seen Georgina's post on this topic.  Its most important advice is likely to be this:

"If you wish to use TDM in your work, we highly recommend that you ensure you are doing so legally and that you contact likeminded folk such as the team at ContentMine to ask for advice."

Will do.  My own post, like those of other participants, has to be short for want of experience.  I have not had occasion to use data-mining in my own work, but I now know that any research query that sounds as though data mining would help towards the answer is a matter for ContentMine.

Meanwhile, I suppose I get a frisson of what data mining is like when I dabble in Google Books' Ngram Viewer.  This enables the user to search vast numbers of books for the occurrence of phrases.  By it I have satisfied my idle curiosity as to the frequency of use of the locution "And Oh!" (it seems to have peaked in 1842 and then slowly declined), and the relative frequency of the phrases "railway station" and "train station" (the latter overtook the former in 1994, and peaked in 2000; they now seem to be rapidly converging again).  But I am not an expert user of this site, and I increased my knowledge of it around 150% in the past hour, revisiting it for this post.

Monday, 13 February 2017

Thing 18: Research data management (RDM)

This is where I hit the '23 research things' hard stuff.  Research data management is an area where I have clearly more to learn than things like Twitter.  I draw comfort from other participants' blogs, in which similar confessions are made.

I'd better start by copying out those four types of data that Georgina names in the post under that link above:

- Observational – so data captured in real-time that is usually irreplaceable and can include anything from survey data to images of someone’s brain. 

- Experimental – this can be data from lab work which can be reproducible such as gene sequences. 

- Simulation – this can be data generated from test models where the models themselves are sometimes more important than the results, such as climate or economic models. 

- Derived or compiled data – this is data that is reproducible and can include 3D models, text and data mining, and compiled databases.

A waterfall of consciousness notes that data management begins with "personal diaries, work e-mails, holiday snapshots and, even, home videos".  Emboldened by that, let me tell how I have managed data at such a level.

Personal diaries.  Mine go back, in an unbroken series, to 1 January 1969,  I keep them all together, in a reasonably consistent compromise between size and chronological order, and I usually have to difficulty finding a particular year.  I say "usually" because my latest search of them failed.  The unfound years are presumably buried on my desk somewhere, and maybe it's time for a spring-clean.

Work emails.  For these I have many folders.  When I answer an email, I incorporate the incoming email into my reply, save that thread, and delete the incoming email.  This works, though not as well as it used to, by reason of the sheer volume of email to deal with.

Word documents and spreadsheets.  In the days when my Word documents were mostly letters, naming them was an easy matter of applying the date, written yymmdd, plus a sequential number for letters written on that date, and the file type extension ("17021201.doc").  An adaptation of that applies to things that fit into a regular sequence ("170212minutes.doc").  But I'm a bit indecisive with other files, leaving the form of the date and the relative position of date and content liable to variation ("170212members.xls", "Events Feb 2017.doc").  I need to get a grip.

Money statements.  Domestic money statements I've got more or less under control.  They're still on paper, for the most part, and I shred them after set intervals (three months for those relating to food and cash withdrawals, two years for those relating to utilities, six years for all others).  Bank statements and similar I keep in separate files for each account; bills &c from other organisations I keep in an A/Z sequence by organisation, and within each organisation in reverse chronological order.  All this is a consequence of reading Taming the paper tiger at home by Barbara Hemphill, which I'll have read ca 2002.

Poetic output.  This I keep track of using a card-index system I devised in 1994.  The card fronts show poem title, number of lines, and year of composition; the backs show where I've submitted the poem and when, and the outcome of the submission.  While a poem's being considered by an editor or competition adjudicator, I flag the card with a yellow sticker. If an unpublished poem is between submissions, I flag the card with a blue sticker.  If I'm lucky enough to have the poem published, or placed in a competition, I mark the card front with a diagonal red line and the place of this success.  And all this information is necessary.  Poetry competitions often have rules about number of lines, and about the ineligibility of poems that have been already published.

So all the above attempts at data management are creaking, and the paper-based ones will have to be replaced with electronic equivalents sooner or later.  The poetry card index might be worth replacing with a database -- something I've had some training in, but never actually made.  An alternative might be to mark the information among the properties of the file, but I can see two disadvantages to that: the risk of information loss between file versions, and the amount of digging that would need to be done in order to get at the information.

Further research needed.  What a thing for Love Your Data Week!

Monday, 6 February 2017

Thing 17: Survey tools (ii)

An update on what I posted on 8 January.

I said then I'd created a survey, using Google Forms, on the subject of superhero powers.  It has drawn a single response, which is obviously not an adequate statistical sample.  100% of respondents named telekinesis as the superpower they hadn't got, and television as the source of their hearing about it; they were disappointed not to have this power, had aspired to it, and, for the story behind their aspiration, gave the following statement:

"Moving equipment would be a gesture away! Computers would be instantly fixed because I willed it to happen."

Let us press on to Georgina's recommended application, Qualtrics.  This is now live for me, and I've had a go at creating a survey using it. It's a spoof, I admit, and neither an illuminating survey nor particularly funny, but had me exploring large areas of Qualtrics that I shall be able to evaluate properly as I use them, or not, in creating surveys in earnest for the Haddon. They will be no worse for this dash of prior Qualtrics experience.

Sunday, 8 January 2017

Thing 17: Survey tools (i)

This post is an interim progress report.  I have, as Georgina bids us, applied for a Qualtrics account; I have made a survey using Google Forms while waiting for Qualtrics to arrive, and tweeted links to said survey.  Later, I will do a 'Thing 17: Survey tools (ii)' post, reporting on how I find Qualtrics and what answers, if any, my Google Forms survey has drawn.

The survey takes its cue from the 2017 Libraries at Cambridge conference. Registrants at the conference were invited to indicate what their superhero powers were, and a few did.  But the conference included much -- a panel discussion and a well-received keynote address -- on the theme of failure.  My survey asks respondents about superpowers they don't have.

Reading the posts of other '23 research things' participants has been most instructive, particularly Luther's notes about the limitations of surveying as a technique and the usefulness of other methods. He is quite right to note the ease with which spurious survey returns can be created; conversely, a toxic situation can be inflamed by insinuations that some survey returns are bogus, even if the insinuations have no basis in fact.  Luther's reference to 'grounded theory' took me on to unfamiliar territory, and I look forward to exploring this further.  Trying to relate it to my my own experience, I suppose it was something like grounded theorizing when I examined the free-text responses to a Haddon Library user survey, and when I asked library users informally why they thought a particular teaching session had had zero take-up.  In both cases I was trying to see if any patterns emerged.

Of these things, more when Qualtrics is in.  And any superpower survey responses.

Sunday, 1 January 2017

Thing 16: Crowdsourcing and citizen science


Georgina's post for this Thing has led me to re-familiarise myself with Kickstarter and Patreon, on both of which I have responded to appeals in recent years.  So far as I can tell, the main difference between the two sites is reflected in their names: Kickstarter enables contributions to get projects started, Patreon enables long-term support.  Composer Kathryn Rose has released some good music, and music in progress, via Patreon, and interesting accounts on her blog and Twitter stream about how the site impinges on the creative process and her business model.

I can see I'd do well to follow more on both sites.

"Do you have an idea for a project that could be crowdfunded?" asks the post.  Sorry, no.  I am intrigued to read the caveats from A Waterfall of Consciousness and Library Spiel about the risk that crowdfunded projects may repel friends and regular funders, especially noting that in Library Spiel's case this caveat is evidently based on experience.  Here's an idea that I did have at one time; it has now run out of steam, and crowdfunding is not sought.  I had better admit that my own kin never thought much of the project.

Citizen science

"What do you think about the democratisation of research and science through citizen science projects?"

It's probably a good idea.  Here's an article my science-journalist wife wrote about developments in this area some four years ago.  Note that the article doesn't present citizen science as an irreversible triumph: in 2013, chemistry was less keen on the idea.  Despite the successful application of citizen-science practice in some chemical research, the question was "Should chemistry join the gang?" and reservations were quoted from some chemists.  I don't know if it's significant that Zooniverse's current project list includes no reference to chemistry.

Related to both chemistry and citizen science is another movement drawn to my attention by Clare: that of the expert patient.  A flagship for this movement is Patientslikeme.  New research mentioned on this site on the day I write, 1 January 2017, includes developments in clinical trials, new ways of indicating levels of pain, and improvements in patients' self-management and self-efficacy.

The question uses the word 'democratisation'.  Democratisation is the benefit that citizen science confers.  Research abc writes that "Hopefully more people appreciating this process will increase public confidence in scientific statements."  

That's better than an unquestioning acceptance of what experts say.  And better than a generic distrust of experts.

I suppose citizen science stands up better against sociopaths and demagogues than those other states.  But I bet a determined troublemaker could spoil even citizen science.

Saturday, 31 December 2016

Thing 15: Collaboration tools


Of the three tools described in the post, Evernote is the only one with which I was unfamiliar.  I have taken it for a spin, listing the roads one would use for a car journey from a Yorkshire place to a Berkshire place while adopting a selective approach to motorways. But I, like Librarian at Heart, know the value of "the very low-tech but aesthetically pleasing option of an actual paper and pen notebook".  I can whip out a notebook and pen quicker than I can do the login(s) necessary for reaching the Evernote app.  The notebook, being a stage removed from the online world, is a little bit more secure.  The "elephants graveyard of notes I can’t understand anymore" recognised by Research abc would, in a notebook, be likely to have at least the virtue of chronological order.

I hope my view does not sound too much like the claim, in a 1990s spoof, that undesirable results from a mythical Microsoft product were "a feature not a bug".

I will continue trying with Evernote and see if I get to like it any better.


I told of my enthusiasm for Doodle in the 2010 round of 23things.  I continue to use it today.  Doodle is not to be blamed for the user error of forgetting that anything requiring a Doodle poll probably needs more than a Doodle poll: a poll to set up a meeting will not necessarily ensure the meeting takes place, and is not a major action on the issue behind the meeting.

Google Drive

I use Google Drive extensively.  It is very helpful for the planning committee of an event, allowing details of venue &c to be circulated rapidly and acquire modifications and comments.  I'm not sure of the best answer to the data-protection questions that Google Drive can present.  Is there a place where private individuals can store things like a Christmas card list or address book online?

Tuesday, 27 December 2016

Thing 14: Sourcing and using good images

Write a blogpost about reusing images and what you have done in the past 

I found myself nodding in agreement with Researchabc and Thelibrarianerrant, both of whom owned to a measure of restraint in the use of images.  I excuse my own lack of pictures, when necessary, by reference to television jokes about the Lord Privy Seal.

I have used images in the Haddon Library's PowerPoints for presentations at induction time and in the Alumni Festival.  I'm not posting these, as they would make no sense without the accompanying spoken text, but I can say that I've paid due respect to copyright, and enjoyed searching Flickr's Creative Commons area for images to use.

Find a really good picture that is shareable and embed it in your blogpost with appropriate credit

New Bridge by Cycling Man  CC BY-NC-ND
The picture shows Christchurch Bridge in Reading.  I haven't yet photographed this bridge myself, or cycled over it, but I have kinsfolk in Reading, and probably will do those things.

Write about how you found using the tools to find images and crediting the image itself

My exploration of the sites recommended in the blog post was unsystematic, with searches at different times for images of fire, demolition, rivers, cathedrals, and bears.  The picture I eventually chose for this blog was from none of those searches: I returned to the familiar Flickr Creative Commons area and looked for that specific bridge, a kind of substitute for pedalling a bike over it.

I hope the licence is correctly made.  It's not my first use of a CC licence, but as this post is an exercise, I looked at Creative Commons' own site for i's to dot and t's to cross, and believe I have done so.

Exploring the blog post's other recommendations was fun. I expect to use Pixabay, Unsplash, Morguefile, and Photopin again, next time I'm looking for pictures in earnest, and I may recommend them to friends making posters and church magazines.  I'm afraid I still can't see how to search Travel Coffee Book and New Old Stock, and am less likely to revisit them therefore.  I failed also with HaikuDeck; but PowerPoint, Open Office Impress, SlideShare and Creative Commons will between them probably do me what HaikuDeck would have done.