Monday 13 February 2017

Thing 18: Research data management (RDM)

This is where I hit the '23 research things' hard stuff.  Research data management is an area where I have clearly more to learn than things like Twitter.  I draw comfort from other participants' blogs, in which similar confessions are made.

I'd better start by copying out those four types of data that Georgina names in the post under that link above:

- Observational – so data captured in real-time that is usually irreplaceable and can include anything from survey data to images of someone’s brain. 

- Experimental – this can be data from lab work which can be reproducible such as gene sequences. 

- Simulation – this can be data generated from test models where the models themselves are sometimes more important than the results, such as climate or economic models. 

- Derived or compiled data – this is data that is reproducible and can include 3D models, text and data mining, and compiled databases.

A waterfall of consciousness notes that data management begins with "personal diaries, work e-mails, holiday snapshots and, even, home videos".  Emboldened by that, let me tell how I have managed data at such a level.

Personal diaries.  Mine go back, in an unbroken series, to 1 January 1969.  I keep them all together, in a reasonably consistent compromise between size and chronological order, and I usually have no difficulty finding a particular year.  I say "usually" because my latest search of them failed.  The unfound years are presumably buried on my desk somewhere, and maybe it's time for a spring-clean.

Work emails.  For these I have many folders.  When I answer an email, I incorporate the incoming email into my reply, save that thread, and delete the incoming email.  This works, though not as well as it used to, by reason of the sheer volume of email to deal with.

Word documents and spreadsheets.  In the days when my Word documents were mostly letters, naming them was an easy matter of applying the date, written yymmdd, plus a sequential number for letters written on that date, and the file type extension ("17021201.doc").  An adaptation of that applies to things that fit into a regular sequence ("170212minutes.doc").  But I'm a bit indecisive with other files, leaving the form of the date and the relative position of date and content liable to variation ("170212members.xls", "Events Feb 2017.doc").  I need to get a grip.

Money statements.  Domestic money statements I've got more or less under control.  They're still on paper, for the most part, and I shred them after set intervals (three months for those relating to food and cash withdrawals, two years for those relating to utilities, six years for all others).  Bank statements and similar I keep in separate files for each account; bills &c from other organisations I keep in an A/Z sequence by organisation, and within each organisation in reverse chronological order.  All this is a consequence of reading Taming the paper tiger at home by Barbara Hemphill, which I'll have read ca 2002.

Poetic output.  This I keep track of using a card-index system I devised in 1994.  The card fronts show poem title, number of lines, and year of composition; the backs show where I've submitted the poem and when, and the outcome of the submission.  While a poem's being considered by an editor or competition adjudicator, I flag the card with a yellow sticker. If an unpublished poem is between submissions, I flag the card with a blue sticker.  If I'm lucky enough to have the poem published, or placed in a competition, I mark the card front with a diagonal red line and the place of this success.  And all this information is necessary.  Poetry competitions often have rules about number of lines, and about the ineligibility of poems that have been already published.

So all the above attempts at data management are creaking, and the paper-based ones will have to be replaced with electronic equivalents sooner or later.  The poetry card index might be worth replacing with a database -- something I've had some training in, but never actually made.  An alternative might be to mark the information among the properties of the file, but I can see two disadvantages to that: the risk of information loss between file versions, and the amount of digging that would need to be done in order to get at the information.

Further research needed.  What a thing for Love Your Data Week!

No comments:

Post a Comment