PIPaL: JISC BCE at Loughborough

Archive for January 2010

Linked Data Explored

leave a comment »

You may have heard recently about data.gov.uk, the UK Government’s new clearinghouse for public sector data.  The concept underlying this is that people will find all kinds of useful things to do with data that had been collected for a specific purpose and would otherwise be mouldering away.  Examples of this include mouseprice.com, which correlates Land Registry statistics for a postcode with Google Street View and Bing Maps; PlanningAlerts.com, which searches Local Authority data for planning applications near you; and FillThatHole, which lets you report potholes and automatically works out who is responsible for the section of road in question.  This is part of a larger initiative across the world to free government data which The Guardian has been cataloguing in its World Government Data site.

The graph below, by Christian Bizer, illustrates some of the links and relationships between well known sets of data – note that this was put together before the release of data.gov.uk.  You can read more about this work at the WWW2009 Linked Data on the Web workshop site.

Christian Bizer's Linked Data visualisation

In the context of our work on Relationship Management for JISC, it’s particularly interesting that there is a mixture of data collected by very different bodies and for different purposes shown here, and also to consider how it can be made possible to link from one set of data to another.  When a piece of data is tagged with a particular identifier we also need for that identifier to come from a common shared vocabulary or at least to have an agreed set of mappings from the vocabulary used in one database to that used in another.  Examples that have some relevance in our context are HESA JACS codes and Companies House Company Numbers.  When used in combination with structured data in a format such as XML, many doors are opened.  For example JACS course codes and XCRI course advertising in combination potentially allow a prospective student to “shop around” to see which institutions offer a particular course.  Further linked data sets could make this an extremely powerful tool by linking directly into the institutions’ course materials on their Virtual Learning Environments, iTunesU, YouTube, etc.

At Loughborough we have been looking into the feasibility of taking the Linked Data approach for tracking some of our research and business partnerships, and found quite a few pitfalls.  Higher Education institutions tend not to go bust and only infrequently merge or change their name.  In the commercial world this is a matter of routine, and further complicated by the likes of holding companies and wholly owned subsidiaries.  Even settling on a name can be a fraught process.  Is Loughborough’s Systems Engineering partner BAe Systems also referred to as BAe and British Aerospace?  You bet!  Additionally, in some key areas we will need to collect further information to link existing datasets.  For example, our publications database is not presently aware of co-authors’ institutional affiliations.  Similarly, whilst research proposals have to be processed on a standard University form before they can be submitted to the funding councils, information about project partners is not presently collected in a structured way on these forms.  At present we adopt a human assisted approach to populating our Institutional Repository, and in our interviews for the PIPaL project a commonly expressed view was that something similar would be desirable to ensure consistency when exploring the possibilities opened up by a Linked Data approach.

Written by Martin Hamilton

January 24, 2010 at 11:45 pm

Posted in Uncategorized

%d bloggers like this: