Last sunday Freebase reached a milestone: freely available data for over 10 million topics are now available. It has reached the milestone by reusing data from a load of other websites and initiatives.  The following quote highlights this:

In October, we rounded out our TV domain by synchronizing with the excellent user-curated TV fan site TVRage.com.  Combined with earlier data loads from thetvdb.com, we now have comprehensive coverage of nearly every TV show and episode created in the United States.  It includes cast and credits, as well as links to key TV websites like tvguide.com and Hulu — nearly a million topics in all!

But the load that took us over the 10 million mark was the final load of editions from Open Library.  Compromising 650,000 authors, almost 2 million books and 2.1 million book editions,   this load pushed new boundaries in our data acquisition, curation, reconciliation and QA processes.

The semantic web company has opened a demo zone  that

compiles a suite of the best software tools, services and information sources for every aspect of the Semantic Web. Finding, creating, linking and publishing information – the flexibility and richness of the Semantic Web is only a few mouse clicks away.

We too often start with tools and wonder what we can do with them. A large part of the Library 2.0 movement is like that.  There are sites, like 23dingen that seem to promote that attitude. Learn what the internet has to offer and then use it, professionally if possible. The workflow seems to be as follows: become aware of what the internet has to offer, get used to it, and apply it. Do all that, then take the attitude of an evangelist, and your are a modern 2.0 librarian.

This is tinkering.

The idea that our work should be demand-driven leads to tinkering too. We define services in collaboration with researchers and librarians and then realize that service. Serving our customers is of course our main goal, in the end, but librarians should not jump from "demand" to "demand". What is lost is a reflection on that what may connect the services thus developed, what is lost too is a critical attitude to the foundations of a library.  For example, customers tend to take many things for granted (like: libraries shuffle documents, whether online or offline, libraries offer search tools that answer questions by presenting a list of documents). A rethinking of the foundations of libraries and the resources it works with will rarely be triggered by obeying customer demands.

 Tinkering works from existing "infrastructures": takes them for granted. Not tinkering, but thinking might be instrumental in changing that infrastructure in order to deliver future services with a maximum of ease. It is not that no one thinks about such an infrastructure. OAIS reference architecture, SOA, 5S and similar undertakings show that infrastructural issues are addressed in the literature.  Also, the Linked Data Initiative has come with clear advice on how to represent metadata and how metadata can be re-used. Registries of identifiers are seen as essential in this context. The issues around Open Data and rights of (re-)use have received considerable attention too, and are an integral part of a solid infrastructure. That work could lead to the definition of an overall architecture and to the development of a flexible infrastructure on top which new services can be developed.

It would be advisable, I think, to retract from the demand oriented strategy and start working on the specification of a good and flexible infrastructure using one of the existing methodologies (a few were mentioned). I think it is an essential step that would make future developments less costly and increase the likelihood of developments to become stable and sustainable services. And we surely should not waste our time with 23 things, there is no inherent evil in that work, but it distracts us from the core issue: build an adequate infrastructure for the digital library.

We need to think more and tinker less.

© 2011 -=( In Between )=- Suffusion theme by Sayontan Sinha
2 visitors online now
2 guests, 0 members
Max visitors today: 2 at 01:11 am CEST
This month: 3 at 05-06-2012 04:59 am CEST
This year: 8 at 04-23-2012 07:28 am CEST
All time: 28 at 12-14-2009 01:28 am CET