Henk Ellermann

Head of the department Digital Library Facilities of the University Library of the University of Groningen.

I never promised a regular update on my blog, but the last post has been from december 2009. For over a year I’ve written nothing here. For a variety of reasons, one of them was that I became bored with all the new developments that really weren’t new. Really, really bored. People ran from one web 2.0 tool to another and nothing has changed, really. The semantic web has remained a laboratory exercise mainly. Social media became a fad and again it has not led to change, not really.

But my boredom has left me. Not that anyhting has happened, really, but because change has become obligatory. A sense of urgency replaced boredom.

Libraries as we know them now are on the verge of extinction. There is no reason, really, to have a library on campus. We may need a building that we can call a study hall, but that’s it. Information should be available online and is already online for a large part. In fact, there is no real reason to have zillions of library websites, for zillions minus 2 universities. Information can be accessed from any place, so there is no reason to have many library places, not online, not offline. But it is not just access to information that can be provided. One can also provide guidance. Guidance to internalize the material presented (learning and teaching) and guidance to using the information, by researchers for instance.  And we can, in principle, point at other relevant sources of information. So, teaching, research, support for teaching, support for research can be offered via the internet.

There are a few barriers to overcome. One is that not all information is freely available. Without some form of open access we can’t use the full potential of the internet. There is nothing to guide you through if you can’t access the basic materials. Two is that there is as yet no clear vision on how to combine tools for information presentation with communication tools. It is not enough to use twitter, e-mail and what have you; it should be firmly rooted in the content. Meaning that if I am stuyding, say, Nietzsches views on time, I need to be able to communicate with those who are undertaking similar studies or, preferably, to those who have done this. We need a map that helps us navigate through content, experts, and people with similar interests. Three is that universities, and not just their libraries, have no vision on how to restructure their teaching and research so that it becomes accessible for all.

If we present information, we should label it such that anyone with an interest in it can find it. We should make it possible too to establish contact between people with similar interests and perhaps provide the means to start a communication. We should not remain within the walls of a university, but incude many other institutions or information resources, like public libraries, museums, private research institutions, and more.

We need a new system for dealing with information, a system that comprises social  and technical elements. In my department we are now trying to work out such a system. We are using the expertise available in a small company called communitysense.  We have to see where it leads us to. We’ll start small, we’ll use anything we can, even web 2.0, but we  will not call ourselves a library anymore.

While converting the metadata in our repositories to “RDF”, we (my student Florian Kunneman and I) wanted to express a few simple relations between authors. Initially we thought we could use foaf:knows for people who co-authored a document. But that’s plain silly, of course. We can use foaf:knows, but would need to subclass it.  We would need  something like the extension made by Eric Vitiello’s extension to FOAF to express family relations (you know: friendOf, parentOf, siblingOf, etc).

We would like to extract relations of the following sort:

Author1 cites Author 2
Author1 mentions Author2
Author1 hasCoAuthor Author2
etc..

A problem might be the fact that such relations have to be derived from publications. Take the first example. It would in fact be a derivation from other relations:

Author1 isAuthorOf Document1
Document1 hasCitation Document2
Author2 isAuthorOf Document2
it then could follow that
Author1 cites Author 2

I wonder how I can find work relevant to this question. Perhaps we first need to define relations that can exist between persons and documents, and from that deduce relations between persons (the same goes for documents of course). What are the relations between authors that could interest us?  It may sound silly, perhaps it is, but I don’t know how to proceed (well we could make our own AOAA -Author Of A Author- vocab, but that again sounds silly).

I played around with Google Wave a bit. I didn’t get it first, but after a while I saw its charms, well, potential charms. We finally seem to have a non clumsy tool to collaborate and communicate, the latter perhaps more than the former.

It may help to streamline thoughts on how to proceed if we want our (digital) libraries to become part of the semantic web. I started a wave therefore, and posted the following message.

——

Working from the assumption that the library should become part of the semantic web, what should we do? Is the following a reasonable set of things to do?

1) Identify what we have, and give it a URI (meaning that authors, documents (objects), keywords, and concepts should get a URI

2) deposit the URI’s in a registry, with a minimal set of triples (relate URI of documents to URI of authors, etc…): what is the minimal set?

1 and 2 are data level requirements, the following are service level

3) use OAI-ORE to define compound objects

4) define owl:sameAs relations between URI’s when needed (what is the workflow here?)

5) Which vocabularies/ontologies can/should be used for which services. Do we need a service catalogue first?

what else?

——

Now, if you are interested in active participation, let me know :)

Last sunday Freebase reached a milestone: freely available data for over 10 million topics are now available. It has reached the milestone by reusing data from a load of other websites and initiatives.  The following quote highlights this:

In October, we rounded out our TV domain by synchronizing with the excellent user-curated TV fan site TVRage.com.  Combined with earlier data loads from thetvdb.com, we now have comprehensive coverage of nearly every TV show and episode created in the United States.  It includes cast and credits, as well as links to key TV websites like tvguide.com and Hulu — nearly a million topics in all!

But the load that took us over the 10 million mark was the final load of editions from Open Library.  Compromising 650,000 authors, almost 2 million books and 2.1 million book editions,   this load pushed new boundaries in our data acquisition, curation, reconciliation and QA processes.

The semantic web company has opened a demo zone  that

compiles a suite of the best software tools, services and information sources for every aspect of the Semantic Web. Finding, creating, linking and publishing information – the flexibility and richness of the Semantic Web is only a few mouse clicks away.

© 2011 -=( In Between )=- Suffusion theme by Sayontan Sinha
2 visitors online now
2 guests, 0 members
Max visitors today: 2 at 07:31 am CET
This month: 5 at 02-07-2012 09:35 am CET
This year: 7 at 01-05-2012 04:37 am CET
All time: 28 at 12-14-2009 01:28 am CET