Semweb Gang talks about Glue

December 18th, 2008

Interesting conversation this month. (This is the stuff I listen to on my commute.) I was particularly intrigued by the 10 or so minutes spent discussing the need for a method of embedding identifiers and the location of a web service into HTML. Send the identifier to the service and get back the metadata. This is the exact use case of unAPI.

I was all set to get to work and give them a brrring! on the cluephone, a.k.a., comment on the post. But before I got around to it Ed Summers pointed out on irc that you can achieve the same thing using just <link> elements and/or HTTP Link: headers. In other words, why separate the identifiers from the service URI.

I like how simple unAPI is to implement. Since your metadata service’s base url doesn’t change you don’t have to worry about coordinating attributes of elements that need to appear in both your <head> and page <body>. This is a non-issue for lots of folks, but I bet it’s not so simple if your using WordPress or Drupal for your CMS.

As for the Glue extension thingie, I’ll try it out before passing judgement. But it did strike me funny that they’re not using RDF for anything. Also, and maybe I’m imagining it, but in the 10-minute wrapup at the end of the podcast I think Tom Heath basically takes a some veiled jabs at the Glue guys for being SemWeb poseurs.

Url2Cite

December 11th, 2008

Last Tuesday I got the chance to sit around for a day with a bunch of talented library & academic technology folks as part of the 1st ever gather of NEcode4lib, the New England “chapter” of the code4lib community. We met at the Boston Public Library and took turns giving short presentations on a variety of topics. I didn’t take greatany notes, but thankfully others did.

The thing I presented on is only about halfway (if that) between an interesting hack and something actually useful. It’s an attempt to create an article metadata scraping service using the CiteULike Plugins. I got the idea a while back from reading this blog post. The basic idea is you take the CiteULike plugins, which are a set of HTML scrapers written in a variety of languages, wrap them in a web service that accepts a URL and a format, and then provide a bookmarklet. A “user” viewing an article at a publisher’s site can then click the bookmarklet and get the article metadata in a variety of formats.

Links:

  • slides from the talk are here
  • As of now the service is available here
  • the bookmarklet is here

Some directions I’m interested in taking this:

  • Create a proper python library wrapper for the plugins
  • Implement the unAPI 300 response providing links to the resource in the available formats. Otherwise the bookmarklet will be restricted to a default format. Or you’d need a separate bookmarklet for each format
  • Add COinS as an output format, i.e. TinyOpenUrl
  • Try the same concept but using SpiderMonkey & the Zotero Translators

Update: January 15, 2009 @ 11:08

I’m disabling the demo service linked to above for the time being until I have an opportunity to improve it and make it actually useful/functional.