Last Tuesday I got the chance to sit around for a day with a bunch of talented library & academic technology folks as part of the 1st ever gather of NEcode4lib, the New England “chapter” of the code4lib community. We met at the Boston Public Library and took turns giving short presentations on a variety of topics. I didn’t take greatany notes, but thankfully others did.

The thing I presented on is only about halfway (if that) between an interesting hack and something actually useful. It’s an attempt to create an article metadata scraping service using the CiteULike Plugins. I got the idea a while back from reading this blog post. The basic idea is you take the CiteULike plugins, which are a set of HTML scrapers written in a variety of languages, wrap them in a web service that accepts a URL and a format, and then provide a bookmarklet. A “user” viewing an article at a publisher’s site can then click the bookmarklet and get the article metadata in a variety of formats.


  • slides from the talk are here
  • As of now the service is available here
  • the bookmarklet is here

Some directions I’m interested in taking this:

  • Create a proper python library wrapper for the plugins
  • Implement the unAPI 300 response providing links to the resource in the available formats. Otherwise the bookmarklet will be restricted to a default format. Or you’d need a separate bookmarklet for each format
  • Add COinS as an output format, i.e. TinyOpenUrl
  • Try the same concept but using SpiderMonkey & the Zotero Translators

Update: January 15, 2009 @ 11:08

I’m disabling the demo service linked to above for the time being until I have an opportunity to improve it and make it actually useful/functional.

Written on December 11, 2008