Embedding citation metadata in the ADS HTML
March 1st, 2010Here’s what I know: you can embed a set of <meta/> tags containing citation metadata in your HTML to help Google Scholar to index your content. We’ve been doing it at ADS for quite a while. I’m not certain if the impetus came directly from Google, or, more likely, we got the idea from a CrossTech blog post by Tony Hammond that describes the technique.
For example, if you execute curl -s http://adsabs.harvard.edu/abs/1977NuPhB.126..298A | grep meta you should see:
<meta name="citation_language" content="en" />
<meta name="citation_doi" content="10.1016/0550-3213(77)90384-4" />
<meta name="citation_abstract_html_url" content="http://adsabs.harvard.edu/abs/1977NuPhB.126..298A" />
<meta name="citation_title" content="Asymptotic freedom in parton language" />
<meta name="citation_authors" content="Altarelli, G.; Parisi, G." />
<meta name="citation_issn" content="0550-3213" />
<meta name="citation_date" content="08/1977" />
<meta name="citation_journal_title" content="Nuclear Physics B" />
<meta name="citation_volume" content="126" />
<meta name="citation_firstpage" content="298" />
<meta name="citation_lastpage" content="318" />
...
Since first implementation we’ve had some back-and-forth with Abhishek Jain at Google Scholar to ensure we’re making use of the full set of fields that Google Scholar looks for.*
Dan Chudnov, David Bucknum & Ed Summers at the LoC recently expressed interest in also embedding these tags. In the absence of official reference from the Google Scholar folks, I figured it would be a good thing to post here.
- citation_language
- citation_doi
- citation_abstract_html_url
- citation_title
- citation_authors
- citation_issn
- citation_date
- citation_journal_title
- citation_volume
- citation_firstpage
- citation_lastpage
- citation_publisher
- citation_issue
- citation_pdf_url
- citation_pmid
- citation_keywords (multiple instances OK)
- citation_conference
- citation_dissertation_name
- citation_dissertation_institution
- citation_patent_number
- citation_patent_country
- citation_technical_report_number
- citation_technical_report_institution
I had to cull this list via a visual scan of a long, forwarded e-mail thread. So, like I tried to insinuate above, it sure would be great if Google Scholar would publish an official reference to this schema somewhere.
* all instances of the term “we” should really be read as “my boss, Alberto”.



March 1st, 2010 at 3:44 pm
Any chance of a RDFa version?
March 1st, 2010 at 4:47 pm
Huh, I had no idea this was possible. This isn’t documented online by Google anywhere, you just have to get it from someone who knows, like you? Crazy, well thanks for sharing.
March 2nd, 2010 at 6:07 am
Jay, thanks so much for dropping this into your blog. The prefix of citation_* kinda makes me wonder if there are other ones they look for…
March 2nd, 2010 at 8:36 am
@Chris: would love to see one. Of course for that to happen google would need to formalize the vocabulary, give it a namespace, etc. (like I’m suggesting).
@Jonathan: your memory must be short because the comments in that crosstech post I linked to are j-rock all the way down