Embedding citation metadata in the ADS HTML

Here’s what I know: you can embed a set of tags containing citation metadata in your HTML to help Google Scholar to index your content. We’ve been doing it at ADS for quite a while. I’m not certain if the impetus came directly from Google, or, more likely, we got the idea from a CrossTech blog post by Tony Hammond that describes the technique.

For example, if you execute <span class="bash"> curl <span class="re5">-s</span> http:<span class="sy0">//</span>adsabs.harvard.edu<span class="sy0">/</span>abs<span class="sy0">/</span>1977NuPhB.126..298A <span class="sy0">|</span> <span class="kw2">grep</span> meta</span> you should see:

...
<meta name="citation_language" content="en" />
<meta name="citation_doi" content="10.1016/0550-3213(77)90384-4" />
<meta name="citation_abstract_html_url" content="http://adsabs.harvard.edu/abs/1977NuPhB.126..298A" />
<meta name="citation_title" content="Asymptotic freedom in parton language" />
<meta name="citation_authors" content="Altarelli, G.; Parisi, G." />
<meta name="citation_issn" content="0550-3213" />
<meta name="citation_date" content="08/1977" />
<meta name="citation_journal_title" content="Nuclear Physics B" />
<meta name="citation_volume" content="126" />
<meta name="citation_firstpage" content="298" />
<meta name="citation_lastpage" content="318" />
...

Since first implementation we’ve had some back-and-forth with Abhishek Jain at Google Scholar to ensure we’re making use of the full set of fields that Google Scholar looks for.*

Dan Chudnov, David Bucknum & Ed Summers at the LoC recently expressed interest in also embedding these tags. In the absence of official reference from the Google Scholar folks, I figured it would be a good thing to post here.

  • citation_language
  • citation_doi
  • citation_abstract_html_url
  • citation_title
  • citation_authors
  • citation_issn
  • citation_date
  • citation_journal_title
  • citation_volume
  • citation_firstpage
  • citation_lastpage
  • citation_publisher
  • citation_issue
  • citation_pdf_url
  • citation_pmid
  • citation_keywords (multiple instances OK)
  • citation_conference
  • citation_dissertation_name
  • citation_dissertation_institution
  • citation_patent_number
  • citation_patent_country
  • citation_technical_report_number
  • citation_technical_report_institution

I had to cull this list via a visual scan of a long, forwarded e-mail thread. So, like I tried to insinuate above, it sure would be great if Google Scholar would publish an official reference to this schema somewhere.

  • all instances of the term “we” should really be read as “my boss, Alberto”.
Written on March 1, 2010