Embedding citation metadata in the ADS HTML
Here’s what I know: you can embed a set of tags containing citation metadata in your HTML to help Google Scholar to index your content. We’ve been doing it at ADS for quite a while. I’m not certain if the impetus came directly from Google, or, more likely, we got the idea from a CrossTech blog post by Tony Hammond that describes the technique.
For example, if you execute <span class="bash"> curl <span class="re5">-s</span> http:<span class="sy0">//</span>adsabs.harvard.edu<span class="sy0">/</span>abs<span class="sy0">/</span>1977NuPhB.126..298A <span class="sy0">|</span> <span class="kw2">grep</span> meta</span>
you should see:
<meta name="citation_language" content="en" />
<meta name="citation_doi" content="10.1016/0550-3213(77)90384-4" />
<meta name="citation_abstract_html_url" content="http://adsabs.harvard.edu/abs/1977NuPhB.126..298A" />
<meta name="citation_title" content="Asymptotic freedom in parton language" />
<meta name="citation_authors" content="Altarelli, G.; Parisi, G." />
<meta name="citation_issn" content="0550-3213" />
<meta name="citation_date" content="08/1977" />
<meta name="citation_journal_title" content="Nuclear Physics B" />
<meta name="citation_volume" content="126" />
<meta name="citation_firstpage" content="298" />
<meta name="citation_lastpage" content="318" />
...
Since first implementation we’ve had some back-and-forth with Abhishek Jain at Google Scholar to ensure we’re making use of the full set of fields that Google Scholar looks for.*
Dan Chudnov, David Bucknum & Ed Summers at the LoC recently expressed interest in also embedding these tags. In the absence of official reference from the Google Scholar folks, I figured it would be a good thing to post here.
- citation_language
- citation_doi
- citation_abstract_html_url
- citation_title
- citation_authors
- citation_issn
- citation_date
- citation_journal_title
- citation_volume
- citation_firstpage
- citation_lastpage
- citation_publisher
- citation_issue
- citation_pdf_url
- citation_pmid
- citation_keywords (multiple instances OK)
- citation_conference
- citation_dissertation_name
- citation_dissertation_institution
- citation_patent_number
- citation_patent_country
- citation_technical_report_number
- citation_technical_report_institution
I had to cull this list via a visual scan of a long, forwarded e-mail thread. So, like I tried to insinuate above, it sure would be great if Google Scholar would publish an official reference to this schema somewhere.
- all instances of the term “we” should really be read as “my boss, Alberto”.