Presentation slides (pptx)
The Open Citations Corpus is a database of approx. 6.3 million biomedical literature citations, harvested from the...
» More
The Open Citations Corpus is a database of approx. 6.3 million biomedical literature citations, harvested from the reference lists of all open access articles in PubMed Central. These contain references to approx. 3.4 million papers, which represent ~20% of all PubMed-listed papers published between 1950 and 2010, including all the most highly cited papers in every biomedical field. The Open Citations Corpus web site allows you to browse these bibliographic records and citations, to select an individual article, and to visualize its citation network in a variety of displays. Details of each selected reference, and the data and diagrams for its citation network, may be downloaded in a variety of formats, while the entire Open Citations Corpus can be downloaded in several formats including RDF and BibJSON. CiTO, FaBiO and other SPAR (Semantic Publishing and Referencing) Ontologies ontologies have been used to encode this information in RDF, after parsing the National Library of Medicine DTD XML obtained from PubMed Central, and after undertaking considerable work to remove the errors that exist in approximately 1% of the literature references. Further information about the Open Citation Corpus, the data processing, and the JISC Open Citations Project that supported this work, is given on the Open Citations Blog.