Data Citation Corpus

Make Data Count

About

Launched: 2024
Record Updated: Dec 09, 2025
Discovery system
Open scholarly dataset
The Data Citation Corpus provides a large open resource of citations to data, aggregating data-article links identified through diverse methodologies, including metadata workflows, curation, and machine learning. Made available as an open CC0 community resource, the Corpus provides insights into the use and reach of data at a scale not possible before.

Mission

Make Data Count is a community initiative that works to build the tools and practices necessary so that the community can meaningfully assess how data are used, and enable recognition of data as primary outputs.

Key Achievements

The first release of the Data Citation Corpus was shared in January 2024, incorporating data citations registered via metadata in DataCite Event Data, and data-article links contributed by Chan Zuckerberg Initiative which were identified via text mining of five million articles to identify mentions to data in the article text.
The July 2025 release of the Data Citation Corpus incorporates additional sources (Aligning Science Across Parkinson’s (ASAP) and Europe PMC) and aggregates 10 million data citations.

Technical Attributes

Maintenance Status

Actively Maintained

Open Code Repository

Implemented

Technical Documentation

Implemented

Code License

Implemented

Open Data Statement

Implemented

Technical Attribute Statements

Programming Languages

  • python

Technology Readiness Level

  • Technology validated in relevant environment

Code Licenses Used

  • MIT License

Content Licensing

  • Creative commons licenses

Standards

Metadata

  • JSON

Persistent Identifier

  • Research Organization Registry
Other:
DOI
Accession numbers

Metrics

  • Make Data Count

Integrations

  • DataCite

Community Engagement

Community Engagement

Implemented

Community Statements

Community Engagement Activities

  • Blogs
  • Conference participation
  • Interest, working, user, or advisory groups
  • Mailing lists and discussion forums (including Slack)
  • Social media
  • Webinars and training

Engagement with Values Frameworks

  • Principles for Open Scholarly Infrastructure (POSI)

SCOSS Participation

Yes

More About Community Engagement

User Contribution Pathways:

Groups can contribute citations to the Corpus

Policies & Governance

Governance Summary

Make Data Count is managed by the Make Data Count Director, a Sustainability Committee, and an Advisory Group. DataCite serves as a the fiscal home of Make Data Count.

Policies

Privacy Policy

Implemented

Governance Structure & Processes

Implemented

Policy Statements

Board Structure

  • None

Community Governance

  • Ad hoc

Additional Information

Organizational History

Make Data Count started in 2015 through a collaboration between the California Digital Library, the Public Library of Science, and the Data Observation Network for Earth that led to a survey into preferences for metrics of the impact of data. The results from this research informed Make Data Count's next projects on workflows to capture data citations via repositories and journals, and the standardization of data downloads and views, through partnerships with DataCite and COUNTER. Over the years, Make Data Count has has worked with repositories to develop and implement recommended practices, and with bibliometricians to better understand data-usage practices among researchers. Make Data Count has partnered with DataCite to develop and deploy infrastructure to capture, store and share data-usage measures. The Data Citation Corpus aims to substantially scale the data usage information available to the community by providing data-citation insights at an unprecedented scale.

Organizational Structure

Business or Ownership Model

Non-profit organization

Volunteers

11-20

Current Affiliations

DataCite leads the development of the Data Citation Corpus as part of Make Data Count activities.

Funding

Primary Funding Source

  • Contributions