Highlights from the US Data Citation Workshop: Developing Policy and Practices

blog arrowPosted on: Sep 1, 2016

Posted by: Chantel Ridsdale, RDC Intern

The US Data Citation Workshop, Developing Policy and Practices, took place on Tuesday, July 12th, 2016, presented by The U.S National Committee (USNC), the Committee on Data for Science and Technology (CODATA) and CODATA/ICSTI Task Group on Data Citation.

The conference was geared towards data professionals with interests in increasing awareness and encouraging policy and practices concerning data citations. The day’s speakers included publishers, editors, data managers, federal agencies, and authors, culminating in some great discussions from varying perspectives. The live stream of the conference can be found here.

This post will cover the key issues discussed at the conference.

Why does this matter?

Impact. For the researcher and the wider community to judge whether a dataset is benefiting society, there must be a citation to determine how the data is being reused. However, first we must understand the answers to a few questions:

  • Which data are being reused?
  • Where are data being used?
  • And why are they used?

If we cannot answer these questions, current practices and policies are not measurable, which impedes impact. A representative from the Dryad repository posits that this is a huge challenge, and argues that it will be 2030, at the earliest, before practices become common place.

Data has become a hot topic because it’s changing the way we look at everything! Every institution, organization, and country has its own data management mandates, goals, and policies. This results in a huge challenge in compliance, standardization and the implementation of best practices.

The data citation community

Stakeholders in data citation are various and varied, making the issues much more complex than anyone might have thought. CODATA is an interdisciplinary scientific committee of the International Council for Science, which works to improve data to all fields of science and technology with respect to:

  • Quality,
  • reliability,
  • management, and
  • accessibility

CODATA promotes data policy, data science, and data capacity building.

Planning for data management

Data Management (DM) was discussed throughout the day, with concerns raised about the requirement of DM Plans by funding agencies. Of particular concern was the lack of accountability and monitoring for DM practices. Suggestions were made that funding agencies requiring DM Plans should publish them. This would provide the research community with clear expectations of research outputs, and funding agencies would have the ability to follow up should these expectations not be met.

The leading publication Nature is currently updating a mandatory “Data Availability Statement” which requires their authors to make materials, data, code, and associated protocols promptly available to readers without undue qualifications. Nature’s approach is one of the exceptions to the rule, as most publications are just “encouraging” researchers to make their data available.

Who’s who?

There are so many “unique” identifiers today (including researcher, document, institution, dataset), that it has become an issue in accessibility and citation. What is the best way to add a unique identifier? ORCID was mentioned by several speakers and audience members, and seems to be the approach most agreed upon: identify the author/researcher of the data which subsequently is linked to the data and research outputs published by that person.

There was, however, a great deal of discussion about Data ID, as opposed to author/researcher ID.  Nothing but the use of Digital Object Identifiers (DOIs) seemed like a realistic approach for the majority of participants. Versioning, however, was discussed as a challenge here. An interesting point was raised concerning data being more appropriately thought of as a stream, instead of as stand-alone objects.

Best practices in data sharing, reuse, and citation

There has been progress in open data and open access since the U.S. Office of Science and Technology Policy (OSTP) February 2013 memo, and the Canadian Tri-Agency Statement of Principles on Digital Data Management June 2016, regarding guidelines for federal agencies, which has stimulated more DM conversations in the community. However, social and cultural change are key challenges to the data and research community, due to long-standing practices and behaviours ingrained over a millennia.

General data citation best practices discussed throughout the day include:

  • Provide the users with a citation generation tool and/or roadmap;
  • Secure buy-in from funding agencies to streamline the push;
  • Time-stamp stored data to help with versioning;
  • Include an online community directory to help connect author with user;
  • Provide a recommended citation for the dataset;
  • Establish best practices, and secure buy-in from all key stakeholders; and
  • Provide the users with a data citation index.

Rewards and incentives

Rewards and incentives were deemed to be the surest way to receive buy-in from the research community and drive behavioural change. However, this effort is going to be a marathon, not a sprint. The discussion began around the under-representation of non-academic researchers, and primarily focused on the challenges surrounding policy, such as:

  • different disciplinary definitions of success and productivity;
  • gender inequalities;
  • ability vs. resource availability;
  • different supports for mentoring and collaboration; and
  • the roles and reproducibility of those who contribute data.

The issue of institutional culture change was discussed at length, and resulted in the call for another term for non-tenure track professors, whose population has been increasing.  The need to change the requirements for tenure to reflect openness, sharing, and collaboration outside of traditional silos was also discussed.

Looking at the citation challenge from silos is not going to result in any solutions. If we want to see change, and ultimately succeed in improving data citation, we must collaboratively come together as a community.

Get involved

If you would like to become involved, the facilitators encouraged participants to join them at International Data Week in Denver, CO September 11-17, 2016. If you would like to become informally involved, there are some interesting conversations occurring via Twitter surrounding these issues, and anyone can join in using @dataparasite and #Iamaresearchparasite.