Original RDC Glossary

A major difficulty in any newly emerging discipline is the lack of a precise and definitive taxonomy of terms. Different communities use the same terms in different ways which can make effective communication problematic. RDC is primarily, but not exclusively focused on data that are digital.

There are occasions when analogue forms of data are also important to research. The following working definitions are those employed by RDC and are intended be used as a practical tool. These definitions may not necessarily achieve widespread consensus among the wide ranging communities that use and produce research data. They are offered here as a mechanism to avoid potential ambiguities in the body of RDC documents rather than as a definitive gloss. This should be considered to be a living document that will be updated and amended as required. [ Adapted from Digital Preservation Coalition ]

Terms and Definitions

TERM DEFINITION REFERENCES
Access Access is assumed to mean continued, ongoing usability of a digital resource, retaining all qualities of authenticity, accuracy and functionality deemed to be essential for the purposes the digital material was created and/or acquired for. Digital preservation coalition
Accession number Accession numbers used by the National Center for Biotechnology Information (NCBI) are unique and citable. MIT data management and publishing
Algorithm A computable set of steps to achieve a desired result. NIST Dictionary of Algorithms and Data Structures
Analogue materials Non-digital materials that have a physical presence (e.g., written and printed material).
Anonymization The act of permanently and completely removing personal identifiers from data, such as converting personally identifiable information into aggregated data. Anonymized data is data that can no longer be associated with an individual in any manner. Once this data is stripped of personally identifying elements, those elements can never be re-associated with the data or the underlying individual. Anonymized data are suitable when no contact is needed with the participant or where the data do not need to be linked to any other data sources. Internet 2/Educause; Open Data 101 (GoC)
Archives A place or collection containing static records, documents, or other materials for long-term preservation. ACTI-DM Working Group/Educause
Archiving A curation activity that ensures that data is properly selected, stored, and can be accessed and that its [sic] logical and physical integrity is maintained over time, including security and authenticity. JISC/TC3+
Authentication The process of confirming the identity of a principal. Since computer identification cannot be absolute (e.g., passwords can be stolen), authentication relies on a related concept of level of trust, in which an institution relies on good identity management practice (so that the institution believes they have correctly identified an individual) and secures mechanisms for sharing identity. This is sometimes referred to as AuthN (authentication), in contrast to AuthZ (authorization). Internet 2/Educause
Authentication A mechanism which attempts to establish the authenticity of digital materials at a particular point in time. For example, digital signatures. Digital preservation coalition
Authorization The process of deciding if a subject (person, program, device, group, role, etc.) is allowed to have access to or take an action against a resource. Authorization relies on a trusted identity (authentication) and the ability to test the privileges held by the subject against the policies or rules governing that resource to determine if an action is permitted for a subject. Internet 2/Educause
Checksums To test if a file has changed over time
Cloud Computing A large-scale distributed computing paradigm that is driven by economies of scale, in which a pool of abstracted, virtualized, dynamically- scalable, managed computing power, storage, platforms and services are delivered on demand to external customers over the Internet. Key elements:• it is a specialized distributed computing paradigm; • it is massively scalable; • it can be encapsulated as an abstract entity that delivers different levels of services to customers outside the Cloud; • it is driven by economies of scale; and, • the services can be dynamically configured (via virtualization or other approaches) and delivered on demand. GRDI 2020/TC3+
Conservation See, “Preservation.”
Controlled vocabulary A list of standardized terminology, words, or phrases, used for indexing or content analysis and information retrieval, usually in a defined information domain DAMA Dictionary of Data Management; TBS Standard On Metadata
Curation The activity of managing and promoting the use of data from its point of creation to ensure it [sic] is fit for contemporary purpose and available for discovery and reuse. For dynamic datasets this may mean continuous enrichment or updating to keep it [sic] for purpose. Higher levels of curation will also involve links with annotation and with other published materials. JISC e-Science Curation Report/TC3+
Cyber-infrastructure Those layers that sit between base technology (a computer science concern) and discipline-specific science. The focus is on value-added systems and services that can be widely shared across scientific domains, both supporting and enabling large increases in multi- and interdisciplinary science while reducing duplication of effort and resources—e.g., including hardware, software, personnel, services and organizations. The Atkins Report/TC3+
Data Facts, measurements, recordings, records, or observations about the world collected by scientists and others, with a minimum of contextual interpretation. Data may be any format or medium taking the form of writings, notes, numbers, symbols, text, images, films, video, sound recordings, pictorial reproductions, drawings, designs or other graphical representations, procedural manuals, forms, diagrams, work flow charts, equipment descriptions, data files, data processing algorithms, or statistical records. Landry et al. (1970); Carol Tenopir (2007); Michael Buckland (2007)
See Zin et al. (2007) for an analysis of 130 definitions of data, information and knowledge provided by an expert panel of 45 leading scholars in information science, and the development of 5 models for defining data, information, and knowledge.
Data Access Protocol A system that allows outsiders to be granted access to databases without overloading either system Open Data 101 (GoC)
Data accountability domain A specification of a grouping/category of EC datasets for the purpose of defining the scope of accountability. Data accountability domains are specified using a set of scope variables (e.g. Information Category, Geographic Scope, Program Scope) and taxonomies (e.g. water quality, air quality, climate and weather). Environment Canada data stewardship handbook (draft).
Data analysis [A] process of inspecting, cleaning, transforming, and modeling data with the goal of highlighting useful information suggesting conclusions, and supporting decision making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains. Wikipedia/Educause
Data centre A facility providing IT services, such as servers, massive storage, and network connectivity. See Digital Infrastructure for related concepts. RDC
Data citation Data citation offers proper recognition to authors as well as permanent identification through the use of global, persistent identifiers in place of URLs, which can change frequently. Use of universal numerical fingerprints (UNFs) guarantees to the scholarly community that future researchers will be able to verify that data retrieved is identical to that used in a publication decades earlier, even if it has changed storage media, operating systems, hardware, and statistical program format. thedata.org/Educause
Data cleaning Data cleaning is a continuous process that requires corrective actions throughout the data lifecycle. Data cleaning is the process of detecting and correcting corrupt or inaccurate records from a dataset. Data cleaning involves identifying, replacing, modifying, or deleting incomplete, incorrect, inaccurate, inconsistent, and irrelevant data.
Data cleansing See, “data cleaning.”
Data completeness Data completeness is the degree to which all required measures are known. Values may be designated as “missing” in order not to have empty cells, or missing values may be replaced with default or interpolated values. In the case of default or interpolated values, these must be flagged as such to distinguish them from actual measurements or observations. Missing, default, or interpolated values do not imply that the dataset has been made complete.
Data compliance Data compliance consists of the ongoing processes to ensure adherence of data to both enterprise business rules (government department, university, industry, or agency), and to legal, regulatory and accreditation requirements. Data compliance includes five areas: controls, audit, legal compliance, regulatory compliance, and accreditation compliance.
Data custodian An IT individual or organization responsible for the IT infrastructure providing and protecting data in conformance with the policies and practices prescribed by data governance. Sometimes referred to as a technical data steward. DAMA Dictionary of Data Management
Data fusion See “Data integration.”
Data governance The exercise of authority, control and shared decision making (planning, monitoring and enforcement) over the management of data assets. DAMA Dictionary of Data Management
Data integration Combining diverse datasets from disparate sources into one unified dataset or database. Data needs to be accessed and extracted, moved, validated and cleansed, standardized, transformed and loaded. Note that in scientific and geospatial applications, “data fusion” and “data integration” or synonymous. However, in business applications, “data fusion” is a data reduction technique. DAMA Dictionary of Data Management; Other
Data Management The activities of data policies, data planning, data element standardization, information management control, data synchronization, data sharing, and database development, including practices and projects that acquire, control, protect, deliver and enhance the value of data and information. Mapping the Data Landscape 2011 Summit; TBS Information Management Glossary (BC Government Information Resource Management); DAMA Dictionary of Data Management
Data Management Plan DMP – A formal statement describing how research data will be managed and documented throughout a research project and the terms regarding the subsequent deposit of the data with a data repository for long-term management and preservation. RDC
Data migration The process of transferring data between storage types, formats, information technologies, or computer systems. RDC adapted from Wikipedia/Educause
Data mining The process of analyzing multivariate datasets using pattern recognition or other knowledge discovery techniques to identify potentially unknown and potentially meaningful data content, relationships, classification, or trends. UNESCO Open Access Policy Guidelines; TBS Information Management Glossary (BC Information Resource Management); DAMA Dictionary of Data Management.
Data policy A set of high-level principles that establish a guiding framework for data management. A data policy can be used to address strategic aspects such as data access, relevant legal matters, data stewardship issues and custodial duties, data acquisition and other issues. Mapping the Data Landscape 2011 Summit
Data production Includes all activities involved in the planning, collecting, processing, analysis and maintenance of data in the original research project. Among these activities are selecting a study design, constructing instruments for data collection, conducting data collection/creation, performing data editing/verification/validation, analyzing data, backing up data versions and preparing and tagging metadata. Stewardship of Research Data in Canada: A Gap Analysis
Data repository An archival service providing the long-term care for digital objects with research value.   The standard for such repositories is the Open Archival Information System reference model (ISO 14721:2003). See Repository and Trusted Digital Repository for related concepts. Mapping the Data Landscape 2011 Summit
Data scrubbing See, “data cleaning.”
Data stewardship An organizational plan of the roles and responsibilities of those overseeing the management of data across all stages of the data lifecycle, including its preservation. A large research project may involve several data stewards as the data moves from stage to stage across the lifecycle.
Data structure An organization of information, usually in memory, for better algorithm efficiency, such as queue, stack, linked list, heap, dictionary, and tree, or conceptual unity, such as the name and address of a person. It may include redundant information, such as length of the list or number of nodes in a sub-tree. NIST Dictionary of Algorithms and Data Structures
Data traceability Data traceability follows the lifecycle of data to track all access and changes to the data. It helps demonstrate transparency, compliance and adherence to regulations. Data traceability, along with data compliance, can be considered part of a data audit process. Data traceability is fundamental to reproducible research.
Data warehouse An integrated, centralized decision support database and the related software programs used to collect, cleanse, transform and store data from a variety of operational sources to support business intelligence. A data warehouse may also include dependent data marts. DAMA Dictionary of Data Management
Data, Administrative Information collected primarily for administrative (not research) purposes. This type of data is collected by government departments and other organisations for the purposes of registration, transaction and record keeping, usually during the delivery of a service. These data are also recognized as having research value. UK Administrative Data Research Network
Data, Analogue Data in the form of analogue materials. [See, also, “Analogue materials”].
Data, Big ”Big data” refers to datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze. This definition assumes that as technology advances over time, the size of datasets that qualify as big data will increase. Also the definition can vary by sector, depending on what kind of software tools are commonly available and what sizes of datasets are common in a particular industry. With those caveats, big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes). McKinsey Global Institute – Big data: the next frontier for innovation, competition and productivity as quoted by the TC3+ in their October 2013 consultation document: Capitalizing on Big Data: Towards a Policy Framework for Advancing Digital Scholarship in Canada.
Data, Digital Data in the form of digital materials. [See, also, “Digital materials”].
Data, Dirty See, “Dataset, Dirty.”
Data, High quality High-quality data are complete, timely, accurate, consistent, relevant, reliable, traceable, cleaned, validated, and well documented.
Data, Open Data that are accessible, usable, assessable, and intelligible. Open data can be freely used, re-used and redistributed by anyone – subject only, at most, to the requirement to attribute and share-alike. Science as an Open Enterprise (SOE)as quoted by TC3+; Open Data 101 (GoC); UNESCO Open Access Policy Guidelines
Data, Personal Data which relate to a living individual who can be identified (a) from those data, or (b) from those data and other information which is in the possession of, or is likely to come into the possession of, the data controller, and includes any expression of opinion about the individual and any indication of the intentions of the data controller or any other person in respect of the individual.
Data, Repurposed Involves creating new data by combining data appropriately from a variety of existing files, generating new data products that did not previously exist. Among the activities in this stage are developing and supporting search tools that utilize standardized metadata, harmonizing the coding of data for specific variables, engineering new methods of combining data and generating and harvesting new data collections. Stewardship of Research Data in Canada: A Gap Analysis   http://rds-sdr.cisti-icist.nrc-cnrc.gc.ca/eng/reports/2008_gap_analysis.html
Data, Research Data that are used as primary sources to support technical or scientific enquiry, research, scholarship, or artistic activity, and that are used as evidence in the research process and/or are commonly accepted in the research community as necessary to validate research findings and results. All other digital and non-digital content have the potential of becoming research data. Research data may be experimental data, observational data, operational data, third party data, public sector data, monitoring data, processed data, or repurposed data. Australian National Data Service; US NIH Grants Policy Statement (p. 171) ; Preserving Research Data in Canada: The Long Tale of Data (Blog)
Data, Research Administrative Information produced around the administration of research projects, including profiles and curriculum vitae of researchers, the scope and impact of research projects, funding, citations, and research outcomes.
Data, Science & technology Qualitative or quantitative attributes of a variable or set of variables. Data refers to representations of physical, biological or chemical facts, typically the results of measurements/observations. It also includes related socio-economic and cultural representations. Data are normally in a structured, tabular, numeric, character, geo-referenced, and/or computer-readable format. Environment Canada data stewardship handbook (draft).
Data, Semantic Data that are tagged with particular metadata that can be used to derive relationships between data. SOE/ TC3+
Database A collection of data that is organised and allows its contents to be easily accessed, managed and updated. The type of database used depends on the requirements of the study. A common type is the relational database, where data are related to each other in a systematic manner so that they can be reorganised and accessed in a number of different ways. A database may house one or many datasets.
Database administration The function of managing the physical aspects of data resources, including database design and integrity, backup and recovery, performance and tuning. DAMA Dictionary of Data Management
Dataset Any organized collection of data in a computational format, defined by a theme or category that reflects what is being measured/observed/monitored . The presentation of the data in the application is enabled through metadata. Mapping the Data Landscape 2011 Summit; TBS Standard on Geospatial Data (ISO 19115:2003); Environment Canada data stewardship handbook (draft).
Dataset series A collection of datasets sharing the same product specification TBS Standard on Geospatial Data (ISO 19115:2003)
Dataset, Dirty A dirty dataset contains inaccurate, incomplete or erroneous data such as spelling or punctuation errors, incorrect data or incorrect data type associated with a field, incomplete or outdated data, duplicate data, inconsistent data, incorrectly ordered data, etc. Using an incorrect or inconsistent data can lead to spurious associations, false conclusions and misdirected investments >
Descriptive metadata Enables identification, location, and retrieval of information resources by users, often including the use of controlled vocabularies for classification and indexing and links to related resources. DCC/TC3+
Digital archiving This term is used very differently within sectors. The library and archiving communities often use it interchangeably with digital preservation. Computing professionals tend to use digital archiving to mean the process of backup and ongoing maintenance as opposed to strategies for long-term digital preservation. It is this latter richer definition, as defined under digital preservation which has been used throughout this handbook. [See, also, “Archiving”] Digital preservation coalition http://www.dpconline.org/advice/preservationhandbook/introduction/definitions-and-concepts
Digital Infrastructure DI – In Canada the preferred term has become Digital Infrastructure to refer to what is known as Cyber-Infrastructure or e-Research Infrastructure RDC
Digital materials A broad term encompassing: (a) digital surrogates created as a result of converting analogue materials to digital form (digitisation); (b) “born digital” for which there has never been and is never intended to be an analogue equivalent; and, (c) digital records. [See, also, “Born digital,” “Digital objects,” “Digital records,” “Digital data,” and “Electronic records”] Digital preservation coalition
Digital object Digital objects are editable, interactive, accessible and modifiable by means of digital objects other than the one governing their behaviour, and are distributed over information infrastructures. A Theory of Digital Objects
Digital Object Identifier DOI – A name (not a location) for an entity on digital networks. It provides a system for persistent and actionable identification and interoperable exchange of managed information on digital networks. MIT data management and publishing
Digital preservation The series of managed activities necessary to ensure continued access to digital materials for as long as necessary. Digital preservation is defined very broadly and refers to all of the actions required to maintain access to digital materials beyond the limits of media failure or technological change. Those materials may be records created during the day-to-day business of an organisation;”born-digital” materials created for a specific purpose (e.g. teaching resources); or the products of digitisation projects..This definition specifically excludes the potential use of digital technology to preserve the original artefacts through digitisation. [See, also, “Digitisation” and “Preservation”]. Digital preservation coalition
Digital records Records created digitally in the day-to-day business of the organisation and assigned formal status by the organisation. They may include for example, word processing documents, emails, databases, or intranet web pages. Digital preservation coalition
Digital Scholarship Incorporates: • building a digital collection of information for further study and analysis; • creating appropriate tools for collection- building; • creating appropriate tools for the analysis and study of collections; • using digital collections and analytical tools to generate new intellectual products; and, • Creating authoring tools for these new intellectual products, either in traditional forms or in digital form. Our Cultural Commonwealth
Digital, Born Digital materials which are not intended to have an analogue equivalent, either as the originating source or as a result of conversion to analogue form.This term is used to differentiate them from 1) digital materials which have been created as a result of converting analogue originals; and 2) digital materials, which may have originated from a digital source but have been printed to paper, e.g. some electronic records. Digital preservation coalition
Digitisation The process of creating digital files by scanning or otherwise converting analogue materials.The resulting digital copy, or digital surrogate, would then be classed as digital material and then subject to the same broad challenges involved in preserving access to it, as “born digital” materials. Digital preservation coalition
Electronic Records See, “Digital records.” Digital preservation coalition
Encoding schema Machine-processable specifications which define the structure and syntax of metadata specifications in a formal schema language TBS Standard On Metadata (Dublin Core Metadata Initiative)
e-Research e-Research – … computationally intensive, large-scale, networked and collaborative forms of research and scholarship across all disciplines, including all of the natural and physical sciences, related applied and technological disciplines, biomedicine, social science and the digital humanities. Association of Research Libraries
e-Research Infrastructure Comprises the ICT assets, facilities and services that support research within institutions and across national innovation systems, and that enable researchers to undertake excellent research and deliver innovation outcomes. Rhys Francis/TC3+
e-Science e-Science – that is, science supported to a significant degree by digital information-processing and/or computational technologies, or wholly based on these. Note that such a definition is functional, not some intrinsic property of the science. Data-based science, that is science which is based wholly or in part on exploiting existing information, is included within this definition. E-Science includes a very broad class of activities, as nearly all information gathering is computer based, or uses information technologies for measuring, recording, reporting, analysing. E-Science often involves intensive use of such technologies: advanced in technique, collaborative or on a large scale (over various possible measures: volumes of information, computational intensity, extent of distribution, variety of information types handled). E-Science can be conducted equally by individuals and small units – in other words, e-Science is equally relevant to small science, and indeed e-Science brings big science within the grasp of the less well-equipped – all you need is a computer. Towards a European e-Infrastructure for e-Science Digital Repositories
eXtensible Markup Language XML – Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere. www.w3.org/XML
Field A data table column name
Format, Data file Data file formats refers to the variety of organizational structures in which research data are stored in digital format. Preferred formats are those designated by a data repository for which the digital content is maintained. If a data file is not in a preferred format, a data curator will often convert the file into a preferred format, thus ensuring that the digital content remains readable and usable. Usually, preferred formats are the de facto standard employed by a particular community. RDC adapted from Policy-making for Research Data in Repositories: A Guide 2009/TC3+
Format, Human-readable Data and code that are commented so that humans can understand what it represents, it’s design, and purpose. Wilson G, Aruliah DA, Brown CT, Hong NPC, Davis M, Guy RT, Haddock SHD, Huff K, Mitchell IM, Plumbley MD, Waugh B, White EP, Wilson P (2012). Best practices for scientific computing , arXiv, 29 November, 1-6.
Format, Machine-readable Data and code that can be easily extracted by computer programs so that the computer is able to use and understand it. PDF documents, for example, are not machine readable. Computers can display the text nicely, but have great difficulty understanding the context that surrounds the text. Open Data 101 (GoC); Other
Granularity The size in which data fields are
sub-divided. [Lengthy definition describes coarse, fine and even finer granularity.]
DAMA Dictionary of Data Management; Wikipedia
Grid Any distributed infrastructure that is federated to combine resources from multiple organizations managed by different administrative domains. The Grid aims to coordinate the sharing of resources in a dynamic and multi-institutional setting to provide additional functionality beyond its constituent parts: brokering, workflow coordination, integration of computing and storage. In order for this to happen, interoperability and standards need to be defined at various levels: for resource access, for coordination and business logic, for data storage and management, for network access and so forth. European Commission, Advancing Technologies and Federating Communities
Information The aggregation of data to make coherent observations about the world, meaningful data, or data arranged or interpreted in a way to provide meaning. Carol Tenopir (2007); William Hersh 2007).
See, Zins (2007)
Information, Confidential Any information obtained by a person on the understanding that they will not disclose it to others, or obtained in circumstances where it is expected that they will not disclose it. For example, the law assumes that whenever people give personal information to health professionals caring for them, it is confidential as long as it remains personally identifiable. See also ‘Personal data’.
Interoperability The structuring of data in such a way that diverse datasets can be integrated. The Open Group TOGAF Documentation
Interoperability The capability to communicate, execute programs, or transfer data among various functional units in a useful and meaningful manner that requires the user to have little or no knowledge of the unique characteristics of those units. Foundational, syntactic, and semantic interoperability are the three necessary aspects of interoperability. TBS Standard On Metadata (Dublin Core Metadata Initiative); DAMA Dictionary of Data Management; ISO/IEC 2382-01, Information Technology Vocabulary, Fundamental Terms
Interoperability, Foundational Foundational interoperability allows data exchange from one information technology system to be received by another and does not require the ability for the receiving information technology system to interpret the data. HIMSS (Healthcare information management and systems society)
Interoperability, Semantic Tthe ability of computer systems to transmit data with unambiguous, shared meaning. Semantic interoperability is a requirement to enable machine computable logic, inferencing, knowledge discovery, and data federation between information systems. Semantic interoperability is achieved when the information transferred has, in its communicated form, all of the meaning required for the receiving system to interpret it correctly, even when the algorithms used by the receiving system are unknown to the sending system. Syntactic interoperability is a pre-requisite to semantic interoperability. Wikipedia
Interoperability, Syntactic Syntactic interoperability defines the stucture or format of data exchange and is achieved through tools such as XML or SQL Standards. Wikipedia; HIMSS (Healthcare information management and systems society)
IUPAC international chemical identifier InChI — The IUPAC International Chemical Identifier is a non-proprietary identifier for chemical substances that can be used in printed and electronic data sources thus enabling easier linking of diverse data compilations. MIT data management and publishing
Knowledge The rules and organizing principles gleaned from aggregated data. The internalized or understood information that can be used to make decisions. William Hersh (2007); Carol Tenopir (2007). See, Zins (2007)
Meta data Literally, “data about data”; data that defines and describes the characteristics of other data, used to improve both business and technical understanding of data and data-related processes. Business meta data includes the names and business definitions of subject areas, entities and attributes, attribute data types and other attribute properties, range descriptions, valid domain values and their definitions. Technical meta data includes physical database table and column names, column properties, and the properties of other database objects, including how data is stored. Process meta data is data that defines and describes the characteristics of other system elements (processes, business rules, programs, jobs, tools, etc.). Data stewardship meta data is data about data stewards, stewardship processes and responsibility assignments. DAMA Dictionary of Data Management. For alternative definitions, see also: Data Curation Centre (DCC)/TC3+; TBS Standard on Geospatial Data (Government On-line Metadata Standard); TBS Standard for Electronic Documents and Records Management Solutions; IOC Oceanographic Data Exchange Policy; UNESCO Open Access Policy Guidelines; Environment Canada data stewardship handbook (draft).
Metadata profile, ISO 19115 A metadata profile that specifies the elements and syntax to be used when implementing the international geospatial standard (ISO 19115: 2003) in North America. Environment Canada data stewardship handbook (draft).
Metadata record A collection of data defined by a theme, category, which reflects what is being measured, observed, monitored at the various sites. The Metadata Record is an information resource of business value. Environment Canada data stewardship handbook (draft).
Metadata, Administrative Used to manage administrative aspects of the digital objects such as intellectual property rights and acquisition. Administrative metadata also documents information concerning the creation, alteration, and version control of the metadata itself.   This is sometimes known as meta-metadata. DCC/TC3+
Metadata, Technical Describes the technical processes used to produce, or required to use a digital object DCC/TC3+
Metadata, Use Manages user access, user tracking, and multi-versioning information DCC/TC3+
Meta-metadata See, “Metadata, Administrative:
Method, Scientific Ask the research question, review the relevant scientific literature, collect the data, analyze and interpret the data, communicate the results.
Middleware Computer software that provides services to software applications beyond those available from the operating system. It can be described as “software glue”. Middleware makes it easier for software developers to perform communication and input/output, so they can focus on the specific purpose of their application. RDC Infrastructure Committee; Wikipedia
Migration A means of overcoming technological obsolescence by transferring digital resources from one hardware/software generation to the next.The purpose of migration is to preserve the intellectual content of digital objects and to retain the ability for clients to retrieve, display, and otherwise use them in the face of constantly changing technology. Migration differs from the refreshing of storage media in that it is not always possible to make an exact digital copy or replicate original features and appearance and still maintain the compatibility of the resource with the new generation of technology. Digital preservation coalition
Persistent Identifier PID – A persistent identifier is a long-lasting reference to a digital object that gives information about that object regardless what happens to it. Developed to address “link rot,” a persistent identifier can be resolved to provide an appropriate representation of an object whether that objects changes its online location or goes offline. Australian National Data Service
Persistent Uniform Resource Locator PURL – This is a URL. However, instead of pointing directly to the location of an Internet resource, a PURL points to an intermediate resolution service. The PURL resolution service associates the PURL with the actual URL and returns that URL to the client. MIT data management and publishing
Preservation An activity within archiving in which specific items of data are maintained over time so that they can still be accessed and understood through changes in technology. JISC/TC3+; TBS Information Management Glossary (National Archives of Canada Preservation Policy)
Preservation metadata Documents actions that have been undertaken to preserve a digital resource such as migrations and checks sum calculations. DCC/TC3+
Preservation, long-term Long-term preservation – Continued access to digital materials, or at least to the information contained in them, indefinitely. Digital preservation coalition
Preservation, medium-term Medium-term preservation – Continued access to digital materials beyond changes in technology for a defined period of time but not indefinitely. Digital preservation coalition
Preservation, short-term Short-term preservation – Access to digital materials either for a defined period of time while use is predicted but which does not extend beyond the foreseeable future and/or until it becomes inaccessible because of changes in technology. Digital preservation coalition
Principal Investigator P.I. – The Principal Investigator has a research leadership role and is the point of contact for a project or partnership that applies the scientific method, historical method, or other research methodology for the advancement of knowledge resulting in independent, objective, high quality, traceable, and reproducible results. The P.I. has primary responsibility for the intellectual direction and integrity of the research or research-related activity, including data production, findings and results, and ensures ethical conduct in all aspects of the research process including but not limited to the treatment of human and animal subjects, conflicts of interest, data acquisition, sharing and ownership, publication practices, responsible authorship, and collaborative research and reporting.

While various tasks may be delegated to team members, some of whom may have greater expertise in specific areas, the P.I. is familiar with the various technical and scientific aspects of a project and how they fit together, is able to identify and remediate gaps, and ensure communication within the team and with users of the research data and results.

The project may be very small involving only a few people (or even only one person – the P.I.), or extremely large involving many groups and multiple P.I.’s and/or co-P.I.’s. Depending on the type of organization (e.g., university, industry, institute, laboratory, government program, etc.) the role of the P.I., how that role fits into the organizational structure, and how it relates to roles within and outside of the organization can vary.

P.I. needs to be defined because some data standards specifically refer to the P.I. (e.g., NARSTO)
Quality assurance QA – The process or set of processes used to measure and assure the quality of a product
Quality control QC – The process of meeting products and services to consumer expectations
Quality control, analytical All those processes and procedures designed to ensure that the results of laboratory analysis are consistent, comparable, accurate and within specified limits of precision.
Quality control, project The Principal Investigator and the project team work together to inspect the accomplished work to ensure its alignment with the project scope, data fitness for use, and data end-user needs
Reformatting Copying information content from one storage medium to a different storage medium (media reformatting) or converting from one file format to a different file format (file re-formatting). Digital preservation coalition
Refreshing Copying information content from one storage media to the same storage media. Digital preservation coalition
Related scientific activities RSA – Complement and extend R&D by contributing to the generation, dissemination, and application of scientific and technological knowledge. (See, also, “Research and Development”) www.science.gc.ca “Annual Science and Technology Data Publication”
Repository Repositories preserve, manage, and provide access to many types of digital materials in a variety of formats. Materials in online repositories are curated to enable search, discovery, and reuse. There must be sufficient control for the digital material to be authentic, reliable, accessible and usable on a continuing basis. ACTI-DM   Working Group/Educause; TBS Standard for Electronic Documents and Records Management Solutions
Research Research is the input data, the code, and the full software environment that produced the research results. Buckheit and Donohue 1995; Donohue 2010; Gandrud 2013.
Research and development R&D – Creative work undertaken on a systematic basis to increase the stock of knowledge, including knowledge of humankind, culture and society, and the use of this stock of knowledge to devise new applications. (See, also, “Related scientific activities”) www.science.gc.ca “Annual Science and Technology Data Publication”
Research data management RDM – Data Management refers to the storage, access and preservation of data produced from a given investigation. Data management practices cover the entire lifecycle of the data, from planning the investigation to conducting it, and from backing up data as it is created and used to long term preservation of data deliverables after the research investigation has concluded. Specific activities and issues that fall within the category of Data Management include: File naming (the proper way to name computer files); data quality control and quality assurance; data access; data documentation (including levels of uncertainty); metadata creation and controlled vocabularies; data storage; data archiving & preservation; data sharing and re-use; data integrity; data security; data privacy; data rights; notebook protocols (lab or field).
Research Data Management Infrastructure RDMI – The configuration of staff, services and tools assembled to support data management across the research lifecycle and more specifically to provide comprehensive coverage of the stages making up the data lifecycle. It can be organized locally and/or globally to support research data activities across the research lifecycle. Chuck Humphrey Blog/TC3+
Research Data, Digital Research data which is in digital form. It may have been originally created in digital form, or it may have been converted from paper, or other form to a digital representation.
Research results Research results are the journal articles, reports, books, slideshows, or websites that announce the project’s findings and try to convince us that the results are correct. Mesirov 2010
Re-use Use of content outside of its original intention Open Data 101 (GoC)
Revision control Revision control over time of data, computer code, software, and documents allows for the ability to revert to a previous revision, which is critical for data traceability, tracking edits, and correcting mistakes.
Revision Control System RCS – A software implementation of revision control that automates the storing, retrieval, logging, identification, and merging of revisions (e.g., GIT, SVN)
Scientific Data Infrastructure What is required to enable researchers to create, store and share the data resulting from their experiments, and to find, access and process the data they need. European Commission, Advancing Technologies and Federating Communities/TC3+
Signals, Analogue Continuous electronic signals
Signals, Digital Non-continuous electronic signals
Source control See, “Revision control.”
Standard Operating Procedure SOP – Detailed, written instructions to achieve uniformity of the performance of a specific function International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use
The Open Archives Initiative Protocol for Metadata Harvesting OAI-PMH – A low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP. oai.org
Total Quality Management TQM – A comprehensive and structured approach to organizational management that seeks to improve the quality of products and services through ongoing refinements in response to continuous feedback.
Trusted Digital Repository TDR – A repository whose mission is to provide [its designated community with reliable, long-term access to managed digital resources. Research Libraries Group/Educause
Uniform Ressource Identifier URI – A string of characters used to identify or name a resource on the Internet. Such identification enables interaction with representations of the resource over a network, typically the World Wide Web, using specific protocols. MIT data management and publishing
Version control See, “Revision control.”
Versioning See, “Revision control.”