Details

DATA-M

DATA-M is a service provided by Prometheus to support the The Configurable Data Curation System (CDCS) ecosystem. the CDCS is a modular web framework developed at the National Institute of Standards and Technology (NIST) for the past several years to initially manage and operate scientific data. 


The CDCS has been developed based on the FAIR data principles which gives it by design the following very powerful capabilities:

The CDCS framework provides the tools to build a platform that follows FAIR data principles which gives it by design the following very powerful capabilities:


The application is being actively maintained and has been forked many times by other organizations. The CDCS is open source and several instanciations are accessible on GitHub such as the Materials Data Curation System (MDCS): https://github.com/usnistgov/MDCS.

Plugin System

CDCS has been engineered with flexibility and enhancement in mind. For that reason, a plugin system has been used to enable customization and enhancements to be added with ease and maintainability.

The CDCS has more than 50 plugins already accessible through the Pypi package repository. Here are currently some of the main features of the framework:





Available Plugins

Here is list of the top 5 monst used plugins:


Data Repository

Users have data in different formats, in multiple places, using different vocabularies. The goal of the CDCS is to provide an effective research data lifecycle that supports FAIR Data principles (Findable, Accessible, Interoperable, Reusable).

Data repositories let you create or reuse a data model for your domain and start collaborating on the curation of datasets (with rich and structured metadata). Whether you are curating data daily or want to share finalized datasets with the community (with persistent identifiers for findability and referencing), data repositories let you build and customize a system that meets your needs. Authorized users can explore the curated data using full text search or custom search forms generated from the custom data model. All the features are available via a web UI but can also be used in scripts to automate curation and data retrieval thanks to its REST API.


Data Registry

Data registries are specialized data repositories focusing on the discovery of resources. Registries come with a predefined data model that let users provide extensive metadata about resources managed by their organization. These systems are able to share data with each other by accessing to a network of trusted registries and providing and harvesting data from this network.

What is a Resource?

Why was it built?


What are the main features of a registry:


Use a standardized metadata model to explore information about a dataset in a searchable way

Data Flow

The CDCS can be used in several domains, thanks to their dynamic data models. Those data models (XML Schemas) can be either user defined or defined by a community of users willing to engage in a standardised way. Once a model is set, (meta)data (XML) uploaded on the system by authorized users will be validated against these data template, and will be indexed in the database of your choice. In addition to these (meta)data documents, the CDCS also allows the storage of files of any types (PDF, images, text, ...). Users coming to the system can then retrieve information from the system thanks to querying endpoints (full text search, search by field) and download the datasets they need.

Data Harvesting

The CDCS framework supports multiple types of search accros multiple instances. The OAI-PMH protocol implemented in registries enables organizations to connect to other registry instances and harvest their information for fast and efficient searches. 

Federated Search

A CDCS instance can be configured to grant access to some or all hosted data on the system to other CDCS instances. This allows users of an instance to broadcast queries to a set of systems and federate results in one place.