Types of data produced
EPSCoR RII4 will generate a broad array of data including: remote sensing, GIS, and other geospatial data products; experimental data including tabular data; maps and photos; model/simulation data; human subject survey data; analytic and visualization tools; and learning modules. Remote sensing, embedded sensor networks, laboratory instrumentation, and human subject survey instruments including interviews and web-based surveys will capture data. To the extent possible, we will also use data available from State/Federal agencies and prior EPSCoR projects, most of which is available from or will be incorporated into the NM Resource Geographic Information System (RGIS, http://rgis.unm.edu), NM’s geospatially enabled data clearinghouse developed and maintained by UNM's Earth Data Analysis Center (EDAC). New and related data collected by EPSCoR researchers will be used to analyze change (e.g., resource availability, human attitudes) over time. With the exception of remotely sensed and long-term instrumentation data, most data sets collected by EPSCoR will be small in size (< 10 MB). Currently, we anticipate that no more than 25 TB of data will be collected as part of the project with the larger data volumes associated with related geospatial data products, videos, and learning modules.
Data and metadata standards
Most of the data collected (by volume) will be geospatial in nature (e.g., GIS and remotely sensed data) and will be in either vector or raster format. These data will be documented using tools that support the ISO 19115 metadata standard. Some of the data collected will be textual in nature and will be saved as text, MS Word, and pdf documents (e.g., surveys, experimental notes). Any tabular data collected will be captured in spreadsheets or data tables and saved in .csv files for long-term accessibility; we plan to make use of the DataUp tool available through DataONE that converts Excel spreadsheets into preservation-ready products and that enables the creation of associated metadata. The metadata management system developed with support from the current EPSCoR project consists of an internal database schema that stores metadata-related information with REST-based web services for delivery of metadata in a variety of standard formats: ISO 19115, FGDC CSDGM, Dublin Core, and WaterML. In addition to ISO 19115, core metadata formats targeted by the project will include Dublin Core for textual resources and Ecological Metadata Language for environmental data that are collected. Other more specific standards that apply to discipline-specific data will be employed as necessary to conform to community norms. Standards will be selected because they are community-based and receive widespread usage and are particularly relevant to the types of data and information that will be collected as part of EPSCoR RII4. We expect the standards landscape to evolve over the five-year life span of the project and will adapt accordingly.
Policies for access and sharing
Data will be made available through several mechanisms. First, during the data and information gathering portion of the project, most data will be available to all project participants via a password-controlled website that houses the virtual lab notebook, copies of non-copyrighted materials, drafts of working white papers and publications, and other data and information generated during the course of the project. Two exceptions include survey data that contain names of human subjects and data and information related to inventions that are to be patented.
With the exception of climate and water data that are collected automatically by State and Federal agencies and that will be made available as soon after collection as the agency allows, other data collected through this project may be embargoed for a period of up to one year to allow time for publication by students and researchers; any exceptions to this embargo period must be approved by the Project Director in writing. There will be no charge for data and information and they will be easily discoverable and acquired via the EPSCoR data portal. The research will be conducted in full compliance with both federal and University regulations and with all components of the applicable Federal Wide Assurance (FWA) for the protection of human subjects in research and all other applicable requirements. UNM investigators participating in the survey and experimental protocol will have completed training in Human Subjects Protections prior to the beginning of the research.
Policies for re-use, redistribution
We plan to adopt the Creative Commons CC BY licensing scheme. This license lets others distribute, remix, tweak, and build upon our work, even commercially, as long as they credit NM EPSCoR researchers for the original creation. This is the most accommodating of licenses offered and is recommended for maximum dissemination and use of licensed materials while also providing appropriate attribution to data creators.
Intended or foreseeable users of the data include other researchers, business and industry, educators, governmental and nongovernmental organizations, and educators and students.
Plans for archiving
The data and information collected through EPSCoR activities will be maintained, curated and archived in the NM Resource Geographic Information System (RGIS) that is associated with the UNM’s Earth Data Analysis Center (EDAC), which is also a DataONE Member Node. EDAC's RGIS has well-defined procedures for backup of multiple copies and preservation, including replication at other DataONE Member Nodes. For long-term archival storage and preservation, we will add EPSCoR data products to LoboVault (the UNM's institutional repository). Research data and metadata stored in RGIS will be available and accessible for a minimum of five years after the completion of the project. Key data such as GIS data layers and associated ISO 19115 compliant metadata will be preserved for the long-term after transformation to generic, preservation-ready formats.
NM EPSCoR contributes its data to the DataONE network as a member node. Through participation in this network, whose mission is to enhance search and discovery of Earth and environmental data, NM EPSCoR is able to reach a wider audience with and maintain high availability of its data.