New journal article details ARM data process, Data Discovery redesign; ARM hits 3 petabytes of archived data
A new article in the journal Earth Science Informatics tells the story of the Atmospheric Radiation Measurement (ARM) user facility’s constantly growing trove of data in the ARM Data Center. The article also details the recent redevelopment of ARM’s Data Discovery, where users can access and order ARM data online.
The authors are ARM Data Center staff Kavya Guntupally, Kyle Dumas, Giri Prakash, Ranjeet Devarakonda, Wade Darnell, Maggie Davis, and Richard Cederwall.
The ARM Data Center, based at Oak Ridge National Laboratory in Tennessee, receives, processes, and delivers data to the scientific community. Data have been collected with sophisticated instruments at ARM observatories and during field campaigns around the world in every climate zone.
In April 2021, ARM reached 3 petabytes of archived data. After 29 years of continuous operation, the ARM Data Center holds more than 11,000 data products.
All ARM data are free to registered users worldwide. In 2020, the ARM Data Center delivered almost 400 terabytes of data to users in 29 countries.
From Raw to Discoverable Data
Did you know? In 2021, the U.S. Department of Energy Office of Science designated ARM as one of its Public Reusable Research (PuRe) Data Resources. Learn more
The article in Earth Science Informatics, a Springer Nature journal, details the steps ARM takes to archive data and make them accessible to users.
Site data systems at all ARM sites securely transmit data from instruments in the field to the ARM Data Center, usually in real time. (Sometimes the data center receives data shipped or hand-delivered on hard drives.) As part of ARM’s data ingestion process, detailed metadata are created for operational and discovery uses.
ARM processes most data to a standard format via the Network Common Data Form (netCDF). Once cataloged and stored in the ARM Data Center, data are made accessible to users via the Data Discovery interface.
Both raw and processed data are archived. The complex data flow includes several quality checks.
Data are often processed, quality-analyzed, and used to create value-added products. These products typically use measurements from multiple instruments to compute geophysical variables that are not measured directly.
Revamping Data Discovery
ARM’s Data Discovery interface provides powerful search functions that rely on metadata to find and deliver the precise data that users want.
To make ARM’s data even more accessible to the observational and modeling communities, the ARM Data Center recently redesigned the workflow for the Data Discovery system. During this process, the ARM Data Center incorporated user requirements and stakeholder recommendations. The new Data Discovery went live in the spring of 2020.
The new Earth Science Informatics article explains the technologies and practices used in the redesign process and the extensive search capabilities now provided by Data Discovery. Search capabilities include a keyword search, guided search for new users, and location search using a mapping tool.
The article describes other new Data Discovery features, such as pop-up pages when a user clicks to view data details on the search results screen. These data details pages provide resources such as the data timeline and quality, data plotting, primary measurements, and citations in different style formats.
Readers of the article will also learn about ARM’s process to determine recommended data for core measurements to help meet user needs.
ARM Data Center staff report that the new Data Discovery has received largely positive feedback, indicating that it has significantly improved the user experience and resulted in more data downloaded. The journal article lists some additional planned upgrades based on user feedback.
# # #ARM is a DOE Office of Science user facility operated by nine DOE national laboratories, including Oak Ridge National Laboratory.