ARM Data Services Completes Successful Fiscal Year 2024

 
Published: 26 November 2024

Editor’s note: This is an update from ARM Chief Data and Computing Officer Giri Prakash.

Giri Prakash in front of the Cumulus high performance computing cluster
Photo of Giri Prakash is from Oak Ridge National Laboratory.

I am pleased to share an overview of ARM Data Services’ significant accomplishments during fiscal year 2024 (FY2024). First and foremost, I extend my gratitude to the ARM staff and the broader scientific community for their invaluable support in advancing our development, engineering, and operational initiatives.

The ARM Data Center team demonstrated exceptional responsiveness by addressing over 1,100 tickets from external users and ARM staff, with most issues resolved within two to five days. This reflects our commitment to providing efficient and reliable support to scientists, principal investigators, and operational staff worldwide. The ARM data archive now hosts over 7 petabytes, offering a robust foundation of observational data to support cutting-edge atmospheric science.

The Data Discovery interface, a key tool for accessing ARM’s extensive data holdings, now includes improved search accuracy and integration with external repositories, making it easier for researchers to find and access the data sets they need. ARM also enhanced metadata management by incorporating automation and machine learning, improving the accuracy and scope of metadata recommendations. These developments significantly enhance the discoverability and usability of the more than 8,300 datastreams available to the community.

To further support researchers, ARM Data Services introduced a new calibration system for tracking instrument records and performance (we will share when this system is available to users) and a redesigned Data Quality Problem Report tool with enhanced search functionality and integration with other data systems. A modernized Field Campaign Dashboard now supports mobile and aerial campaigns with new features, including calendar views, mapping layers, and external data integration from NASA and NOAA. These improvements have been instrumental in streamlining field campaign data access and operational workflows for both science and operations teams.

A map of northern Alabama points out different ARM Bankhead National Forest sites and imagery being collected around the sites.
This screenshot from the ARM Field Campaign dashboard features a map of instrument deployments and imagery from around the Bankhead National Forest atmospheric observatory in northern Alabama.

ARM Data Services is advancing the ARM Data Workbench, an innovative platform designed to integrate 32 years of ARM data, robust computing resources, and an open-source software stack. This effort focuses on removing barriers to data access, enabling seamless interaction with ARM data and external sources. The workbench provides a collaborative, dynamic environment for data analysis and machine learning, using tools such as JupyterHub. By prioritizing FAIR (Findable, Accessible, Interoperable, and Reusable) principles, the Data Workbench ensures that researchers can easily explore and analyze data in a flexible and efficient manner.

The first phase of the Data Workbench, launched in 2022–2023, allows users to discover and stage data on ARM’s computing resources while leveraging Python-based tools such as Jupyter Notebooks. Building on this foundation, the next phase will introduce features such as automatic data staging; an intuitive user dashboard; and Data Studio, a comprehensive data analysis platform powered by ARM’s open-source libraries, including the Atmospheric data Community Toolkit (ACT), Python ARM Radar Toolkit (Py-ART), and ARM Data Integrator (ADI). Seamlessly integrated with tools such as the Data Discovery interface and ARM’s data submission systems, the Data Workbench will create a unified and user-friendly experience, positioning ARM as a leader in collaborative, data-driven atmospheric research.

Our community high performance computing (HPC) cluster, Cumulus, underwent significant upgrades during FY2024. With over 16,000 computing cores, Cumulus enables “data proximity computing” for fast, parallel processing of large-scale data analyses. Throughout the year, it supported more than 20 HPC projects, advancing computational research for the atmospheric science community. Researchers are encouraged to submit HPC proposals via the form available on the ARM website to leverage this powerful resource.

For support and feedback, users are encouraged to submit requests through the Ask Us option (also in the footer of each ARM.gov web page) or via the Feedback tab in Data Discovery.

ARM Data Services also made substantial progress in reprocessing historical data. The ARM Data Center reprocessing team managed new tasks while efficiently addressing a backlog of older jobs. The majority of FY2024 reprocessing tasks are now complete, with all reprocessed data available to users through Data Discovery.

ARM Data Services’ achievements in FY2024 reflect a steadfast commitment to supporting the research community with reliable data resources, advanced tools, and responsive user services. We will continue to collaborate with stakeholders and researchers worldwide to ensure ARM remains a vital resource for advancing atmospheric science.

For support and feedback, users are encouraged to submit requests through the Ask Us option (also in the footer of each ARM.gov web page) or via the Feedback tab in Data Discovery.

Thank you for your continued partnership, and we look forward to supporting your research in FY2025.