An official website of the United States government
blue sky with white clouds

World’s premier ground-based observations facility advancing atmospheric research

Breakout Summary Report

ARM/ASR User and PI Meeting

Session Title:

Advancing Machine Learning and Artificial Intelligence Applications with ARM Data and ASR Research

Session Date:

4 March 2025

Session Time:

2:00 PM - 4:00 PM

Number of Attendees:

60

Summary Authors:

Jingjing Tian, Maria Zawadowicz, Robert Jackson, Lishan Li, and Adam Theisen

Breakout Description

This session aims to bring together the ARM/ASR research community to explore how machine learning (ML) and artificial intelligence (AI) can enhance the utilization of data to advance atmospheric science research objectives in addition to advancing ARM’s products and processes. ML/AI methodologies are increasingly recognized as transformative tools for analyzing field observations, building data-driven models, creating surrogates for existing models, and gaining new insights into atmospheric processes.

Aligned with ARM/ASR objectives, this session focuses on using ML/AI to improve the representation of physical processes, extract insights from ARM data, and integrate observations with simulations. Additionally, this session will highlight applications of AI/ML in ARM that improve the quality of data and/or the user experience.

This session aims to (1) Highlight recent innovations in ML/AI techniques for observational analysis and process-scale modeling with applications to ARM datasets and broader atmospheric science; (2) Share lessons learned, advancements, and strategies for integrating ML/AI into observational and modeling workflows; (3) Strengthen connections within the ARM/ASR and atmospheric science communities to advance the development and application of ML/AI techniques.

Expected outcomes: (1) A summary of ongoing AI applications and their impact on ARM data analysis. (2) Community-driven recommendations to advance the adoption and integration of AI techniques within ARM. (3) Prioritized list of challenges and opportunities to guide future AI-related efforts.

Main Discussion

This session consists of Part I: AI for ARM Operations and Part II: AI-Driven Data Analysis and Scientific Applications.

Part I covered three key topics, each introduced with a presentation followed by a discussion. To encourage engagement, discussion questions were shared in advance. After each presentation, participants had the opportunity to ask questions and share their perspectives. Topic 1: AI for Data Quality (Presentation given by Mia Li, OU). Discussion question: How AI can streamline ARM operations and improve data quality. Topic 2: ARM AI/ML Value-Added Products (VAPs) (Presentations given by Maxwell Levin and Erol Cromwell, PNNL). Discussion question: Do researchers prefer ML-based data products or those derived from traditional methods? What factors influence this choice? What data standards should ML VAPs follow? Topic 3: An overview of AI/ML applications and ongoing efforts at ARM Data Center. (Presented by Hannah R. Collier, ORNL)

In Part II, we invited three speakers, Bhupendra Raut (ANL), Carl Schmitt (University of Alaska Fairbanks), and Christine Chiu (Colorado State University), to share how they use ARM data in their research. Along with presenting their findings, they prepared slides to discuss lessons learned or opportunities for applying AI to ARM data.

After the presentations and discussions in Part I and II, we left time to further discuss: How the ARM/ASR community better shares ML/AI tools, workflows, and best practices (e.g., repositories, workshops, documentation) to enhance collaboration.

Key Findings

AI/ML is shaping how ARM processes data, from improving data quality control to advancing scientific applications.

The ARM Data Quality (DQ) Office has developed the Iterative Error-Driven Ensemble Labeling (IEDEL) algorithm, a semi-supervised learning approach that improves data quality control by labeling only a small portion of the data to train the model. Initially applied to 1D time series data, this method is now being expanded to image-based datasets using computer vision techniques, allowing for better detection of data spikes and anomalies.

At the same time, ML-powered ARM Value-Added Products (VAPs) are being developed, including merged aerosol size distribution and cloud phase classification. While these models perform well in predicting data quality, challenges remain in generalizing across different campaigns, handling class imbalance, and dealing with missing inputs. To ensure reliability and usability, there is a growing need for standardized guidelines for ML-based VAPs. Participants in discussions expressed interest in using these products but emphasized the importance of clear technical documentation.

The ARM Data Center also leverages language processing models and supervised ML to improve metadata classification, user-specific recommendations, and metadata generation. They found that data cleaning significantly improves results.

Beyond operations, AI/ML is being applied to ARM datasets in scientific research. CNN-based classification of ARM camera images has shown promising results, though challenges like out-of-focus images remain. This approach could also be extended to other ARM instruments, including MASC, PIP, and cloud probes.

Meanwhile, self-supervised learning (SSL) was introduced in the presentation as a method for identifying patterns in unlabeled data without manual labeling, making it especially valuable for large ARM datasets. By reducing the need for time-consuming annotation, SSL improves AI model training efficiency and has shown promising results on TSI images.

ML is also explored for emulating cloud microphysical processes and 3D radiative transfer. While ML has proven valuable in identifying key factors in these processes, the effectiveness of these models heavily depends on careful input selection and a clear understanding of uncertainties in training data and predictions. Ensuring high-quality inputs remains crucial for making these ML-driven approaches reliable for broader scientific applications.

Issues

N/A

Needs

(1) Develop standardized ML workflows for ARM VAPs with clear guidelines and documentation to ensure consistency, usability, and transparency, potentially including the publication of ML models, training data, and evaluation metrics to enhance reproducibility and trust.

(2)Improve communication and resource-sharing to keep the community updated on AI/ML advancements in ARM/ASR research.

Decisions

N/A

Future Plans

N/A

Action Items

A survey may be needed to collect opinions on the following activities:

(1) Organize regular meetings to discuss ongoing projects, challenges, and advancements in AI/ML applications using ARM data.

(2) Create a platform for this community to share and discuss recent publications on AI/ML applications relevant to ARM and ASR science.

(3) Plan a hands-on event, such as a boot camp or hackathon, to provide practical experience applying AI/ML techniques to ARM data.

ARM Logo

Follow Us:

Keep up with the Atmospheric Observer

Updates on ARM news, events, and opportunities delivered to your inbox

Subscribe Now

ARM User Profile

ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.

Atmospheric Radiation Measurement (ARM) | Reviewed March 2025