An official website of the United States government
blue sky with white clouds

World’s premier ground-based observations facility advancing atmospheric research

Breakout Summary Report

ARM/ASR User and PI Meeting

Session Title:

Data-driven parameterization development using ARM observations and models

Session Date:

4 March 2025

Session Time:

10:45 AM - 12:45 PM

Number of Attendees:

80

Summary Authors:

Kara Lamb and Nicole Riemer

Breakout Description

Data-driven parameterization development for convection, clouds, and aerosol processes in bothprocess level atmospheric models and larger scale Earth System Models has becomeincreasingly popular in recent years. These methods hold significant potential to reduce bothstructural and parametric uncertainty in physical models and to improve the consistency of therepresentation of these processes across spatial and temporal scales. In this session we aim tobring together DOE ARM/ASR researchers who are working on atmospheric modelparameterization development from a data-driven perspective, including machine learning, dataassimilation,reduced-order methods, causal discovery, and combinations thereof. In particular,we are interested in discussing:

● Recent work and advances related to atmospheric model parameterization developmentfrom a data driven perspective

● Using multi-scale modeling approaches and data-driven methods to develop unified parameterizations of processes and to improve the consistency of parameterizations across spatial and temporal scales

● Physics-informed machine learning to better integrate models and observations and to reduce bias in ML-based parameterizations

● Explainable AI methods and causal discovery for improved process level understanding

● Ensemble forecasting and uncertainty quantification

● Hybrid-physics machine learning models/Earth System Model development with differentiable dynamical cores

While these methods demonstrate significant promise for improving models and physical understanding of atmospheric processes, there are also a number of challenges in fully taking advantage of these approaches to improve models and to better integrate models and observations. For example, integration of ML-based parameterizations into existing models has been challenging, and on-line performance often does not match off-line performance. We plan to have a focused discussion on how to overcome these challenges.

Main Discussion

Our session consisted of two main sections, one focused on improving global scale models, and the other on improving process-levels models. We had one overview talk and three shorter lightning talks on recent work to motivate discussion in each topic. We then held small group discussions where we asked attendees to address the following questions in regards to applications of machine learning and other data-driven methods to atmospheric model development:

● What are best practices? What are common challenges and barriers to advancement?

● Are there community tools and benchmark data sets that would be useful to develop?

● Are there future research directions the community can identify and recommend?

● Do data-driven methods actually bring something new to the field, and if so, what?

● How can these methods help to identify ARM observation gaps? How can we make better use of existing ARM observations?

Key Findings

Large scale models:

● We discussed PPE’s and their growing importance in constraining parameters and identifying structural issues in global models. PPE’s were identified as best practices for evaluating parameters in global models that have been established in the past few years. Calibrated Physics Ensembles (CPE’s, which have been calibrated against available observations) can reduce the parameter space further to what is physically consistent.

● We also discussed opportunities for data-driven methods to improve data assimilation in global models.

● SCREAM-like simulations could help bridge bottom-up and top-down model development; in addition, LASSO simulations could be leveraged more for parameterizing sub-grid-scale processes using recent data-driven methodologies.

Process level models:

● In cases where we do have high fidelity models that we trust for specific processes, coarse-graining has been identified as a best practice.

● In regards to process level models, we discussed the challenges of creating benchmark data sets. One issue is that many of the problems that are tractable with data-driven methods are still being defined. In addition, we discussed the challenges of identifying problems that can be addressed algorithmically only at this point; many of the existing challenges with process level models require experts in atmospheric observations or models to identify suitable data sets or models to address them. Another issue is that the observations that can inform process-level models are typically not in homogeneous “big data” sets, but rather many small, specialized experiments or field studies that require a lot of detailed understanding.

Issues

Large-scale models:

● Lack of quantified uncertainty in observations

● Lack of consensus on what is actually important to predict well

● Constraining with observations is still very tricky. One issue is the difference in scale between measurements and model grid cells.

● We still lack cohesive methods to deal with structural uncertainty, and it’s challenging to separate structural and parametric uncertainty.

● Can scale-aware or scale sensitive data-driven parameterizations be developed?

● How can we match high-fidelity emulators to what global models can do?

● How can we select observations that are appropriate for constraining the parameters of interest?

Process-level models:

● Data for process level models was identified as one barrier. Even when data is available, it can be difficult to contextualize older data sets, and observational data is often missing key features. Process-level modeling can be challenging from a data-driven perspective because remote data that is available at a high spatial and temporal frequency is not well integrated into process-level model development pipelines.

Needs

AI-ready benchmark datasets. This means the dataset must go beyond just having interesting scientific content—it needs to be structured, documented, and accessible in ways that support model training, evaluation, and reproducibility.

Consistent model hierarchy. Hierarchical modeling could help to reduce compensating errors in the coupling of parameterized processes due to global tuning, but in order to tackle structural uncertainty, we need a consistent model hierarchy.

Decisions

We discussed choosing one particular process and making this a targeted focus area for research for the community. Parameterizing collision-coalescence was highlighted as one possible process rate where we believe that data-driven methods could make a huge amount of progress with focused effort. Having focused research around this particular problem could lead to significant advancements in the next few years.

Future Plans

One point that we converged around was the idea of focusing on one particular process rate (collision-coalescence was suggested) and making this the targeted effort of community research. This could take the form of identifying past data sets that could be used and homogenized into AI-ready benchmark data sets, hierarchical model development focused on parameterizing this process in a structurally consistent manner at different scales, and having modeling efforts and field observations focused on this issue.

Action Items

Are there opportunities to identify and homogenize data sets that could be useful for constraining specific processes?

ARM Logo

Follow Us:

Keep up with the Atmospheric Observer

Updates on ARM news, events, and opportunities delivered to your inbox

Subscribe Now

ARM User Profile

ARM welcomes users from all institutions and nations. A free ARM user account is needed to access ARM data.

Atmospheric Radiation Measurement (ARM) | Reviewed March 2025