New methods for extracting more detail from existing data sets

 

Submitter:

Isaacman-VanWertz, Gabriel — Virginia Polytechnic Institute and State University

Area of research:

Aerosol Properties

Journal Reference:

Kim S, L Yee, A Goldstein, and G Isaacman-VanWertz. 2025. "Systematic characterization of unknown compounds via dimensionality reduction of time series." Aerosol Science and Technology, , 10.1080/02786826.2024.2445634.

Science

Detailed data of what is in the atmosphere is often very complex, containing thousands of chemicals without known identities or properties. By developing new automated tools for analyzing certain types of data, this research will substantially improve the ability to make sense of these data and extract new details about the composition of the atmosphere.

Impact

Many data sets collected at atmospheric field sites focus only on a subset of specific data of interest because fully analyzing all of the information has been too time- or labor-intensive. By developing a new tool for automated analysis of certain types of data sets, this advance will provide researchers with substantially more data to answer a broad range of questions about atmospheric process.

Summary

One approach to understanding the detail of what is in the air is a gas chromatograph, which separates the complex mixture of chemicals into the air into all the component parts. Knowing what different chemicals are present provides information about the sources and processes transforming aerosols and air pollutants. Because this data is so complex, researchers often focus on specific target chemicals that provide known information, and a lot of data are never examined due to time and labor limitations. This research uses advanced techniques, including machine learning, to automatically analyze and organize this data, making the process faster and more efficient. The method was tested on samples at the ARM site in Manacapuru, Brazil during the GoAmazon2014/15 field campaign. Using automated methods with limited operator effort, time series of over 400 unique chemicals were generated, characterized by chemical properties, and grouped into categories. These data will not only improve understanding of atmospheric processes at this field site by expanding the amount of detail known, but the method will significantly improve processing of data and the amount of detailed information available at future field sites.