ARM Achieves New Data Milestone

 
Published: 20 March 2023

ARM hits 4 petabytes and keeps adding to its collection

A green arrow slightly arching right illustrates the increase of ARM data since it started collections in 1992 to 1 terabyte in 1996, 15 terabytes in 2000, 200 terabytes in 2010, 2 petabytes in 2020, and 4 petabytes in 2023.
From its first bytes of data in 1992 to its first terabyte in 1996 to 4 petabytes in early 2023, ARM continues to reach new milestones in the amount of atmospheric data it has collected. The ARM Data Center expects to have almost 6 petabytes by the end of 2023. Graphic is courtesy of Giri Prakash, Oak Ridge National Laboratory.

To call 4 petabytes a lot of data sounds like an understatement. On February 17, 2023, the Atmospheric Radiation Measurement (ARM) user facility passed this milestone, having collected 4 petabytes worth of continuous data since 1992.

To put that gargantuan amount of data in perspective, some estimates say 4 petabytes is approximately equivalent to 2 trillion pages of standard printed text. An average DVD holds slightly more than 4 gigabytes of information. The 4 petabytes of data in ARM’s collection can be thought of as the amount of data that would fit on 1 million DVDs.

The data are accumulating at speed.

In December 2016, over 24 years after ARM began collecting atmospheric data, the ARM Data Center passed 1 petabyte. The total doubled to 2 petabytes by March 2020. It took a little over a year to reach the 3-petabyte mark in April 2021. Almost two years later, the data total surpassed 4 petabytes.

Giri Prakash, who manages the ARM Data Center at Oak Ridge National Laboratory in Tennessee, attributes the recent exponential growth to increases in three main factors: the number of ARM field campaigns, ARM’s arsenal of high-resolution observational instruments, and output from the Large-Eddy Simulation (LES) ARM Symbiotic Simulation and Observation (LASSO) activity.

The LASSO high-resolution modeling activity has produced simulations of shallow and deep convection over ARM sites. Later in 2023, ARM plans to release the full LASSO deep convection data set from the Cloud, Aerosol, and Complex Terrain Interactions (CACTI) field campaign in Argentina.

“We are currently processing over 1.5 petabytes that will be archived in the next few months,” says Prakash. “By the end of the year, we expect to have close to 6 petabytes of data.”

In fiscal year 2022 (October 2021 through September 2022), ARM delivered 238 terabytes—almost a quarter of a petabyte—of data to users worldwide. All ARM data are freely available online through ARM Data Discovery.

# # #


ARM is a DOE Office of Science user facility operated by nine DOE national laboratories.