Educating the ARM Community: An Introduction to JupyterHub and the Data Workbench

 
Published: 23 May 2023

Editor’s note: In January 2023, the White House Office of Science and Technology Policy launched the Year of Open Science to advance national open science policies across the federal government. During the year, ARM is publishing a series of stories on work to advance open and equitable research. Max Grover of Argonne National Laboratory and Monica Ihli of Oak Ridge National Laboratory provided the following post.

Graphic of a person holding a digital globe with lines and dots to indicate data being shared. The ARM and DOE logos are in white above the text "2023 Year of Open Science" in green.On April 26, ARM hosted the first webinar in a series to educate the ARM community on how to access and use its new computational resources.

The ARM Data Center has been working on starting up the ARM Data Workbench, an interactive computing environment that can connect to state-of-the-art computing resources to work with ARM’s 30 years of climate research data.

Monica Ihli, a software developer at Oak Ridge National Laboratory, and Max Grover, a software developer at Argonne National Laboratory, presented at this webinar, covering Jupyter Notebooks, how to log on to the Data Workbench, and how researchers can use free open-source software installed on the workbench to execute their science workflows.

The webinar was recorded and is available on ARM’s YouTube channel. The key points are also summarized in this blog post for those interested!

The Jupyter Notebook: A Community Standard

A screen grab shows PyART Basics and an overview screen of what the Jupyter Notebook will cover.
This sample Jupyter Notebook on the ARM Data Workbench, used as a demo during the JupyterHub webinar, showcases the Python ARM Radar Toolkit (Py-ART).

The first topic of discussion was an overview of what a Jupyter Notebook is.

Jupyter Notebooks have become the standard interactive computing format within the open science community. Much like physical notebooks one might use in a classroom, these notebooks contain text notes and equations, as well as an execution environment, which allows users to run code next to their notes. This enables users to document what they are working on and reproduce the exact code they used to obtain their scientific results!

Rather than ARM’s software development team building its own custom interactive environment, the ARM Data Center utilizes free open-source software from the Jupyter ecosystem, including JupyterHub, which is a place where people can build and execute Jupyter Notebooks. JupyterHub is a part of the Data Workbench.

Monica walked people through how to use their ARM login to access JupyterHub. Once on JupyterHub, she opened a Jupyter Notebook and executed a few cells, showcasing how easy it is to start exploring scientific questions related to ARM data on the workbench.

Elevating Your Experience on the Data Workbench

While all ARM users are able to access the Data Workbench, additional privileges can be accessed by applying for elevated resources. The key benefits here include:

  • persistent project space
  • scalable resources, including more computational power
  • integration with the Data Discovery interface, allowing users to order data to the workbench
  • access to the full archive of ARM data, which includes 30 years of observations.

If you would like to apply for this elevated Data Workbench experience, you can do so using these instructions in ARM’s Knowledge Base.

Open-Source Software: Installed and Ready for Science!

If you missed the JupyterHub webinar or wanted to rewatch a portion, you can view the recording above.

ARM supports not only the computational environment and data used in this webinar, but also free open-source software that empowers our scientific community. Two key packages detailed in the webinar are the Python ARM Radar Toolkit (Py-ART), which is focused on analyzing weather radar observations; and the Atmospheric data Community Toolkit (ACT), which helps users work with meteorological time-series observations and provides support for a variety of ARM data sets.

While Max did not give a tutorial on how to use these packages, he did cover how to execute Jupyter Notebooks using these tools on the workbench. The integration of the software, data, and computational resources enables users to easily create and reproduce complex scientific workflows.

The default computational environment on the workbench includes Py-ART, ACT, and a suite of other useful packages. For a full list of the software installed on the system, be sure to look at this Knowledge Base article.