Computing Resources

 

The ARM Data Center offers computing infrastructure to support ARM’s next-generation atmospheric model simulations and petascale data storage, and to provide capabilities for users to conduct big-data analytics and machine learning for atmospheric and climate science research.

The ARM Data Center provides a co-located data and computing platform that enables users to work with large volumes of ARM data without the need to download them. ARM’s heterogeneous and flexible computing architecture provides resources for conventional physics-based model simulations for data analysis and machine learning.

Users with an active ARM account can request access to ARM’s high-performance computing facility that integrates a range of computer resources and storage systems.

JupyterHub and the ARM Data Workbench

JupyterHub is a popular tool for supporting scientific analysis through notebook-based computational environments. Hosted by the ARM Data Center, JupyterHub provides scalable computing infrastructure designed to meet the needs of a variety of users.

JupyterHub functions as part of the ARM Data Workbench, a new ecosystem for interacting with ARM data. Currently in development, the workbench will provide a set of tools for users to select data, retrieve measurement values, visualize data, perform data analysis, and even create their own data bundles.

Any user with an active ARM account can explore JupyterHub and ARM Data Workbench capabilities. In order to gain access to parallel processing capabilities, save files past a single session, and stage data to JupyterHub from Data Discovery, users must request elevated  access.

Cumulus Cluster

Cumulus is a midrange Dell system that consists of 16,384 processing cores with a 4-petabyte General Parallel File System (GPFS). It is used by ARM infrastructure staff for Large-Eddy Simulation (LES) ARM Symbiotic Simulation and Observation (LASSO) development and operation, radar data processing, large-scale reprocessing, value-added product generation, and data quality analysis.

The Cumulus cluster is also available to users with an active ARM account to conduct ARM-approved computationally intensive science projects that involve working with large volumes of ARM data, apply computationally intensive codes to ARM data sets, or analyze LASSO output data sets. Use of the Cumulus cluster for ARM user science projects requires submission of a short proposal to ARM.  

Proposals for Use of the Cumulus Cluster

The Cumulus cluster is available to ARM users for their high-performance computing needs. Users can request to access this resource.

ARM users may request use of the Cumulus cluster by submitting a high-performance computing request (HPCR), a short proposal describing the activities that will make use of the Cumulus cluster. All investigators who will need access to the computing resources for a given project must be included as a principal investigator (PI) or co-PI on the request. All investigators must have active ARM user accounts prior to submitting their HPCR proposal. Proposals to use the cluster cannot be submitted until accounts have been created for all PIs and co-PIs.

Proposals  to use the ARM high-performance computing resources should focus on science activities that meet one or more of the following criteria:

  • involve large volumes of ARM data, which would be prohibitive to download to other computer systems
  • require parallel processing using computationally intensive code applied to ARM data sets
  • analyze voluminous LASSO outputs.

Unless explicitly requested through a special call for proposals, requests will not be considered that primarily involve running dynamical model simulations, without meeting one or more of the above criteria.

Proposals for the use of Cumulus will be reviewed monthly by the ARM Infrastructure Management Board (IMB). Larger efforts may require additional information and undergo a longer scientific peer review. Computational requests must:

  • clearly indicate the relevance of the proposed computational activities to the ARM mission
  • describe the ARM data sets to be analyzed
  • explain how the proposal meets the criteria above
  • explain why ARM computational cluster resources are the most appropriate computational resources for achieving the science goals.

To submit a high-performance computing proposal, PIs must submit the high-performance computing request (HPCR) form. The request will be included in the next monthly IMB review that occurs after responses from the PI to all requests for clarifying information have been received. Results of the review of the proposal and responses to PIs will be communicated after the monthly review.

Once approved, the ARM Data Center High-Performance Computing team will communicate with PIs to enable access to Cumulus and provide any technical support needed.

Projects focusing primarily on model simulations or data analytic applications that do not require the use of large volumes of ARM data or do not apply computationally intensive code to analysis of ARM data may be more suitable for the U.S. Department of Energy’s (DOE) National Energy Research Scientific Computing Center or other DOE leadership computing user facilities, such as the Oak Ridge National Laboratory Leadership Computing Facility and Argonne National Laboratory Leadership Computing Facility. Allocations for these computational facilities should be requested directly from the individual facilities.

Executing a Project

All project allocations will be for a duration of 1 year but can be extended through an extension request process. For projects that extend beyond 1 year, PIs are required to submit a status report on an annual basis throughout the duration of the project.

Expectations for High-Performance Computing Users

  1. Code of Conduct. PIs must review and agree to ARM’s Code of Conduct when requesting access to ARM’s high-performance computing resources.
  2. Project Description. PIs must provide a description of their proposed computational project and its science goals. The project description must clearly show use of a large volume of ARM observational data or LASSO output, or the application of computationally intensive codes to ARM data sets. The project description should also include the computational approach, the planned ARM and non-ARM input data volumes, and all output data volumes required.
  3. Status Reports. If the project extends beyond 1 year, PIs must provide annual status reports to ARM.
  4. Data Submission. High-performance computing projects that create new data products should plan to submit their data to the ARM Data Center as a PI data product. PIs should provide a detailed description of the product and data volume storage needs.
  5. Final Report. A final report for the project is required to be submitted to ARM within 6 months of project completion.
  6. Acknowledgment of ARM Support. Investigators who receive ARM support for their work should use the following acknowledgment in associated publications:

This research was supported by the Atmospheric Radiation Measurement (ARM) user facility, a U.S. Department of Energy (DOE) Office of Science user facility managed by the Biological and Environmental Research program.

Closing Out a Project

When closing out a project, the PI is required to submit a brief, final report of the outcome to the ARM Field Campaign Administrator to complete the ARM documentation.