SLAC extends and centralizes IT infrastructure to prepare for future data challenges


Newswise – A computing facility at the Department of Energy’s SLAC National Accelerator Laboratory is doubling in size, preparing the lab for new science initiatives that promise to revolutionize our understanding of the world, from the atomic scale to the cosmic scale, but also require handling unprecedented data flows.

When the superconducting X-ray laser at SLAC, for example, comes online, it will eventually accumulate data at a dizzying rate of one terabyte per second. And the world’s largest digital camera for astronomy, under construction at the Vera C. Rubin Observatory Laboratory, will eventually capture 20 terabytes of data every night.

“The new IT infrastructure will be up to these challenges and more,” said Amedeo Perazzo, who leads the Controls and Data Systems division within the lab’s Technology Innovation Branch. “We are embracing some of the latest and greatest technologies to create computing capabilities for all of SLAC for years to come.”

The Stanford University-led construction adds a second building to the existing Stanford Research Computing Facility (SRCF). SLAC will become one of the major tenants of SRCF-II, a modern data center that will provide an environment designed to operate 24 hours a day, 7 days a week, without service interruption and with data integrity at the core. spirit. SRCF-II will double the current data center capacities, for a total of 6 megawatts of electrical capacity.

“Computing is a core skill for a science-driven organization like SLAC,” said Adeyemi Adesanya, head of Perazzo’s divisional scientific computing systems department. “I am thrilled to see our vision of an integrated computing facility come to life. This is a necessity for analyzing large-scale data, and it will also pave the way for new initiatives.

A hub for SLAC Big Data

Adesanya’s team is preparing to set up hardware for the SLAC Shared Science Data Facility (S3DF), which will fit into the SRCF-II. It will become a computing hub for all the data-intensive experiments performed in the lab.

Above all, it will benefit future users of LCLS-II, the upgraded Linac Coherent Light Source (LCLS) X-ray laser that will produce over 8,000 more pulses per second than the first-generation machine. Researchers hope to use LCLS-II to gain new insights into atomic processes that are fundamental to some of the most pressing challenges of our time, including the chemistry of clean energy technologies, molecular drug design, and materials development. and quantum devices.

But with these new capabilities come tough computing challenges, said Jana Thayer, LCLS Data Systems division manager. “To get the best scientific results and get the most out of their time at LCLS-II, users will need rapid feedback – within minutes – on the quality of their data,” she said. declared. “To do this, with an X-ray laser producing thousands of times more data every second than its predecessor, we need the petaflops of computing power that S3DF will provide.”

Another problem researchers will have to deal with is that LCLS-II will amass too much data to store it all. The new data facility will run an innovative data reduction pipeline that will discard unnecessary data before it is saved for analysis.

Another demanding computational technique that will benefit from the new infrastructure is cryogenic electron microscopy (cryo-EM) of biomolecules, such as proteins, RNA or virus particles. In this method, scientists take pictures of how an electron beam interacts with a sample containing the biomolecules. They sometimes have to analyze millions of images to reconstruct the three-dimensional molecular structure in near-atomic detail. The researchers also hope to visualize molecular components within cells, not just biochemically purified molecules, at high resolution in the future.

The complex image reconstruction process requires a lot of CPU and GPU power and involves elaborate machine learning algorithms. Doing these calculations at S3DF will bring new opportunities, said Wah Chiu, head of the Stanford-SLAC Cryo-EM Center.

“I really hope that S3DF will become an intellectual hub for computer science, where experts come together to write code that allows us to visualize increasingly complex biological systems,” Chiu said. “There is great potential for discovering new structural states of molecules and organelles in normal and diseased cells at SLAC.”

In fact, everyone in the lab will be able to use the computing resources available. Other potential “clients” include SLAC’s instrument for ultrafast electron diffraction (MeV-UED), Stanford’s Synchrotron Radiation Light Source (SSRL), the lab-scale machine learning initiative, and applications in accelerator science. In total, the S3DF will be able to support 80% of SLAC’s computing needs, while 20% of the most demanding scientific computations will be performed in offsite supercomputer facilities.

Several services under one roof

SRCF-II will host two other major data facilities.

One of them is the Rubin Observatory’s US Data Facility (USDF). In a few years, the observatory will begin taking images of the southern night sky from a mountaintop in Chile using its SLAC-built 3,200-megapixel camera. For the Legacy Survey of Space and Time (LSST), it will take two images every 37 seconds for 10 years. The resulting information could hold answers to some of the biggest questions about our universe, including what exactly is accelerating its expansion, but that information will be contained in a 60-petabyte catalog that researchers will have to sift through. The resulting image archive will be around 300 petabytes in size, dominating storage usage in SRCF-II. The USDF, along with two other centers in the UK and France, will handle the production of the huge catalog of data.

A third data center will serve SLAC’s first-generation X-ray laser user community. The existing IT infrastructure for LCLS data analysis will gradually transition to SRCF-II and become a much larger system there.

Although each data center has specific needs in terms of technical specifications, they all rely on a core of shared services: data must always be transferred, stored, analyzed and managed. Working closely with Stanford, Rubin Observatory, LCLS and other partners, the Perazzo and Adesanya teams are implementing all three systems.

For Adesanya, this unified approach – which includes a cost model that will help pay for future upgrades and growth – is a dream come true. “Historically, computing at SLAC was highly distributed and each facility would have its own specialized system,” he said. “The new, more centralized approach will help drive new lab-scale initiatives, such as machine learning, and by breaking down silos and converging into an integrated data facility, we’re building something that’s more capable than the sum of everything we had before.

The construction of the SRCF-II is a Stanford project. Much of the S3DF infrastructure is funded by the Department of Energy’s Office of Science. LCLS and SSRL are Office of Science user facilities. The Rubin Observatory is a joint initiative of the National Science Foundation (NSF) and the Office of Science. Its main mission is to carry out the Legacy Survey of Space and Time, providing an unprecedented set of data for scientific research supported by both agencies. Rubin is jointly operated by NSF’s NOIRLab and SLAC. NOIRLab is operated for the NSF by the Association of Universities for Research in Astronomy and SLAC is operated for the DOE by Stanford. The Stanford-SLAC Cryo-EM Center (S2C2) is supported by the Transformative High-Resolution Cryo-Electron Microscopy program of the National Institutes of Health (NIH) Joint Fund.

SLAC is a dynamic, multi-program laboratory that explores how the universe works at the largest, smallest, and fastest scales and invents powerful tools used by scientists around the world. With research spanning particle physics, astrophysics and cosmology, materials, chemistry, biological and energy sciences, and scientific computing, we help solve real-world problems and advance the interests of nation.

SLAC is operated by Stanford University for the US Department of Energy. Science Office. The Office of Science is the largest supporter of basic physical science research in the United States and works to address some of the most pressing challenges of our time.


Comments are closed.