PDF
Collaborative Research: OAC Core: Topology-Aware Data Compression for Scientific Analysis and Visualization

Award Number and Duration

NSF OAC 2313124 (University of Utah)
NSF OAC 2313122 (University of Kentucky)
NSF OAC 2313123 (Ohio State University)

September 1, 2023 to August 31, 2026

PI and Point of Contact

Bei Wang Phillips (PI, University of Utah)
Associate Professor
School of Computing and Scientific Computing and Imaging Institute
University of Utah
beiwang AT sci.utah.edu
Home page

Xin Liang (Lead PI, University of Kentucky)
Assistant Professor
Department of Computer Science
University of Kentucky
xliang AT uky.edu
Home page

Hanqi Guo (PI, Ohio State University)
Associate Professor
Department of Computer Science and Engineering
Ohio State University
guo.2154 AT osu.edu
Home page

Overview

Today's large-scale simulations are producing vast amounts of data that are revolutionizing scientific thinking and practices. As the disparity between data generation rates and available I/O bandwidths continues to grow, data storage and movement are becoming significant bottlenecks for extreme-scale scientific simulations in terms of in situ and post hoc analysis and visualization. Such a disparity necessitates data compression, where data produced by simulations are compressed in situ and decompressed in situ and post hoc for analysis and exploration. Meanwhile, topological data analysis plays an important role in extracting insights from scientific data regarding feature definition, extraction, and evaluation. However, most of today’s lossy compressors are topology- agnostic, i.e., they do not guarantee the preservation of topological features essential to scientific discoveries. This project aims to research and develop advanced lossy compression techniques and softwares that preserve topological features in data for in situ and post hoc analysis and visualization at extreme scales. The data of interest are scalar fields and vector fields that arise from scientific simulations, with driving applications in cosmology, climate, and fusion simulations.

This project has three research thrusts that focus on deriving topological constraints from scalar fields (I) and vector fields (II), and integrating these constraints to develop topology-aware error- controlled and neural compressors (III). Topological descriptors for scalar and vector fields play a dual role for data compression: they provide topological constraints for error-controlled compressors in the form of pointwise error bounds and for neural compressors in the form of topological loss functions. The team will work closely with domain scientists from climate, fusion, and cosmology research communities to significantly enhance the research cyberinfrastructure ecosystem for computational and data-enabled science and engineering.

This project tackles the data compression, analysis, and visualization needs in extreme-scale scientific simulations by developing a suite of topology-aware data compression algorithms. Such algorithms effectively reduce the size of data while preserving critical features defined by topological notations. We will demonstrate that topological features can be authentically preserved in decompressed data by defining and enforcing topology-aware constraints over advanced lossy compression algorithms. Such capabilities have not been studied systematically within today’s data compression paradigm, which is mostly topology-agnostic, and can lead to significant errors in analyzing and visualizing decompressed data using topological techniques. This project will impact specific fields (computational science, data analysis, data compression, and visualization) and the broader scientific community. The software deliverable of this project will significantly advance the research cyberinfrastructure for current and upcoming exascale systems. This project will foster novel discoveries in multiple scientific disciplines beyond cosmology, climate, and fusion by enabling efficient and effective compression on a wide range of platforms.

Broader Impacts

This project brings together application scientists, visualization experts, and compression re- searchers to advance research and eduction in advanced cyberinfrastructure. The PIs will integrate the research results into teaching and recruit talented students to participate in collaborative research initiatives with leading domain scientists. The team will broaden the participation of underrepresented groups and K–12 students through ongoing collaborations on university campuses. Workshops will be organized at visualization and high-performance computing conferences for broad dissemination. In particular, data challenges will be integrated within workshops to help onboard members from science and engineering communities to engage in joint developmental efforts.

Publications and Manuscripts

Year 1 (2023 - 2024)
PDF MSz: An Efficient Parallel Algorithm for Correcting Morse-Smale Segmentations in Error-Bounded Lossy Compressors.
Yuxiao Li, Xin Liang, Bei Wang, Yongfeng Qiu, Lin Yan, Hanqi Guo.
IEEE Visualization Conference (IEEE VIS), 2024.
IEEE Transactions on Visualization and Computer Graphics, to appear, 2024.
Supplement 1.
Supplement 2: Detailed figures from Figure 7 and Figure 8 in the main paper.
PDF Preserving Topological Feature with Sign-of-Determinant Predicates in Lossy Compression: A Case Study of Vector Field Critical Points.
Mingze Xia, Sheng Di, Franck Cappello, Pu Jiao, Kai Zhao, Jinyang Liu, Xuan Wu, Xin Liang, Hanqi Guo.
Proceedings of the 40th IEEE International Conference on Data Engineering (IEEE ICDE), 2024.
PDF CliZ: Optimizing Lossy Compression for Climate Datasets with Adaptive Fine-tuned Data Prediction.
Zizhe Jian, Sheng Di, Jinyang Liu, Kai Zhao, Xin Liang, Haiying Xu, Robert Underwood, Jiajun Huang, Shixun Wu, Zizhong Chen, Franck Cappello.
Proceedings of the 38th IEEE International Parallel and Distributed Processing Symposium (IEEE IPDPS), 2024.
PDF A General Framework for Augmenting Lossy Compressors with Topological Guarantees.
Nathan Gorski, Xin Liang, Hanqi Guo, Lin Yan, and Bei Wang.
Manuscript, 2024.

Presentations, Educational Development and Broader Impacts

Year 1 (2023 - 2024)
  1. Bei Wang, Invited Talk, Topology-Preserving Data Compression, Department of mathematics, Technical University of Munich (TUM), Germany, July 19, 2024.

  2. Bei Wang, Invited Talk, Topology in Data Visualization and Topology-Preserving Data Compression, Mathematical Methods in Data Science, Math Lab Talk and Discussion Series, MPI for Mathematics in the Sciences, Leipzig, Germany, June 21, 2024.

Students

Nathaniel Gorski (U of Utah, CS PhD student)
Dhruv Meduri (U of Utah, CS PhD student)

Acknowledgement

This material is based upon work supported or partially supported by the National Science Foundation under Grant No.2313124.

Any opinions, findings, and conclusions or recommendations expressed in this project are those of author(s) and do not necessarily reflect the views of the National Science Foundation.

Web page last update: September 4, 2024.