SCI Publications
2024
X. Huang, H. Miao, A. Townsend, K. Champley, J. Tringe, V. Pascucci, P.T. Bremer.
Bimodal Visualization of Industrial X-Ray and Neutron Computed Tomography Data, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
DOI: 10.1109/TVCG.2024.3382607
Advanced manufacturing creates increasingly complex objects with material compositions that are often difficult to characterize by a single modality. Our collaborating domain scientists are going beyond traditional methods by employing both X-ray and neutron computed tomography to obtain complementary representations expected to better resolve material boundaries. However, the use of two modalities creates its own challenges for visualization, requiring either complex adjustments of bimodal transfer functions or the need for multiple views. Together with experts in nondestructive evaluation, we designed a novel interactive bimodal visualization approach to create a combined view of the co-registered X-ray and neutron acquisitions of industrial objects. Using an automatic topological segmentation of the bivariate histogram of X-ray and neutron values as a starting point, the system provides a simple yet effective interface to easily create, explore, and adjust a bimodal visualization. We propose a widget with simple brushing interactions that enables the user to quickly correct the segmented histogram results. Our semiautomated system enables domain experts to intuitively explore large bimodal datasets without the need for either advanced segmentation algorithms or knowledge of visualization techniques. We demonstrate our approach using synthetic examples, industrial phantom objects created to stress bimodal scanning techniques, and real-world objects, and we discuss expert feedback.
Z. Li, H. Miao, V. Pascucci, S. Liu.
Visualization Literacy of Multimodal Large Language Models: A Comparative Study, Subtitled arXiv:2407.10996, 2024.
The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective.
In this work, we aim to fill the gap by utilizing the concept of visualization literacy to evaluate MLLMs. We assess MLLMs' performance over two popular visualization literacy evaluation datasets (VLAT and mini-VLAT). Under the framework of visualization literacy, we develop a general setup to compare different multimodal large language models (e.g., GPT4-o, Claude 3 Opus, Gemini 1.5 Pro) as well as against existing human baselines. Our study demonstrates MLLMs' competitive performance in visualization literacy, where they outperform humans in certain tasks such as identifying correlations, clusters, and hierarchical structures.
M. Taufer, H. Martinez, J. Luettgau, L. Whitnah, G. Scorzelli, P. Newel, A. Panta, T. Bremer, D. Fils, C.R. Kirkpatrick, N. McCurdy, V. Pascucci.
Integrating FAIR Digital Objects (FDOs) into the National Science Data Fabric (NSDF) to Revolutionize Dataflows for Scientific Discovery, In Computing in Science & Engineering, IEEE, 2024.
In this perspective paper, we introduce a paradigm-shifting approach that combines the power of FAIR Digital Objects (FDO) with the National Science Data Fabric (NSDF), defining a new era of data accessibility, scientific discovery, and education. Integrating FDOs into the NSDF opens doors to overcoming substantial data access barriers and facilitating the extraction of machine-actionable metadata aligned with FAIR principles. Our augmented NSDF empowers the exchange of massive climate simulations and streamlines materials science workflows. This paper lays the foundation for an inclusive, web-centric, and network-first design, democratizing data access and fostering unprecedented opportunities for research and collaboration within the scientific community.
2023
D. Hoang, H. Bhatia, P. Lindstrom, V. Pascucci.
Progressive Tree-Based Compression of Large-Scale Particle Data, In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--18. 2023.
DOI: 10.1109/TVCG.2023.3260628
Scientific simulations and observations using particles have been creating large datasets that require effective and efficient data reduction to store, transfer, and analyze. However, current approaches either compress only small data well while being inefficient for large data, or handle large data but with insufficient compression. Toward effective and scalable compression/decompression of particle positions, we introduce new kinds of particle hierarchies and corresponding traversal orders that quickly reduce reconstruction error while being fast and low in memory footprint. Our solution to compression of large-scale particle data is a flexible block-based hierarchy that supports progressive, random-access, and error-driven decoding, where error estimation heuristics can be supplied by the user. For low-level node encoding, we introduce new schemes that effectively compress both uniform and densely structured particle distributions.
S. Leventhal, A. Gyulassy, M. Heimann, V. Pascucci.
Exploring Classification of Topological Priors with Machine Learning for Feature Extraction, In IEEE Transactions on Visualization and Computer Graphics, pp. 1--12. 2023.
In many scientific endeavors, increasingly abstract representations of data allow for new interpretive methodologies and conceptualization of phenomena. For example, moving from raw imaged pixels to segmented and reconstructed objects allows researchers new insights and means to direct their studies toward relevant areas. Thus, the development of new and improved methods for segmentation remains an active area of research. With advances in machine learning and neural networks, scientists have been focused on employing deep neural networks such as U-Net to obtain pixel-level segmentations, namely, defining associations between pixels and corresponding/referent objects and gathering those objects afterward. Topological analysis, such as the use of the Morse-Smale complex to encode regions of uniform gradient flow behavior, offers an alternative approach: first, create geometric priors, and then apply machine learning to classify. This approach is empirically motivated since phenomena of interest often appear as subsets of topological priors in many applications. Using topological elements not only reduces the learning space but also introduces the ability to use learnable geometries and connectivity to aid the classification of the segmentation target. In this paper, we describe an approach to creating learnable topological elements, explore the application of ML techniques to classification tasks in a number of areas, and demonstrate this approach as a viable alternative to pixel-level classification, with similar accuracy, improved execution time, and requiring marginal training data.
Z. Li, S. Liu, K. Bhavya, T. Bremer, V. Pascucci.
Instance-wise Linearization of Neural Network for Model Interpretation, Subtitled arXiv:2310.16295v1, 2023.
Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction.
For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication F(x)=W⋅x+b. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making.
S. Liu, H. Miao, Z. Li, M. Olson, V. Pascucci, P.T. Bremer, Subtitled arXiv preprint arXiv:2312.04494, 2023.
With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Feedback from unstructured interviews with experts in AI research, medical visualization, and radiology has been incorporated, highlighting the practicality and potential of AVAs. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high-level visualization goals, which pave the way for developing expert-level visualization agents in the future.
J. Luettgau, G. Scorzelli, V. Pascucci, M. Taufer.
Development of Large-Scale Scientific Cyberinfrastructure and the Growing Opportunity to Democratize Access to Platforms and Data, In Distributed, Ambient and Pervasive Interactions, Springer Nature Switzerland, pp. 378--389. 2023.
ISBN: 978-3-031-34668-2
DOI: 10.1007/978-3-031-34668-2_25
As researchers across scientific domains rapidly adopt advanced scientific computing methodologies, access to advanced cyberinfrastructure (CI) becomes a critical requirement in scientific discovery. Lowering the entry barriers to CI is a crucial challenge in interdisciplinary sciences requiring frictionless software integration, data sharing from many distributed sites, and access to heterogeneous computing platforms. In this paper, we explore how the challenge is not merely a factor of availability and affordability of computing, network, and storage technologies but rather the result of insufficient interfaces with an increasingly heterogeneous mix of computing technologies and data sources. With more distributed computation and data, scientists, educators, and students must invest their time and effort in coordinating data access and movements, often penalizing their scientific research. Investments in the interfaces’ software stack are necessary to help scientists, educators, and students across domains take advantage of advanced computational methods. To this end, we propose developing a science data fabric as the standard scientific discovery interface that seamlessly manages data dependencies within scientific workflows and CI.
J. Luettgau, H. Martinez, G. Tarcea, G. Scorzelli, V. Pascucci, M. Taufer.
Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric, In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 325–326. 2023.
DOI: 10.1145/3588195.3595948
The National Science Data Fabric (NSDF) is our solution to the problem of addressing the data-sharing needs of the growing data science community. NSDF is designed to make sharing data across geographically distributed sites easier for users who lack technical expertise and infrastructure. By developing an easy-to-install software stack, we promote the FAIR data-sharing principles in NSDF while leveraging existing high-speed data transfer infrastructures such as Globus and XRootD. This work shows how we leverage latency and throughput information between geo-distributed NSDF sites with NSDF entry points to optimize the automatic coordination of data placement and transfer across the data fabric, which can further improve the efficiency of data sharing.
N. Morrical, S. Zellmann, A. Sahistan, P. Shriwise, V. Pascucci.
Attribute-Aware RBFs: Interactive Visualization of Time Series Particle Volumes Using RT Core Range Queries, In IEEE Trans Vis Comput Graph, IEEE, 2023.
DOI: 10.1109/TVCG.2023.3327366
Supplemental material
Smoothed-particle hydrodynamics (SPH) is a mesh-free method used to simulate volumetric media in fluids, astrophysics, and solid mechanics. Visualizing these simulations is problematic because these datasets often contain millions, if not billions of particles carrying physical attributes and moving over time. Radial basis functions (RBFs) are used to model particles, and overlapping particles are interpolated to reconstruct a high-quality volumetric field; however, this interpolation process is expensive and makes interactive visualization difficult. Existing RBF interpolation schemes do not account for color-mapped attributes and are instead constrained to visualizing just the density field. To address these challenges, we exploit ray tracing cores in modern GPU architectures to accelerate scalar field reconstruction. We use a novel RBF interpolation scheme to integrate per-particle colors and densities, and leverage GPU-parallel tree construction and refitting to quickly update the tree as the simulation animates over time or when the user manipulates particle radii. We also propose a Hilbert reordering scheme to cluster particles together at the leaves of the tree to reduce tree memory consumption. Finally, we reduce the noise of volumetric shadows by adopting a spatially temporal blue noise sampling scheme. Our method can provide a more detailed and interactive view of these large, volumetric, time-series particle datasets than traditional methods, leading to new insights into these physics simulations.
S. Petruzza, B. Summa, A. Gooch, C.M. Laney, T. Goulden, J. Schreiner, S. Callahan, V. Pascucci.
Interactive Visualization and Portable Image Blending of Massive Aerial Image Mosaics, In IEEE International Conference on Big Data, IEEE, pp. 3365-3370. 2023.
Processing, managing and publishing the substantial volume of data collected through modern remote sensing technologies in a format that is easy for researchers - across broad skill levels and scientific domains - to view and use presents a formidable challenge. As a prime example, the massive scale of image mosaics produced by NEON’s Airborne Observation Platform (AOP), often several to hundreds of gigabytes in volume, demands efficient data management strategies. Additionally, these aerial mosaics frequently exhibit seams due to variations in lighting conditions during the data acquisition process. These seams undermine the integrity of subsequent scientific analyses, introducing distortions that hinder accurate interpretation of ecological patterns. Finally, one of NEON’s core objectives is to make these data broadly accessible to users, including those who are not yet versed in working with remote sensing data or who wish to view the datasets without needing to download and process them.In response to these challenges, we have developed a comprehensive data management pipeline that enables interactive access for analysis and visualization of NEON’s aerial mosaic collection. This pipeline automates data ingestion, conversion, and publication in a streamable format, facilitating seamless user interaction through web viewers and programming APIs. Moreover, we have implemented a portable blending algorithm aimed at eliminating these problematic seams from large aerial mosaics. This algorithm, grounded in the Conjugate Gradient (CG) method, has been implemented both in CUDA and using the modern SYCL programming model for enhanced portability across diverse computing platforms.Experimental results demonstrate scalable performance across both CPU and GPU architectures. This work not only addresses the challenges of large aerial data management and seam removal but also opens avenues for more accurate and comprehensive scientific investigations within the NEON ecosystem.
N. Zhou, G. Scorzelli, J. Luettgau, R.R. Kancharla, J. Kane, R. Wheeler, B. Croom, B. Newell, V. Pascucci, M. Taufer.
Orchestration of materials science workflows for heterogeneous resources at large scale, In The International Journal of High Performance Computing Applications, Sage, 2023.
In the era of big data, materials science workflows need to handle large-scale data distribution, storage, and computation. Any of these areas can become a performance bottleneck. We present a framework for analyzing internal material structures (e.g., cracks) to mitigate these bottlenecks. We demonstrate the effectiveness of our framework for a workflow performing synchrotron X-ray computed tomography reconstruction and segmentation of a silica-based structure. Our framework provides a cloud-based, cutting-edge solution to challenges such as growing intermediate and output data and heavy resource demands during image reconstruction and segmentation. Specifically, our framework efficiently manages data storage, scaling up compute resources on the cloud. The multi-layer software structure of our framework includes three layers. A top layer uses Jupyter notebooks and serves as the user interface. A middle layer uses Ansible for resource deployment and managing the execution environment. A low layer is dedicated to resource management and provides resource management and job scheduling on heterogeneous nodes (i.e., GPU and CPU). At the core of this layer, Kubernetes supports resource management, and Dask enables large-scale job scheduling for heterogeneous resources. The broader impact of our work is four-fold: through our framework, we hide the complexity of the cloud’s software stack to the user who otherwise is required to have expertise in cloud technologies; we manage job scheduling efficiently and in a scalable manner; we enable resource elasticity and workflow orchestration at a large scale; and we facilitate moving the study of nonporous structures, which has wide applications in engineering and scientific fields, to the cloud. While we demonstrate the capability of our framework for a specific materials science application, it can be adapted for other applications and domains because of its modular, multi-layer architecture.
2022
T. M. Athawale, D. Maljovec. L. Yan, C. R. Johnson, V. Pascucci, B. Wang.
Uncertainty Visualization of 2D Morse Complex Ensembles Using Statistical Summary Maps, In IEEE Transactions on Visualization and Computer Graphics, Vol. 28, No. 4, pp. 1955-1966. April, 2022.
ISSN: 1077-2626
DOI: 10.1109/TVCG.2020.3022359
Morse complexes are gradient-based topological descriptors with close connections to Morse theory. They are widely applicable in scientific visualization as they serve as important abstractions for gaining insights into the topology of scalar fields. Data uncertainty inherent to scalar fields due to randomness in their acquisition and processing, however, limits our understanding of Morse complexes as structural abstractions. We, therefore, explore uncertainty visualization of an ensemble of 2D Morse complexes that arises from scalar fields coupled with data uncertainty. We propose several statistical summary maps as new entities for quantifying structural variations and visualizing positional uncertainties of Morse complexes in ensembles. Specifically, we introduce three types of statistical summary maps – the probabilistic map , the significance map , and the survival map – to characterize the uncertain behaviors of gradient flows. We demonstrate the utility of our proposed approach using wind, flow, and ocean eddy simulation datasets.
PT. Bremer, G. Tourassi, W. Bethel, K. Gaither, V. Pascucci, W. Xu.
Report for the ASCR Workshop on Visualization for Scientific Discovery, Decision-Making, and Communication, DOE, 2022.
Z. Li, S. Liu, X. Yu, K. Bhavya, J. Cao, J. Diffenderfer, P.T. Bremer, V. Pascucci.
“Understanding Robustness Lottery”: A Comparative Visual Analysis of Neural Network Pruning Approaches, Subtitled arXiv preprint arXiv:2206.07918, 2022.
Deep learning approaches have provided state-of-the-art performance in many applications by relying on extremely large and heavily overparameterized neural networks. However, such networks have been shown to be very brittle, not generalize well to new uses cases, and are often difficult if not impossible to deploy on resources limited platforms. Model pruning, i.e., reducing the size of the network, is a widely adopted strategy that can lead to more robust and generalizable network -- usually orders of magnitude smaller with the same or even improved performance. While there exist many heuristics for model pruning, our understanding of the pruning process remains limited. Empirical studies show that some heuristics improve performance while others can make models more brittle or have other side effects. This work aims to shed light on how different pruning methods alter the network's internal feature representation, and the corresponding impact on model performance. To provide a meaningful comparison and characterization of model feature space, we use three geometric metrics that are decomposed from the common adopted classification loss. With these metrics, we design a visualization system to highlight the impact of pruning on model prediction as well as the latent feature embedding. The proposed tool provides an environment for exploring and studying differences among pruning methods and between pruned and original model. By leveraging our visualization, the ML researchers can not only identify samples that are fragile to model pruning and data corruption but also obtain insights and explanations on how some pruned …
Z. Li, H. Menon, K. Mohror, S. Liu, L. Guo, P.T. Bremer, V. Pascucci.
A Visual Comparison of Silent Error Propagation, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2022.
DOI: 10.1109/TVCG.2022.3230636
High-performance computing (HPC) systems play a critical role in facilitating scientific discoveries. Their scale and complexity (e.g., the number of computational units and software stack) continue to grow as new systems are expected to process increasingly more data and reduce computing time. However, with more processing elements, the probability that these systems will experience a random bit-flip error that corrupts a program's output also increases, which is often recognized as silent data corruption. Analyzing the resiliency of HPC applications in extreme-scale computing to silent data corruption is crucial but difficult. An HPC application often contains a large number of computation units that need to be tested, and error propagation caused by error corruption is complex and difficult to interpret. To accommodate this challenge, we propose an interactive visualization system that helps HPC researchers understand the resiliency of HPC applications and compare their error propagation. Our system models an application's error propagation to study a program's resiliency by constructing and visualizing its fault tolerance boundary. Coordinating with multiple interactive designs, our system enables domain experts to efficiently explore the complicated spatial and temporal correlation between error propagations. At the end, the system integrated a nonmonotonic error propagation analysis with an adjustable graph propagation visualization to help domain experts examine the details of error propagation and answer such questions as why an error is mitigated or amplified by program execution.
Y. Livnat, D. Maljovec, A. Gyulassy, B. Mouginot, V. Pascucci.
A Novel Tree Visualization to Guide Interactive Exploration of Multi-dimensional Topological Hierarchies, Subtitled arXiv preprint arXiv:2208.06952, 2022.
Understanding the response of an output variable to multi-dimensional inputs lies at the heart of many data exploration endeavours. Topology-based methods, in particular Morse theory and persistent homology, provide a useful framework for studying this relationship, as phenomena of interest often appear naturally as fundamental features. The Morse-Smale complex captures a wide range of features by partitioning the domain of a scalar function into piecewise monotonic regions, while persistent homology provides a means to study these features at different scales of simplification. Previous works demonstrated how to compute such a representation and its usefulness to gain insight into multi-dimensional data. However, exploration of the multi-scale nature of the data was limited to selecting a single simplification threshold from a plot of region count. In this paper, we present a novel tree visualization that provides a concise overview of the entire hierarchy of topological features. The structure of the tree provides initial insights in terms of the distribution, size, and stability of all partitions. We use regression analysis to fit linear models in each partition, and develop local and relative measures to further assess uniqueness and the importance of each partition, especially with respect parents/children in the feature hierarchy. The expressiveness of the tree visualization becomes apparent when we encode such measures using colors, and the layout allows an unprecedented level of control over feature selection during exploration. For instance, selecting features from multiple scales of the hierarchy enables a more nuanced exploration. Finally, we …
J. Luettgau, C.R. Kirkpatrick, G. Scorzelli, V. Pascucci, G. Tarcea, M. Taufer.
NSDF-Catalog: Lightweight Indexing Service for Democratizing Data Delivering, 2022.
Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is often hard if not impossible, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we develop a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata collections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations
N. Morrical, A. Sahistan, U. Güdükbay, I. Wald, V. Pascucci.
Quick Clusters: A GPU-Parallel Partitioning for Efficient Path Tracing of Unstructured Volumetric Grids, 2022.
DOI: 10.13140/RG.2.2.34351.20648
We propose a simple, yet effective method for clustering finite elements in order to improve preprocessing times and rendering performance of unstructured volumetric grids. Rather than building bounding volume hierarchies (BVHs) over individual elements, we sort elements along a Hilbert curve and aggregate neighboring elements together, significantly improving BVH memory consumption. Then to further reduce memory consumption, we cluster the mesh on the fly into sub-meshes with smaller indices using series of efficient parallel mesh re-indexing operations. These clusters are then passed to a highly optimized ray tracing API for both point containment queries and ray-cluster intersection testing. Each cluster is assigned a maximum extinction value for adaptive sampling, which we rasterize into non-overlapping view-aligned bins allocated along the ray. These maximum extinction bins are then used to guide the placement of samples along the ray during visualization, significantly reducing the number of samples required and greatly improving overall visualization interactivity. Using our approach, we improve rendering performance over a competitive baseline on the NASA Mars Lander dataset by 6×(1FPS up to 6FPS including volumetric shadows) while simultaneously reducing memory consumption by 3×(33GB down to 11GB) and avoiding any offline preprocessing steps, enabling high quality interactive visualization on consumer graphics cards. By utilizing the full 48 GB of an RTX 8000, we improve performance of Lander by 17×(1FPS up to 17FPS), enabling new possibilities for large data exploration.
P. Olaya, J. Luettgau, N. Zhou, J. Lofstead, G. Scorzelli, V. Pascucci, M. Taufer.
NSDF-FUSE: A Testbed for Studying Object Storage via FUSE File Systems, In Proceedings of the 31st International Symposium on High-Performance Parallel and Distributed Computing, Association for Computing Machinery, pp. 277–278. 2022.
ISBN: 9781450391993
DOI: 10.1145/3502181.3533709
This work presents NSDF-FUSE, a testbed for evaluating settings and performance of FUSE-based file systems on top of S3-compatible object storage; the testbed is part of a suite of services from the National Science Data Fabric (NSDF) project (an NSF-funded project that is delivering cyberinfrastructures for data scientists). We demonstrate how NSDF-FUSE can be deployed to evaluate eight different mapping packages that mount S3-compatible object storage to a file system, as well as six data patterns representing different I/O operations on two cloud platforms. NSDF-FUSE is open-source and can be easily extended to run with other software mapping packages and different cloud platforms.
Page 1 of 13