Publications

Page 9 of 144

SCI Publications

2023

E. R. Hurd, M. Han, J. K. Mendes, J. R. Hadley, C. R. Johnson, E. V. R. DiBella, J. N. Oshinski, L. H. Timmins. “Comparison of Prospective and Retrospective Gated 4D Flow Cardiac MR Image Acquisitions in the Carotid Bifurcation,” In Cardiovascular Engineering and Technology, Vol. 14, No. 1, Springer Nature, pp. 1--12. Feb, 2023.
ISSN: 1869-408X
DOI: 10.1007/s13239-022-00630-6

ABSTRACT

Purpose: To evaluate the agreement of 4D flow cMRI-derived bulk flow features and fluid (blood) velocities in the carotid bifurcation using prospective and retrospective gating techniques.

Methods: Prospective and retrospective ECG-gated three-dimensional (3D) cine phase-contrast cardiac MRI with three-direction velocity encoding (i.e., 4D flow cMRI) data were acquired in ten carotid bifurcations from men (n = 3) and women (n = 2) that were cardiovascular disease-free. MRI sequence parameters were held constant across all scans except temporal resolution values differed. Velocity data were extracted from the fluid domain and evaluated across the entire volume or at defined anatomic planes (common, internal, external carotid arteries). Qualitative agreement between gating techniques was performed by visualizing flow streamlines and topographical images, and statistical comparisons between gating techniques were performed across the fluid volume and defined anatomic regions.

Results: Agreement in the kinematic data (e.g., bulk flow features and velocity data) were observed in the prospectively and retrospectively gated acquisitions. Voxel differences in time-averaged, peak systolic, and diastolic-averaged velocity magnitudes between gating techniques across all volunteers were 2.7%, 1.2%, and 6.4%, respectively. No significant differences in velocity magnitudes or components ([Formula: see text], [Formula: see text], [Formula: see text]) were observed. Importantly, retrospective acquisitions captured increased retrograde flow in the internal carotid artery (i.e., carotid sinus) compared to prospective acquisitions (10.4 ± 6.3% vs. 4.6 ± 5.3%; [Formula: see text] < 0.05).

Conclusion: Prospective and retrospective ECG-gated 4D flow cMRI acquisitions provide comparable evaluations of fluid velocities, including velocity vector components, in the carotid bifurcation. However, the increased temporal coverage of retrospective acquisitions depicts increased retrograde flow patterns (i.e., disturbed flow) not captured by the prospective gating technique.

K. Iyer, S. Elhabian. “Mesh2SSM: From Surface Meshes to Statistical Shape Models of Anatomy,” Subtitled “arXiv:2305.07805,” 2023.

ABSTRACT

Statistical shape modeling is the computational process of discovering significant shape parameters from segmented anatomies captured by medical images (such as MRI and CT scans), which can fully describe subject-specific anatomy in the context of a population. The presence of substantial non-linear variability in human anatomy often makes the traditional shape modeling process challenging. Deep learning techniques can learn complex non-linear representations of shapes and generate statistical shape models that are more faithful to the underlying population-level variability. However, existing deep learning models still have limitations and require established/optimized shape models for training. We propose Mesh2SSM, a new approach that leverages unsupervised, permutation-invariant representation learning to estimate how to deform a template point cloud to subject-specific meshes, forming a correspondence-based shape model. Mesh2SSM can also learn a population-specific template, reducing any bias due to template selection. The proposed method operates directly on meshes and is computationally efficient, making it an attractive alternative to traditional and deep learning-based SSM approaches.

C. R. Johnson, H. Shen. “AI for Scientific Visualization,” In Artificial Intelligence for Science, Edited by Alok Choudhary, Geoffrey Fox, and Tony Hey, World Scientific, pp. 535-552. 2023.
DOI: 10.1142/9789811265679_0029

S. Johnson, B. Zimmerman, H. Odéen, J. Shea, N. Winkler, R. Factor, S. Joshi, A. Payne. “A Non-Contrast Multi-Parametric MRI Biomarker for Assessment of MR-Guided Focused Ultrasound Thermal Therapies,” In IEEE Transactions on Biomedical Engineering, IEEE, pp. 1--12. 2023.
DOI: 10.1109/TBME.2023.3303445

ABSTRACT

Objective: We present the development of a non-contrast multi-parametric magnetic resonance (MPMR) imaging biomarker to assess treatment outcomes for magnetic resonance-guided focused ultrasound (MRgFUS) ablations of localized tumors. Images obtained immediately following MRgFUS ablation were inputs for voxel- wise supervised learning classifiers, trained using registered histology as a label for thermal necrosis. Methods: VX2 tumors in New Zealand white rabbits quadriceps were thermally ablated using an MRgFUS system under 3 T MRI guidance. Animals were re-imaged three days post-ablation and euthanized. Histological necrosis labels were created by 3D registration between MR images and digitized H&E segmentations of thermal necrosis to enable voxel- wise classification of necrosis. Supervised MPMR classifier inputs included maximum temperature rise, cumulative thermal dose (CTD), post-FUS differences in T2-weighted images, and apparent diffusion coefficient, or ADC, maps. A logistic regression, support vector machine, and random forest classifier were trained in red a leave-one-out strategy in test data from four subjects. Results: In the validation dataset, the MPMR classifiers achieved higher recall and Dice than than a clinically adopted 240 cumulative equivalent minutes at 43^∘ C (CEM ₄₃ ) threshold (0.43) in all subjects.redThe average Dice scores of overlap with the registered histological label for the logistic regression (0.63) and support vector machine (0.63) MPMR classifiers were within 6% of the acute contrast-enhanced non-perfused volume (0.67). Conclusions: Voxel- wise registration of MPMR data to histological outcomes facilitated supervised learning of an accurate non-contrast MR biomarker for MRgFUS ablations in a rabbit VX2 tumor model.

R. Kamali, E. Kwan, M. Regouski, T.J. Bunch, D.J. Dosdall, E. Hsu, R. S. Macleod, I. Polejaeva, R. Ranjan. “Contribution of atrial myofiber architecture to atrial fibrillation,” In PLOS ONE, Vol. 18, No. 1, Public Library of Science, pp. 1--16. Jan, 2023.
DOI: 10.1371/journal.pone.0279974

ABSTRACT

Background

The role of fiber orientation on a global chamber level in sustaining atrial fibrillation (AF) is unknown. The goal of this study was to correlate the fiber direction derived from Diffusion Tensor Imaging (DTI) with AF inducibility.

Methods

Transgenic goats with cardiac-specific overexpression of constitutively active TGF-β1 (n = 14) underwent AF inducibility testing by rapid pacing in the left atrium. We chose a minimum of 10 minutes of sustained AF as a cut-off for AF inducibility. Explanted hearts underwent DTI to determine the fiber direction. Using tractography data, we clustered, visualized, and quantified the fiber helix angles in 8 different regions of the left atrial wall using two reference vectors defined based on anatomical landmarks.

Results

Sustained AF was induced in 7 out of 14 goats. The mean helix fiber angles in 7 out of 8 selected regions were statistically different (P-Value < 0.05) in the AF inducible group. The average fractional anisotropy (FA) and the mean diffusivity (MD) were similar in the two groups with FA of 0.32±0.08 and MD of 8.54±1.72 mm2/s in the non-inducible group and FA of 0.31±0.05 (P-value = 0.90) and MD of 8.68±1.60 mm2/s (P-value = 0.88) in the inducible group.

Conclusions

DTI based fiber direction shows significant variability across subjects with a significant difference between animals that are AF inducible versus animals that are not inducible. Fiber direction might be contributing to the initiation and sustaining of AF, and its role needs to be investigated further.

M.S.T. Karanam, T. Kataria, S. Elhabian. “ADASSM: Adversarial Data Augmentation in Statistical Shape Models From Images,” Subtitled “arXiv:2307.03273v2,” 2023.

ABSTRACT

Statistical shape models (SSM) have been well-established as an excellent tool for identifying variations in the morphology of anatomy across the underlying population. Shape models use consistent shape representation across all the samples in a given cohort, which helps to compare shapes and identify the variations that can detect pathologies and help in formulating treatment plans. In medical imaging, computing these shape representations from CT/MRI scans requires time-intensive preprocessing operations, including but not limited to anatomy segmentation annotations, registration, and texture denoising. Deep learning models have demonstrated exceptional capabilities in learning shape representations directly from volumetric images, giving rise to highly effective and efficient Image-to-SSM. Nevertheless, these models are data-hungry and due to the limited availability of medical data, deep learning models tend to overfit. Offline data augmentation techniques, that use kernel density estimation based (KDE) methods for generating shape-augmented samples, have successfully aided Image-to-SSM networks in achieving comparable accuracy to traditional SSM methods. However, these augmentation methods focus on shape augmentation, whereas deep learning models exhibit image-based texture bias results in sub-optimal models. This paper introduces a novel strategy for on-the-fly data augmentation for the Image-to-SSM framework by leveraging data-dependent noise generation or texture augmentation. The proposed framework is trained as an adversary to the Image-to-SSM network, augmenting diverse and challenging noisy samples. Our approach achieves improved accuracy by encouraging the model to focus on the underlying geometry rather than relying solely on pixel values.

D.J. Kasik, M.C. Whitton, C.R. Johnson. “The Big 50: Celebrating 50 ACM SIGGRAPH Conferences,” In IEEE Computer Graphics and Applications, Vol. 43, IEEE, 2023.
ISSN: 0272-1716
DOI: 10.1109/mcg.2023.3266086

ABSTRACT

The Acm Special Interest Group on Computer Graphics and Interactive Techniques (SIGGRAPH) will hold its 50th Annual Conference on 6–10 August 2023. IEEE Computer Graphics and Applications (CG&A) is joining in the celebration with this special issue.

T. Kataria, S. Rajamani, A.B. Ayubi, M. Bronner, J. Jedrzkiewicz, B. Knudsen, S. Elhabian. “Automating Ground Truth Annotations For Gland Segmentation Through Immunohistochemistry,” 2023.

ABSTRACT

The microscopic evaluation of glands in the colon is of utmost importance in the diagnosis of inflammatory bowel disease (IBD) and cancer. When properly trained, deep learning pipelines can provide a systematic, reproducible, and quantitative assessment of disease-related changes in glandular tissue architecture. The training and testing of deep learning models require large amounts of manual annotations, which are difficult, time-consuming, and expensive to obtain. Here, we propose a method for the automated generation of ground truth in digital H&E slides using immunohistochemistry (IHC) labels. The image processing pipeline generates annotations of glands in H&E histopathology images from colon biopsies by transfer of gland masks from CK8/18, CDX2, or EpCAM IHC. The IHC gland outlines are transferred to co-registered H&E images for the training of deep learning models. We compare the performance of the deep learning models to manual annotations using an internal held-out set of biopsies as well as two public data sets. Our results show that EpCAM IHC provides gland outlines that closely match manual gland annotations (DICE = 0.89) and are robust to damage by inflammation. In addition, we propose a simple data sampling technique that allows models trained on data from several sources to be adapted to a new data source using just a few newly annotated samples. The best-performing models achieved average DICE scores of 0.902 and 0.89, respectively, on GLAS and CRAG colon cancer public datasets when trained with only 10% of annotated cases from either public cohort. Altogether, the performances of our models indicate that automated annotations using cell type-specific IHC markers can safely replace manual annotations. The automated IHC labels from single institution cohorts can be combined with small numbers of hand-annotated cases from multi-institutional cohorts to train models that generalize well to diverse data sources.

T. Kataria, B. Knudsen, S. Elhabian. “Unsupervised Domain Adaptation for Semantic Segmentation via Feature-space Density Matching,” Subtitled “arXiv:2305.05789,” 2023.

ABSTRACT

Semantic segmentation is a critical step in automated image interpretation and analysis where pixels are classified into one or more predefined semantically meaningful classes. Deep learning approaches for semantic segmentation rely on harnessing the power of annotated images to learn features indicative of these semantic classes. Nonetheless, they often fail to generalize when there is a significant domain (i.e., distributional) shift between the training (i.e., source) data and the dataset(s) encountered when deployed (i.e., target), necessitating manual annotations for the target data to achieve acceptable performance. This is especially important in medical imaging because different image modalities have significant intra- and inter-site variations due to protocol and vendor variability. Current techniques are sensitive to hyperparameter tuning and target dataset size. This paper presents an unsupervised domain adaptation approach for semantic segmentation that alleviates the need for annotating target data. Using kernel density estimation, we match the target data distribution to the source data in the feature space. We demonstrate that our results are comparable or superior on multiple-site prostate MRI and histopathology images, which mitigates the need for annotating target data.

T. Kataria, B. Knudsen, S. Elhabian. “To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology,” Subtitled “arXiv:2307.03275,” 2023.

ABSTRACT

Annotating medical imaging datasets is costly, so fine-tuning (or transfer learning) is the most effective method for digital pathology vision applications such as disease classification and semantic segmentation. However, due to texture bias in models trained on real-world images, transfer learning for histopathology applications might result in underperforming models, which necessitates the need for using unlabeled histopathology data and self-supervised methods to discover domain-specific characteristics. Here, we tested the premise that histopathology-specific pretrained models provide better initializations for pathology vision tasks, i.e., gland and cell segmentation. In this study, we compare the performance of gland and cell segmentation tasks with domain-specific and non-domain-specific pretrained weights. Moreover, we investigate the data size at which domain-specific pretraining produces a statistically significant difference in performance. In addition, we investigated whether domain-specific initialization improves the effectiveness of out-of-domain testing on distinct datasets but the same task. The results indicate that performance gain using domain-specific pretraining depends on both the task and the size of the training dataset. In instances with limited dataset sizes, a significant improvement in gland segmentation performance was also observed, whereas models trained on cell segmentation datasets exhibit no improvement.

S. Leventhal, A. Gyulassy, M. Heimann, V. Pascucci. “Exploring Classification of Topological Priors with Machine Learning for Feature Extraction,” In IEEE Transactions on Visualization and Computer Graphics, pp. 1--12. 2023.

ABSTRACT

In many scientific endeavors, increasingly abstract representations of data allow for new interpretive methodologies and conceptualization of phenomena. For example, moving from raw imaged pixels to segmented and reconstructed objects allows researchers new insights and means to direct their studies toward relevant areas. Thus, the development of new and improved methods for segmentation remains an active area of research. With advances in machine learning and neural networks, scientists have been focused on employing deep neural networks such as U-Net to obtain pixel-level segmentations, namely, defining associations between pixels and corresponding/referent objects and gathering those objects afterward. Topological analysis, such as the use of the Morse-Smale complex to encode regions of uniform gradient flow behavior, offers an alternative approach: first, create geometric priors, and then apply machine learning to classify. This approach is empirically motivated since phenomena of interest often appear as subsets of topological priors in many applications. Using topological elements not only reduces the learning space but also introduces the ability to use learnable geometries and connectivity to aid the classification of the segmentation target. In this paper, we describe an approach to creating learnable topological elements, explore the application of ML techniques to classification tasks in a number of areas, and demonstrate this approach as a viable alternative to pixel-level classification, with similar accuracy, improved execution time, and requiring marginal training data.

J. Li, A. Pepe, C. Gsaxner, G. Luijten, Y. Jin, S. Elhabian, et. al.. “MedShapeNet - A Large-Scale Dataset of 3D Medical Shapes for Computer Vision,” Subtitled “arXiv:2308.16139v3,” 2023.

ABSTRACT

We present MedShapeNet, a large collection of anatomical shapes (e.g., bones, organs, vessels) and 3D surgical instrument models. Prior to the deep learning era, the broad application of statistical shape models (SSMs) in medical image analysis is evidence that shapes have been commonly used to describe medical data. Nowadays, however, state-of-the-art (SOTA) deep learning algorithms in medical imaging are predominantly voxel-based. In computer vision, on the contrary, shapes (including, voxel occupancy grids, meshes, point clouds and implicit surface models) are preferred data representations in 3D, as seen from the numerous shape-related publications in premier vision conferences, such as the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), as well as the increasing popularity of ShapeNet (about 51,300 models) and Princeton ModelNet (127,915 models) in computer vision research. MedShapeNet is created as an alternative to these commonly used shape benchmarks to facilitate the translation of data-driven vision algorithms to medical applications, and it extends the opportunities to adapt SOTA vision algorithms to solve critical medical problems. Besides, the majority of the medical shapes in MedShapeNet are modeled directly on the imaging data of real patients, and therefore it complements well existing shape benchmarks consisting of computer-aided design (CAD) models. MedShapeNet currently includes more than 100,000 medical shapes, and provides annotations in the form of paired data. It is therefore also a freely available repository of 3D models for extended reality (virtual reality - VR, augmented reality - AR, mixed reality - MR) and medical 3D printing. This white paper describes in detail the motivations behind MedShapeNet, the shape acquisition procedures, the use cases, as well as the usage of the online shape search portal: https://medshapenet.ikim.nrw/

S. Li, X. Yu, W. Xing, R.M. Kirby, A. Narayan, S. Zhe. “Multi-Resolution Active Learning of Fourier Neural Operators,” Subtitled “arXiv:2309.16971,” 2023.

ABSTRACT

Fourier Neural Operator (FNO) is a popular operator learning framework. It not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO can be a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.

Z. Li, S. Liu, K. Bhavya, T. Bremer, V. Pascucci. “Instance-wise Linearization of Neural Network for Model Interpretation,” Subtitled “arXiv:2310.16295v1,” 2023.

ABSTRACT

Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction.

For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication F(x)=W⋅x+b. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.

H. Lin, M. Lisnic, D. Akbaba, M. Meyer, A. Lex. “Here’s what you need to know about my data: Exploring Expert Knowledge’s Role in Data Analysis,” 2023.

ABSTRACT

Data driven decision making has become the gold standard in science, industry, and public policy. Yet data alone, as an imperfect and partial representation of reality, is often insufficient to make good analysis decisions. Knowledge about the context of a dataset, its strengths and weaknesses, and its applicability for certain tasks is essential. In this work, we present an interview study with analysts from a wide range of domains and with varied expertise and experience inquiring about the role of contextual knowledge. We provide insights into how data is insufficient in analysts workflows and how they incorporate other sources of knowledge into their analysis. We also suggest design opportunities to better and more robustly consider both, knowledge and data in analysis processes.

M. Lisnic, A. Lex, M. Kogan. “"Yeah, this graph doesn't show that": Analysis of Online Engagement with Misleading Data Visualizations,” In OSF Preprints, 2023.

ABSTRACT

Attempting to make sense of a phenomenon or crisis, social media users often share data visualizations and interpretations that can be erroneous or misleading. Prior work has studied how data visualizations can mislead, but do misleading visualizations reach a broad social media audience? And if so, do users amplify or challenge misleading interpretations? To answer these questions, we conducted a mixed-methods analysis of the public’s engagement with data visualization posts about COVID-19 on Twitter. Compared to posts with accurate visual insights, our results show that posts with misleading visualizations garner more replies in which the audiences point out nuanced fallacies and caveats in data interpretations. Based on the results of our thematic analysis of engagement, we identify and discuss important opportunities and limitations to effectively leveraging crowdsourced assessments to address data-driven misinformation.

AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making. “S. Liu, H. Miao, Z. Li, M. Olson, V. Pascucci, P.T. Bremer,” Subtitled “arXiv preprint arXiv:2312.04494,” 2023.

ABSTRACT

With recent advances in multi-modal foundation models, the previously text-only large language models (LLM) have evolved to incorporate visual input, opening up unprecedented opportunities for various applications in visualization. Our work explores the utilization of the visual perception ability of multi-modal LLMs to develop Autonomous Visualization Agents (AVAs) that can interpret and accomplish user-defined visualization objectives through natural language. We propose the first framework for the design of AVAs and present several usage scenarios intended to demonstrate the general applicability of the proposed paradigm. The addition of visual perception allows AVAs to act as the virtual visualization assistant for domain experts who may lack the knowledge or expertise in fine-tuning visualization outputs. Our preliminary exploration and proof-of-concept agents suggest that this approach can be widely applicable whenever the choices of appropriate visualization parameters require the interpretation of previous visual output. Feedback from unstructured interviews with experts in AI research, medical visualization, and radiology has been incorporated, highlighting the practicality and potential of AVAs. Our study indicates that AVAs represent a general paradigm for designing intelligent visualization systems that can achieve high-level visualization goals, which pave the way for developing expert-level visualization agents in the future.

D. Long, W.W. Xing, A.S. Krishnapriyan, R.M. Kirby, S. Zhe, M.W. Mahoney. “Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels,” Subtitled “arXiv:2310.05387v1,” 2023.

ABSTRACT

Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior — an ideal Bayesian sparse distribution — for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.

J. Luettgau, G. Scorzelli, V. Pascucci, M. Taufer. “Development of Large-Scale Scientific Cyberinfrastructure and the Growing Opportunity to Democratize Access to Platforms and Data,” In Distributed, Ambient and Pervasive Interactions, Springer Nature Switzerland, pp. 378--389. 2023.
ISBN: 978-3-031-34668-2
DOI: 10.1007/978-3-031-34668-2_25

ABSTRACT

As researchers across scientific domains rapidly adopt advanced scientific computing methodologies, access to advanced cyberinfrastructure (CI) becomes a critical requirement in scientific discovery. Lowering the entry barriers to CI is a crucial challenge in interdisciplinary sciences requiring frictionless software integration, data sharing from many distributed sites, and access to heterogeneous computing platforms. In this paper, we explore how the challenge is not merely a factor of availability and affordability of computing, network, and storage technologies but rather the result of insufficient interfaces with an increasingly heterogeneous mix of computing technologies and data sources. With more distributed computation and data, scientists, educators, and students must invest their time and effort in coordinating data access and movements, often penalizing their scientific research. Investments in the interfaces’ software stack are necessary to help scientists, educators, and students across domains take advantage of advanced computational methods. To this end, we propose developing a science data fabric as the standard scientific discovery interface that seamlessly manages data dependencies within scientific workflows and CI.

J. Luettgau, H. Martinez, G. Tarcea, G. Scorzelli, V. Pascucci, M. Taufer. “Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric,” In Proceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, ACM, pp. 325–326. 2023.
DOI: 10.1145/3588195.3595948

ABSTRACT

The National Science Data Fabric (NSDF) is our solution to the problem of addressing the data-sharing needs of the growing data science community. NSDF is designed to make sharing data across geographically distributed sites easier for users who lack technical expertise and infrastructure. By developing an easy-to-install software stack, we promote the FAIR data-sharing principles in NSDF while leveraging existing high-speed data transfer infrastructures such as Globus and XRootD. This work shows how we leverage latency and throughput information between geo-distributed NSDF sites with NSDF entry points to optimize the automatic coordination of data placement and transfer across the data fabric, which can further improve the efficiency of data sharing.

Page 9 of 144

SCIENTIFIC COMPUTING AND IMAGING INSTITUTEat the University of Utah

SCI Publications

SCIENTIFIC COMPUTING AND IMAGING INSTITUTE
at the University of Utah