SCI Publications
2024
M. Lowery, J. Turnage, Z. Morrow, J.D. Jakeman, A. Narayan.
Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning, Subtitled arXiv:2407.00809v1, 2024.
This paper introduces the Kernel Neural Operator (KNO), a novel operator learning technique that uses deep kernel-based integral operators in conjunction with quadrature for function-space approximation of operators (maps from functions to functions). KNOs use parameterized, closed-form, finitely-smooth, and compactly-supported kernels with trainable sparsity parameters within the integral operators to significantly reduce the number of parameters that must be learned relative to existing neural operators. Moreover, the use of quadrature for numerical integration endows the KNO with geometric flexibility that enables operator learning on irregular geometries. Numerical results demonstrate that on existing benchmarks the training and test accuracy of KNOs is higher than popular operator learning techniques while using at least an order of magnitude fewer trainable parameters. KNOs thus represent a new paradigm of low-memory, geometrically-flexible, deep operator learning, while retaining the implementation simplicity and transparency of traditional kernel methods from both scientific computing and machine learning.
W. Lyu, R. Sridharamurthy, J.M. Phillips, B. Wang.
Fast Comparative Analysis of Merge Trees Using Locality Sensitive Hashing, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
Scalar field comparison is a fundamental task in scientific visualization. In topological data analysis, we compare topological descriptors of scalar fields—such as persistence diagrams and merge trees—because they provide succinct and robust abstract representations. Several similarity measures for topological descriptors seem to be both asymptotically and practically efficient with polynomial time algorithms, but they do not scale well when handling large-scale, time-varying scientific data and ensembles. In this paper, we propose a new framework to facilitate the comparative analysis of merge trees, inspired by tools from locality sensitive hashing (LSH). LSH hashes similar objects into the same hash buckets with high probability. We propose two new similarity measures for merge trees that can be computed via LSH, using new extensions to Recursive MinHash and subpath signature, respectively. Our similarity measures are extremely efficient to compute and closely resemble the results of existing measures such as merge tree edit distance or geometric interleaving distance. Our experiments demonstrate the utility of our LSH framework in applications such as shape matching, clustering, key event detection, and ensemble summarization.
C. Mackenzie, S. Ruckel, A. Morris, S. Elhabian, E. Bieging.
Statistical Shape Modeling To Predict Left Atrial Appendage Thrombus, In Journal of Cardiovascular Computed Tomography, Elsevier, 2024.
DOI: https://doi.org/10.1016/j.jcct.2024.05.195
C. Mackenzie, A. Morris, S. Ruckel, S. Elhabian, E. Bieging.
Left Atrial Appendage Thrombus Prediction with Statistical Shape Modeling, In Circulation, Vol. 150, 2024.
DOI: https://doi.org/10.1161/circ.150.suppl_1.4144233
Methods: We collected 132 cardiac CTs from consecutive studies of patients over 14 months obtained for evaluation of LAA thrombus prior to cardioversion. Of these, 16 patients were excluded. The LA and LAA were manually segmented independently from the systolic phase of the remaining 116 patients. Shape analysis was then performed using Shapeworks software (SCI, University of Utah) to compute shape parameters of the LAA in isolation as well as the LA and LAA in combination without controlling for scale or orientation. The shape parameters explaining the greatest shape variance were considered for the model until at least 80% of shape variance was included. A logistic regression model for prediction of LAA thrombus was created using these shape parameters with forward and backward stepwise model selection.
Results: Of the 116 studies analyzed, 6 patients had thrombus in the LAA. Average shapes of the patients with and without thrombus differed in overall size as well as prominence of the LAA. Four shape parameters accounted for 81.2% of the LAA shape variance while six shape parameters accounted for 80.5% of the combined LA and LAA variance. The first shape parameter was predictive of LAA thrombus using both shape of the LAA only (p = 0.0258, AUC = 0.762), and when LAA shape was combined with LA shape in a joint model (p = 0.00511, AUC = 0.877).
Conclusion: Statistical shape modeling of the LAA, with or without the LA, can be performed on CT image data, and demonstrates differences in shape of these structures between patients with and without LAA thrombus. Patients with LAA thrombus had a larger overall LAA size, LA size, and a more prominent LAA with distinctive morphology. Findings suggest that statistical shape modeling may offer a quantitative and reproducible approach for using LAA shape to assess stroke risk in patients with AF.
H. Manoochehri, B. Zhang, B.S. Knudsen, T. Tasdizen.
PathMoCo: A Novel Framework to Improve Feature Embedding in Self-supervised Contrastive Learning for Histopathological Images, Subtitled arXiv:2410.17514, 2024.
Self-supervised learning has become a cornerstone in various areas, particularly histopathological image analysis. Image augmentation plays a crucial role in self-supervised learning, as it generates variations in image samples. However, traditional image augmentation techniques often overlook the unique characteristics of histopathological images. In this paper, we propose a new histopathology-specific image augmentation method called stain reconstruction augmentation (SRA). We integrate our SRA with MoCo v3, a leading model in self-supervised contrastive learning, along with our additional contrastive loss terms, and call the new model SRA-MoCo v3. We demonstrate that our SRA-MoCo v3 always outperforms the standard MoCo v3 across various downstream tasks and achieves comparable or superior performance to other foundation models pre-trained on significantly larger histopathology datasets.
Q.C. Nguyen, T. Tasdizen, M. Alirezaei, H. Mane, X. Yue, J.S. Merchant, W. Yu, L. Drew, D. Li, T.T. Nguyen.
Neighborhood built environment, obesity, and diabetes: A Utah siblings study, In SSM - Population Health, Vol. 26, 2024.
Background
This study utilizes innovative computer vision methods alongside Google Street View images to characterize neighborhood built environments across Utah.
Methods
Convolutional Neural Networks were used to create indicators of street greenness, crosswalks, and building type on 1.4 million Google Street View images. The demographic and medical profiles of Utah residents came from the Utah Population Database (UPDB). We implemented hierarchical linear models with individuals nested within zip codes to estimate associations between neighborhood built environment features and individual-level obesity and diabetes, controlling for individual- and zip code-level characteristics (n = 1,899,175 adults living in Utah in 2015). Sibling random effects models were implemented to account for shared family attributes among siblings (n = 972,150) and twins (n = 14,122).
Results
Consistent with prior neighborhood research, the variance partition coefficients (VPC) of our unadjusted models nesting individuals within zip codes were relatively small (0.5%–5.3%), except for HbA1c (VPC = 23%), suggesting a small percentage of the outcome variance is at the zip code-level. However, proportional change in variance (PCV) attributable to zip codes after the inclusion of neighborhood built environment variables and covariates ranged between 11% and 67%, suggesting that these characteristics account for a substantial portion of the zip code-level effects. Non-single-family homes (indicator of mixed land use), sidewalks (indicator of walkability), and green streets (indicator of neighborhood aesthetics) were associated with reduced diabetes and obesity. Zip codes in the third tertile for non-single-family homes were associated with a 15% reduction (PR: 0.85; 95% CI: 0.79, 0.91) in obesity and a 20% reduction (PR: 0.80; 95% CI: 0.70, 0.91) in diabetes. This tertile was also associated with a BMI reduction of −0.68 kg/m2 (95% CI: −0.95, −0.40)
Conclusion
We observe associations between neighborhood characteristics and chronic diseases, accounting for biological, social, and cultural factors shared among siblings in this large population-based study.
Q.C. Nguyen, M. Alirezaei, X. Yue, H. Mane, D. Li, L. Zhao, T.T. Nguyen, R. Patel, W. Yu, M. Hu, D. Quistberg, T. Tasdizen.
Leveraging computer vision for predicting collision risks: a cross-sectional analysis of 2019–2021 fatal collisions in the USA, In Injury Prevention, BMJ, 2024.
Objective The USA has higher rates of fatal motor vehicle collisions than most high-income countries. Previous studies examining the role of the built environment were generally limited to small geographic areas or single cities. This study aims to quantify associations between built environment characteristics and traffic collisions in the USA.
Methods Built environment characteristics were derived from Google Street View images and summarised at the census tract level. Fatal traffic collisions were obtained from the 2019–2021 Fatality Analysis Reporting System. Fatal and non-fatal traffic collisions in Washington DC were obtained from the District Department of Transportation. Adjusted Poisson regression models examined whether built environment characteristics are related to motor vehicle collisions in the USA, controlling for census tract sociodemographic characteristics.
Results Census tracts in the highest tertile of sidewalks, single-lane roads, streetlights and street greenness had 70%, 50%, 30% and 26% fewer fatal vehicle collisions compared with those in the lowest tertile. Street greenness and single-lane roads were associated with 37% and 38% fewer pedestrian-involved and cyclist-involved fatal collisions. Analyses with fatal and non-fatal collisions in Washington DC found streetlights and stop signs were associated with fewer pedestrians and cyclists-involved vehicle collisions while road construction had an adverse association.
Conclusion This study demonstrates the utility of using data algorithms that can automatically analyse street segments to create indicators of the built environment to enhance understanding of large-scale patterns and inform interventions to decrease road traffic injuries and fatalities.
R. Nihalaani, T. Kataria, J. Adams, S.Y. Elhabian.
Estimation and Analysis of Slice Propagation Uncertainty in 3D Anatomy Segmentation, Subtitled arXiv preprint arXiv:2403.12290, 2024.
Supervised methods for 3D anatomy segmentation demonstrate superior performance but are often limited by the availability of annotated data. This limitation has led to a growing interest in self-supervised approaches in tandem with the abundance of available unannotated data. Slice propagation has emerged as an self-supervised approach that leverages slice registration as a self-supervised task to achieve full anatomy segmentation with minimal supervision. This approach significantly reduces the need for domain expertise, time, and the cost associated with building fully annotated datasets required for training segmentation networks. However, this shift toward reduced supervision via deterministic networks raises concerns about the trustworthiness and reliability of predictions, especially when compared with more accurate supervised approaches. To address this concern, we propose the integration of calibrated uncertainty quantification (UQ) into slice propagation methods, providing insights into the model’s predictive reliability and confidence levels. Incorporating uncertainty measures enhances user confidence in self-supervised approaches, thereby improving their practical applicability. We conducted experiments on three datasets for 3D abdominal segmentation using five UQ methods. The results illustrate that incorporating UQ improves not only model trustworthiness, but also segmentation accuracy. Furthermore, our analysis reveals various failure modes of slice propagation methods that might not be immediately apparent to end-users. This study opens up new research avenues to improve the accuracy and trustworthiness of slice propagation methods.
T.A.J. Ouermi, J. Li, T. Athawale, C.R. Johnson.
Estimation and Visualization of Isosurface Uncertainty from Linear and High-Order Interpolation Methods, In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 51--61. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00012
Isosurface visualization is fundamental for exploring and analyzing 3D volumetric data. Marching cubes (MC) algorithms with linear interpolation are commonly used for isosurface extraction and visualization. Although linear interpolation is easy to implement, it has limitations when the underlying data is complex and high-order, which is the case for most real-world data. Linear interpolation can output vertices at the wrong location. Its inability to deal with sharp features and features smaller than grid cells can lead to an incorrect isosurface with holes and broken pieces. Despite these limitations, isosurface visualizations typically do not include insight into the spatial location and the magnitude of these errors. We utilize high-order interpolation methods with MC algorithms and interactive visualization to highlight these uncertainties. Our visualization tool helps identify the regions of high interpolation errors. It also allows users to query local areas for details and compare the differences between isosurfaces from different interpolation methods. In addition, we employ high-order methods to identify and reconstruct possible features that linear methods cannot detect. We showcase how our visualization tool helps explore and understand the extracted isosurface errors through synthetic and real-world data.
T.A.J. Ouermi, J. Li, Z. Morrow, B. Waanders, C.R. Johnson.
Glyph-Based Uncertainty Visualization and Analysis of Time-Varying Vector Fields, In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 73--77. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00014
Uncertainty is inherent to most data, including vector field data, yet it is often omitted in visualizations and representations. Effective uncertainty visualization can enhance the understanding and interpretability of vector field data. For instance, in the context of severe weather events such as hurricanes and wildfires, effective uncertainty visualization can provide crucial insights about fire spread or hurricane behavior and aid in resource management and risk mitigation. Glyphs are commonly used for representing vector uncertainty but are often limited to 2D. In this work, we present a glyph-based technique for accurately representing 3D vector uncertainty and a comprehensive framework for visualization, exploration, and analysis using our new glyphs. We employ hurricane and wildfire examples to demonstrate the efficacy of our glyph design and visualization tool in conveying vector field uncertainty.
A. Panta, X. Huang, N. McCurdy, D. Ellsworth, A. Gooch, .
Web-based Visualization and Analytics of Petascale data: Equity as a Tide that Lifts All Boats, In Proceedings of the IEEE Visualization conference, IEEE, 2024.
Scientists generate petabytes of data daily to help uncover environmental trends or behaviors that are hard to predict. For example, understanding climate simulations based on the long-term average of temperature, precipitation, and other environmental variables is essential to predicting and establishing root causes of future undesirable scenarios and assessing possible mitigation strategies. While supercomputer centers provide a powerful infrastructure for generating petabytes of simulation output, accessing and analyzing these datasets interactively remains challenging on multiple fronts. This paper presents an approach to managing, visualizing, and analyzing petabytes of data within a browser on equipment ranging from the top NASA supercomputer to commodity hardware like a laptop. Our novel data fabric abstraction layer allows user-friendly querying of scientific information while hiding the complexities of dealing with file systems or cloud services. We also optimize network utilization while streaming from petascale repositories through state-of-the-art progressive compression algorithms. Based on this abstraction, we provide customizable dashboards that can be accessed from any device with any internet connection, enabling interactive visual analysis of vast amounts of data to a wide range of users - from top scientists with access to leadership-class computing environments to undergraduate students of disadvantaged backgrounds from minority-serving institutions. We focus on NASA’s use of petascale climate datasets as an example of particular societal impact and, therefore, a case where achieving equity in science participation is critical. We validate our approach by improving the ability of climate scientists to visually explore their data via two fully interactive dashboards. We further validate our approach by deploying the dashboards and simplified training materials in the classroom at a minority-serving institution. These dashboards, released in simplified form to the general public, contribute significantly to a broader push to democratize the access and use of climate data.
A. Panta, G. Scorzelli, A. Gooch, V. Pascucci, H. Lee.
Managing Large-scale Atmospheric and Oceanic Climate Data for Efficient Analysis and On-the-fly Interactive Visualization, 2024.
DOI: 10.22541/essoar.173238742.20533901/v1
Managing vast volumes of climate data, often reaching into terabytes and petabytes, presents significant challenges in terms of storage, accessibility, efficient analysis, and on-the-fly interactive visualization. Traditional data handling techniques are increasingly inadequate for the massive atmospheric and oceanic data generated by modern climate research. We tackled these challenges by reorganizing the native data layout to optimize access and processing, implementing advanced visualization algorithms like OpenVisus for real-time interactive exploration, and extracting comprehensive metadata for all available fields to improve data discoverability and usability. Our work utilized extensive datasets, including downscaled projections of various climate variables and high-resolution ocean simulations from NEX GDDP CMIP6 and NASA DYAMOND datasets. By transforming the data into progressive, streaming-capable formats and incorporating ARCO (Analysis Ready, Cloud Optimized) features before moving them to the cloud, we ensured that the data is highly accessible and efficient for analysis, while allowing direct access to data subsets in the cloud. The direct integration of the Python library called Xarray allows efficient and easy access to the data, leveraging the familiarity most climate scientists have with it. This approach, combined with the progressive streaming format, not only enhances the findability, shareability and reusability of the data but also facilitates sophisticated analyses and visualizations from commodity hardware like personal cell phones and computers without the need for large computational resources. By collaborating with climate scientists and domain experts from NASA Jet Propulsion Lab and NASA Ames Research Center, we published more than 2 petabytes of climate data via our interactive dashboards for climate scientists and the general public. Ultimately, our solution fosters quicker decision-making, greater collaboration, and innovation in the global climate science community by breaking down barriers imposed by hardware limitations and geographical constraints and allowing access to sophisticated visualization tools via publicly available dashboards.
M. Parashar.
Enabling Responsible Artificial Intelligence Research and Development Through the Democratization of Advanced Cyberinfrastructure, In Harvard Data Science Review, Special Issue 4: Democratizing Data, 2024.
Artificial intelligence (AI) is driving discovery, innovation, and economic growth, and has the potential to transform science and society. However, realizing the positive, transformative potential of AI requires that AI research and development (R&D) progress responsibly; that is, in a way that protects privacy, civil rights, and civil liberties, and promotes principles of fairness, accountability, transparency, and equity. This article explores the importance of democratizing AI R&D for achieving the goal of responsible AI and its potential impacts.
M. Parashar.
Everywhere & Nowhere: Envisioning a Computing Continuum for Science, Subtitled arXiv:2406.04480v1, 2024.
Emerging data-driven scientific workflows are seeking to leverage distributed data sources to understand end-to-end phenomena, drive experimentation, and facilitate important decision-making. Despite the exponential growth of available digital data sources at the edge, and the ubiquity of non trivial computational power for processing this data, realizing such science workflows remains challenging. This paper explores a computing continuum that is everywhere and nowhere – one spanning resources at the edges, in the core and in between, and providing abstractions that can be harnessed to support science. It also introduces recent research in programming abstractions that can express what data should be processed and when and where it should be processed, and autonomic middleware services that automate the discovery of resources and the orchestration of computations across these resources.
S. Parsa, B. Wang.
Harmonic Chain Barcode and Stability, Subtitled arXiv:2409.06093, 2024.
The persistence barcode is a topological descriptor of data that plays a fundamental role in topological data analysis. Given a filtration of the space of data, a persistence barcode tracks the evolution of its homological features. In this paper, we introduce a new type of barcode, referred to as the canonical barcode of harmonic chains, or harmonic chain barcode for short, which tracks the evolution of harmonic chains. As our main result, we show that the harmonic chain barcode is stable and it captures both geometric and topological information of data. Moreover, given a filtration of a simplicial complex of size n with m time steps, we can compute its harmonic chain barcode in O(m2nω + mn3) time, where nω is the matrix multiplication time. Consequently, a harmonic chain barcode can be utilized in applications in which a persistence barcode is applicable, such as feature vectorization and machine learning. Our work provides strong evidence in a growing list of literature that geometric (not just topological) information can be recovered from a persistence filtration.
M. Penwarden, H. Owhadi, R.M. Kirby.
Kolmogorov n-Widths for Multitask Physics-Informed Machine Learning (PIML) Methods: Towards Robust Metrics, Subtitled arXiv preprint arXiv:2402.11126, 2024.
Physics-informed machine learning (PIML) as a means of solving partial differential equations (PDE) has garnered much attention in the Computational Science and Engineering (CS&E) world. This topic encompasses a broad array of methods and models aimed at solving a single or a collection of PDE problems, called multitask learning. PIML is characterized by the incorporation of physical laws into the training process of machine learning models in lieu of large data when solving PDE problems. Despite the overall success of this collection of methods, it remains incredibly difficult to analyze, benchmark, and generally compare one approach to another. Using Kolmogorov n-widths as a measure of effectiveness of approximating functions, we judiciously apply this metric in the comparison of various multitask PIML architectures. We compute lower accuracy bounds and analyze the model's learned basis functions on various PDE problems. This is the first objective metric for comparing multitask PIML architectures and helps remove uncertainty in model validation from selective sampling and overfitting. We also identify avenues of improvement for model architectures, such as the choice of activation function, which can drastically affect model generalization to "worst-case" scenarios, which is not observed when reporting task-specific errors. We also incorporate this metric into the optimization process through regularization, which improves the models' generalizability over the multitask PDE problem.
D. Alex Quistberg, S.J. Mooney, T. Tasdizen, P. Arbelaez, Q.C. Nguyen.
Deep Learning-Methods to Amplify Epidemiological Data Collection and Analyses, In American Journal of Epidemiology, Oxford University Press, 2024.
Deep learning is a subfield of artificial intelligence and machine learning based mostly on neural networks and often combined with attention algorithms that has been used to detect and identify objects in text, audio, images, and video. Serghiou and Rough (Am J Epidemiol. 0000;000(00):0000-0000) present a primer for epidemiologists on deep learning models. These models provide substantial opportunities for epidemiologists to expand and amplify their research in both data collection and analyses by increasing the geographic reach of studies, including more research subjects, and working with large or high dimensional data. The tools for implementing deep learning methods are not quite yet as straightforward or ubiquitous for epidemiologists as traditional regression methods found in standard statistical software, but there are exciting opportunities for interdisciplinary collaboration with deep learning experts, just as epidemiologists have with statisticians, healthcare providers, urban planners, and other professionals. Despite the novelty of these methods, epidemiological principles of assessing bias, study design, interpretation and others still apply when implementing deep learning methods or assessing the findings of studies that have used them.
S. Saklani, C. Goel, S. Bansal, Z. Wang, S. Dutta, T. Athawale, D. Pugmire, C.R. Johnson.
Uncertainty-Informed Volume Visualization using Implicit Neural Representation, In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 62--72. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00013
The increasing adoption of Deep Neural Networks (DNNs) has led to their application in many challenging scientific visualization tasks. While advanced DNNs offer impressive generalization capabilities, understanding factors such as model prediction quality, robustness, and uncertainty is crucial. These insights can enable domain scientists to make informed decisions about their data. However, DNNs inherently lack ability to estimate prediction uncertainty, necessitating new research to construct robust uncertainty-aware visualization techniques tailored for various visualization tasks. In this work, we propose uncertainty-aware implicit neural representations to model scalar field data sets effectively and comprehensively study the efficacy and benefits of estimated uncertainty information for volume visualization tasks. We evaluate the effectiveness of two principled deep uncertainty estimation techniques: (1) Deep Ensemble and (2) Monte Carlo Dropout (MC-Dropout). These techniques enable uncertainty-informed volume visualization in scalar field data sets. Our extensive exploration across multiple data sets demonstrates that uncertainty-aware models produce informative volume visualization results. Moreover, integrating prediction uncertainty enhances the trustworthiness of our DNN model, making it suitable for robustly analyzing and visualizing real-world scientific volumetric data sets.
S.A. Sakin, K.E. Isaacs.
A Literature-based Visualization Task Taxonomy for Gantt Charts, Subtitled arXiv:2408.04050, 2024.
Gantt charts are a widely-used idiom for visualizing temporal discrete event sequence data where dependencies exist between events. They are popular in domains such as manufacturing and computing for their intuitive layout of such data. However, these domains frequently generate data at scales which tax both the visual representation and the ability to render it at interactive speeds. To aid visualization developers who use Gantt charts in these situations, we develop a task taxonomy of low level visualization tasks supported by Gantt charts and connect them to the data queries needed to support them. Our taxonomy is derived through a literature survey of visualizations using Gantt charts over the past 30 years.
C. Scully-Allison, I. Lumsden, K. Williams, J. Bartels, M. Taufer, S. Brink, A. Bhatele, O. Pearce, K. Isaacs.
Design Concerns for Integrated Scripting and Interactive Visualization in Notebook Environments, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
DOI: 10.1109/TVCG.2024.3354561
Interactive visualization can support fluid exploration but is often limited to predetermined tasks. Scripting can support a vast range of queries but may be more cumbersome for free-form exploration. Embedding interactive visualization in scripting environments, such as computational notebooks, provides an opportunity to leverage the strengths of both direct manipulation and scripting. We investigate interactive visualization design methodology, choices, and strategies under this paradigm through a design study of calling context trees used in performance analysis, a field which exemplifies typical exploratory data analysis workflows with Big Data and hard to define problems. We first produce a formal task analysis assigning tasks to graphical or scripting contexts based on their specificity, frequency, and suitability. We then design a notebook-embedded interactive visualization and validate it with intended users. In a follow-up study, we present participants with multiple graphical and scripting interaction modes to elicit feedback about notebook-embedded visualization design, finding consensus in support of the interaction model. We report and reflect on observations regarding the process and design implications for combining visualization and scripting in notebooks.
Page 4 of 143