Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.

BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).

Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications

2021

M. Rasouli, R. M. Kirby, H. Sundar. “A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication,” In The International Conference on High Performance Computing in Asia-Pacific Region, pp. 110-119. 2021.

ABSTRACT

Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.

A. Rathore, N. Chalapathi, S. Palande, Bei Wang. “TopoAct: Visually Exploring the Shape of Activations in Deep Learning,” In Computer Graphics Forum, Vol. 40, No. 1, pp. 382-397. 2021.

ABSTRACT

Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e., combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we aim to obtain a global view of what neurons detect by studying their activations. In particular, we develop visualizations that show the shape of the activation space, the organizational principle behind neuron activations, and the relationships of these activations within a layer. Applying tools from topological data analysis, we present TopoAct, a visual exploration system to study topological summaries of activation vectors. We present exploration scenarios using TopoAct that provide valuable insights into learned representations of neural networks. We expect TopoAct to give a topological perspective that enriches the current toolbox of neural network analysis, and to provide a basis for network architecture diagnosis and data anomaly detection.

A. Rauff, L.H. Timmins, R.T. Whitaker, J.A. Weiss. “A Nonparametric Approach for Estimating Three-Dimensional Fiber Orientation Distribution Functions (ODFs) in Fibrous Materials,” In IEEE Transactions on Medical Imaging, 2021.
DOI: 10.1109/TMI.2021.3115716

ABSTRACT

Many biological tissues contain an underlying fibrous microstructure that is optimized to suit a physiological function. The fiber architecture dictates physical characteristics such as stiffness, diffusivity, and electrical conduction. Abnormal deviations of fiber architecture are often associated with disease. Thus, it is useful to characterize fiber network organization from image data in order to better understand pathological mechanisms. We devised a method to quantify distributions of fiber orientations based on the Fourier transform and the Qball algorithm from diffusion MRI. The Fourier transform was used to decompose images into directional components, while the Qball algorithm efficiently converted the directional data from the frequency domain to the orientation domain. The representation in the orientation domain does not require any particular functional representation, and thus the method is nonparametric. The algorithm was verified to demonstrate its reliability and used on datasets from microscopy to show its applicability. This method increases the ability to extract information of microstructural fiber organization from experimental data that will enhance our understanding of structure-function relationships and enable accurate representation of material anisotropy in biological tissues.

M. Razi, M. Kirby, A. Narayan. “Kernel optimization for Low-Rank Multi-Fidelity Algorithms,” In International Journal for Uncertainty Quantification, Begel House Inc., pp. 31-54. 2021.

ABSTRACT

One of the major challenges for low-rank multi-fidelity (MF) approaches is the assumption that low-fidelity (LF) and high-fidelity (HF) models admit``similar''low-rank kernel representations. Low-rank MF methods have traditionally attempted to exploit low-rank representations of\emph linear kernels. However, such linear kernels may not be able to capture low-rank behavior, and they may admit LF and HF kernels that are not similar. Such a situation renders a naive approach to low-rank MF procedures ineffective. In this paper, we propose a novel approach for the selection of a near-optimal kernel function for use in low-rank MF methods. The proposed framework is a two-step strategy wherein:(1) hyperparameters of a library of kernel functions are optimized, and (2) a particular combination of of the optimized kernels is selected, through either a convex mixture (Additive Kernel Approach) or through a data-driven …

P. Rosen, A. Seth, E. Mills, A. Ginsburg, J. Kamenetzky, J. Kern, C.R. Johnson, B. Wang. “Using Contour Trees in the Analysis and Visualization of Radio Astronomy Data Cubes,” In Topological Methods in Data Analysis and Visualization VI, Springer-Verlag, pp. 87--108. 2021.

ABSTRACT

The current generation of radio and millimeter telescopes, particularly the Atacama Large Millimeter Array (ALMA), offers enormous advances in observing capabilities. While these advances represent an unprecedented opportunity to facilitate scientific understanding, the increased complexity in the spatial and spectral structure of these ALMA data cubes lead to challenges in their interpretation. In this paper, we perform a feasibility study for applying topological data analysis and visualization techniques never before tested by the ALMA community. Using techniques based on contour trees, we seek to improve upon existing analysis and visualization workflows of ALMA data cubes, in terms of accuracy and speed in feature extraction. We review our development process in building effective analysis and visualization capabilities for the astrophysicists. We also summarize effective design practices by identifying domain-specific needs of simplicity, integrability, and reproducibility, in order to best target and service the large astrophysics community.

R. Roy, J. Raiman, N. Kant, I. Elkin, R. Kirby, M. Siu, S. Oberman, S. Godil, B. Catanzaro. “PrefixRL: Optimization of Parallel Prefix Circuits using Deep Reinforcement Learning,” In 2021 58th ACM/IEEE Design Automation Conference (DAC), IEEE, pp. 853-858. 2021.
DOI: 10.1109/DAC18074.2021.9586094

ABSTRACT

In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library.

Damodar Sahasrabudhe. “Enhancing Asynchronous Many-Task Runtime Systems for Next-Generation Architectures and Exascale Supercomputers,” School of Computing, University of Utah, Salt Lake City, UT, USA, 2021.

ABSTRACT

Exascale supercomputers capable of computing 10¹⁸ double-precision floating point operations per second are expected to be operational around 2022/23. The complexity and diversity of the proposed exascale machines pose new challenges for the software applications, namely, 1) implementing efficient data management; 2) having programming systems to exploit locality and multimillion parallelism; 3) developing efficient algorithms to leverage new architectures; 4) ensuring resiliency; and 5) improving scientific productivity on diverse architectures. Due to data-driven scheduling and asynchronous execution, Asynchronous Many-Task (AMT) runtime systems show promise to handle these exascale challenges.

One such AMT, the Uintah Computational Framework, maintains two distinct layers for the application and underlying runtime infrastructure. This distinction allows Uintah users to concentrate on application and the Uintah infrastructure handles communication, data coherency, multithreading, and architecture-specific complexities.

This dissertation addresses some of the exascale challenges and also integrates the individual solutions under the single umbrella of Uintah. The resiliency approach handles node failure faster than the traditional checkpointing method and helps to address challenge (4). A potential solution for challenges (2) and (3) can be the new asynchronous scheduler designed for the Sunway Taihulight supercomputer that shows the benefits of asynchronous execution. The novel portable Single Instruction Multiple Data (SIMD) primitive provides a prospective approach to handle (2) and (5), which achieves near-ideal vectorization on Central Processing Units (CPUs) along with Graphics Processing Unit (GPU) portability provided by the CUDA back end. The newly developed threading model using MPI endpoints shows performance improvements over the MPI-everywhere version, which can be one of the solutions to tackle challenges (2) and (3). Finally, this work enhances the heterogeneous scheduler, contributes to the ongoing portability drive, and successfully runs a simulation using portable AMT tasks on thousands of CPUs and GPUs. These enhancements are important to answer challenges (2), (3), and (5). As a result, this research takes Uintah closer to exascale readiness. Using Uintah as an example, this work demonstrates how AMTs, third-party libraries, and applications can be enhanced to benefit from the next-generation architectures.

J. Salinet, R. Molero, F. S. Schlindwein, J. Karel, M. Rodrigo, J. L. Rojo-Álvarez, O. Berenfeld, A. M. Climent, B. Zenger, F. Vanheusden, J. G. S. Paredes, R. MacLeod, F. Atienza, M. S. Guillem, M. Cluitmans, P. Bonizzi. “Electrocardiographic imaging for atrial fibrillation: a perspective from computer models and animal experiments to clinical value,” In Frontiers in Physiology, Vol. 12, Frontiers Media, April, 2021.
DOI: 10.3389/fphys.2021.653013

ABSTRACT

Salinet et al. Electrocardiographic Imaging for Atrial Fibrillation treatment guidance (for example, localization of AF triggers and sustaining mechanisms), and we discuss the technological requirements and validation. We address experimental and clinical results, limitations, and future challenges for fruitful application of ECGI for AF understanding and management. We pay attention to existing techniques and clinical application, to computer models and (animal or human) experiments, to challenges of methodological and clinical validation. The overall objective of the study is to provide a consensus on valuable directions that ECGI research may take to provide future improvements in AF characterization and treatment guidance.

J. Sandhu, T. Bidone, R. D. Rabbitt. “Prestin Generates Instantaneous Force in Outer Hair Cell Membranes,” In Biophysical Journal, Vol. 120, No. 3, 2021.

ABSTRACT

Hearing occurs from sound reaching the inner ear cochlea, where electromotile Outer Hair Cells (OHCs) amplify vibrations by elongating and contracting rapidly in response to auditory frequency changes in membrane potential. OHCs can generate force cycle-by-cycle at frequencies exceeding 50kHz, but precisely how this is achieved is unclear. Electromotility requires expression of the transmembrane protein, prestin, which facilitates the electromechanical conversion through action of the Coulomb force acting on the anion Cl- bound at the core of the protein. However, recent experimental data suggests the charge displacement is too slow to support sound amplification at auditory frequencies. As a consequence, prestin electromechanics remain unclear at the molecular level. We hypothesize that prestin instantaneously transmits stress to the membrane, which subsequently drives charge displacement, membrane deformation, and OHC shape changes. To test the hypothesis, we examined the conformational dynamics of prestin and its effects on the motion of lipids under: (1) isometric conditions and (2) constant force conditions in order to mimic different regimes of membrane loading. All-atom molecular dynamics simulations of the prestin dimer embedded in POPC membranes were run and the trajectories analyzed. We discovered that under isometric conditions, the presence of a chloride ion in the electric field increased residue fluctuations. This trend was not observed under constant force conditions, supporting the idea that isometric conditions cause instantaneous force to be generated in the membrane. The analysis allowed us to identify the molecular mechanisms by which prestin allows electromechanical amplification by OHCs in the cochlea.

S. Sane, T. Athawale,, C.R. Johnson. “Visualization of Uncertain Multivariate Data via Feature Confidence Level-Sets,” In EuroVis 2021, 2021.

ABSTRACT

Recent advancements in multivariate data visualization have opened new research opportunities for the visualization community. In this paper, we propose an uncertain multivariate data visualization technique called feature confidence level-sets. Conceptually, feature level-sets refer to level-sets of multivariate data. Our proposed technique extends the existing idea of univariate confidence isosurfaces to multivariate feature level-sets. Feature confidence level-sets are computed by considering the trait for a specific feature, a confidence interval, and the distribution of data at each grid point in the domain. Using uncertain multivariate data sets, we demonstrate the utility of the technique to visualize regions with uncertainty in relation to the specific trait or feature, and the ability of the technique to provide secondary feature structure visualization based on uncertainty.

S. Sane, A. Yenpure, R. Bujack, M. Larsen, K. Moreland, C. Garth, C. R. Johnson,, H. Childs. “Scalable In Situ Computation of Lagrangian Representations via Local Flow Maps,” In Eurographics Symposium on Parallel Graphics and Visualization, The Eurographics Association, 2021.
DOI: 10.2312/pgv.20211040

ABSTRACT

In situ computation of Lagrangian flow maps to enable post hoc time-varying vector field analysis has recently become an active area of research. However, the current literature is largely limited to theoretical settings and lacks a solution to address scalability of the technique in distributed memory. To improve scalability, we propose and evaluate the benefits and limitations of a simple, yet novel, performance optimization. Our proposed optimization is a communication-free model resulting in local Lagrangian flow maps, requiring no message passing or synchronization between processes, intrinsically improving scalability, and thereby reducing overall execution time and alleviating the encumbrance placed on simulation codes from communication overheads. To evaluate our approach, we computed Lagrangian flow maps for four time-varying simulation vector fields and investigated how execution time and reconstruction accuracy are impacted by the number of GPUs per compute node, the total number of compute nodes, particles per rank, and storage intervals. Our study consisted of experiments computing Lagrangian flow maps with up to 67M particle trajectories over 500 cycles and used as many as 2048 GPUs across 512 compute nodes. In all, our study contributes an evaluation of a communication-free model as well as a scalability study of computing distributed Lagrangian flow maps at scale using in situ infrastructure on a modern supercomputer.

S. Sane, C. R. Johnson, H. Childs. “Investigating In Situ Reduction via Lagrangian Representations for Cosmology and Seismology Applications,” In Computational Science -- ICCS 2021, Springer International Publishing, pp. 436--450. 2021.
DOI: 10.1007/978-3-030-77961-0_36

ABSTRACT

Although many types of computational simulations produce time-varying vector fields, subsequent analysis is often limited to single time slices due to excessive costs. Fortunately, a new approach using a Lagrangian representation can enable time-varying vector field analysis while mitigating these costs. With this approach, a Lagrangian representation is calculated while the simulation code is running, and the result is explored after the simulation. Importantly, the effectiveness of this approach varies based on the nature of the vector field, requiring in-depth investigation for each application area. With this study, we evaluate the effectiveness for previously unexplored cosmology and seismology applications. We do this by considering encumbrance (on the simulation) and accuracy (of the reconstructed result). To inform encumbrance, we integrated in situ infrastructure with two simulation codes, and evaluated on representative HPC environments, performing Lagrangian in situ reduction using GPUs as well as CPUs. To inform accuracy, our study conducted a statistical analysis across a range of spatiotemporal configurations as well as a qualitative evaluation. In all, we demonstrate effectiveness for both cosmology and seismology—time-varying vector fields from these domains can be reduced to less than 1% of the total data via Lagrangian representations, while maintaining accurate reconstruction and requiring under 10% of total execution time in over 80% of our experiments.

A. Singh, M. Bauer, S. Joshi. “Physics Informed Convex Artificial Neural Networks (PICANNs) for Optimal Transport based Density Estimation,” Subtitled “arXiv,” 2021.

ABSTRACT

Optimal Mass Transport (OMT) is a well studied problem with a variety of applications in a diverse set of fields ranging from Physics to Computer Vision and in particular Statistics and Data Science. Since the original formulation of Monge in 1781 significant theoretical progress been made on the existence, uniqueness and properties of the optimal transport maps. The actual numerical computation of the transport maps, particularly in high dimensions, remains a challenging problem. By Brenier's theorem, the continuous OMT problem can be reduced to that of solving a non-linear PDE of Monge-Ampere type whose solution is a convex function. In this paper, building on recent developments of input convex neural networks and physics informed neural networks for solving PDE's, we propose a Deep Learning approach to solve the continuous OMT problem.

To demonstrate the versatility of our framework we focus on the ubiquitous density estimation and generative modeling tasks in statistics and machine learning. Finally as an example we show how our framework can be incorporated with an autoencoder to estimate an effective probabilistic generative model.

W. T. Sołowski, M. Berzins, W. Coombs, J. Guilkey, M. Möller, Q. A. Tran, T. Adibaskoro, S. Seyedan, R. Tielen, K. Soga. “Material point method: Overview and challenges ahead (with videos),” In Advances in Applied Mechanics, 1, Vol. 14, Ch. 2, Elsevier, pp. 113-204. 2021.
ISBN: 978-0-323-88519-5

ABSTRACT

The paper gives an overview of Material Point Method and shows its evolution over the last 25 years. The Material Point Method developments followed a logical order. The article aims at identifying this order and show not only the current state of the art, but explain the drivers behind the developments and identify what is currently still missing.The paper explores modern implementations of both explicit and implicit Material Point Method. It concentrates mainly on uses of the method in engineering, but also gives a short overview of Material Point Method application in computer graphics and animation. Furthermore, the article gives overview of errors in the material point method algorithms, as well as identify gaps in knowledge, filling which would hopefully lead to a much more efficient and accurate Material Point Method. The paper also briefly discusses algorithms related to contact and boundaries, coupling the Material Point Method with other numerical methods and modeling of fractures. It also gives an overview of modeling of multi-phase continua with Material Point Method. The paper closes with numerical examples, aiming at showing the capabilities of Material Point Method in advanced simulations. Those include landslide modeling, multiphysics simulation of shaped charge explosion and simulations of granular material flow out of a silo undergoing changes from continuous to discontinuous and back to continuous behavior.The paper uniquely illustrates many of the developments not only with figures but also with videos, giving the whole extend of simulation instead of just a timestamped image

W. T. Sołowski, M. Berzins, W. Coombs, J. Guilkey, M. Möller, Q. A. Tran, T. Adibaskoro, S. Seyedan, R. Tielen, K. Soga. “Material point method: Overview and challenges ahead (without videos),” In Advances in Applied Mechanics, 1, Vol. 14, Ch. 2, Elsevier, pp. 113-204. 2021.

ABSTRACT

P. Subedi, P.E .Davis, M. Parashar. “RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows,” In 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 146--156. 2021.

ABSTRACT

While in-situ workflow formulations have addressed some of the data-related challenges associated with extreme-scale scientific workflows, these workflows involve complex interactions and different modes of data exchange. In the context of increasing system complexity, such workflows present significant resource management challenges, requiring complex cost-performance tradeoffs. This paper presents RISE, an intelligent staging-based data management middleware, which builds on the DataSpaces framework and performs intelligent scheduling of data management operations to reduce I/O contention. In RISE, data are always written immediately to local buffers to reduce the effect of the transfer impact upon application performance. RISE identifies applications’ data access patterns and moves data towards data consumers only when the network is expected to be idle, reducing the impact of asynchronous …

E. Suchyta, S. Klasky, N. Podhorszki, M. Wolf, A. Adesoji, C.S. Chang, J. Choi, P. E. Davis, J. Dominski, S. Ethier, I. Foster, K. Germaschewski, B. Geveci, C. Harris, K. A. Huck, Q. Liu, J. Logan, K. Mehta, G. Merlo, S. V. Moore, T. Munson, M. Parashar, D. Pugmire, M. S. Shephard, C. W. Smith, P. Subedi, L. Wan, R. Wang, S. Zhang. “The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science,” In The International Journal of High Performance Computing Applications, SAGE Publications, pp. 10943420211019119. 2021.

ABSTRACT

We present the Exascale Framework for High Fidelity coupled Simulations (EFFIS), a workflow and code coupling framework developed as part of the Whole Device Modeling Application (WDMApp) in the Exascale Computing Project.EFFIS consists of a library, command line utilities, and a collection of run-time daemons. Together, these software products enable users to easily compose and execute workflows that include: strong or weak coupling, in situ (or offline)analysis/visualization/monitoring, command-and-control actions, remote dashboard integration, and more. We describe WDMApp physics coupling cases and computer science requirements that motivate the design of the EFFIS framework. Furthermore, we explain the essential enabling technology that EFFIS leverages: ADIOS for performant data movement, PerfStubs/TAU for performance monitoring, and an advanced COUPLER for transforming coupling data from its native format to the representation needed by another application. Finally, we demonstrate EFFIS using coupled multi-simulation WDMApp workflows and exemplify how the framework supports the project’s needs. We show that EFFIS and its associated services for data movement, visualization, and performance collection does not introduce appreciable overhead to the WDMApp workflow and that the resource-dominant application’s idle time while waiting for data is minimal.

T. Sun, D. Li, B. Wang. “Decentralized Federated Averaging,” Subtitled “arXiv preprint arXiv:2104.11375,” 2021.

ABSTRACT

Federated averaging (FedAvg) is a communication efficient algorithm for the distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, which requires massive communication between server and clients in each communication. Moreover, attacking the central server can break the whole system's privacy. In this paper, we study the decentralized FedAvg with momentum (DFedAvgM), which is implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved when the loss function satisfies the P\L property. Finally, we numerically verify the efficacy of DFedAvgM.

T. Sun, D. Li, B. Wang. “Stability and Generalization of the Decentralized Stochastic Gradient Descent,” Subtitled “arXiv preprint arXiv:2102.01302,” 2021.

ABSTRACT

The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non) convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.

W. Tao, R. Bhalodia, R. Whitaker. “A Gaussian Process Model for Unsupervised Analysis of High Dimensional Shape Data,” In Machine Learning in Medical Imaging, Springer International Publishing, pp. 356--365. 2021.
DOI: 10.1007/978-3-030-87589-3_37

ABSTRACT

Applications of medical image analysis are often faced with the challenge of modelling high-dimensional data with relatively few samples. In many settings, normal or healthy samples are prevalent while pathological samples are rarer, highly diverse, and/or difficult to model. In such cases, a robust model of the normal population in the high-dimensional space can be useful for characterizing pathologies. In this context, there is utility in hybrid models, such as probabilistic PCA, which learns a low-dimensional model, commensurates with the available data, and combines it with a generic, isotropic noise model for the remaining dimensions. However, the isotropic noise model ignores the inherent correlations that are evident in so many high-dimensional data sets associated with images and shapes in medicine. This paper describes a method for estimating a Gaussian model for collections of images or shapes that exhibit underlying correlations, e.g., in the form of smoothness. The proposed method incorporates a Gaussian-process noise model within a generative formulation. For optimization, we derive a novel expectation maximization (EM) algorithm. We demonstrate the efficacy of the method on synthetic examples and on anatomical shape data.

Page 14 of 137

Start
Prev
9
10
11
12
13
14
15
16
17
18
Next
End

SCI