A. Nouri, P.E. Davis, P. Subedi, M. Parashar. Scalable Graph Embedding LearningOn A Single GPU, Subtitled arXiv preprint arXiv:2110.06991, 2021.
Graph embedding techniques have attracted growing interest since they convert the graph data into continuous and low-dimensional space. Effective graph analytic provides users a deeper understanding of what is behind the data and thus can benefit a variety of machine learning tasks. With the current scale of real-world applications, most graph analytic methods suffer high computation and space costs. These methods and systems can process a network with thousands to a few million nodes. However, scaling to large-scale networks remains a challenge. The complexity of training graph embedding system requires the use of existing accelerators such as GPU. In this paper, we introduce a hybrid CPU-GPU framework that addresses the challenges of learning embedding of large-scale graphs. The performance of our method is compared qualitatively and quantitatively with the existing embedding systems on common benchmarks. We also show that our system can scale training to datasets with an order of magnitude greater than a single machine's total memory capacity. The effectiveness of the learned embedding is evaluated within multiple downstream applications. The experimental results indicate the effectiveness of the learned embedding in terms of performance and accuracy.
A. Nouri, P.E. Davis, P. Subedi, M. Parashar. Exploring the Role of Machine Learning in Scientific Workflows: Opportunities and Challenges, Subtitled arXiv preprint arXiv:2110.13999, 2021.
In this survey, we discuss the challenges of executing scientific workflows as well as existing Machine Learning (ML) techniques to alleviate those challenges. We provide the context and motivation for applying ML to each step of the execution of these workflows. Furthermore, we provide recommendations on how to extend ML techniques to unresolved challenges in the execution of scientific workflows. Moreover, we discuss the possibility of using ML techniques for in-situ operations. We explore the challenges of in-situ workflows and provide suggestions for improving the performance of their execution using ML techniques.
M. Penwarden, S. Zhe, A. Narayan, R. M. Kirby. Multifidelity Modeling for Physics-Informed Neural Networks (PINNs), Subtitled arXiv preprint arXiv:2106.13361, 2021.
Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature.
M. Penwarden, S. Zhe, A. Narayan, R. M. Kirby. Physics-Informed Neural Networks (PINNs) for Parameterized PDEs: A Metalearning Approach, Subtitled arXiv preprint arXiv:2110.13361, 2021.
Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs for parameterized PDEs. Following the ML world, we introduce metalearning of PINNs for parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs. We provide theoretically motivated and empirically backed assumptions that make our metalearning approach possible. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.
R. Pulch, A. Narayan, T. Stykel. Sensitivity analysis of random linear differential–algebraic equations using system norms, In Journal of Computational and Applied Mathematics, North-Holland, pp. 113666. 2021.
We consider linear dynamical systems composed of differential–algebraic equations (DAEs), where a quantity of interest (QoI) is assigned as output. Physical parameters of a system are modelled as random variables to quantify uncertainty, and we investigate a variance-based sensitivity analysis of the random QoI. Based on expansions via generalised polynomial chaos, the stochastic Galerkin method yields a new deterministic system of DAEs of high dimension. We define sensitivity measures by system norms, ie, the H∞-norm of the transfer function associated with the Galerkin system for different combinations of outputs. To ameliorate the enormous computational effort required to compute norms of high-dimensional systems, we apply balanced truncation, a particular method of model order reduction (MOR), to obtain a low-dimensional linear dynamical system that produces approximations of system norms …
E. Qian, J.M. Tabeart, C. Beattie, S. Gugercin, J. Jiang, P. Kramer, A. Narayan. Model Reduction of Linear Dynamical Systems via Balancing for Bayesian Inference, Subtitled arXiv preprint arXiv:2111.13246, 2021.
We consider the Bayesian approach to the linear Gaussian inference problem of inferring the initial condition of a linear dynamical system from noisy output measurements taken after the initial time. In practical applications, the large dimension of the dynamical system state poses a computational obstacle to computing the exact posterior distribution. Model reduction offers a variety of computational tools that seek to reduce this computational burden. In particular, balanced truncation is a system-theoretic approach to model reduction which obtains an efficient reduced-dimension dynamical system by projecting the system operators onto state directions which trade off the reachability and observability of state directions as expressed through the associated Gramians. We introduce Gramian definitions relevant to the inference setting and propose a balanced truncation approach based on these inference Gramians that yield a reduced dynamical system that can be used to cheaply approximate the posterior mean and covariance. Our definitions exploit natural connections between (i) the reachability Gramian and the prior covariance and (ii) the observability Gramian and the Fisher information. The resulting reduced model then inherits stability properties and error bounds from system theoretic considerations, and in some settings yields an optimal posterior covariance approximation. Numerical demonstrations on two benchmark problems in model reduction show that our method can yield near-optimal posterior covariance approximations with order-of-magnitude state dimension reduction.
Y. Qin, A. Narayan, K. Cheng, P. Wang. An efficient method of calculating composition-dependent inter-diffusion coefficients based on compressed sensing method, In Computational Materials Science, Vol. 188, Elsevier, pp. 110145. 2021.
Composition-dependent inter-diffusion coefficients are key parameters in many physical processes. Due to the under-determinedness of the governing diffusion equations, numerical methods either impose strict physical conditions on the samples or require a computationally onerous amount of data. To address such problems, we propose a novel inverse framework to recover the diffusion coefficients using a compressed sensing method, which in principle can be extended to alloy systems with arbitrary number of species. Comparing to conventional methods, the new approach does not impose any priori assumptions on the functional relationship between diffusion coefficients and concentrations, nor any preference on the locations of the samples, as long as it is in the diffused zone. It also requires much less data compared to least-squares approaches. Through a few numerical examples of ternary and quandary systems, we demonstrate the accuracy and robustness of the new method.
With the growing number and increasing availability of shared-use instruments and observatories, observational data is becoming an essential part of application workflows and contributor to scientific discoveries in a range of disciplines. However, the corresponding growth in the number of users accessing these facilities coupled with the expansion in the scale and variety of the data, is making it challenging for these facilities to ensure their data can be accessed, integrated, and analyzed in a timely manner, and is resulting significant demands on their cyberinfrastructure (CI). In this paper, we present the design of a push-based data delivery framework that leverages emerging in-network capabilities, along with data pre-fetching techniques based on a hybrid data management model. Specifically, we analyze data access traces for two large-scale observatories, Ocean Observatories Initiative (OOI) and Geodetic Facility for the Advancement of Geoscience (GAGE), to identify typical user access patterns and to develop a model that can be used for data pre-fetching. Furthermore, we evaluate our data pre-fetching model and the proposed framework using a simulation of the Virtual Data Collaboratory (VDC) platform that provides in-network data staging and processing capabilities. The results demonstrate that the ability of the framework to significantly improve data delivery performance and reduce network traffic at the observatories’ facilities.
Large-scale multiuser scientific facilities, such as geographically distributed observatories, remote instruments, and experimental platforms, represent some of the largest national investments and can enable dramatic advances across many areas of science. Recent examples of such advances include the detection of gravitational waves and the imaging of a black hole’s event horizon. However, as the number of such facilities and their users grow, along with the complexity, diversity, and volumes of their data products, finding and accessing relevant data is becoming increasingly challenging, limiting the potential impact of facilities. These challenges are further amplified as scientists and application workflows increasingly try to integrate facilities’ data from diverse domains. In this paper, we leverage concepts underlying recommender systems, which are extremely effective in e-commerce, to address these data-discovery and data-access challenges for large-scale distributed scientific facilities. We first analyze data from facilities and identify and model user-query patterns in terms of facility location and spatial localities, domain-specific data models, and user associations. We then use this analysis to generate a knowledge graph and develop the collaborative knowledge-aware graph attention network (CKAT) recommendation model, which leverages graph neural networks (GNNs) to explicitly encode the collaborative signals through propagation and combine them with knowledge associations. Moreover, we integrate a knowledge-aware neural attention mechanism to enable the CKAT to pay more attention to key information while reducing irrelevant noise, thereby increasing the accuracy of the recommendations. We apply the proposed model on two real-world facility datasets and empirically demonstrate that the CKAT can effectively facilitate data discovery, significantly outperforming several compelling state-of-the-art baseline models.
Y. Qin, I. Rodero, M. Parashar. Toward Democratizing Access to Facilities Data: A Framework for Intelligent Data Discovery and Delivery, Subtitled arXiv:2112.06479, 2021.
Data collected by large-scale instruments, observatories, and sensor networks are key enablers of scientific discoveries in many disciplines. However, ensuring that these data can be accessed, integrated, and analyzed in a democratized and timely manner remains a challenge. In this article, we explore how state-of-the-art techniques for data discovery and access can be adapted to facility data and develop a conceptual framework for intelligent data access and discovery.
A.S. Rababah, L.R. Bear, Y.S. Dogrusoz, W. Good, J. Bergquist, J. Stoks, R. MacLeod, K. Rjoob, M. Jennings, J. Mclaughlin, D. D. Finlay. Reducing Line-of-block Artifacts in Cardiac Activation Maps Estimated Using ECG Imaging: A Comparison of Source Models and Estimation Methods, In Computers in Biology and Medicine, Vol. 136, pp. 104666. 2021.
Electrocardiographic imaging is an imaging modality that has been introduced recently to help in visualizing the electrical activity of the heart and consequently guide the ablation therapy for ventricular arrhythmias. One of the main challenges of this modality is that the electrocardiographic signals recorded at the torso surface are contaminated with noise from different sources. Low amplitude leads are more affected by noise due to their low peak-to-peak amplitude. In this paper, we have studied 6 datasets from two torso tank experiments (Bordeaux and Utah experiments) to investigate the impact of removing or interpolating these low amplitude leads on the inverse reconstruction of cardiac electrical activity. Body surface potential maps used were calculated by using the full set of recorded leads, removing 1, 6, 11, 16, or 21 low amplitude leads, or interpolating 1, 6, 11, 16, or 21 low amplitude leads using one of the three interpolation methods – Laplacian interpolation, hybrid interpolation, or the inverse-forward interpolation. The epicardial potential maps and activation time maps were computed from these body surface potential maps and compared with those recorded directly from the heart surface in the torso tank experiments. There was no significant change in the potential maps and activation time maps after the removal of up to 11 low amplitude leads. Laplacian interpolation and hybrid interpolation improved the inverse reconstruction in some datasets and worsened it in the rest. The inverse forward interpolation of low amplitude leads improved it in two out of 6 datasets and at least remained the same in the other datasets. It was noticed that after doing the inverse-forward interpolation, the selected lambda value was closer to the optimum lambda value that gives the inverse solution best correlated with the recorded one.
Detection and segmentation in microscopy images, In Computer Vision for Microscopy Image Analysis, Academic Press, pp. 43-71. 2021.
The plethora of heterogeneous data generated using modern microscopy imaging techniques eliminates the possibility of manual image analysis for biologists. Consequently, reliable and robust computerized techniques are critical to analyze microscopy data. Detection problems in microscopy images focuses on accurately identifying the objects of interest in an image that can be used to investigate hypotheses about developmental or pathological processes and can be indicative of prognosis in patients. Detection is also considered to be the preliminary step for solving subsequent problems, such as segmentation and tracking for various biological applications. Segmentation of the desired structures and regions in microscopy images require pixel-level labels to uniquely identify the individual structures and regions with contours for morphological and physiological analysis. Distributions of features extracted from the segmented regions can be used to compare normal versus disease or normal versus wild-type populations. Segmentation can be considered as a precursor for solving classification, reconstruction, and tracking problems in microscopy images. In this chapter, we discuss how the field of microscopic image analysis has progressed over the years, starting with traditional approaches and then followed by the study of learning algorithms. Because there is a lot of variability in microscopy data, it is essential to study learning algorithms that can adapt to these changes. We focus on deep learning approaches with convolutional neural networks (CNNs), as well as hierarchical methods for segmentation and detection in optical and electron microscopy images. Limitation of training data is one of the significant problems; hence, we explore solutions to learn better models with minimal user annotations.
M. Rasouli, R. M. Kirby, H. Sundar. A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication, In The International Conference on High Performance Computing in Asia-Pacific Region, pp. 110-119. 2021.
Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.
A. Rathore, N. Chalapathi, S. Palande, Bei Wang. TopoAct: Visually Exploring the Shape of Activations in Deep Learning, In Computer Graphics Forum, Vol. 40, No. 1, pp. 382-397. 2021.
Deep neural networks such as GoogLeNet, ResNet, and BERT have achieved impressive performance in tasks such as image and text classification. To understand how such performance is achieved, we probe a trained deep neural network by studying neuron activations, i.e., combinations of neuron firings, at various layers of the network in response to a particular input. With a large number of inputs, we aim to obtain a global view of what neurons detect by studying their activations. In particular, we develop visualizations that show the shape of the activation space, the organizational principle behind neuron activations, and the relationships of these activations within a layer. Applying tools from topological data analysis, we present TopoAct, a visual exploration system to study topological summaries of activation vectors. We present exploration scenarios using TopoAct that provide valuable insights into learned representations of neural networks. We expect TopoAct to give a topological perspective that enriches the current toolbox of neural network analysis, and to provide a basis for network architecture diagnosis and data anomaly detection.
Many biological tissues contain an underlying fibrous microstructure that is optimized to suit a physiological function. The fiber architecture dictates physical characteristics such as stiffness, diffusivity, and electrical conduction. Abnormal deviations of fiber architecture are often associated with disease. Thus, it is useful to characterize fiber network organization from image data in order to better understand pathological mechanisms. We devised a method to quantify distributions of fiber orientations based on the Fourier transform and the Qball algorithm from diffusion MRI. The Fourier transform was used to decompose images into directional components, while the Qball algorithm efficiently converted the directional data from the frequency domain to the orientation domain. The representation in the orientation domain does not require any particular functional representation, and thus the method is nonparametric. The algorithm was verified to demonstrate its reliability and used on datasets from microscopy to show its applicability. This method increases the ability to extract information of microstructural fiber organization from experimental data that will enhance our understanding of structure-function relationships and enable accurate representation of material anisotropy in biological tissues.
Kernel optimization for Low-Rank Multi-Fidelity Algorithms, In International Journal for Uncertainty Quantification, Begel House Inc., pp. 31-54. 2021.M. Razi, M. Kirby, A. Narayan.
One of the major challenges for low-rank multi-fidelity (MF) approaches is the assumption that low-fidelity (LF) and high-fidelity (HF) models admit``similar''low-rank kernel representations. Low-rank MF methods have traditionally attempted to exploit low-rank representations of\emph linear kernels. However, such linear kernels may not be able to capture low-rank behavior, and they may admit LF and HF kernels that are not similar. Such a situation renders a naive approach to low-rank MF procedures ineffective. In this paper, we propose a novel approach for the selection of a near-optimal kernel function for use in low-rank MF methods. The proposed framework is a two-step strategy wherein:(1) hyperparameters of a library of kernel functions are optimized, and (2) a particular combination of of the optimized kernels is selected, through either a convex mixture (Additive Kernel Approach) or through a data-driven …
In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library.
Damodar Sahasrabudhe. Enhancing Asynchronous Many-Task Runtime Systems for Next-Generation Architectures and Exascale Supercomputers, School of Computing, University of Utah, Salt Lake City, UT, USA, 2021.
Exascale supercomputers capable of computing 1018 double-precision floating point operations per second are expected to be operational around 2022/23. The complexity and diversity of the proposed exascale machines pose new challenges for the software applications, namely, 1) implementing efficient data management; 2) having programming systems to exploit locality and multimillion parallelism; 3) developing efficient algorithms to leverage new architectures; 4) ensuring resiliency; and 5) improving scientific productivity on diverse architectures. Due to data-driven scheduling and asynchronous execution, Asynchronous Many-Task (AMT) runtime systems show promise to handle these exascale challenges.
One such AMT, the Uintah Computational Framework, maintains two distinct layers for the application and underlying runtime infrastructure. This distinction allows Uintah users to concentrate on application and the Uintah infrastructure handles communication, data coherency, multithreading, and architecture-specific complexities.
This dissertation addresses some of the exascale challenges and also integrates the individual solutions under the single umbrella of Uintah. The resiliency approach handles node failure faster than the traditional checkpointing method and helps to address challenge (4). A potential solution for challenges (2) and (3) can be the new asynchronous scheduler designed for the Sunway Taihulight supercomputer that shows the benefits of asynchronous execution. The novel portable Single Instruction Multiple Data (SIMD) primitive provides a prospective approach to handle (2) and (5), which achieves near-ideal vectorization on Central Processing Units (CPUs) along with Graphics Processing Unit (GPU) portability provided by the CUDA back end. The newly developed threading model using MPI endpoints shows performance improvements over the MPI-everywhere version, which can be one of the solutions to tackle challenges (2) and (3). Finally, this work enhances the heterogeneous scheduler, contributes to the ongoing portability drive, and successfully runs a simulation using portable AMT tasks on thousands of CPUs and GPUs. These enhancements are important to answer challenges (2), (3), and (5). As a result, this research takes Uintah closer to exascale readiness. Using Uintah as an example, this work demonstrates how AMTs, third-party libraries, and applications can be enhanced to benefit from the next-generation architectures.
Salinet et al. Electrocardiographic Imaging for Atrial Fibrillation treatment guidance (for example, localization of AF triggers and sustaining mechanisms), and we discuss the technological requirements and validation. We address experimental and clinical results, limitations, and future challenges for fruitful application of ECGI for AF understanding and management. We pay attention to existing techniques and clinical application, to computer models and (animal or human) experiments, to challenges of methodological and clinical validation. The overall objective of the study is to provide a consensus on valuable directions that ECGI research may take to provide future improvements in AF characterization and treatment guidance.
Prestin Generates Instantaneous Force in Outer Hair Cell Membranes, In Biophysical Journal, Vol. 120, No. 3, 2021.J. Sandhu, T. Bidone, R. D. Rabbitt.
Hearing occurs from sound reaching the inner ear cochlea, where electromotile Outer Hair Cells (OHCs) amplify vibrations by elongating and contracting rapidly in response to auditory frequency changes in membrane potential. OHCs can generate force cycle-by-cycle at frequencies exceeding 50kHz, but precisely how this is achieved is unclear. Electromotility requires expression of the transmembrane protein, prestin, which facilitates the electromechanical conversion through action of the Coulomb force acting on the anion Cl- bound at the core of the protein. However, recent experimental data suggests the charge displacement is too slow to support sound amplification at auditory frequencies. As a consequence, prestin electromechanics remain unclear at the molecular level. We hypothesize that prestin instantaneously transmits stress to the membrane, which subsequently drives charge displacement, membrane deformation, and OHC shape changes. To test the hypothesis, we examined the conformational dynamics of prestin and its effects on the motion of lipids under: (1) isometric conditions and (2) constant force conditions in order to mimic different regimes of membrane loading. All-atom molecular dynamics simulations of the prestin dimer embedded in POPC membranes were run and the trajectories analyzed. We discovered that under isometric conditions, the presence of a chloride ion in the electric field increased residue fluctuations. This trend was not observed under constant force conditions, supporting the idea that isometric conditions cause instantaneous force to be generated in the membrane. The analysis allowed us to identify the molecular mechanisms by which prestin allows electromechanical amplification by OHCs in the cochlea.