Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization|
Subtitled arXiv:2205.03681, V. Keshavarzzadeh, R.M. Kirby, A. Narayan. 2022.
Inverse problems and, in particular, inferring unknown or latent parameters from data are ubiquitous in engineering simulations. A predominant viewpoint in identifying unknown parameters is Bayesian inference where both prior information about the parameters and the information from the observations via likelihood evaluations are incorporated into the inference process. In this paper, we adopt a similar viewpoint with a slightly different numerical procedure from standard inference approaches to provide insight about the localized behavior of unknown underlying parameters. We present a variational inference approach which mainly incorporates the observation data in a point-wise manner, i.e. we invert a limited number of observation data leveraging the gradient information of the forward map with respect to parameters, and find true individual samples of the latent parameters when the forward map is noise-free and one-to-one. For statistical calculations (as the ultimate goal in simulations), a large number of samples are generated from a trained neural network which serves as a transport map from the prior to posterior latent parameters. Our neural network machinery, developed as part of the inference framework and referred to as Neural Net Kernels (NNK), is based on hierarchical (deep) kernels which provide greater flexibility for training compared to standard neural networks. We showcase the effectiveness of our inference procedure in identifying bimodal and irregular distributions compared to a number of approaches including Markov Chain Monte Carlo sampling approaches and a Bayesian neural network approach.
Advancing Reproducibility in Parallel and Distributed Systems Research|
M. Parashar. In Computer, Vol. 55, No. 5, pp. 4--5. 2022.
This installment of Computer’s series highlighting the work published in IEEE Computer Society journals comes from IEEE Transactions on Parallel and Distributed Systems.
Porting Uintah to Heterogeneous Systems|
J.K. Holmen, D. Sahasrabudhe, M. Berzins. In Proceedings of the Platform for Advanced Scientific Computing Conference (PASC22), ACM, 2022.
The Uintah Computational Framework is being prepared to make portable use of forthcoming exascale systems, initially the DOE Aurora system through the Aurora Early Science Program. This paper describes the evolution of Uintah to be ready for such architectures. A key part of this preparation has been the adoption of the Kokkos performance portability layer in Uintah. The sheer size of the Uintah codebase has made it imperative to have a representative benchmark. The design of this benchmark and the use of Kokkos within it is discussed. This paper complements recent work with additional details and new scaling studies run 24x further than earlier studies. Results are shown for two benchmarks executing workloads representative of typical Uintah applications. These results demonstrate single-source portability across the DOE Summit and NSF Frontera systems with good strong-scaling characteristics. The challenge of extending this approach to anticipated exascale systems is also considered.
Integrating atomistic simulations and machine learning to design multi-principal element alloys with superior elastic modulus|
M. Grant, M. R. Kunz, K. Iyer, L. I. Held, T. Tasdizen, J. A. Aguiar, P. P. Dholabhai. In Journal of Materials Research, Springer International Publishing, pp. 1--16. 2022.
Multi-principal element, high entropy alloys (HEAs) are an emerging class of materials that have found applications across the board. Owing to the multitude of possible candidate alloys, exploration and compositional design of HEAs for targeted applications is challenging since it necessitates a rational approach to identify compositions exhibiting enriched performance. Here, we report an innovative framework that integrates molecular dynamics and machine learning to explore a large chemical-configurational space for evaluating elastic modulus of equiatomic and non-equiatomic HEAs along primary crystallographic directions. Vital thermodynamic properties and machine learning features have been incorporated to establish fundamental relationships correlating Young’s modulus with Gibbs free energy, valence electron concentration, and atomic size difference. In HEAs, as the number of elements increases …
Organizing Large Data Sets for Efficient Analyses on HPC Systems|
J. Gu, P. Davis, G. Eisenhauer, W. Godoy, A. Huebl, S. Klasky, M. Parashar, N. Podhorszki, F. Poeschel, J. Vay, L. Wan, R. Wang, K. Wu. In Journal of Physics: Conference Series, Vol. 2224, No. 1, IOP Publishing, pp. 012042. 2022.
Upcoming exascale applications could introduce significant data management challenges due to their large sizes, dynamic work distribution, and involvement of accelerators such as graphical processing units, GPUs. In this work, we explore the performance of reading and writing operations involving one such scientific application on two different supercomputers. Our tests showed that the Adaptable Input and Output System, ADIOS, was able to achieve speeds over 1TB/s, a significant fraction of the peak I/O performance on Summit. We also demonstrated the querying functionality in ADIOS could effectively support common selective data analysis operations, such as conditional histograms. In tests, this query mechanism was able to reduce the execution time by a factor of five. More importantly, ADIOS data management framework allows us to achieve these performance improvements with only a minimal amount …
Proximal Implicit ODE Solvers for Accelerating Learning Neural ODEs|
Subtitled arXiv preprint arXiv:2204.08621, J. Baker, H. Xia, Y. Wang, E. Cherkaev, A. Narayan, L. Chen, J. Xin, A. L. Bertozzi, S. J. Osher, B. Wang. 2022.
Learning neural ODEs often requires solving very stiff ODE systems, primarily using explicit adaptive step size ODE solvers. These solvers are computationally expensive, requiring the use of tiny step sizes for numerical stability and accuracy guarantees. This paper considers learning neural ODEs using implicit ODE solvers of different orders leveraging proximal operators. The proximal implicit solver consists of inner-outer iterations: the inner iterations approximate each implicit update step using a fast optimization algorithm, and the outer iterations solve the ODE system over time. The proximal implicit ODE solver guarantees superiority over explicit solvers in numerical stability and computational efficiency. We validate the advantages of proximal implicit solvers over existing popular neural ODE solvers on various challenging benchmark tasks, including learning continuous-depth graph neural networks and continuous normalizing flows.
ENO-Based High-Order Data-Bounded and Constrained Positivity-Preserving Interpolation|
Subtitled https://arxiv.org/abs/2204.06168, T.A.J. Ouermi, R.M. Kirby, M. Berzins. 2022.
A number of key scientific computing applications that are based upon tensor-product grid constructions, such as numerical weather prediction (NWP) and combustion simulations, require property-preserving interpolation. Essentially Non-Oscillatory (ENO) interpolation is a classic example of such interpolation schemes. In the aforementioned application areas, property preservation often manifests itself as a requirement for either data boundedness or positivity preservation. For example, in NWP, one may have to interpolate between the grid on which the dynamics is calculated to a grid on which the physics is calculated (and back). Interpolating density or other key physical quantities without accounting for property preservation may lead to negative values that are nonphysical and result in inaccurate representations and/or interpretations of the physical data. Property-preserving interpolation is straightforward when used in the context of low-order numerical simulation methods. High-order property-preserving interpolation is, however, nontrivial, especially in the case where the interpolation points are not equispaced. In this paper, we demonstrate that it is possible to construct high-order interpolation methods that ensure either data boundedness or constrained positivity preservation. A novel feature of the algorithm is that the positivity-preserving interpolant is constrained; that is, the amount by which it exceeds the data values may be strictly controlled. The algorithm we have developed comes with theoretical estimates that provide sufficient conditions for data boundedness and constrained positivity preservation. We demonstrate the application of our algorithm on a collection of 1D and 2D numerical examples, and show that in all cases property preservation is respected.
Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures|
Subtitled arXiv:2204.04273, J.D. Hogue, R.M. Kirby, A. Narayan. 2022.
Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network.
Portable, Scalable Approaches For Improving Asynchronous Many-Task Runtime Node Use|
John Holmen. School of Computing, University of Utah, 2022.
This research addresses node-level scalability, portability, and heterogeneous computing challenges facing asynchronous many-task (AMT) runtime systems. These challenges have arisen due to increasing socket/core/thread counts and diversity among supported architectures on current and emerging high-performance computing (HPC) systems. This places greater emphasis on thread scalability and simultaneous use of diverse architectures to maximize node use and is complicated by architecture-specific programming models.
To reduce the exposure of application developers to such challenges, AMT programming models have emerged to offer a runtime-based solution. These models overdecompose a problem into many fine-grained tasks to be scheduled and executed by an underlying runtime to improve node-level concurrency. However, task execution granularity challenges remain, and it is unclear where and how shared memory programming models should be used within an AMT model to improve node use. This research aims to ease these design decisions with consideration for performance portability layers (PPLs), which provide a single interface to multiple shared memory programming models.
The contribution of this research is the design of a task scheduling approach for portably improving node use when extending AMT runtime systems to many-core and heterogeneous HPC systems with shared memory programming models. The success of this approach is shown through the portable adoption of a performance portability layer, Kokkos, within Uintah, a representative AMT runtime system. The resulting task scheduler enables the scheduling and execution of portable, fine-grained tasks across processors and accelerators simultaneously with flexible control over task execution granularity. A collection of experiments on current many-core and heterogeneous HPC systems are used to validate this approach and inform design recommendations. Among resulting recommendations are approaches for easing the adoption of a heterogeneous MPI+PPL task scheduling approach in an asynchronous many-task runtime system and furthermore to ease indirect adoption of a performance portability layer in large legacy codebases.
Convex Optimization-Based Structure-Preserving Filter For Multidimensional Finite Element Simulations|
Subtitled arXiv preprint arXiv:2203.09748, V. Zala, A. Narayan, R.M. Kirby. 2022.
In simulation sciences, it is desirable to capture the real-world problem features as accurately as possible. Methods popular for scientific simulations such as the finite element method (FEM) and finite volume method (FVM) use piecewise polynomials to approximate various characteristics of a problem, such as the concentration profile and the temperature distribution across the domain. Polynomials are prone to creating artifacts such as Gibbs oscillations while capturing a complex profile. An efficient and accurate approach must be applied to deal with such inconsistencies in order to obtain accurate simulations. This often entails dealing with negative values for the concentration of chemicals, exceeding a percentage value over 100, and other such problems. We consider these inconsistencies in the context of partial differential equations (PDEs). We propose an innovative filter based on convex optimization to deal with the inconsistencies observed in polynomial-based simulations. In two or three spatial dimensions, additional complexities are involved in solving the problems related to structure preservation. We present the construction and application of a structure-preserving filter with a focus on multidimensional PDEs. Methods used such as the Barycentric interpolation for polynomial evaluation at arbitrary points in the domain and an optimized root-finder to identify points of interest improve the filter efficiency, usability, and robustness. Lastly, we present numerical experiments in 2D and 3D using discontinuous Galerkin formulation and demonstrate the filter's efficacy to preserve the desired structure. As a real-world application …
Reinventing High Performance Computing: Challenges and Opportunities|
Subtitled UUSCI-2022-001, D. Reed, D. Gannon, J. Dongarra. University of Utah, 2022.
The world of computing is in rapid transition, now dominated by a world of smartphones and cloud services, with profound implications for the future of advanced scientific computing. Simply put, high-performance computing (HPC) is at an important inflection point. For the last 60 years, the world's fastest supercomputers were almost exclusively produced in the United States on behalf of scientific research in the national laboratories. Change is now in the wind. While costs now stretch the limits of U.S. government funding for advanced computing, Japan and China are now leaders in the bespoke HPC systems funded by government mandates. Meanwhile, the global semiconductor shortage and political battles surrounding fabrication facilities affect everyone. However, another, perhaps even deeper, fundamental change has occurred. The major cloud vendors have invested in global networks of massive scale systems that dwarf today's HPC systems. Driven by the computing demands of AI, these cloud systems are increasingly built using custom semiconductors, reducing the financial leverage of traditional computing vendors. These cloud systems are now breaking barriers in game playing and computer vision, reshaping how we think about the nature of scientific computation. Building the next generation of leading edge HPC systems will require rethinking many fundamentals and historical approaches by embracing end-to-end co-design; custom hardware configurations and packaging; large-scale prototyping, as was common thirty years ago; and collaborative partnerships with the dominant computing ecosystem companies, smartphone, and cloud computing vendors.
Learning POD of Complex Dynamics Using Heavy-ball Neural ODEs|
Subtitled arXiv:2202.12373, J. Baker, E. Cherkaev, A. Narayan, B. Wang. 2022.
Proper orthogonal decomposition (POD) allows reduced-order modeling of complex dynamical systems at a substantial level, while maintaining a high degree of accuracy in modeling the underlying dynamical systems. Advances in machine learning algorithms enable learning POD-based dynamics from data and making accurate and fast predictions of dynamical systems. In this paper, we leverage the recently proposed heavy-ball neural ODEs (HBNODEs) [Xia et al. NeurIPS, 2021] for learning data-driven reduced-order models (ROMs) in the POD context, in particular, for learning dynamics of time-varying coefficients generated by the POD analysis on training snapshots generated from solving full order models. HBNODE enjoys several practical advantages for learning POD-based ROMs with theoretical guarantees, including 1) HBNODE can learn long-term dependencies effectively from sequential observations and 2) HBNODE is computationally efficient in both training and testing. We compare HBNODE with other popular ROMs on several complex dynamical systems, including the von Kármán Street flow, the Kurganov-Petrova-Popov equation, and the one-dimensional Euler equations for fluids modeling.
Computational Error Estimation for The Material Point Method|
M. Berzins. 2022.
A common feature of many methods in computational mechanics is that there is often a way of estimating the error in the computed solution. The situation for computational mechanics codes based upon the Material Point Method is very different in that there has been comparatively little work on computable error estimates for these methods. This work is concerned with introducing such an approach for the Material Point Method. Although it has been observed that spatial errors may dominate temporal ones at stable time steps, recent work has made more precise the sources and forms of the different MPM errors. There is then a need to estimate these errors computationally through computable estimates of the different errors in the material point method. Estimates of the different spatial errors in the Material Point Method are constructed based upon nodal derivatives of the different physical variables in MPM. These derivatives are then estimated using standard difference approximations calculated on the background mesh. The use of these estimates of the spatial error makes it possible to measure the growth of errors over time. A number of computational experiments are used to illustrate the performance of the computed error estimates. As the key feature of the approach is the calculation of derivatives on the regularly spaced background mesh, the extension to calculating derivatives and hence to error estimates for higher dimensional problems is clearly possible.
A Stieltjes algorithm for generating multivariate orthogonal polynomials|
Subtitled arXiv preprint arXiv:2202.04843, Z. Liu, A. Narayan. 2022.
Orthogonal polynomials of several variables have a vector-valued three-term recurrence relation, much like the corresponding one-dimensional relation. This relation requires only knowledge of certain recurrence matrices, and allows simple and stable evaluation of multivariate orthogonal polynomials. In the univariate case, various algorithms can evaluate the recurrence coefficients given the ability to compute polynomial moments, but such a procedure is absent in multiple dimensions. We present a new Multivariate Stieltjes (MS) algorithm that fills this gap in the multivariate case, allowing computation of recurrence matrices assuming moments are available. The algorithm is essentially explicit in two and three dimensions, but requires the numerical solution to a non-convex problem in more than three dimensions. Compared to direct Gram-Schmidt-type orthogonalization, we demonstrate on several examples in up to three dimensions that the MS algorithm is far more stable, and allows accurate computation of orthogonal bases in the multivariate setting, in contrast to direct orthogonalization approaches.
Energy conservation and accuracy of some MPM formulations|
M. Berzins. In Computational Particle Mechanics, 2022.
The success of the Material Point Method (MPM) in solving many challenging problems nevertheless raises some open questions regarding the fundamental properties of the method such as time integration accuracy and energy conservation. The traditional MPM time integration methods are often based upon the symplectic Euler method or staggered central differences. This raises the question of how to best ensure energy conservation in explicit time integration for MPM. Two approaches are used here, one is to extend the Symplectic Euler method (Cromer Euler) to provide better energy conservation and the second is to use a potentially more accurate symplectic methods, namely the widely-used Stormer-Verlet Method. The Stormer-Verlet method is shown to have locally third order time accuracy of energy conservation in time, in contrast to the second order accuracy in energy conservation of the symplectic Euler methods that are used in many MPM calculations. It is shown that there is an extension to the Symplectic Euler stress-last method that provides better energy conservation that is comparable with the Stormer-Verlet method. This extension is referred to as TRGIMP and also has third order accuracy in energy conservation. When the interactions between space and time errors are studied it is seen that spatial errors may dominate in computed quantities such as displacement and velocity. This connection between the local errors in space and time is made explicit mathematically and explains the observed results that displacement and velocity errors are very similar for both methods. The observed and theoretically predicted third-order energy conservation accuracy and computational costs are demonstrated on a standard MPM test example.
AMM: Adaptive Multilinear Meshes|
Subtitled arXiv:2007.15219, H. Bhatia, D. Hoang, N. Morrical, V. Pascucci, P.T. Bremer, P. Lindstrom. 2021.
Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g., through compression, or adapting data resolution, e.g., using spatial hierarchies. Recent research suggests that combining the two approaches, i.e., adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes
Translational computer science at the scientific computing and imaging institute|
C. R. Johnson. In Journal of Computational Science, Vol. 52, pp. 101217. 2021.
The Scientific Computing and Imaging (SCI) Institute at the University of Utah evolved from the SCI research group, started in 1994 by Professors Chris Johnson and Rob MacLeod. Over time, research centers funded by the National Institutes of Health, Department of Energy, and State of Utah significantly spurred growth, and SCI became a permanent interdisciplinary research institute in 2000. The SCI Institute is now home to more than 150 faculty, students, and staff. The history of the SCI Institute is underpinned by a culture of multidisciplinary, collaborative research, which led to its emergence as an internationally recognized leader in the development and use of visualization, scientific computing, and image analysis research to solve important problems in a broad range of domains in biomedicine, science, and engineering. A particular hallmark of SCI Institute research is the creation of open source software systems, including the SCIRun scientific problem-solving environment, Seg3D, ImageVis3D, Uintah, ViSUS, Nektar++, VisTrails, FluoRender, and FEBio. At this point, the SCI Institute has made more than 50 software packages broadly available to the scientific community under open-source licensing and supports them through web pages, documentation, and user groups. While the vast majority of academic research software is written and maintained by graduate students, the SCI Institute employs several professional software developers to help create, maintain, and document robust, tested, well-engineered open source software. The story of how and why we worked, and often struggled, to make professional software engineers an integral part of an academic research institute is crucial to the larger story of the SCI Institute’s success in translational computer science (TCS).
Adaptive Placement of Data Analysis Tasks For Staging Based In-Situ Processing|
Z. Wang, P. Subedi, M. Dorier, P.E. Davis, M. Parashar. In 2021 IEEE 28th International Conference on High Performance Computing, Data, and Analytics (HiPC), pp. 242-251. 2021.
In-situ processing addresses the gap between speeds of computing and I/O capabilities by processing data close to the data source, i.e., on the same system as the data source (e.g., a simulation). However, the effective implementation of in-situ processing workflows requires the optimization of several design parameters such as where on the system workflow data analysis/visualization (ana/vis) as placed and how execution as well as the interaction and data exchanges between ana/vis are coordinated. For example, in the case of hybrid in-situ processing, interacting ana/vis may be tightly or loosely coupled depending on their placement, and this can lead to very different performance and scalability. A key challenge is deciding the most appropriate ana/vis placement, which depends on dynamic applications, workflow, and system characteristics that might change at runtime. In this paper, we present a framework to support online adaptive data analysis placement during the execution of an in-situ workflow. Specifically, the paper presents a model and architecture, and explores several data analysis placement strategies. Evaluation results show that dynamically choosing appropriate data analysis placement strategies can balance the benefits and overhead of different data analysis placement patterns to reduce in-situ processing time.
MDSC: Modelling Distributed Stream Processing across the Edge-to-Cloud Continuum|
D. Balouek-Thomert, P. Silva, K. Fauvel, A. Costan, G. Antoniu, M. Parashar. In DML-ICC 2021 workshop (held in conjunction with UCC 2021), December, 2021.
The growth of the Internet of Things is resulting in an explosion of data volumes at the Edge of the Internet. To reduce costs incurred due to data movement and centralized cloud-based processing, it is becoming increasingly important to process and analyze such data closer to the data sources. Exploiting Edge computing capabilities for stream-based processing is however challenging. It requires addressing the complex characteristics and constraints imposed by all the resources along the data path, as well as the large set of heterogeneous data processing and management frameworks. Consequently, the community needs tools that can facilitate the modeling of this complexity and can integrate the various components involved. In this work, we introduce MDSC, a hierarchical approach for modeling distributed stream-based applications on Edge-to-Cloud continuum infrastructures. We demonstrate how MDSC can be applied to a concrete real-life ML-based application -early earthquake warning - to help answer questions such as: when is it worth decentralizing the classification load from the Cloud to the Edge and how?
GP-HMAT: Scalable, $O(n\log (n)) $ Gaussian Process Regression with Hierarchical Low-Rank Matrices|
Subtitled arXiv preprint arXiv:2201.00888, V. Keshavarzzadeh, S. Zhe, R.M. Kirby, A. Narayan. 2021.
A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical matrix approach, named HMAT, which effectively decomposes the matrix structure, in a recursive manner, into significantly smaller matrices where a direct approach could be used for inversion. Our matrix partitioning uses a particular aggregation strategy for data points, which promotes the low-rank structure of off-diagonal blocks in the hierarchical kernel matrix. We employ a randomized linear algebra method for matrix reduction on the low-rank off-diagonal blocks without factorizing a large matrix. We provide analytical error and cost estimates for the inversion of the matrix, investigate them empirically with numerical computations, and demonstrate the application of our approach on three numerical examples involving GP regression for engineering problems and a large-scale real dataset. We provide the computer implementation of GP-HMAT, HMAT adapted for GP likelihood and derivative computations, and the implementation of the last numerical example on a real dataset. We demonstrate superior scalability of the HMAT approach compared to built-in operator in MATLAB for large-scale linear solves Ax=y via a repeatable and verifiable empirical study. An extension to hierarchical semiseparable (HSS) matrices is discussed as future research.