2022

H. D. Tran, M. Fernando, K. Saurabh, B. Ganapathysubramanian, R. M. Kirby, H. Sundar.
**“A scalable adaptive-matrix SPMV for heterogeneous architectures,”** In *2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)*, IEEE, pp. 13--24. 2022.

DOI: 10.1109/IPDPS53621.2022.00011

In most computational codes, the core computational kernel is the Sparse Matrix-Vector product (SpMV) that enables specialized linear algebra libraries like PETSc to be used, especially in the distributed memory setting. However, optimizing SpMvperformance and scalability at all levels of a modern heterogeneous architecture can be challenging as it is characterized by irregular memory access. This work presents a hybrid approach (HyMV) for evaluating SpMV for matrices arising from PDE discretization schemes such as the finite element method (FEM). The approach enables localized structured memory access that provides improved performance and scalability. Additionally, it simplifies the programmability and portability on different architectures. The developed HyMV approach enables efficient parallelization using MPI, SIMD, OpenMP, and CUDA with minimum programming effort. We present a detailed comparison of HyMV with the two traditional approaches in computational code, matrix-assembled and matrix-free approaches, for structured and unstructured meshes. Our results demonstrate that the HyMV approach achieves excellent scalability and outperforms both approaches, e.g., achieving average speedups of 11x for matrix setup, 1.7x for SpMV with structured meshes, 3.6x for SpMV with unstructured meshes, and 7.5x for GPU SpMV.

V. Zala, A. Narayan, R.M. Kirby.
**“Convex Optimization-Based Structure-Preserving Filter For Multidimensional Finite Element Simulations,”** Subtitled **“arXiv preprint arXiv:2203.09748,”** 2022.

In simulation sciences, it is desirable to capture the real-world problem features as accurately as possible. Methods popular for scientific simulations such as the finite element method (FEM) and finite volume method (FVM) use piecewise polynomials to approximate various characteristics of a problem, such as the concentration profile and the temperature distribution across the domain. Polynomials are prone to creating artifacts such as Gibbs oscillations while capturing a complex profile. An efficient and accurate approach must be applied to deal with such inconsistencies in order to obtain accurate simulations. This often entails dealing with negative values for the concentration of chemicals, exceeding a percentage value over 100, and other such problems. We consider these inconsistencies in the context of partial differential equations (PDEs). We propose an innovative filter based on convex optimization to deal with the inconsistencies observed in polynomial-based simulations. In two or three spatial dimensions, additional complexities are involved in solving the problems related to structure preservation. We present the construction and application of a structure-preserving filter with a focus on multidimensional PDEs. Methods used such as the Barycentric interpolation for polynomial evaluation at arbitrary points in the domain and an optimized root-finder to identify points of interest improve the filter efficiency, usability, and robustness. Lastly, we present numerical experiments in 2D and 3D using discontinuous Galerkin formulation and demonstrate the filter's efficacy to preserve the desired structure. As a real-world application …

2021

M. K. Ballard, R. Amici, V. Shankar, L. A. Ferguson, M. Braginsky, R. M. Kirby.
**“Towards an Extrinsic, CG-XFEM Approach Based on Hierarchical Enrichments for Modeling Progressive Fracture,”** Subtitled **“arXiv preprint arXiv:2104.14704,”** 2021.

We propose an extrinsic, continuous-Galerkin (CG), extended finite element method (XFEM) that generalizes the work of Hansbo and Hansbo to allow multiple Heaviside enrichments within a single element in a hierarchical manner. This approach enables complex, evolving XFEM surfaces in 3D that cannot be captured using existing CG-XFEM approaches. We describe an implementation of the method for 3D static elasticity with linearized strain for modeling open cracks as a salient step towards modeling progressive fracture. The implementation includes a description of the finite element model, hybrid implicit/explicit representation of enrichments, numerical integration method, and novel degree-of-freedom (DoF) enumeration algorithm. This algorithm supports an arbitrary number of enrichments within an element, while simultaneously maintaining a CG solution across elements. Additionally, our approach easily allows an implementation suitable for distributed computing systems. Enabled by the DoF enumeration algorithm, the proposed method lays the groundwork for a computational tool that efficiently models progressive fracture. To facilitate a discussion of the complex enrichment hierarchies, we develop enrichment diagrams to succinctly describe and visualize the relationships between the enrichments (and the fields they create) within an element. This also provides a unified language for discussing extrinsic XFEM methods in the literature. We compare several methods, relying on the enrichment diagrams to highlight their nuanced differences.

H. Bhatia, S. N. Petruzza, R. Anirudh, A. G. Gyulassy, R. M. Kirby, V. Pascucci, P. T. Bremer.
**“Data-Driven Estimation of Temporal-Sampling Errors in Unsteady Flows,”** 2021.

While computer simulations typically store data at the highest available spatial resolution, it is often infeasible to do so for the temporal dimension. Instead, the common practice is to store data at regular intervals, the frequency of which is strictly limited by the available storage and I/O bandwidth. However, this manner of temporal subsampling can cause significant errors in subsequent analysis steps. More importantly, since the intermediate data is lost, there is no direct way of measuring this error after the fact. One particularly important use case that is affected is the analysis of unsteady flows using pathlines, as it depends on an accurate interpolation across time. Although the potential problem with temporal undersampling is widely acknowledged, there currently does not exist a practical way to estimate the potential impact. This paper presents a simple-to-implement yet powerful technique to estimate the error in pathlines due to temporal subsampling. Given an unsteady flow, we compute pathlines at the given temporal resolution as well as subsamples thereof. We then compute the error induced due to various levels of subsampling and use it to estimate the error between the given resolution and the unknown ground truth. Using two turbulent flows, we demonstrate that our approach, for the first time, provides an accurate, a posteriori error estimate for pathline computations. This estimate will enable scientists to better understand the uncertainties involved in pathline-based analysis techniques and can lead to new uncertainty visualization approaches using the predicted errors.

M. Carlson, X. Zheng, H. Sundar, G. E. Karniadakis, R. M. Kirby.
**“An open-source parallel code for computing the spectral fractional Laplacian on 3D complex geometry domains,”** In *Computer Physics Communications*, Vol. 261, North-Holland, pp. 107695. 2021.

We present a spectral element algorithm and open-source code for computing the fractional Laplacian defined by the eigenfunction expansion on finite 2D/3D complex domains with both homogeneous and nonhomogeneous boundaries. We demonstrate the scalability of the spectral element algorithm on large clusters by constructing the fractional Laplacian based on computed eigenvalues and eigenfunctions using up to thousands of CPUs. To demonstrate the accuracy of this eigen-based approach for computing the factional Laplacian, we approximate the solutions of the fractional diffusion equation using the computed eigenvalues and eigenfunctions on a 2D quadrilateral, and on a 3D cubic and cylindrical domain, and compare the results with the contrived solutions to demonstrate fast convergence. Subsequently, we present simulation results for a fractional diffusion equation on a hand-shaped domain discretized with 3D hexahedra, as well as on a domain constructed from the Hanford site geometry corresponding to nonzero Dirichlet boundary conditions. Finally, we apply the algorithm to solve the surface quasi-geostrophic (SQG) equation on a 2D square with periodic boundaries. Simulation results demonstrate the accuracy, efficiency, and geometric flexibility of our algorithm and that our algorithm can capture the subtle dynamics of anomalous diffusion modeled by the fractional Laplacian on complex geometry domains. The included open-source code is the first of its kind.

J. Chilleri, Y. He, D. Bedrov, R. M. Kirby.
**“Optimal allocation of computational resources based on Gaussian process: Application to molecular dynamics simulations,”** In *Computational Materials Science*, Vol. 188, Elsevier, pp. 110178. 2021.

Simulation models have been utilized in a wide range of real-world applications for behavior predictions of complex physical systems or material designs of large structures. While extensive simulation is mathematically preferable, external limitations such as available resources are often necessary considerations. With a fixed computational resource (i.e., total simulation time), we propose a Gaussian process-based numerical optimization framework for optimal time allocation over simulations at different locations, so that a surrogate model with uncertainty estimation can be constructed to approximate the full simulation. The proposed framework is demonstrated first via two synthetic problems, and later using a real test case of a glass-forming system with divergent dynamic relaxations where a Gaussian process is constructed to estimate the diffusivity and its uncertainty with respect to the temperature.

V. Keshavarzzadeh, M. Alirezaei, T. Tasdizen, R. M. Kirby.
**“Image-Based Multiresolution Topology Optimization Using Deep Disjunctive Normal Shape Model,”** In *Computer-Aided Design*, Vol. 130, Elsevier, pp. 102947. 2021.

We present a machine learning framework for predicting the optimized structural topology design susing multiresolution data. Our approach primarily uses optimized designs from inexpensive coarse mesh finite element simulations for model training and generates high resolution images associated with simulation parameters that are not previously used. Our cost-efficient approach enables the designers to effectively search through possible candidate designs in situations where the design requirements rapidly change. The underlying neural network framework is based on a deep disjunctive normal shape model (DDNSM) which learns the mapping between the simulation parameters and segments of multi resolution images. Using this image-based analysis we provide a practical algorithm which enhances the predictability of the learning machine by determining a limited number of important parametric samples(i.e.samples of the simulation parameters)on which the high resolution training data is generated. We demonstrate our approach on benchmark compliance minimization problems including the 3D topology optimization where we show that the high-fidelity designs from the learning machine are close to optimal designs and can be used as effective initial guesses for the large-scale optimization problem.

V. Keshavarzzadeh, R. M. Kirby, A. Narayan.
**“Multilevel Designed Quadrature for Partial Differential Equations with Random Inputs,”** In *SIAM Journal on Scientific Computing*, Vol. 43, No. 2, Society for Industrial and Applied Mathematics, pp. A1412-A1440. 2021.

We introduce a numerical method, multilevel designed quadrature for computing the statistical solution of partial differential equations with random input data. Similar to multilevel Monte Carlo methods, our method relies on hierarchical spatial approximations in addition to a parametric/stochastic sampling strategy. A key ingredient in multilevel methods is the relationship between the spatial accuracy at each level and the number of stochastic samples required to achieve that accuracy. Our sampling is based on flexible quadrature points that are designed for a prescribed accuracy, which can yield less overall computational cost compared to alternative multilevel methods. We propose a constrained optimization problem that determines the number of samples to balance the approximation error with the computational budget. We further show that the optimization problem is convex and derive analytic formulas for the optimal number of points at each level. We validate the theoretical estimates and the performance of our multilevel method via numerical examples on a linear elasticity and a steady state heat diffusion problem.

V. Keshavarzzadeh, R. M. Kirby, A. Narayan.
**“Robust topology optimization with low rank approximation using artificial neural networks,”** In *Computational Mechanics*, 2021.

DOI: 10.1007/s00466-021-02069-3

We present a low rank approximation approach for topology optimization of parametrized linear elastic structures. The parametrization is considered on loading and stiffness of the structure. The low rank approximation is achieved by identifying a parametric connection among coarse finite element models of the structure (associated with different design iterates) and is used to inform the high fidelity finite element analysis. We build an Artificial Neural Network (ANN) map between low resolution design iterates and their corresponding interpolative coefficients (obtained from low rank approximations) and use this surrogate to perform high resolution parametric topology optimization. We demonstrate our approach on robust topology optimization with compliance constraints/objective functions and develop error bounds for the the parametric compliance computations. We verify these parametric computations with more challenging quantities of interest such as the p-norm of von Mises stress. To conclude, we use our approach on a 3D robust topology optimization and show significant reduction in computational cost via quantitative measures.

V. Keshavarzzadeh, S. Zhe, R.M. Kirby, A. Narayan.
**“GP-HMAT: Scalable, $O(n\log (n)) $ Gaussian Process Regression with Hierarchical Low-Rank Matrices,”** Subtitled **“arXiv preprint arXiv:2201.00888,”** 2021.

A Gaussian process (GP) is a powerful and widely used regression technique. The main building block of a GP regression is the covariance kernel, which characterizes the relationship between pairs in the random field. The optimization to find the optimal kernel, however, requires several large-scale and often unstructured matrix inversions. We tackle this challenge by introducing a hierarchical matrix approach, named HMAT, which effectively decomposes the matrix structure, in a recursive manner, into significantly smaller matrices where a direct approach could be used for inversion. Our matrix partitioning uses a particular aggregation strategy for data points, which promotes the low-rank structure of off-diagonal blocks in the hierarchical kernel matrix. We employ a randomized linear algebra method for matrix reduction on the low-rank off-diagonal blocks without factorizing a large matrix. We provide analytical error and cost estimates for the inversion of the matrix, investigate them empirically with numerical computations, and demonstrate the application of our approach on three numerical examples involving GP regression for engineering problems and a large-scale real dataset. We provide the computer implementation of GP-HMAT, HMAT adapted for GP likelihood and derivative computations, and the implementation of the last numerical example on a real dataset. We demonstrate superior scalability of the HMAT approach compared to built-in operator in MATLAB for large-scale linear solves Ax=y via a repeatable and verifiable empirical study. An extension to hierarchical semiseparable (HSS) matrices is discussed as future research.

A.S. Krishnapriyan, A. Gholami, S. Zhe, R.M. Kirby, M.W. Mahoney.
**“Characterizing possible failure modes in physics-informed neural networks,”** Subtitled **“arXiv preprint arXiv:2109.01050,”** 2021.

Recent work in scientific machine learning has developed so-called physics-informed neural network (PINN) models. The typical approach is to incorporate physical domain knowledge as soft constraints on an empirical loss function and use existing machine learning methodologies to train the model. We demonstrate that, while existing PINN methodologies can learn good models for relatively trivial problems, they can easily fail to learn relevant physical phenomena even for simple PDEs. In particular, we analyze several distinct situations of widespread physical interest, including learning differential equations with convection, reaction, and diffusion operators. We provide evidence that the soft regularization in PINNs, which involves differential operators, can introduce a number of subtle problems, including making the problem ill-conditioned. Importantly, we show that these possible failure modes are not due to the lack of expressivity in the NN architecture, but that the PINN's setup makes the loss landscape very hard to optimize. We then describe two promising solutions to address these failure modes. The first approach is to use curriculum regularization, where the PINN's loss term starts from a simple PDE regularization, and becomes progressively more complex as the NN gets trained. The second approach is to pose the problem as a sequence-to-sequence learning task, rather than learning to predict the entire space-time at once. Extensive testing shows that we can achieve up to 1-2 orders of magnitude lower error with these methods as compared to regular PINN training.

E. Laughton, V. Zala, A. Narayan, R. M. Kirby, D. Moxey.
**“Fast Barycentric-Based Evaluation Over Spectral/hp Elements,”** Subtitled **“arXiv preprint arXiv:2103.03594,”** 2021.

As the use of spectral/*hp* element methods, and high-order finite element methods in general, continues to spread, community efforts to create efficient, optimized algorithms associated with fundamental high-order operations have grown. Core tasks such as solution expansion evaluation at quadrature points, stiffness and mass matrix generation, and matrix assembly have received tremendousattention. With the expansion of the types of problems to which high-order methods are applied, and correspondingly the growth in types of numerical tasks accomplished through high-order methods, the number and types of these core operations broaden. This work focuses on solution expansion evaluation at arbitrary points within an element. This operation is core to many postprocessing applications such as evaluation of streamlines and pathlines, as well as to field projection techniques such as mortaring. We expand barycentric interpolation techniques developed on an interval to 2D (triangles and quadrilaterals) and 3D (tetrahedra, prisms, pyramids, and hexahedra) spectral/*hp* element methods. We provide efficient algorithms for their implementations, and demonstrate their effectiveness using the spectral/*hp* element library Nektar++.

M. Penwarden, S. Zhe, A. Narayan, R. M. Kirby.
**“Multifidelity Modeling for Physics-Informed Neural Networks (PINNs),”** Subtitled **“arXiv preprint arXiv:2106.13361,”** 2021.

Multifidelity simulation methodologies are often used in an attempt to judiciously combine low-fidelity and high-fidelity simulation results in an accuracy-increasing, cost-saving way. Candidates for this approach are simulation methodologies for which there are fidelity differences connected with significant computational cost differences. Physics-informed Neural Networks (PINNs) are candidates for these types of approaches due to the significant difference in training times required when different fidelities (expressed in terms of architecture width and depth as well as optimization criteria) are employed. In this paper, we propose a particular multifidelity approach applied to PINNs that exploits low-rank structure. We demonstrate that width, depth, and optimization criteria can be used as parameters related to model fidelity, and show numerical justification of cost differences in training due to fidelity parameter choices. We test our multifidelity scheme on various canonical forward PDE models that have been presented in the emerging PINNs literature.

M. Penwarden, S. Zhe, A. Narayan, R. M. Kirby.
**“Physics-Informed Neural Networks (PINNs) for Parameterized PDEs: A Metalearning Approach,”** Subtitled **“arXiv preprint arXiv:2110.13361,”** 2021.

Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs for parameterized PDEs. Following the ML world, we introduce metalearning of PINNs for parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs. We provide theoretically motivated and empirically backed assumptions that make our metalearning approach possible. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.

M. Rasouli, R. M. Kirby, H. Sundar.
**“A Compressed, Divide and Conquer Algorithm for Scalable Distributed Matrix-Matrix Multiplication,”** In *The International Conference on High Performance Computing in Asia-Pacific Region*, pp. 110-119. 2021.

Matrix-matrix multiplication (GEMM) is a widely used linear algebra primitive common in scientific computing and data sciences. While several highly-tuned libraries and implementations exist, these typically target either sparse or dense matrices. The performance of these tuned implementations on unsupported types can be poor, and this is critical in cases where the structure of the computations is associated with varying degrees of sparsity. One such example is Algebraic Multigrid (AMG), a popular solver and preconditioner for large sparse linear systems. In this work, we present a new divide and conquer sparse GEMM, that is also highly performant and scalable when the matrix becomes dense, as in the case of AMG matrix hierarchies. In addition, we implement a lossless data compression method to reduce the communication cost. We combine this with an efficient communication pattern during distributed-memory GEMM to provide 2.24 times (on average) better performance than the state-of-the-art library PETSc. Additionally, we show that the performance and scalability of our method surpass PETSc even more when the density of the matrix increases. We demonstrate the efficacy of our methods by comparing our GEMM with PETSc on a wide range of matrices.

N. Truong, C. Yuksel, C. Watcharopas, J. A. Levine, R. M. Kirby.
**“Particle Merging-and-Splitting,”** In *IEEE Transactions on Visualization and Computer Graphics*, IEEE, 2021.

Robustly handling collisions between individual particles in a large particle-based simulation has been a challenging problem. We introduce particle merging-and-splitting, a simple scheme for robustly handling collisions between particles that prevents inter-penetrations of separate objects without introducing numerical instabilities. This scheme merges colliding particles at the beginning of the time-step and then splits them at the end of the time-step. Thus, collisions last for the duration of a time-step, allowing neighboring particles of the colliding particles to influence each other. We show that our merging-and-splitting method is effective in robustly handling collisions and avoiding penetrations in particle-based simulations. We also show how our merging-and-splitting approach can be used for coupling different simulation systems using different and otherwise incompatible integrators. We present simulation tests …

W. W. Xing, A. A. Shah, P. Wang, S. Zhe, Q. Fu, R. M. Kirby.
**“Residual Gaussian process: A tractable nonparametric Bayesian emulator for multi-fidelity simulations,”** In *Applied Mathematical Modelling*, Vol. 97, Elsevier, pp. 36-56. 2021.

Challenges in multi-fidelity modelling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. The resulting model is equipped with a closed-form solution for the predictive posterior, making it applicable to advanced, high-dimensional tasks that require uncertainty estimation. Its advantages are demonstrated on univariate benchmarks and on three challenging multivariate problems. It is shown how active learning can be used to enhance the model, especially with a limited computational budget. Furthermore, error bounds are derived for the mean prediction in the univariate case.

W. W. Xing, R. M. Kirby, S. Zhe.
**“Deep coregionalization for the emulation of simulation-based spatial-temporal fields,”** In *Journal of Computational Physics*, Academic Press, pp. 109984. 2021.

Data-driven surrogate models are widely used for applications such as design optimization and uncertainty quantification, where repeated evaluations of an expensive simulator are required. For most partial differential equation (PDE) simulations, the outputs of interest are often spatial or spatial-temporal fields, leading to very high-dimensional outputs. Despite the success of existing data-driven surrogates for high-dimensional outputs, most methods require a significant number of samples to cover the response surface in order to achieve a reasonable degree of accuracy. This demand makes the idea of surrogate models less attractive considering the high-computational cost to generate the data. To address this issue, we exploit the multifidelity nature of a PDE simulation and introduce deep coregionalization, a Bayesian nonparametric autoregressive framework for efficient emulation of spatial-temporal fields. To effectively extract the output correlations in the context of multifidelity data, we develop a novel dimension reduction technique, residual principal component analysis. Our model can simultaneously capture the rich output correlations and the fidelity correlations and make high-fidelity predictions with only a small number of expensive, high-fidelity simulation samples. We show the advantages of our model in three canonical PDE models and a fluid dynamics problem. The results show that the proposed method can not only approximate simulation results with significantly less cost (by bout 10%-25%) but also further improve model accuracy.

Y. Xu, V. Keshavarzzadeh, R. M. Kirby, A. Narayan.
**“A bandit-learning approach to multifidelity approximation,”** Subtitled **“arXiv preprint arXiv:2103.15342,”** 2021.

Multifidelity approximation is an important technique in scientific computation and simulation. In this paper, we introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates of the parameters of interest. Under a linear model assumption, we formulate a multifidelity approximation as a modified stochastic bandit, and analyze the loss for a class of policies that uniformly explore each model before exploiting. Utilizing the estimated conditional mean-squared error, we propose a consistent algorithm, adaptive Explore-Then-Commit (AETC), and establish a corresponding trajectory-wise optimality result. These results are then extended to the case of vector-valued responses, where we demonstrate that the algorithm is efficient without the need to worry about estimating high-dimensional parameters. The main advantage of our approach is that we require neither hierarchical model structure nor\textit a priori knowledge of statistical information (eg, correlations) about or between models. Instead, the AETC algorithm requires only knowledge of which model is a trusted high-fidelity model, along with (relative) computational cost estimates of querying each model. Numerical experiments are provided at the end to support our theoretical findings.

V. Zala, R. M. Kirby, A. Narayan.
**“Structure-preserving Nonlinear Filtering for Continuous and Discontinuous Galerkin Spectral/hp Element Methods,”** Subtitled **“arXiv preprint arXiv:2106.08316,”** 2021.

Finite element simulations have been used to solve various partial differential equations (PDEs) that model physical, chemical, and biological phenomena. The resulting discretized solutions to PDEs often do not satisfy requisite physical properties, such as positivity or monotonicity. Such invalid solutions pose both modeling challenges, since the physical interpretation of simulation results is not possible, and computational challenges, since such properties may be required to advance the scheme. We, therefore, consider the problem of computing solutions that preserve these structural solution properties, which we enforce as additional constraints on the solution. We consider in particular the class of convex constraints, which includes positivity and monotonicity. By embedding such constraints as a postprocessing convex optimization procedure, we can compute solutions that satisfy general types of convex constraints. For certain types of constraints (including positivity and monotonicity), the optimization is a filter, i.e., a norm-decreasing operation. We provide a variety of tests on one-dimensional time-dependent PDEs that demonstrate the method's efficacy, and we empirically show that rates of convergence are unaffected by the inclusion of the constraints.