B. Charoenwong, R.M. Kirby, J. Reiter. Computer Science Abstractions To Help Reason About Decentralized Stablecoin Design, In IEEE Access, IEEE, 2023.
Computer science as a discipline is known for its penchant for using abstractions as a tool for reasoning. It is no surprise that computer science might have something valuable to lend to the world of decentralized stablecoin design, as it is in fact a “computing" problem. In this paper, we examine the possibility of a decentralized and capital-efficient stablecoin using smart contracts that algorithmically trade to maintain stability and study the potential new functionality that smart contracts enable. By exploiting traditional abstractions from computer science, we show that a capital-efficient algorithmic stablecoin cannot be provably stable. Additionally, we provide a formal exposition of the workings of Central Bank Digital Currencies, connecting this to the space of possible stablecoin designs. We then discuss several outstanding conjectures from both academics and practitioners and finally highlight the regulatory similarities between money-market funds and working stablecoins. Our work builds upon the current and growing interplay between the realms of engineering and financial services, and it also demonstrates how ways of thinking as a computer scientist can aid practitioners. We believe this research is vital for understanding and developing the future of financial technology.
H. Dai, M. Penwarden, R.M. Kirby, S. Joshi. Neural Operator Learning for Ultrasound Tomography Inversion, Subtitled arXiv:2304.03297v1, 2023.
Neural operator learning as a means of mapping between complex function spaces has garnered significant attention in the field of computational science and engineering (CS&E). In this paper, we apply Neural operator learning to the time-of-flight ultrasound computed tomography (USCT) problem. We learn the mapping between time-of-flight (TOF) data and the heterogeneous sound speed field using a full-wave solver to generate the training data. This novel application of operator learning circumnavigates the need to solve the computationally intensive iterative inverse problem. The operator learns the non-linear mapping offline and predicts the heterogeneous sound field with a single forward pass through the model. This is the first time operator learning has been used for ultrasound tomography and is the first step in potential real-time predictions of soft tissue distribution for tumor identification in beast imaging.
S. Fang, S. Zhe, H.M. Lin, A.A. Azad, H. Fettke, E.M. Kwan, L. Horvath, B. Mak, T. Zheng, P. Du, S. Jia, R.M. Kirby, M. Kohli. Multi-Omic Integration of Blood-Based Tumor-Associated Genomic and Lipidomic Profiles Using Machine Learning Models in Metastatic Prostate Cancer, In Clinical Cancer Informatics, 2023.
To determine prognostic and predictive clinical outcomes in metastatic hormone-sensitive prostate cancer (mHSPC) and metastatic castrate-resistant prostate cancer (mCRPC) on the basis of a combination of plasma-derived genomic alterations and lipid features in a longitudinal cohort of patients with advanced prostate cancer.
A multifeature classifier was constructed to predict clinical outcomes using plasma-based genomic alterations detected in 120 genes and 772 lipidomic species as informative features in a cohort of 71 patients with mHSPC and 144 patients with mCRPC. Outcomes of interest were collected over 11 years of follow-up. These included in mHSPC state early failure of androgen-deprivation therapy (ADT) and exceptional responders to ADT; early death (poor prognosis) and long-term survivors in mCRPC state. The approach was to build binary classification models that identified discriminative candidates with optimal weights to predict outcomes. To achieve this, we built multi-omic feature-based classifiers using traditional machine learning (ML) methods, including logistic regression with sparse regularization, multi-kernel Gaussian process regression, and support vector machines.
The levels of specific ceramides (d18:1/14:0 and d18:1/17:0), and the presence of CHEK2 mutations, AR amplification, and RB1 deletion were identified as the most crucial factors associated with clinical outcomes. Using ML models, the optimal multi-omics feature combination determined resulted in AUC scores of 0.751 for predicting mHSPC survival and 0.638 for predicting ADT failure; and in mCRPC state, 0.687 for prognostication and 0.727 for exceptional survival. The models were observed to be superior than using a limited candidate number of features for developing multi-omic prognostic and predictive signatures.
Using a ML approach that incorporates multiple omic features improves the prediction accuracy for metastatic prostate cancer outcomes significantly. Validation of these models will be needed in independent data sets in future.
S. Li, X. Yu, W. Xing, R.M. Kirby, A. Narayan, S. Zhe. Multi-Resolution Active Learning of Fourier Neural Operators, Subtitled arXiv:2309.16971, 2023.
Fourier Neural Operator (FNO) is a popular operator learning framework. It not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO can be a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.
D. Long, W.W. Xing, A.S. Krishnapriyan, R.M. Kirby, S. Zhe, M.W. Mahoney. Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels, Subtitled arXiv:2310.05387v1, 2023.
Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior — an ideal Bayesian sparse distribution — for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.
T. A. J. Ouermi, R. M Kirby, M. Berzins.
HiPPIS A High-Order Positivity-Preserving Mapping Software for Structured Meshes, In ACM Trans. Math. Softw, ACM, Nov, 2023.
Polynomial interpolation is an important component of many computational problems. In several of these computational problems, failure to preserve positivity when using polynomials to approximate or map data values between meshes can lead to negative unphysical quantities. Currently, most polynomial-based methods for enforcing positivity are based on splines and polynomial rescaling. The spline-based approaches build interpolants that are positive over the intervals in which they are defined and may require solving a minimization problem and/or system of equations. The linear polynomial rescaling methods allow for high-degree polynomials but enforce positivity only at limited locations (e.g., quadrature nodes). This work introduces open-source software (HiPPIS) for high-order data-bounded interpolation (DBI) and positivity-preserving interpolation (PPI) that addresses the limitations of both the spline and polynomial rescaling methods. HiPPIS is suitable for approximating and mapping physical quantities such as mass, density, and concentration between meshes while preserving positivity. This work provides Fortran and Matlab implementations of the DBI and PPI methods, presents an analysis of the mapping error in the context of PDEs, and uses several 1D and 2D numerical examples to demonstrate the benefits and limitations of HiPPIS.
Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs on new tasks, for which parameterized PDEs provides a good testbed application as tasks can be easily defined in this context. Following the ML world, we introduce metalearning of PINNs with application to parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs as well as implementation considerations and algorithmic complexity. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.
M. Penwarden, A.D. Jagtap, S. Zhe, G.E. Karniadakis, R.M. Kirby. A unified scalable framework for causal sweeping strategies for Physics-Informed Neural Networks (PINNs) and their temporal decompositions, Subtitled arXiv:2302.14227v1, 2023.
Physics-informed neural networks (PINNs) as a means of solving partial differential equations (PDE) have garnered much attention in the Computational Science and Engineering (CS&E) world. However, a recent topic of interest is exploring various training (i.e., optimization) challenges – in particular, arriving at poor local minima in the optimization landscape results in a PINN approximation giving an inferior, and sometimes trivial, solution when solving forward time-dependent PDEs with no data. This problem is also found in, and in some sense more difficult, with domain decomposition strategies such as temporal decomposition using XPINNs. To address this problem, we first enable a general categorization for previous causality methods, from which we identify a gap (e.g., opportunity) in the previous approaches. We then furnish examples and explanations for different training challenges, their cause, and how they relate to information propagation and temporal decomposition. We propose a solution to fill this gap by reframing these causality concepts into a generalized information propagation framework in which any prior method or combination of methods can be described. This framework is easily modifiable via user parameters in the open-source code accompanying this paper. Our unified framework moves toward reducing the number of PINN methods to consider and the reimplementation and retuning cost for thorough comparisons rather than increasing it. Using the idea of information propagation, we propose a new stacked-decomposition method that bridges the gap between time-marching PINNs and XPINNs. We also introduce significant computational speed-ups by using transfer learning concepts to initialize subnetworks in the domain and loss tolerance-based propagation for the subdomains. Finally, we formulate a new time-sweeping collocation point algorithm inspired by the previous PINNs causality literature, which our framework can still describe, and provides a significant computational speed-up via reduced-cost collocation point segmentation. The proposed methods overcome training challenges in PINNs and XPINNs for time-dependent PDEs by respecting the causality in multiple forms and improving scalability by limiting the computation required per optimization iteration. Finally, we provide numerical results for these methods on baseline PDE problems for which unmodified PINNs and XPINNs struggle to train.
K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis. Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils, Subtitled arXiv:2302.00807v1, 2023.
Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as their shape can be easily defined by the four-digit parametrization. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work.
K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, N. Plewacki, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis.
Deep neural operators as accurate surrogates for shape optimization, In Engineering Applications of Artificial Intelligence, Vol. 129, pp. 107615. 2023.
Deep neural operators, such as DeepONet, have changed the paradigm in high-dimensional nonlinear regression, paving the way for significant generalization and speed-up in computational engineering applications. Here, we investigate the use of DeepONet to infer flow fields around unseen airfoils with the aim of shape constrained optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results that display little to no degradation in prediction accuracy while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as the four-digit parameterization can easily define their shape. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have a low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work. Finally, we validate the ability of DeepONet to handle a complex 3D waverider geometry at hypersonic flight by inferring shear stress and heat flux distributions on its surface at unseen angles of attack. The main contribution of this paper is a modular integrated design framework that uses an over-parametrized neural operator as a surrogate model with good generalizability coupled seamlessly with multiple optimization solvers in a plug-and-play mode.
S. Fang, A. Narayan, R.M. Kirby, S. Zhe. Bayesian Continuous-Time Tucker Decomposition, In Proceedings of the 39 th International Conference on Machine Learning, 2022.
Tensor decomposition is a dominant framework for multiway data analysis and prediction. Although practical data often contains timestamps for the observed entries, existing tensor decomposition approaches overlook or under-use this valuable time information. They either drop the timestamps or bin them into crude steps and hence ignore the temporal dynamics within each step or use simple parametric time coefficients. To overcome these limitations, we propose Bayesian Continuous-Time Tucker Decomposition (BCTT). We model the tensor-core of the classical Tucker decomposition as a time-varying function, and place a Gaussian process prior to flexibly estimate all kinds of temporal dynamics. In this way, our model maintains the interpretability while is flexible enough to capture various complex temporal relationships between the tensor nodes. For efficient and high-quality posterior inference, we use the stochastic differential equation (SDE) representation of temporal GPs to build an equivalent state-space prior, which avoids huge kernel matrix computation and sparse/low-rank approximations. We then use Kalman filtering, RTS smoothing, and conditional moment matching to develop a scalable message-passing inference algorithm. We show the advantage of our method in simulation and several real-world applications.
J.D. Hogue, R.M. Kirby, A. Narayan. Dimensionality Reduction in Deep Learning via Kronecker Multi-layer Architectures, Subtitled arXiv:2204.04273, 2022.
Deep learning using neural networks is an effective technique for generating models of complex data. However, training such models can be expensive when networks have large model capacity resulting from a large number of layers and nodes. For training in such a computationally prohibitive regime, dimensionality reduction techniques ease the computational burden, and allow implementations of more robust networks. We propose a novel type of such dimensionality reduction via a new deep learning architecture based on fast matrix multiplication of a Kronecker product decomposition; in particular our network construction can be viewed as a Kronecker product-induced sparsification of an "extended" fully connected network. Analysis and practical examples show that this architecture allows a neural network to be trained and implemented with a significant reduction in computational time and resources, while achieving a similar error level compared to a traditional feedforward neural network.
V. Keshavarzzadeh, R.M. Kirby, A. Narayan. Variational Inference for Nonlinear Inverse Problems via Neural Net Kernels: Comparison to Bayesian Neural Networks, Application to Topology Optimization, Subtitled arXiv:2205.03681, 2022.
Inverse problems and, in particular, inferring unknown or latent parameters from data are ubiquitous in engineering simulations. A predominant viewpoint in identifying unknown parameters is Bayesian inference where both prior information about the parameters and the information from the observations via likelihood evaluations are incorporated into the inference process. In this paper, we adopt a similar viewpoint with a slightly different numerical procedure from standard inference approaches to provide insight about the localized behavior of unknown underlying parameters. We present a variational inference approach which mainly incorporates the observation data in a point-wise manner, i.e. we invert a limited number of observation data leveraging the gradient information of the forward map with respect to parameters, and find true individual samples of the latent parameters when the forward map is noise-free and one-to-one. For statistical calculations (as the ultimate goal in simulations), a large number of samples are generated from a trained neural network which serves as a transport map from the prior to posterior latent parameters. Our neural network machinery, developed as part of the inference framework and referred to as Neural Net Kernels (NNK), is based on hierarchical (deep) kernels which provide greater flexibility for training compared to standard neural networks. We showcase the effectiveness of our inference procedure in identifying bimodal and irregular distributions compared to a number of approaches including Markov Chain Monte Carlo sampling approaches and a Bayesian neural network approach.
S. Li, R.M. Kirby, S. Zhe. Decomposing Temporal High-Order Interactions via Latent ODEs, In Proceedings of the 39 th International Conference on Machine Learning, 2022.
High-order interactions between multiple objects are common in real-world applications. Although tensor decomposition is a popular framework for high-order interaction analysis and prediction, most methods cannot well exploit the valuable timestamp information in data. The existent methods either discard the timestamps or convert them into discrete steps or use over-simplistic decomposition models. As a result, these methods might not be capable enough of capturing complex, finegrained temporal dynamics or making accurate predictions for long-term interaction results. To overcome these limitations, we propose a novel Temporal High-order Interaction decompoSition model based on Ordinary Differential Equations (THIS-ODE). We model the time-varying interaction result with a latent ODE. To capture the complex temporal dynamics, we use a neural network (NN) to learn the time derivative of the ODE state. We use the representation of the interaction objects to model the initial value of the ODE and to constitute a part of the NN input to compute the state. In this way, the temporal relationships of the participant objects can be estimated and encoded into their representations. For tractable and scalable inference, we use forward sensitivity analysis to efficiently compute the gradient of ODE state, based on which we use integral transform to develop a stochastic mini-batch learning algorithm. We demonstrate the advantage of our approach in simulation and four real-world applications.
S. Li, Z Wang, R.M. Kirby, S. Zhe. Infinite-Fidelity Coregionalization for Physical Simulation, Subtitled arXiv:2207.00678, 2022.
Multi-fidelity modeling and learning are important in physical simulation-related applications. It can leverage both low-fidelity and high-fidelity examples for training so as to reduce the cost of data generation while still achieving good performance. While existing approaches only model finite, discrete fidelities, in practice, the fidelity choice is often continuous and infinite, which can correspond to a continuous mesh spacing or finite element length. In this paper, we propose Infinite Fidelity Coregionalization (IFC). Given the data, our method can extract and exploit rich information within continuous, infinite fidelities to bolster the prediction accuracy. Our model can interpolate and/or extrapolate the predictions to novel fidelities, which can be even higher than the fidelities of training data. Specifically, we introduce a low-dimensional latent output as a continuous function of the fidelity and input, and multiple it with a basis matrix to predict high-dimensional solution outputs. We model the latent output as a neural Ordinary Differential Equation (ODE) to capture the complex relationships within and integrate information throughout the continuous fidelities. We then use Gaussian processes or another ODE to estimate the fidelity-varying bases. For efficient inference, we reorganize the bases as a tensor, and use a tensor-Gaussian variational posterior to develop a scalable inference algorithm for massive outputs. We show the advantage of our method in several benchmark tasks in computational physics.
S. Li, J.M. Phillips, X. Yu, R.M. Kirby, S. Zhe. Batch Multi-Fidelity Active Learning with Budget Constraints, Subtitled arXiv:2210.12704v1, 2022.
Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near -approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.
S. Li, M. Penwarden, R.M. Kirby, S. Zhe. Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks, Subtitled arXiv preprint arXiv:2210.12669, 2022.
Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, applying different PINNs to solve the equation in each subdomain and aligning the solution at the interface of the subdomains. Hence, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of the multi-domain PINNs is sensitive to the choice of the interface conditions for solution alignment. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine the optimal interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit models. The first one applies to the entire training procedure, and online updates a Gaussian process (GP) reward surrogate that given the PDE parameters and interface conditions predicts the solution error. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP surrogate for each phase to enable different condition selections at the two stages so as to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
T. Nguyen, R.G. Baraniuk, R.M. Kirby, S.J. Osher, B. Wang. Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization, Subtitled arXiv preprint arXiv:2208.00579, 2022.
Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accuracy. In response, we first interpret the linear attention and residual connections in computing the attention map as gradient descent steps. We then introduce momentum into these components and propose the \emphmomentum transformer, which utilizes momentum to improve the accuracy of linear transformers while maintaining linear memory and computational complexities. Furthermore, we develop an adaptive strategy to compute the momentum value for our model based on the optimal momentum for quadratic optimization. This adaptive momentum eliminates the need to search for the optimal momentum value and further enhances the performance of the momentum transformer. A range of experiments on both autoregressive and non-autoregressive tasks, including image generation and machine translation, demonstrate that the momentum transformer outperforms popular linear transformers in training efficiency and accuracy.
T.A.J. Ouermi, R.M. Kirby, M. Berzins. ENO-Based High-Order Data-Bounded and Constrained Positivity-Preserving Interpolation, Subtitled https://arxiv.org/abs/2204.06168, In Numerical Algorithms, 2022.
A number of key scientific computing applications that are based upon tensor-product grid constructions, such as numerical weather prediction (NWP) and combustion simulations, require property-preserving interpolation. Essentially Non-Oscillatory (ENO) interpolation is a classic example of such interpolation schemes. In the aforementioned application areas, property preservation often manifests itself as a requirement for either data boundedness or positivity preservation. For example, in NWP, one may have to interpolate between the grid on which the dynamics is calculated to a grid on which the physics is calculated (and back). Interpolating density or other key physical quantities without accounting for property preservation may lead to negative values that are nonphysical and result in inaccurate representations and/or interpretations of the physical data. Property-preserving interpolation is straightforward when used in the context of low-order numerical simulation methods. High-order property-preserving interpolation is, however, nontrivial, especially in the case where the interpolation points are not equispaced. In this paper, we demonstrate that it is possible to construct high-order interpolation methods that ensure either data boundedness or constrained positivity preservation. A novel feature of the algorithm is that the positivity-preserving interpolant is constrained; that is, the amount by which it exceeds the data values may be strictly controlled. The algorithm we have developed comes with theoretical estimates that provide sufficient conditions for data boundedness and constrained positivity preservation. We demonstrate the application of our algorithm on a collection of 1D and 2D numerical examples, and show that in all cases property preservation is respected.
S. Subramanian, R.M. Kirby, M.W. Mahoney, A. Gholami. Adaptive Self-supervision Algorithms for Physics-informed Neural Networks , Subtitled arXiv:2207.04084, 2022.
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adapting the location of the collocation points as training proceeds. Specifically, we propose a novel adaptive collocation scheme which progressively allocates more collocation points (without increasing their number) to areas where the model is making higher errors (based on the gradient of the loss function in the domain). This, coupled with a judicious restarting of the training during any optimization stalls (by simply resampling the collocation points in order to adjust the loss landscape) leads to better estimates for the prediction error. We present results for several problems, including a 2D Poisson and diffusion-advection system with different forcing functions. We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points. In contrast, our adaptive schemes can achieve up to an order of magnitude smaller error, with similar computational complexity as the baseline. Furthermore, we find that the adaptive methods consistently perform on-par or slightly better than vanilla PINN method, even for large collocation point regimes. The code for all the experiments has been open sourced.