Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications

2018


M. Hajij, B. Wang, C. Scheidegger, P. Rosen. “Visual Detection of Structural Changes in Time-Varying Graphs Using Persistent Homology,” In 2018 IEEE Pacific Visualization Symposium (PacificVis), IEEE, April, 2018.
DOI: 10.1109/pacificvis.2018.00024

ABSTRACT

Topological data analysis is an emerging area in exploratory data analysis and data mining. Its main tool, persistent homology, has become a popular technique to study the structure of complex, high-dimensional data. In this paper, we propose a novel method using persistent homology to quantify structural changes in time-varying graphs. Specifically, we transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time. We provide a visualization that assists in time-varying graph exploration and helps to identify patterns of behavior within the data. To validate our approach, we conduct several case studies on real-world datasets and show how our method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. We also examine whether a persistence-based similarity measure satisfies a set of well-established, desirable properties for graph metrics.



M. Hajij, B. Wang, P. Rosen. “MOG: Mapper on Graphs for Relationship Preserving Clustering,” In CoRR, 2018.

ABSTRACT

The interconnected nature of graphs often results in difficult to interpret clutter. Typically techniques focus on either decluttering by clustering nodes with similar properties or grouping edges with similar relationship. We propose using mapper, a powerful topological data analysis tool, to summarize the structure of a graph in a way that both clusters data with similar properties and preserves relationships. Typically, mapper operates on a given data by utilizing a scalar function defined on every point in the data and a cover for scalar function codomain. The output of mapper is a graph that summarize the shape of the space. In this paper, we outline how to use this mapper construction on an input graphs, outline three filter functions that capture important structures of the input graph, and provide an interface for interactively modifying the cover. To validate our approach, we conduct several case studies on synthetic and real world data sets and demonstrate how our method can give meaningful summaries for graphs with various complexities



J. Hampton, HR. Fairbanks, A. Narayan, A. Doostan. “Practical error bounds for a non-intrusive bi-fidelity approach to parametric/stochastic model reduction,” In Journal of Computational Physics, Vol. 368, Elsevier BV, pp. 315--332. September, 2018.
DOI: 10.1016/j.jcp.2018.04.015

ABSTRACT

For practical model-based demands, such as design space exploration and uncertainty quantification (UQ), a high-fidelity model that produces accurate outputs often has high computational cost, while a low-fidelity model with less accurate outputs has low computational cost. It is often possible to construct a bi-fidelity model having accuracy comparable with the high-fidelity model and computational cost comparable with the low-fidelity model. This work presents the construction and analysis of a non-intrusive (i.e., sample-based) bi-fidelity model that relies on the low-rank structure of the map between model parameters/uncertain inputs and the solution of interest, if exists. Specifically, we derive a novel, pragmatic estimate for the error committed by this bi-fidelity model. We show that this error bound can be used to determine if a given pair of low- and high-fidelity models will lead to an accurate bi-fidelity approximation. The cost of this error bound is relatively small and depends on the solution rank. The value of this error estimate is demonstrated using two example problems in the context of UQ, involving linear and non-linear partial differential equations.



Y. He, M. Razi, C. Forestiere, L. Dal Negro, R.M. Kirby. “Uncertainty quantification guided robust design for nanoparticles' morphology,” In Computer Methods in Applied Mechanics and Engineering, Elsevier BV, pp. 578--593. July, 2018.
DOI: 10.1016/j.cma.2018.03.027

ABSTRACT

The automatic inverse design of three-dimensional plasmonic nanoparticles enables scientists and engineers to explore a wide design space and to maximize a device's performance. However, due to the large uncertainty in the nanofabrication process, we may not be able to obtain a deterministic value of the objective, and the objective may vary dramatically with respect to a small variation in uncertain parameters. Therefore, we take into account the uncertainty in simulations and adopt a classical robust design model for a robust design. In addition, we propose an efficient numerical procedure for the robust design to reduce the computational cost of the process caused by the consideration of the uncertainty. Specifically, we use a global sensitivity analysis method to identify the important random variables and consider the non-important ones as deterministic, and consequently reduce the dimension of the stochastic space. In addition, we apply the generalized polynomial chaos expansion method for constructing computationally cheaper surrogate models to approximate and replace the full simulations. This efficient robust design procedure is performed by varying the particles' material among the most commonly used plasmonic materials such as gold, silver, and aluminum, to obtain different robust optimal shapes for the best enhancement of electric fields.



A. Jallepalli, J. Docampo-Sánchez, J.K. Ryan, R. Haimes, R.M. Kirby. “On the treatment of field quantities and elemental continuity in fem solutions,” In IEEE Transactions on Visualization and Computer Graphics, Vol. 24, No. 1, IEEE, pp. 903--912. Jan, 2018.
DOI: 10.1109/tvcg.2017.2744058

ABSTRACT

As the finite element method (FEM) and the finite volume method (FVM), both traditional and high-order variants, continue their proliferation into various applied engineering disciplines, it is important that the visualization techniques and corresponding data analysis tools that act on the results produced by these methods faithfully represent the underlying data. To state this in another way: the interpretation of data generated by simulation needs to be consistent with the numerical schemes that underpin the specific solver technology. As the verifiable visualization literature has demonstrated: visual artifacts produced by the introduction of either explicit or implicit data transformations, such as data resampling, can sometimes distort or even obfuscate key scientific features in the data. In this paper, we focus on the handling of elemental continuity, which is often only C0 continuous or piecewise discontinuous, when visualizing primary or derived fields from FEM or FVM simulations. We demonstrate that traditional data handling and visualization of these fields introduce visual errors. In addition, we show how the use of the recently proposed line-SIAC filter provides a way of handling elemental continuity issues in an accuracy-conserving manner with the added benefit of casting the data in a smooth context even if the representation is element discontinuous.



A. Janson, C. Butson. “Targeting Neuronal Fiber Tracts for Deep Brain Stimulation Therapy Using Interactive, Patient-Specific Models,” In Journal of Visualized Experiments, No. 138, MyJove Corporation, Aug, 2018.
DOI: 10.3791/57292

ABSTRACT

Deep brain stimulation (DBS), which involves insertion of an electrode to deliver stimulation to a localized brain region, is an established therapy for movement disorders and is being applied to a growing number of disorders. Computational modeling has been successfully used to predict the clinical effects of DBS; however, there is a need for novel modeling techniques to keep pace with the growing complexity of DBS devices. These models also need to generate predictions quickly and accurately. The goal of this project is to develop an image processing pipeline to incorporate structural magnetic resonance imaging (MRI) and diffusion weighted imaging (DWI) into an interactive, patient specific model to simulate the effects of DBS. A virtual DBS lead can be placed inside of the patient model, along with active contacts and stimulation settings, where changes in lead position or orientation generate a new finite element mesh and solution of the bioelectric field problem in near real-time, a timespan of approximately 10 seconds. This system also enables the simulation of multiple leads in close proximity to allow for current steering by varying anodes and cathodes on different leads. The techniques presented in this paper reduce the burden of generating and using computational models while providing meaningful feedback about the effects of electrode position, electrode design, and stimulation configurations to researchers or clinicians who may not be modeling experts.



M. Javanmardi, T. Tasdizen. “Domain adaptation for biomedical image segmentation using adversarial training,” In 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018), IEEE, pp. 554-558. April, 2018.
DOI: 10.1109/isbi.2018.8363637

ABSTRACT

Many biomedical image analysis applications require segmentation. Convolutional neural networks (CNN) have become a promising approach to segment biomedical images; however, the accuracy of these methods is highly dependent on the training data. We focus on biomedical image segmentation in the context where there is variation between source and target datasets and ground truth for the target dataset is very limited or non-existent. We use an adversarial based training approach to train CNNs to achieve good accuracy on the target domain. We use the DRIVE and STARE eye vasculture segmentation datasets and show that our approach can significantly improve results where we only use labels of one domain in training and test on the other domain. We also show improvements on membrane detection between MIC-CAI 2016 CREMI challenge and ISBI2013 EM segmentation challenge datasets.



A.L. Kapron, S.K. Aoki, J.A. Weiss, A.J. Krych, T.G. Maak. “Isolated focal cartilage and labral defects in patients with femoroacetabular impingement syndrome may represent new, unique injury patterns,” In Knee Surgery, Sports Traumatology, Arthroscopy, Springer Nature, Feb, 2018.
DOI: 10.1007/s00167-018-4861-2

ABSTRACT

Purpose

Develop a framework to quantify the size, location and severity of femoral and acetabular-sided cartilage and labral damage observed in patients undergoing hip arthroscopy, and generate a database of individual defect parameters to facilitate future research and treatment efforts.

Methods

The size, location, and severity of cartilage and labral damage were prospectively collected using a custom, standardized post-operative template for 100 consecutive patients with femoroacetabular impingement syndrome. Chondrolabral junction damage, isolated intrasubstance labral damage, isolated acetabular cartilage damage and femoral cartilage damage were quantified and recorded using a combination of Beck and ICRS criteria. Radiographic measurements including alpha angle, head–neck offset, lateral centre edge angle and acetabular index were calculated and compared to the aforementioned chondral data using a multivariable logistic regression model and adjusted odd's ratio. Reliability among measurements were assessed using the kappa statistic and intraclass coefficients were used to evaluate continuous variables.

Results

Damage to the acetabular cartilage originating at the chondrolabral junction was the most common finding in 97 hips (97%) and was usually accompanied by labral damage in 65 hips (65%). The width (p = 0.003) and clock-face length (p = 0.016) of the damaged region both increased alpha angle on anteroposterior films. 10% of hips had femoral cartilage damage while only 2 (2%) of hips had isolated defects to either the acetabular cartilage or labrum. The adjusted odds of severe cartilage (p = 0.022) and labral damage (p = 0.046) increased with radiographic cam deformity but was not related to radiographic measures of acetabular coverage.

Conclusions

Damage at the chondrolabral junction was very common in this hip arthroscopy cohort, while isolated defects to the acetabular cartilage or labrum were rare. These data demonstrate that the severity of cam morphology, quantified through radiographic measurements, is a primary predictor of location and severity of chondral and labral damage and focal chondral defects may represent a unique subset of patients that deserve further study.



V. Keshavarzzadeh, R.M. Kirby, A. Narayan. “Numerical integration in multiple dimensions with designed quadrature,” In CoRR, 2018.

ABSTRACT

We present a systematic computational framework for generating positive quadrature rules in multiple dimensions on general geometries. A direct moment-matching formulation that enforces exact integration on polynomial subspaces yields nonlinear conditions and geometric constraints on nodes and weights. We use penalty methods to address the geometric constraints, and subsequently solve a quadratic minimization problem via the Gauss-Newton method. Our analysis provides guidance on requisite sizes of quadrature rules for a given polynomial subspace, and furnishes useful user-end stability bounds on error in the quadrature rule in the case when the polynomial moment conditions are violated by a small amount due to, e.g., finite precision limitations or stagnation of the optimization procedure. We present several numerical examples investigating optimal low-degree quadrature rules, Lebesgue constants, and 100-dimensional quadrature. Our capstone examples compare our quadrature approach to popular alternatives, such as sparse grids and quasi-Monte Carlo methods, for problems in linear elasticity and topology optimization.



K Knudson, B Wang. “Discrete Stratified Morse Theory: A User's Guide,” In CoRR, 2018.

ABSTRACT

Inspired by the works of Forman on discrete Morse theory, which is a combinatorial adaptation to cell complexes of classical Morse theory on manifolds, we introduce a discrete analogue of the stratified Morse theory of Goresky and MacPherson (1988). We describe the basics of this theory and prove fundamental theorems relating the topology of a general simplicial complex with the critical simplices of a discrete stratified Morse function on the complex. We also provide an algorithm that constructs a discrete stratified Morse function out of an arbitrary function defined on a finite simplicial complex; this is different from simply constructing a discrete Morse function on such a complex. We borrow Forman's idea of a "user's guide," where we give simple examples to convey the utility of our theory.



L. Kuhnel, T. Fletcher, S. Joshi, S. Sommer. “Latent Space Non-Linear Statistics,” In CoRR, 2018.

ABSTRACT

Given data, deep generative models, such as variational autoencoders (VAE) and generative adversarial networks (GAN), train a lower dimensional latent representation of the data space. The linear Euclidean geometry of data space pulls back to a nonlinear Riemannian geometry on the latent space. The latent space thus provides a low-dimensional nonlinear representation of data and classical linear statistical techniques are no longer applicable. In this paper we show how statistics of data in their latent space representation can be performed using techniques from the field of nonlinear manifold statistics. Nonlinear manifold statistics provide generalizations of Euclidean statistical notions including means, principal component analysis, and maximum likelihood fits of parametric probability distributions. We develop new techniques for maximum likelihood inference in latent space, and adress the computational complexity of using geometric algorithms with high-dimensional data by training a separate neural network to approximate the Riemannian metric and cometric tensor capturing the shape of the learned data manifold.



S. Kumar, A. Humphrey, W. Usher, S. Petruzza, B. Peterson, J. A. Schmidt, D. Harris, B. Isaac, J. Thornock, T. Harman, V. Pascucci,, M. Berzins. “Scalable Data Management of the Uintah Simulation Framework for Next-Generation Engineering Problems with Radiation,” In Supercomputing Frontiers, Springer International Publishing, pp. 219--240. 2018.
ISBN: 978-3-319-69953-0
DOI: 10.1007/978-3-319-69953-0_13

ABSTRACT

The need to scale next-generation industrial engineering problems to the largest computational platforms presents unique challenges. This paper focuses on data management related problems faced by the Uintah simulation framework at a production scale of 260K processes. Uintah provides a highly scalable asynchronous many-task runtime system, which in this work is used for the modeling of a 1000 megawatt electric (MWe) ultra-supercritical (USC) coal boiler. At 260K processes, we faced both parallel I/O and visualization related challenges, e.g., the default file-per-process I/O approach of Uintah did not scale on Mira. In this paper we present a simple to implement, restructuring based parallel I/O technique. We impose a restructuring step that alters the distribution of data among processes. The goal is to distribute the dataset such that each process holds a larger chunk of data, which is then written to a file independently. This approach finds a middle ground between two of the most common parallel I/O schemes--file per process I/O and shared file I/O--in terms of both the total number of generated files, and the extent of communication involved during the data aggregation phase. To address scalability issues when visualizing the simulation data, we developed a lightweight renderer using OSPRay, which allows scientists to visualize the data interactively at high quality and make production movies. Finally, this work presents a highly efficient and scalable radiation model based on the sweeping method, which significantly outperforms previous approaches in Uintah, like discrete ordinates. The integrated approach allowed the USC boiler problem to run on 260K CPU cores on Mira.



B. Kundu, A. A. Brock, D. J. Englot, C. R. Butson, J. D. Rolston. “Deep brain stimulation for the treatment of disorders of consciousness and cognition in traumatic brain injury patients: a review,” In Neurosurgical Focus, Vol. 45, No. 2, Journal of Neurosurgery Publishing Group (JNSPG), pp. E14. Aug, 2018.
DOI: 10.3171/2018.5.focus18168

ABSTRACT

Traumatic brain injury (TBI) is a looming epidemic, growing most rapidly in the elderly population. Some of the most devastating sequelae of TBI are related to depressed levels of consciousness (e.g., coma, minimally conscious state) or deficits in executive function. To date, pharmacological and rehabilitative therapies to treat these sequelae are limited. Deep brain stimulation (DBS) has been used to treat a number of pathologies, including Parkinson disease, essential tremor, and epilepsy. Animal and clinical research shows that targets addressing depressed levels of consciousness include components of the ascending reticular activating system and areas of the thalamus. Targets for improving executive function are more varied and include areas that modulate attention and memory, such as the frontal and prefrontal cortex, fornix, nucleus accumbens, internal capsule, thalamus, and some brainstem nuclei. The authors review the literature addressing the use of DBS to treat higher-order cognitive dysfunction and disorders of consciousness in TBI patients, while also offering suggestions on directions for future research.



S. Liu, P.T. Bremer, J.J. Thiagarajan, V. Srikumar, B. Wang, Y. Livnat, V. Pascucci. “Visual Exploration of Semantic Relationships in Neural Word Embeddings,” In IEEE Transactions on Visualization and Computer Graphics, Vol. 24, No. 1, IEEE, pp. 553--562. Jan, 2018.
DOI: 10.1109/tvcg.2017.2745141

ABSTRACT

Constructing distributed representations for words through neural language models and using the resulting vector spaces for analysis has become a crucial component of natural language processing (NLP). However, despite their widespread application, little is known about the structure and properties of these spaces. To gain insights into the relationship between words, the NLP community has begun to adapt high-dimensional visualization techniques. In particular, researchers commonly use t-distributed stochastic neighbor embeddings (t-SNE) and principal component analysis (PCA) to create two-dimensional embeddings for assessing the overall structure and exploring linear relationships (e.g., word analogies), respectively. Unfortunately, these techniques often produce mediocre or even misleading results and cannot address domain-specific visualization challenges that are crucial for understanding semantic relationships in word embeddings. Here, we introduce new embedding techniques for visualizing semantic and syntactic analogies, and the corresponding tests to determine whether the resulting views capture salient structures. Additionally, we introduce two novel views for a comprehensive study of analogy relationships. Finally, we augment t-SNE embeddings to convey uncertainty information in order to allow a reliable interpretation. Combined, the different views address a number of domain-specific tasks difficult to solve with existing tools.



Image segmentation using disjunctive normal Bayesian shape, appearance models. “F. Mesadi, E. Erdil, M. Cetin, T. Tasdizen,” In IEEE Transactions on Medical Imaging, Vol. 37, No. 1, IEEE, pp. 293--305. Jan, 2018.
DOI: 10.1109/tmi.2017.2756929

ABSTRACT

The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. For instance, most active shape and appearance models require landmark points and assume unimodal shape and appearance distributions, and the level set representation does not support construction of local priors. In this paper, we present novel appearance and shape models for image segmentation based on a differentiable implicit parametric shape representation called a disjunctive normal shape model (DNSM). The DNSM is formed by the disjunction of polytopes, which themselves are formed by the conjunctions of half-spaces. The DNSM's parametric nature allows the use of powerful local prior statistics, and its implicit nature removes the need to use landmarks and easily handles topological changes. In a Bayesian inference framework, we model arbitrary shape and appearance distributions using nonparametric density estimations, at any local scale. The proposed local shape prior results in accurate segmentation even when very few training shapes are available, because the method generates a rich set of shape variations by locally combining training samples. We demonstrate the performance of the framework by applying it to both 2-D and 3-D data sets with emphasis on biomedical image segmentation applications.



Q.C. Nguyen, M. Sajjadi, M. McCullough, M. Pham, T.T. Nguyen, W. Yu, H. Meng, M. Wen, F. Li, K.R. Smith, K. Brunisholz, T, Tasdizen. “Neighbourhood looking glass: 360º automated characterisation of the built environment for neighbourhood effects research,” In Journal of Epidemiology and Community Health, BMJ, Jan, 2018.
DOI: 10.1136/jech-2017-209456

ABSTRACT

Background
Neighbourhood quality has been connected with an array of health issues, but neighbourhood research has been limited by the lack of methods to characterise large geographical areas. This study uses innovative computer vision methods and a new big data source of street view images to automatically characterise neighbourhood built environments.

Methods
A total of 430 000 images were obtained using Google's Street View Image API for Salt Lake City, Chicago and Charleston. Convolutional neural networks were used to create indicators of street greenness, crosswalks and building type. We implemented log Poisson regression models to estimate associations between built environment features and individual prevalence of obesity and diabetes in Salt Lake City, controlling for individual-level and zip code-level predisposing characteristics.

Results
Computer vision models had an accuracy of 86%–93% compared with manual annotations. Charleston had the highest percentage of green streets (79%), while Chicago had the highest percentage of crosswalks (23%) and commercial buildings/apartments (59%). Built environment characteristics were categorised into tertiles, with the highest tertile serving as the referent group. Individuals living in zip codes with the most green streets, crosswalks and commercial buildings/apartments had relative obesity prevalences that were 25%–28% lower and relative diabetes prevalences that were 12%–18% lower than individuals living in zip codes with the least abundance of these neighbourhood features.

Conclusion
Neighbourhood conditions may influence chronic disease outcomes. Google Street View images represent an underused data resource for the construction of built environment features.



C. Nobre, M. Streit, A. Lex. “Juniper: A Tree+ Table Approach to Multivariate Graph Visualization,” In CoRR, 2018.

ABSTRACT

Analyzing large, multivariate graphs is an important problem in many domains, yet such graphs are challenging to visualize. In this paper, we introduce a novel, scalable, tree+table multivariate graph visualization technique, which makes many tasks related to multivariate graph analysis easier to achieve. The core principle we follow is to selectively query for nodes or subgraphs of interest and visualize these subgraphs as a spanning tree of the graph. The tree is laid out in a linear layout, which enables us to juxtapose the nodes with a table visualization where diverse attributes can be shown. We also use this table as an adjacency matrix, so that the resulting technique is a hybrid node-link/adjacency matrix technique. We implement this concept in Juniper, and complement it with a set of interaction techniques that enable analysts to dynamically grow, re-structure, and aggregate the tree, as well as change the layout or show paths between nodes. We demonstrate the utility of our tool in usage scenarios for different multivariate networks: a bipartite network of scholars, papers, and citation metrics, and a multitype network of story characters, places, books, etc.



T.A.J, Ouermi, R. M. Kirby,, M. Berzins. “Performance Optimization Strategies for WRF Physics Schemes Used in Weather Modeling,” In International Journal of Networking and Computing, Vol. 8, No. 2, IJNC , pp. 301--327. 2018.
DOI: 10.15803/ijnc.8.2_301

ABSTRACT

Performance optimization in the petascale era and beyond in the exascale era has and will require modifications of legacy codes to take advantage of new architectures with large core counts and SIMD units. The Numerical Weather Prediction (NWP) physics codes considered here are optimized using thread-local structures of arrays (SOA). High-level and low-level optimization strategies are applied to the WRF Single-Moment 6-Class Microphysics Scheme (WSM6) and Global Forecast System (GFS) physics codes used in the NEPTUNE forecast code. By building on previous work optimizing WSM6 on the Intel Knights Landing (KNL), it is shown how to further optimize WMS6 and GFS physics, and GFS radiation on Intel KNL, Haswell, and potentially on future micro-architectures with many cores and SIMD vector units. The optimization techniques used herein employ thread-local structures of arrays (SOA), an OpenMP directive, OMP SIMD, and minor code transformations to enable better utilization of SIMD units, increase parallelism, improve locality, and reduce memory traffic. The optimized versions of WSM6, GFS physics, GFS radiation run 70, 27, and 23 faster (respectively) on KNL and 26, 18 and 30 faster (respectively) on Haswell than their respective original serial versions. Although this work targets WRF physics schemes, the findings are transferable to other performance optimization contexts and provide insight into the optimization of codes with complex physical models for present and near-future architectures with many core and vector units.



B. Peterson, A. Humphrey, J. Holmen T. Harman, M. Berzins, D. Sunderland, H.C. Edwards. “Demonstrating GPU Code Portability and Scalability for Radiative Heat Transfer Computations,” In Journal of Computational Science, Elsevier BV, June, 2018.
ISSN: 1877-7503
DOI: 10.1016/j.jocs.2018.06.005

ABSTRACT

High performance computing frameworks utilizing CPUs, Nvidia GPUs, and/or Intel Xeon Phis necessitate portable and scalable solutions for application developers. Nvidia GPUs in particular present numerous portability challenges with a different programming model, additional memory hierarchies, and partitioned execution units among streaming multiprocessors. This work presents modifications to the Uintah asynchronous many-task runtime and the Kokkos portability library to enable one single codebase for complex multiphysics applications to run across different architectures. Scalability and performance results are shown on multiple architectures for a globally coupled radiation heat transfer simulation, ranging from a single node to 16384 Titan compute nodes.



B. Peterson, A. Humphrey, D. Sunderland, J. Sutherland, T. Saad, H. Dasari, M. Berzins. “Automatic Halo Management for the Uintah GPU-Heterogeneous Asynchronous Many-Task Runtime,” In International Journal of Parallel Programming, Dec, 2018.
ISSN: 1573-7640
DOI: 10.1007/s10766-018-0619-1

ABSTRACT

The Uintah computational framework is used for the parallel solution of partial differential equations on adaptive mesh refinement grids using modern supercomputers. Uintah is structured with an application layer and a separate runtime system. Uintah is based on a distributed directed acyclic graph (DAG) of computational tasks, with a task scheduler that efficiently schedules and executes these tasks on both CPU cores and on-node accelerators. The runtime system identifies task dependencies, creates a task graph prior to the execution of these tasks, automatically generates MPI message tags, and automatically performs halo transfers for simulation variables. Automating halo transfers in a heterogeneous environment poses significant challenges when tasks compute within a few milliseconds, as runtime overhead affects wall time execution, or when simulation variables require large halos spanning most or all of the computational domain, as task dependencies become expensive to process. These challenges are magnified at production scale when application developers require each compute node perform thousands of different halo transfers among thousands simulation variables. The principal contribution of this work is to (1) identify and address inefficiencies that arise when mapping tasks onto the GPU in the presence of automated halo transfers, (2) implement new schemes to reduce runtime system overhead, (3) minimize application developer involvement with the runtime, and (4) show overhead reduction results from these improvements.