L. Zhou, M. Rivinius, C. R. Johnson,, D. Weiskopf. Photographic High-Dynamic-Range Scalar Visualization, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2020.
We propose a photographic method to show scalar values of high dynamic range (HDR) by color mapping for 2D visualization. We combine (1) tone-mapping operators that transform the data to the display range of the monitor while preserving perceptually important features based on a systematic evaluation and (2) simulated glares that highlight high-value regions. Simulated glares are effective for highlighting small areas (of a few pixels) that may not be visible with conventional visualizations; through a controlled perception study, we confirm that glare is preattentive. The usefulness of our overall photographic HDR visualization is validated through the feedback of expert users.
Objective. During deep brain stimulation (DBS), it is well understood that extracellular cathodic stimulation can cause activation of passing axons. Activation can be predicted from the second derivative of the electric potential along an axon, which depends on axonal orientation with respect to the stimulation source. We hypothesize that fiber orientation influences activation thresholds and that fiber orientations can be selectively targeted with DBS waveforms. Approach. We used bioelectric field and multicompartment NEURON models to explore preferential activation based on fiber orientation during monopolar or bipolar stimulation. Preferential fiber orientation was extracted from the principal eigenvectors and eigenvalues of the Hessian matrix of the electric potential. We tested cathodic, anodic, and charge-balanced pulses to target neurons based on fiber orientation in general and clinical scenarios. Main results. Axons passing the DBS lead have positive second derivatives around a cathode, whereas orthogonal axons have positive second derivatives around an anode, as indicated by the Hessian. Multicompartment NEURON models confirm that passing fibers are activated by cathodic stimulation, and orthogonal fibers are activated by anodic stimulation. Additionally, orthogonal axons have lower thresholds compared to passing axons. In a clinical scenario, fiber pathways associated with therapeutic benefit can be targeted with anodic stimulation at 50% lower stimulation amplitudes. Significance. Fiber orientations can be selectively targeted with simple changes to the stimulus waveform. Anodic stimulation preferentially activates orthogonal fibers, approaching or leaving the electrode, at lower thresholds for similar therapeutic benefit in DBS with decreased power consumption.
C.J. Anderson, D.N. Anderson, S.M. Pulst, C.R. Butson, A.D. Dorval.
Neural Selectivity, Efficiency, and Dose Equivalence in Deep Brain Stimulation through Pulse Width Tuning and Segmented Electrodes, In bioRxiv, Cold Spring Harbor Laboratory, April, 2019.
Achieving deep brain stimulation (DBS) dose equivalence is challenging, especially with pulse width tuning and directional contacts. Further, the precise effects of pulse width tuning are unknown.
We created multicompartment neuron models for two axon diameters and used finite element modeling to determine extracellular influence from standard and segmented electrodes. We analyzed axon activation profiles and calculated volumes of tissue activated.
Long pulse widths focus the stimulation effect on small, nearby fibers, suppressing white matter tract activation (responsible for some DBS side effects) and improving battery utilization. Directional leads enable similar benefits to a greater degree. We derive equations for equivalent activation with pulse width tuning and segmented contacts.
We find agreement with classic studies and reinterpret recent articles concluding that short pulse widths focus the stimulation effect on small, nearby fibers, decrease side effects, and improve power consumption. Our field should reconsider shortened pulse widths.
Directional deep brain stimulation (DBS) leads have recently been approved and used in patients, and growing evidence suggests that directional contacts can increase the therapeutic window by redirecting stimulation to the target region while avoiding side-effect-inducing regions. We outline the design, fabrication, and testing of a novel directional DBS lead, theμDBS, which utilizes microscale contacts to increase the spatial resolution of stimulation steering and improve the selectivity in targeting small diameter fibers. We outline the steps of fabrication of theμDBS, from an integrated circuit design to post-processing and validation testing. We tested the onboard digital circuitry for programming fidelity, characterized impedance for a variety of electrode sizes, and demonstrated functionality in a saline bath. In a computational experiment,we determined that reduced electrode sizes focus the stimulation effect on small, nearby fibers. Smaller electrode sizes allow for a relative decrease in small-diameter axon thresholds compared to thresholds of large-diameter fibers, demonstrating a focusing of the stimulation effect within small, and possibly therapeutic, fibers. This principle of selectivity could be useful in further widening the window of therapy. TheμDBS offers a unique, multi resolution design in which any combination of microscale contacts can be used together to function as electrodes of various shapes and sizes. Multiscale electrodes could be useful in selective neural targeting for established neurological targets and in exploring novel treatment targets for new neurological indications.
C. C. Aquino, G. Duffley, D. M. Hedges, J. Vorwerk, P. A. House, H. B. Ferraz, J. D. Rolston, C. R. Butson, L. E. Schrock.
Interleaved deep brain stimulation for dyskinesia management in Parkinson's disease, In Movement Disorders, 2019.
In patients with Parkinson's disease, stimulation above the subthalamic nucleus (STN) may engage the pallidofugal fibers and directly suppress dyskinesia.
The objective of this study was to evaluate the effect of interleaving stimulation through a dorsal deep brain stimulation contact above the STN in a cohort of PD patients and to define the volume of tissue activated with antidyskinesia effects.
We analyzed the Core Assessment Program for Surgical Interventional Therapies dyskinesia scale, Unified Parkinson's Disease Rating Scale parts III and IV, and other endpoints in 20 patients with interleaving stimulation for management of dyskinesia. Individual models of volume of tissue activated and heat maps were used to identify stimulation sites with antidyskinesia effects.
The Core Assessment Program for Surgical Interventional Therapies dyskinesia score in the on medication phase improved 70.9 ± 20.6% from baseline with noninterleaved settings (P < 0.003). With interleaved settings, dyskinesia improved 82.0 ± 27.3% from baseline (P < 0.001) and 61.6 ± 39.3% from the noninterleaved phase (P = 0.006). The heat map showed a concentration of volume of tissue activated dorsally to the STN during the interleaved setting with an antidyskinesia effect.
Interleaved deep brain stimulation using the dorsal contacts can directly suppress dyskinesia, probably because of the involvement of the pallidofugal tract, allowing more conservative medication reduction. © 2019 International Parkinson and Movement Disorder Society
We present a framework for the analysis of uncertainty in isocontour extraction. The marching squares (MS) algorithm for isocontour reconstruction generates a linear topology that is consistent with hyperbolic curves of a piecewise bilinear interpolation. The saddle points of the bilinear interpolant cause topological ambiguity in isocontour extraction. The midpoint decider and the asymptotic decider are well-known mathematical techniques for resolving topological ambiguities. The latter technique investigates the data values at the cell saddle points for ambiguity resolution. The uncertainty in data, however, leads to uncertainty in underlying bilinear interpolation functions for the MS algorithm, and hence, their saddle points. In our work, we study the behavior of the asymptotic decider when data at grid vertices is uncertain. First, we derive closed-form distributions characterizing variations in the saddle point values for uncertain bilinear interpolants. The derivation assumes uniform and nonparametric noise models, and it exploits the concept of ratio distribution for analytic formulations. Next, the probabilistic asymptotic decider is devised for ambiguity resolution in uncertain data using distributions of the saddle point values derived in the first step. Finally, the confidence in probabilistic topological decisions is visualized using a colormapping technique. We demonstrate the higher accuracy and stability of the probabilistic asymptotic decider in uncertain data with regard to existing decision frameworks, such as deciders in the mean field and the probabilistic midpoint decider, through the isocontour visualization of synthetic and real datasets.
A statistical framework for quantification and visualisation of positional uncertainty in deep brain stimulation electrodes, In Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, Vol. 7, No. 4, Taylor & Francis, pp. 438-449. 2019.
Deep brain stimulation (DBS) is an established therapy for treating patients with movement disorders such as Parkinson’s disease. Patient-specific computational modelling and visualisation have been shown to play a key role in surgical and therapeutic decisions for DBS. The computational models use brain imaging, such as magnetic resonance (MR) and computed tomography (CT), to determine the DBS electrode positions within the patient’s head. The finite resolution of brain imaging, however, introduces uncertainty in electrode positions. The DBS stimulation settings for optimal patient response are sensitive to the relative positioning of DBS electrodes to a specific neural substrate (white/grey matter). In our contribution, we study positional uncertainty in the DBS electrodes for imaging with finite resolution. In a three-step approach, we first derive a closed-form mathematical model characterising the geometry of the DBS electrodes. Second, we devise a statistical framework for quantifying the uncertainty in the positional attributes of the DBS electrodes, namely the direction of longitudinal axis and the contact-centre positions at subvoxel levels. The statistical framework leverages the analytical model derived in step one and a Bayesian probabilistic model for uncertainty quantification. Finally, the uncertainty in contact-centre positions is interactively visualised through volume rendering and isosurfacing techniques. We demonstrate the efficacy of our contribution through experiments on synthetic and real datasets. We show that the spatial variations in true electrode positions are significant for finite resolution imaging, and interactive visualisation can be instrumental in exploring probabilistic positional variations in the DBS lead.
P. R. Atkins, Y. Shin, P. Agrawal, S. Y. Elhabian, R. T. Whitaker, J. A. Weiss, S. K. Aoki, C. L. Peters, A. E. Anderson. Which Two-dimensional Radiographic Measurements of Cam Femoroacetabular Impingement Best Describe the Three-dimensional Shape of the Proximal Femur?, In Clinical Orthopaedics and Related Research, Vol. 477, No. 1, 2019.
Many two-dimensional (2-D) radiographic views are used to help diagnose cam femoroacetabular impingement (FAI), but there is little consensus as to which view or combination of views is most effective at visualizing the magnitude and extent of the cam lesion (ie, severity). Previous studies have used a single image from a sequence of CT or MR images to serve as a reference standard with which to evaluate the ability of 2-D radiographic views and associated measurements to describe the severity of the cam lesion. However, single images from CT or MRI data may fail to capture the apex of the cam lesion. Thus, it may be more appropriate to use measurements of three-dimensional (3-D) surface reconstructions from CT or MRI data to serve as an anatomic reference standard when evaluating radiographic views and associated measurements used in the diagnosis of cam FAI.
The purpose of this study was to use digitally reconstructed radiographs and 3-D statistical shape modeling to (1) determine the correlation between 2-D radiographic measurements of cam FAI and 3-D metrics of proximal femoral shape; and 2) identify the combination of radiographic measurements from plain film projections that were most effective at predicting the 3-D shape of the proximal femur.
This study leveraged previously acquired CT images of the femur from a convenience sample of 37 patients (34 males; mean age, 27 years, range, 16-47 years; mean body mass index [BMI], 24.6 kg/m, range, 19.0-30.2 kg/m) diagnosed with cam FAI imaged between February 2005 and January 2016. Patients were diagnosed with cam FAI based on a culmination of clinical examinations, history of hip pain, and imaging findings. The control group consisted of 59 morphologically normal control participants (36 males; mean age, 29 years, range, 15-55 years; mean BMI, 24.4 kg/m, range, 16.3-38.6 kg/m) imaged between April 2008 and September 2014. Of these controls, 30 were cadaveric femurs and 29 were living participants. All controls were screened for evidence of femoral deformities using radiographs. In addition, living control participants had no history of hip pain or previous surgery to the hip or lower limbs. CT images were acquired for each participant and the surface of the proximal femur was segmented and reconstructed. Surfaces were input to our statistical shape modeling pipeline, which objectively calculated 3-D shape scores that described the overall shape of the entire proximal femur and of the region of the femur where the cam lesion is typically located. Digital reconstructions for eight plain film views (AP, Meyer lateral, 45° Dunn, modified 45° Dunn, frog-leg lateral, Espié frog-leg, 90° Dunn, and cross-table lateral) were generated from CT data. For each view, measurements of the α angle and head-neck offset were obtained by two researchers (intraobserver correlation coefficients of 0.80-0.94 for the α angle and 0.42-0.80 for the head-neck offset measurements). The relationships between radiographic measurements from each view and the 3-D shape scores (for the entire proximal femur and for the region specific to the cam lesion) were assessed with linear correlation. Additionally, partial least squares regression was used to determine which combination of views and measurements was the most effective at predicting 3-D shape scores.
Three-dimensional shape scores were most strongly correlated with α angle on the cross-table view when considering the entire proximal femur (r = -0.568; p < 0.001) and on the Meyer lateral view when considering the region of the cam lesion (r = -0.669; p < 0.001). Partial least squares regression demonstrated that measurements from the Meyer lateral and 90° Dunn radiographs produced the optimized regression model for predicting shape scores for the proximal femur (R = 0.405, root mean squared error of prediction [RMSEP] = 1.549) and the region of the cam lesion (R = 0.525, RMSEP = 1.150). Interestingly, views with larger differences in the α angle and head-neck offset between control and cam FAI groups did not have the strongest correlations with 3-D shape.
Considered together, radiographic measurements from the Meyer lateral and 90° Dunn views provided the most effective predictions of 3-D shape of the proximal femur and the region of the cam lesion as determined using shape modeling metrics.
Our results suggest that clinicians should consider using the Meyer lateral and 90° Dunn views to evaluate patients in whom cam FAI is suspected. However, the α angle and head-neck offset measurements from these and other plain film views could describe no more than half of the overall variation in the shape of the proximal femur and cam lesion. Thus, caution should be exercised when evaluating femoral head anatomy using the α angle and head-neck offset measurements from plain film radiographs. Given these findings, we believe there is merit in pursuing research that aims to develop the framework necessary to integrate statistical shape modeling into clinical evaluation, because this could aid in the diagnosis of cam FAI.
M. Berzins. Time Integration Errors and Energy Conservation Properties of the Stormer Verlet Method Applied to MPM, In Proceedings of VI International Conference on Particle-based Methods – Fundamentals and Applications, Barcelona, Edited by E. O ̃ nate, M. Bischoff, D.R.J. Owen, P. Wriggers & T. Zohdi, PARTICLES 2019, October, 2019.
The success of the Material Point Method (MPM) in solving many challenging problems nevertheless raises some open questions regarding the fundamental properties of the method such as the energy conservation since being addressed by Bardenhagen and by Love and Sulsky. Similarly while low order symplectic time integration techniques are used with MPM, higher order methods have not been used. For this reason the Stormer Verlet method, a popular and widely-used symplectic method is applied to MPM. Both the time integration error and the energy conservation properties of this method applied to MPM are considered. The method is shown to have locally third order accuracy of energy conservation in time. This is in contrast to the locally second order accuracy in energy conservation of the methods that are used in many MPM calculations. This third accuracy accuracy is demonstrated both locally and globally on a standard MPM test example.
R. Bhalodia, S. Y. Elhabian, L. Kavan, R. T. Whitaker. A Cooperative Autoencoder for Population-Based Regularization of CNN Image Registration, In Medical Image Computing and Computer Assisted Intervention – MICCAI 2019, In Medical Image Computing and Computer Assisted Intervention -- MICCAI 2019, Springer International Publishing, pp. 391--400. 2019.
Spatial transformations are enablers in a variety of medical image analysis applications that entail aligning images to a common coordinate systems. Population analysis of such transformations is expected to capture the underlying image and shape variations, and hence these transformations are required to produce anatomically feasible correspondences. This is usually enforced through some smoothness-based generic metric or regularization of the deformation field. Alternatively, population-based regularization has been shown to produce anatomically accurate correspondences in cases where anatomically unaware (i.e., data independent) regularization fail. Recently, deep networks have been used to generate spatial transformations in an unsupervised manner, and, once trained, these networks are computationally faster and as accurate as conventional, optimization-based registration methods. However, the deformation fields produced by these networks require smoothness penalties, just as the conventional registration methods, and ignores population-level statistics of the transformations. Here, we propose a novel neural network architecture that simultaneously learns and uses the population-level statistics of the spatial transformations to regularize the neural networks for unsupervised image registration. This regularization is in the form of a bottleneck autoencoder, which learns and adapts to the population of transformations required to align input images by encoding the transformations to a low dimensional manifold. The proposed architecture produces deformation fields that describe the population-level features and associated correspondences in an anatomically relevant manner and are statistically compact relative to the state-of-the-art approaches while maintaining computational efficiency. We demonstrate the efficacy of the proposed architecture on synthetic data sets, as well as 2D and 3D medical data.
Computational models are a popular tool for predicting the effects of deep brain stimulation (DBS) on neural tissue. One commonly used model, the volume of tissue activated (VTA), is computed using multiple methodologies. We quantified differences in the VTAs generated by five methodologies: the traditional axon model method, the electric field norm, and three activating function based approaches - the activating function at each grid point in the tangential direction (AF-Tan) or in the maximally activating direction (AF-3D), and the maximum activating function along the entire length of a tangential fiber (AF-Max).
Approach: We computed the VTA using each method across multiple stimulation settings. The resulting volumes were compared for similarity, and the methodologies were analyzed for their differences in behavior.
Main Results: Activation threshold values for both the electric field norm and the activating function vary with regards to electrode configuration, pulse width, and frequency. All methods produced highly similar volumes for monopolar stimulation. For bipolar electrode configurations, only the maximum activating function along the tangential axon method, AF-Max, produced similar volumes to those produced by the axon model method. Further analysis revealed that both of these methods are biased by their exclusive use of tangential fiber orientations. In contrast, the activating function in the maximally activating direction method, AF-3D, produces a VTA that is free of axon orientation and projection bias.
Significance: Simulating tangentially oriented axons, the standard approach of computing the VTA, is too computationally expensive for widespread implementation and yields results biased by the assumption of tangential fiber orientation. In this work, we show that a computationally efficient method based on the activating function, AF-Max, reliably reproduces the VTAs generated by direct axon modeling. Further, we propose another method, AF-3D as a potentially superior model for representing generic neural tissue activation.
Topological techniques have proven to be a powerful tool in the analysis and visualization of large-scale scientific data. In particular, the Morse-Smale complex and its various components provide a rich framework for robust feature definition and computation. Consequently, there now exist a number of approaches to compute Morse-Smale complexes for large-scale data in parallel. However, existing techniques are based on discrete concepts which produce the correct topological structure but are known to introduce grid artifacts in the resulting geometry. Here, we present a new approach that combines parallel streamline computation with combinatorial methods to construct a high-quality discrete Morse-Smale complex. In addition to being invariant to the orientation of the underlying grid, this algorithm allows users to selectively build a subset of features using high-quality geometry. In particular, a user may specifically select which ascending/descending manifolds are reconstructed with improved accuracy, focusing computational effort where it matters for subsequent analysis. This approach computes Morse-Smale complexes for larger data than previously feasible with significant speedups. We demonstrate and validate our approach using several examples from a variety of different scientific domains, and evaluate the performance of our method.
M. Han, I. Wald, W. Usher, Q. Wu, F. Wang, V. Pascicci, C. D. Hansen, C. R. Johnson. Ray Tracing Generalized Tube Primitives: Method and Applications, In Computer Graphics Forum, Vol. 38, No. 3, John Wiley & Sons Ltd., 2019.
We present a general high-performance technique for ray tracing generalized tube primitives. Our technique efficiently supports tube primitives with fixed and varying radii, general acyclic graph structures with bifurcations, and correct transparency with interior surface removal. Such tube primitives are widely used in scientific visualization to represent diffusion tensor imaging tractographies, neuron morphologies, and scalar or vector fields of 3D flow. We implement our approach within the OSPRay ray tracing framework, and evaluate it on a range of interactive visualization use cases of fixed- and varying-radius streamlines, pathlines, complex neuron morphologies, and brain tractographies. Our proposed approach provides interactive, high-quality rendering, with low memory overhead.
Quantifying Impurity Effects on the Surface Morphology of α-U3O8, In Analytical Chemistry, 2019.
The morphological effect of impurities on α-U3O8 has been investigated. This study provides the first evidence that the presence of impurities can alter nuclear material morphology, and these changes can be quantified to aid in revealing processing history. Four elements: Ca, Mg, V, and Zr were implemented in the uranyl peroxide synthesis route and studied individually within the α-U3O8. Six total replicates were synthesized, and replicates 1–3 were filtered and washed with Millipore water (18.2 MΩ) to remove any residual nitrates. Replicates 4–6 were filtered but not washed to determine the amount of impurities removed during washing. Inductively coupled plasma mass spectrometry (ICP-MS) was employed at key points during the synthesis to quantify incorporation of the impurity. Each sample was characterized using powder X-ray diffraction (p-XRD), high-resolution scanning electron microscopy (HRSEM), and SEM with energy dispersive X-ray spectroscopy (SEM-EDS). p-XRD was utilized to evaluate any crystallographic changes due to the impurities; HRSEM imagery was analyzed with Morphological Analysis for MAterials (MAMA) software and machine learning classification for quantification of the morphology; and SEM-EDS was utilized to locate the impurity within the α-U3O8. All samples were found to be quantifiably distinguishable, further demonstrating the utility of quantitative morphology as a signature for the processing history of nuclear material.
In the present study, surface morphological differences of mixtures of triuranium octoxide (U3O8), synthesized from uranyl peroxide (UO4) and ammonium diuranate (ADU), were investigated. The purity of each sample was verified using powder X-ray diffractometry (p-XRD), and scanning electron microscopy (SEM) images were collected to identify unique morphological features. The U3O8 from ADU and UO4 was found to be unique. Qualitatively, both particles have similar features being primarily circular in shape. Using the morphological analysis of materials (MAMA) software, particle shape and size were quantified. UO4 was found to produce U3O8 particles three times the area of those produced from ADU. With the starting morphologies quantified, U3O8 samples from ADU and UO4 were physically mixed in known quantities. SEM images were collected of the mixed samples, and the MAMA software was used to quantify particle attributes. As U3O8 particles from ADU were unique from UO4, the composition of the mixtures could be quantified using SEM imaging coupled with particle analysis. This provides a novel means of quantifying processing histories of mixtures of uranium oxides. Machine learning was also used to help further quantify characteristics in the image database through direct classification and particle segmentation using deep learning techniques based on Convolutional Neural Networks (CNN). It demonstrates that these techniques can distinguish the mixtures with high accuracy as well as showing significant differences in morphology between the mixtures. Results from this study demonstrate the power of quantitative morphological analysis for determining the processing history of nuclear materials.
There currently exist two dominant strategies to reduce data sizes in analysis and visualization: reducing the precision of the data, e.g., through quantization, or reducing its resolution, e.g., by subsampling. Both have advantages and disadvantages and both face fundamental limits at which the reduced information ceases to be useful. The paper explores the additional gains that could be achieved by combining both strategies. In particular, we present a common framework that allows us to study the trade-off in reducing precision and/or resolution in a principled manner. We represent data reduction schemes as progressive streams of bits and study how various bit orderings such as by resolution, by precision, etc., impact the resulting approximation error across a variety of data sets as well as analysis tasks. Furthermore, we compute streams that are optimized for different tasks to serve as lower bounds on the achievable error. Scientific data management systems can use the results presented in this paper as guidance on how to store and stream data to make efficient use of the limited storage and bandwidth in practice.
J. K. Holmen, B. Peterson, A. Humphrey, D. Sunderland, O. H. Diaz-Ibarra, J. N. Thornock, M. Berzins. Portably Improving Uintah's Readiness for Exascale Systems Through the Use of Kokkos, SCI Institute, 2019.
Uncertainty and diversity in future HPC systems, including those for exascale, makes portable codebases desirable. To ease future ports, the Uintah Computational Framework has adopted the Kokkos C++ Performance Portability Library. This paper describes infrastructure advancements and performance improvements using partitioning functionality recently added to Kokkos within Uintah's MPI+Kokkos hybrid parallelism approach. Results are presented for two challenging calculations that have been refactored to support Kokkos::OpenMP and Kokkos::Cuda back-ends. These results demonstrate performance improvements up to (i) 2.66x when refactoring for portability, (ii) 81.59x when adding loop-level parallelism via Kokkos back-ends, and (iii) 2.63x when more eciently using a node. Good strong-scaling characteristics to 442,368 threads across 1728 Knights Landing processors are also shown. These improvements have been achieved with little added overhead (sub-millisecond, consuming up to 0.18% of per-timestep time). Kokkos adoption and refactoring lessons are also discussed.
J. K. Holmen, B. Peterson, M. Berzins. An Approach for Indirectly Adopting a Performance Portability Layer in Large Legacy Codes, In 2nd International Workshop on Performance, Portability, and Productivity in HPC (P3HPC), In conjunction with SC19, 2019.
Diversity among supported architectures in current and emerging high performance computing systems, including those for exascale, makes portable codebases desirable. Portability of a codebase can be improved using a performance portability layer to provide access to multiple underlying programming models through a single interface. Direct adoption of a performance portability layer, however, poses challenges for large pre-existing software frameworks that may need to preserve legacy code and/or adopt other programming models in the future. This paper describes an approach for indirect adoption that introduces a framework-specific portability layer between the application developer and the adopted performance portability layer to help improve legacy code support and long-term portability for future architectures and programming models. This intermediate layer uses loop-level, application-level, and build-level components to ease adoption of a performance portability layer in large legacy codebases. Results are shown for two challenging case studies using this approach to make portable use of OpenMP and CUDA via Kokkos in an asynchronous many-task runtime system, Uintah. These results show performance improvements up to 2.7x when refactoring for portability and 2.6x when more efficiently using a node. Good strong-scaling to 442,368 threads across 1,728 Knights Landing processors are also shown using MPI+Kokkos at scale.
Alan Humphrey. Scalable Asynchronous Many-Task Runtime Solutions to Globally Coupled Problems, School of Computing, University of Utah, 2019.
Thermal radiation is an important physical process and a key mechanism in a class of challenging engineering and research problems. The principal exascale-candidate application motivating this research is a large eddy simulation (LES) aimed at predicting the performance of a commercial, 1200 MWe ultra-super critical (USC) coal boiler, with radiation as the dominant mode of heat transfer. Scalable modeling of radiation is currently one of the most challenging problems in large-scale simulations, due to the global, all-to-all physical and resulting computational connectivity. Fundamentally, radiation models impose global data dependencies, requiring each compute node in a distributed memory system to send data to, and receive data from, potentially every other node. This process can be prohibitively expensive on large distributed memory systems due to pervasive all-to-all message passing interface (MPI) communication. Correctness is also difficult to achieve when coordinating global communication of this kind. Asynchronous many-task (AMT) runtime systems are a possible leading alternative to mitigate programming challenges at the runtime system-level, sheltering the application developer from the complexities introduced by future architectures. However, large-scale parallel applications with complex global data dependencies, such as in radiation modeling, pose significant scalability challenges themselves, even for a highly tuned AMT runtime. The principal aims of this research are to demonstrate how the Uintah AMT runtime can be adapted, making it possible for complex multiphysics applications with radiation to scale on current petascale and emerging exascale architectures. For Uintah, which uses a directed acyclic graph to represent the computation and associated data dependencies, these aims are achieved through: 1) the use of an AMT runtime; 2) adapting and leveraging Uintah’s adaptive mesh refinement support to dramatically reduce computation, communication volume, and nodal memory footprint for radiation calculations; and 3) automating the all-to-all communication at the runtime level through a task graph dependency analysis phase designed to efficiently manage data dependencies inherent in globally coupled problems.