Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Large scale visualization on the Powerwall.
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications

2017


M. Feiszli, A. Narayan. “Numerical Computation of Weil-Peterson Geodesics in the Universal Teichmueller Space,” In SIAM Journal on Imaging Sciences, Vol. 10, No. 3, SIAM, pp. 1322--1345. Jan, 2017.
DOI: 10.1137/15M1043947

ABSTRACT

We propose an optimization algorithm for computing geodesics on the universal Teichm\"uller space T(1) in the Weil-Petersson (WP) metric. Another realization for T(1) is the space of planar shapes, modulo translation and scale, and thus our algorithm addresses a fundamental problem in computer vision: compute the distance between two given shapes. The identification of smooth shapes with elements on T(1) allows us to represent a shape as a diffeomorphism on S1. Then given two diffeomorphisms on S1 (i.e., two shapes we want connect with a flow), we formulate a discretized WP energy and the resulting problem is a boundary-value minimization problem. We numerically solve this problem, providing several examples of geodesic flow on the space of shapes, and verifying mathematical properties of T(1). Our algorithm is more general than the application here in the sense that it can be used to compute geodesics on any other Riemannian manifold.



M. Foote, P. Sabouri, A. Sawant, S. Joshi. “Rank Constrained Diffeomorphic Density Motion Estimation for Respiratory Correlated Computed Tomography,” In Graphs in Biomedical Image Analysis, Computational Anatomy and Imaging Genetics, Springer International Publishing, pp. 177--185. 2017.
DOI: 10.1007/978-3-319-67675-3_16

ABSTRACT

Motion estimation of organs in a sequence of images is important in numerous medical imaging applications. The focus of this paper is the analysis of 4D Respiratory Correlated Computed Tomography (RCCT) Imaging. It is hypothesized that the quasi-periodic breathing induced motion of organs in the thorax can be represented by deformations spanning a very low dimension subspace of the full infinite dimensional space of diffeomorphic transformations. This paper presents a novel motion estimation algorithm that includes the constraint for low-rank motion between the different phases of the RCCT images. Low-rank deformation solutions are necessary for the efficient statistical analysis and improved treatment planning and delivery. Although the application focus of this paper is RCCT the algorithm is quite general and applicable to various motion estimation problems in medical imaging.



K. Furmanova, S. Gratzl, H. Stitz, T. Zichner, M. Jaresova, M. Ennemoser, A. Lex, M. Streit. “Taggle: Scalable Visualization of Tabular Data through Aggregation,” In CoRR, 2017.

ABSTRACT

Visualization of tabular data---for both presentation and exploration purposes---is a well-researched area. Although effective visual presentations of complex tables are supported by various plotting libraries, creating such tables is a tedious process and requires scripting skills. In contrast, interactive table visualizations that are designed for exploration purposes either operate at the level of individual rows, where large parts of the table are accessible only via scrolling, or provide a high-level overview that often lacks context-preserving drill-down capabilities. In this work we present Taggle, a novel visualization technique for exploring and presenting large and complex tables that are composed of individual columns of categorical or numerical data and homogeneous matrices. The key contribution of Taggle is the hierarchical aggregation of data subsets, for which the user can also choose suitable visual representations.The aggregation strategy is complemented by the ability to sort hierarchically such that groups of items can be flexibly defined by combining categorical stratifications and by rich data selection and filtering capabilities. We demonstrate the usefulness of Taggle for interactive analysis and presentation of complex genomics data for the purpose of drug discovery.



E. Ghafoori, E.G. Kholmovski, S. Thomas, J. Silvernagel, N. Angel, N. Hu, D.J. Dosdall, R.s. MacLeod, R. Ranjan. “Characterization of Gadolinium Contrast Enhancement of Radiofrequency Ablation Lesions in Predicting Edema and Chronic Lesion Size,” In Circulation: Arrhythmia and Electrophysiology, Vol. 10, No. 11, Ovid Technologies (Wolters Kluwer Health), pp. e005599. Oct, 2017.
DOI: 10.1161/circep.117.005599

ABSTRACT

Background Magnetic resonance imaging (MRI) has been used to acutely visualize radiofrequency ablation lesions, but its accuracy in predicting chronic lesion size is unknown. The main goal of this study was to characterize different areas of enhancement in late gadolinium enhancement MRI done immediately after ablation to predict acute edema and chronic lesion size.

Methods and Results In a canine model (n=10), ventricular radiofrequency lesions were created using ThermoCool SmartTouch (Biosense Webster) catheter. All animals underwent MRI (late gadolinium enhancement and T2-weighted edema imaging) immediately after ablation and after 1, 2, 4, and 8 weeks. Edema, microvascular obstruction, and enhanced volumes were identified in MRI and normalized to chronic histological volume. Immediately after contrast administration, the microvascular obstruction region was 3.2±1.1 times larger than the chronic lesion volume in acute MRI. Even 60 minutes after contrast administration, edema was 8.7±3.31 times and the enhanced area 6.14±2.74 times the chronic lesion volume. Exponential fit to the microvascular obstruction volume was found to be the best predictor of chronic lesion volume at 26.14 minutes (95% prediction interval, 24.35–28.11 minutes) after contrast injection. The edema volume in late gadolinium enhancement correlated well with edema volume in T2-weighted MRI with an R2 of 0.99.

Conclusion Microvascular obstruction region on acute late gadolinium enhancement images acquired 26.1 minutes after contrast administration can accurately predict the chronic lesion volume. We also show that T1-weighted MRI images acquired immediately after contrast injection accurately shows edema resulting from radiofrequency ablation.



S. Ghimire, J. Dhamala, J. Coll-Font, J. D. Tate, M. S. Guillem, D. H. Brooks, R. S. MacLeod, L. Wang. “Overcoming Barriers to Quantification and Comparison of Electrocardiographic Imaging Methods: A Community-Based Approach,” In Computing in Cardiology, Vol. 44, 2017.

ABSTRACT

There has been a recent upsurge in the development of electrocardiographic imaging (ECGI) methods, along with a significant increase in clinical application. To better assess the state-of-the-art, enable reliable progress, and facilitate clinical adoption, it is important to be able to compare results in a comprehensive manner, scientifically and clinically. However, studies vary in modeling choices, computational methods, validation mechanisms and metrics, and clinical applications, making unified evaluation and comparison of ECGI a critical challenge.

This paper describes initial results of a project to address this challenge via a community-based approach organized by the Consortium for Electrocardiographic Imaging (CEI). We detail different aspects of this collective effort including a data sharing repository, a platform for comparison of different algorithms and modeling approaches on the same datasets, several active workgroups and progress made along these directions. We also summarize the results from groups participating in this collaboration and contributing solutions by applying their methods to the same dataset for comparison.



T. Gilray, S. Kumar. “Toward parallel CFA with datalog, MPI, and CUDA,” In Scheme and Functional Programming Workshop, 2017.

ABSTRACT

We present our recent experience working to design parallel functional control-flow analysis (CFA) using an encoding in Datalog and underlying relational algebra implemented for SIMD coprocessors and supercomputers. Control-flow analysis statically models the possible propagations of data and control through a target program, finitely obtaining a bound on reachable expressions and environments and on possible return and argument values. We used Souffl´e, a parallel CPU-based Datalog implementation from Oracle labs, and worked toward a new MPI-based distributed hash join implementation and an extension of the GPU-based relational algebra library RedFox.

In this paper, we provide introductions to functional flow analysis, Datalog, MPI, and CUDA, explaining the total process we are working on to bring these components together in an analysis pipeline toward the end of scaling functional program analyses by extracting their intrinsic parallelism in a principled manner.



W. W. Good, B. Erem, J. Coll-Font, D. H. Brooks, R. S. MacLeod. “Detecting Ischemic Stress to the Myocardium Using Laplacian Eigenmaps and Changes to Conduction Velocity,” In Computing in Cardiology, Vol. 44, IEEE, 2017.

ABSTRACT

The underlying pathophysiology of ischemia and its electrocardiographic consequences are poorly understood, resulting in unreliable diagnosis of this disease. This limited knowledge of underlying mechanisms suggests a data driven approach, which seeks to identify patterns in the ECG that can be linked statistically to underlying behavior and conditions of ischemic stress. The gold standard ECG metrics for evaluating ischemia monitor vertical deflections within the ST segment. However, ischemia influences all portions of the electrogram. Another metric that targets the QRS complex during ischemia is Conduction Velocity (CV). An even more inclusive, data driven approach is known as "Laplacian Eigenmaps" (LE), which can identify trajectories, or "manifolds", that respond to different spatiotemporal consequences of ischemic stress, and these changes to the trajectories on the manifold may serve as a clinically relevant biomarker. On this study, we compared the LE- and CV-based markers against two gold standards for detecting ischemic stress, both derived from the ST segment. We evaluated the response time and fidelity of each biomarker using a Time to Threshold (TTT) and Contrast Ratio (CR) measure, over 51 episodes recorded as cardiac electrograms from a canine model of controlled ischemia. The results show that metrics designed to monitor regions beyond the ST segment can perform at least as well, if not better, than traditional ST segment based metrics.



C. Gritton, J. Guilkey, J. Hooper, D. Bedrov, R. M. Kirby, M. Berzins. “Using the material point method to model chemical/mechanical coupling in the deformation of a silicon anode,” In Modelling and Simulation in Materials Science and Engineering, Vol. 25, No. 4, pp. 045005. 2017.

ABSTRACT

The lithiation and delithiation of a silicon battery anode is modeled using the material point method (MPM). The main challenges in modeling this process using the MPM is to simulate stress dependent diffusion coupled with concentration dependent stress within a material that undergoes large deformations. MPM is chosen as the numerical method of choice because of its ability to handle large deformations. A method for modeling diffusion within MPM is described. A stress dependent model for diffusivity and three different constitutive models that fully couple the equations for stress with the equations for diffusion are considered. Verifications tests for the accuracy of the numerical implementations of the models and validation tests with experimental results show the accuracy of the approach. The application of the fully coupled stress diffusion model implemented in MPM is applied to modeling the lithiation and delithiation of silicon nanopillars.



L. Guo, A. Narayan, T. Zhou, Y. Chen. “Stochastic Collocation Methods via L1 Minimization Using Randomized Quadratures,” In SIAM Journal on Scientific Computing, Vol. 39, No. 1, pp. A333--A359. Jan, 2017.
ISSN: 1064-8275
DOI: 10.1137/16M1059680

ABSTRACT

In this work, we discuss the problem of approximating a multivariate function via ℓ1 minimization method, using a random chosen sub-grid of the corresponding tensor grid of Gaussian points. The independent variables of the function are assumed to be random variables, and thus, the framework provides a non-intrusive way to construct the generalized polynomial chaos expansions, stemming from the motivating application of Uncertainty Quantification (UQ). We provide theoretical analysis on the validity of the approach. The framework includes both the bounded measures such as the uniform and the Chebyshev measure, and the unbounded measures which include the Gaussian measure. Several numerical examples are given to confirm the theoretical results.



J. K. Holmen, A. Humphrey, D. Sutherland, M. Berzins. “Improving Uintah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks,” In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC17, No. 27, pp. 27:1--27:8. 2017.
ISBN: 978-1-4503-5272-7
DOI: 10.1145/3093338.3093388

ABSTRACT

The University of Utah's Carbon Capture Multidisciplinary Simulation Center (CCMSC) is using the Uintah Computational Framework to predict performance of a 1000 MWe ultra-supercritical clean coal boiler. The center aims to utilize the Intel Xeon Phi-based DOE systems, Theta and Aurora, through the Aurora Early Science Program by using the Kokkos C++ library to enable node-level performance portability. This paper describes infrastructure advancements and portability improvements made possible by our integration of Kokkos within Uintah. Scalability results are presented that compare serial and data parallel task execution models for a challenging radiative heat transfer calculation, central to the center's predictive boiler simulations. These results demonstrate both good strong-scaling characteristics to 256 Knights Landing (KNL) processors on the NSF Stampede system, and show the KNL-based calculation to compete with prior GPU-based results for the same calculation.



J. Jakeman, A. Narayan, T. Zhou. “A Generalized Sampling and Preconditioning Scheme for Sparse Approximation of Polynomial Chaos Expansions,” In SIAM Journal on Scientific Computing, Vol. 39, No. 3, SIAM, pp. A1114--A1144. Jan, 2017.
ISSN: 1064-8275
DOI: 10.1137/16M1063885

ABSTRACT

In this paper we propose an algorithm for recovering sparse orthogonal polynomials using stochastic collocation. Our approach is motivated by the desire to use generalized polynomial chaos expansions (PCE) to quantify uncertainty in models subject to uncertain input parameters. The standard sampling approach for recovering sparse polynomials is to use Monte Carlo (MC) sampling of the density of orthogonality. However MC methods result in poor function recovery when the polynomial degree is high. Here we propose a general algorithm that can be applied to any admissible weight function on a bounded domain and a wide class of exponential weight functions defined on unbounded domains. Our proposed algorithm samples with respect to the weighted equilibrium measure of the parametric domain, and subsequently solves a preconditioned ℓ1-minimization problem, where the weights of the diagonal preconditioning matrix are given by evaluations of the Christoffel function. We present theoretical analysis to motivate the algorithm, and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest. Numerical examples are also provided that demonstrate that our proposed Christoffel Sparse Approximation algorithm leads to comparable or improved accuracy even when compared with Legendre and Hermite specific algorithms.



J. Jiang, Y. Chen, A. Narayan. “Offline-Enhanced Reduced Basis Method Through Adaptive Construction of the Surrogate Training Set,” In Journal of Scientific Computing, Vol. 73, No. 2-3, Springer Nature, pp. 853--875. September, 2017.
DOI: 10.1007/s10915-017-0551-3

ABSTRACT

The reduced basis method (RBM) is a popular certified model reduction approach for solving parametrized partial differential equations. One critical stage of the offline portion of the algorithm is a greedy algorithm, requiring maximization of an error estimate over parameter space. In practice this maximization is usually performed by replacing the parameter domain continuum with a discrete "training" set. When the dimension of parameter space is large, it is necessary to significantly increase the size of this training set in order to effectively search parameter space. Large training sets diminish the attractiveness of RBM algorithms since this proportionally increases the cost of the offline phase. In this work we propose novel strategies for offline RBM algorithms that mitigate the computational difficulty of maximizing error estimates over a training set. The main idea is to identify a subset of the training set, a "surrogate training set" (STS), on which to perform greedy algorithms. The STS we construct is much smaller in size than the full training set, yet our examples suggest that it is accurate enough to induce the solution manifold of interest at the current offline RBM iteration. We propose two algorithms to construct the STS: our first algorithm, the successive maximization method, is inspired by inverse transform sampling for non-standard univariate probability distributions. The second constructs an STS by identifying pivots in the Cholesky decomposition of an approximate error correlation matrix. We demonstrate the algorithm through numerical experiments, showing that it is capable of accelerating offline RBM procedures without degrading accuracy, assuming that the solution manifold has rapidly decaying Kolmogorov width.



M. Kern, A. Lex, N. Gehlenborg, C. R. Johnson. “Interactive Visual Exploration And Refinement Of Cluster Assignments,” In BMC Bioinformatics, Cold Spring Harbor Laboratory, April, 2017.
DOI: 10.1101/123844

ABSTRACT

Background:
With ever-increasing amounts of data produced in biology research, scientists are in need of efficient data analysis methods. Cluster analysis, combined with visualization of the results, is one such method that can be used to make sense of large data volumes. At the same time, cluster analysis is known to be imperfect and depends on the choice of algorithms, parameters, and distance measures. Most clustering algorithms don't properly account for ambiguity in the source data, as records are often assigned to discrete clusters, even if an assignment is unclear. While there are metrics and visualization techniques that allow analysts to compare clusterings or to judge cluster quality, there is no comprehensive method that allows analysts to evaluate, compare, and refine cluster assignments based on the source data, derived scores, and contextual data.

Results:
In this paper, we introduce a method that explicitly visualizes the quality of cluster assignments, allows comparisons of clustering results and enables analysts to manually curate and refine cluster assignments. Our methods are applicable to matrix data clustered with partitional, hierarchical, and fuzzy clustering algorithms. Furthermore, we enable analysts to explore clustering results in context of other data, for example, to observe whether a clustering of genomic data results in a meaningful differentiation in phenotypes.

Conclusions:
Our methods are integrated into Caleydo StratomeX, a popular, web-based, disease subtype analysis tool. We show in a usage scenario that our approach can reveal ambiguities in cluster assignments and produce improved clusterings that better differentiate genotypes and phenotypes.



S. Kumar, D. Hoang, S. Petruzza, J. Edwards, V. Pascucci. “Reducing Network Congestion and Synchronization Overhead During Aggregation of Hierarchical Data,” In 2017 IEEE 24th International Conference on High Performance Computing (HiPC), pp. 223-232. Dec, 2017.
DOI: 10.1109/HiPC.2017.00034

ABSTRACT

Hierarchical data representations have been shown to be effective tools for coping with large-scale scientific data. Writing hierarchical data on supercomputers, however, is challenging as it often involves all-to-one communication during aggregation of low-resolution data which tends to span the entire network domain, resulting in several bottlenecks. We introduce the concept of indexing templates, which succinctly describe data organization and can be used to alter movement of data in beneficial ways. We present two techniques, domain partitioning and localized aggregation, that leverage indexing templates to alleviate congestion and synchronization overheads during data aggregation. We report experimental results that show significant I/O speedup using our proposed schemes on two of today's fastest supercomputers, Mira and Shaheen II, using the Uintah and S3D simulation frameworks.



S. Kumar, D. Hoang, S. Petruzza, J. Edwards, V. Pascucci. “Reducing network congestion and synchronization overhead during aggregation of hierarchical data,” In 2017 IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, Dec, 2017.
DOI: 10.1109/hipc.2017.00034

ABSTRACT

Hierarchical data representations have been shown to be effective tools for coping with large-scale scientific data. Writing hierarchical data on supercomputers, however, is challenging as it often involves all-to-one communication during aggregation of low-resolution data which tends to span the entire network domain, resulting in several bottlenecks. We introduce the concept of indexing templates, which succinctly describe data organization and can be used to alter movement of data in beneficial ways. We present two techniques, domain partitioning and localized aggregation, that leverage indexing templates to alleviate congestion and synchronization overheads during data aggregation. We report experimental results that show significant I/O speedup using our proposed schemes on two of today's fastest supercomputers, Mira and Shaheen II, using the Uintah and S3D simulation frameworks.



S. McKenna, A. Lex, M. Meyer. “Worksheets for Guiding Novices through the Visualization Design Process,” In CoRR, 2017.

ABSTRACT

For visualization pedagogy, an important but challenging notion to teach is design, from making to evaluating visualization encodings, user interactions, or data visualization systems. In our previous work, we introduced the design activity framework to codify the high-level activities of the visualization design process. This framework has helped structure experts' design processes to create visualization systems, but the framework's four activities lack a breakdown into steps with a concrete example to help novices utilizing this framework in their own real-world design process. To provide students with such concrete guidelines, we created worksheets for each design activity: understand, ideate, make, and deploy. Each worksheet presents a high-level summary of the activity with actionable, guided steps for a novice designer to follow. We validated the use of this framework and the worksheets in a graduate-level visualization course taught at our university. For this evaluation, we surveyed the class and conducted 13 student interviews to garner qualitative, open-ended feedback and suggestions on the worksheets. We conclude this work with a discussion and highlight various areas for future work on improving visualization design pedagogy.



M. Mirzargar, A. Jallepalli, J.K. Ryan, R.M. Kirby. “Hexagonal Smoothness-Increasing Accuracy-Conserving Filtering,” In Journal of Scientific Computing, Vol. 73, No. 2-3, Springer Nature, pp. 1072--1093. Aug, 2017.
DOI: 10.1007/s10915-017-0517-5

ABSTRACT

Discontinuous Galerkin (DG) methods are a popular class of numerical techniques to solve partial differential equations due to their higher order of accuracy. However, the inter-element discontinuity of a DG solution hinders its utility in various applications, including visualization and feature extraction. This shortcoming can be alleviated by postprocessing of DG solutions to increase the inter-element smoothness. A class of postprocessing techniques proposed to increase the inter-element smoothness is SIAC filtering. In addition to increasing the inter-element continuity, SIAC filtering also raises the convergence rate from order k+1 to order 2k+1. Since the introduction of SIAC filtering for univariate hyperbolic equations by Cockburn et al. (Math Comput 72(242):577–606, 2003), many generalizations of SIAC filtering have been proposed. Recently, the idea of dimensionality reduction through rotation has been the focus of studies in which a univariate SIAC kernel has been used to postprocess a two-dimensional DG solution (Docampo-Sánchez et al. in Multi-dimensional filtering: reducing the dimension through rotation, 2016. arXiv preprint arXiv:1610.02317). However, the scope of theoretical development of multidimensional SIAC filters has never gone beyond the usage of tensor product multidimensional B-splines or the reduction of the filter dimension. In this paper, we define a new SIAC filter called hexagonal SIAC (HSIAC) that uses a nonseparable class of two-dimensional spline functions called hex splines. In addition to relaxing the separability assumption, the proposed HSIAC filter provides more symmetry to its tensor-product counterpart. We prove that the superconvergence property holds for a specific class of structured triangular meshes using HSIAC filtering and provide numerical results to demonstrate and validate our theoretical results.



M. Mirzargar, R.T. Whitaker, R.M. Kirby. “Exploration of Heterogeneous Data Using Robust Similarity,” In CoRR, 2017.

ABSTRACT

Heterogeneous data pose serious challenges to data analysis tasks, including exploration and visualization. Current techniques often utilize dimensionality reductions, aggregation, or conversion to numerical values to analyze heterogeneous data. However, the effectiveness of such techniques to find subtle structures such as the presence of multiple modes or detection of outliers is hindered by the challenge to find the proper subspaces or prior knowledge to reveal the structures. In this paper, we propose a generic similarity-based exploration technique that is applicable to a wide variety of datatypes and their combinations, including heterogeneous ensembles. The proposed concept of similarity has a close connection to statistical analysis and can be deployed for summarization, revealing fine structures such as the presence of multiple modes, and detection of anomalies or outliers. We then propose a visual encoding framework that enables the exploration of a heterogeneous dataset in different levels of detail and provides insightful information about both global and local structures. We demonstrate the utility of the proposed technique using various real datasets, including ensemble data.



A. Narayan, J. Jakeman, T. Zhou. “A Christoffel function weighted least squares algorithm for collocation approximations,” In Mathematics of Computation, Vol. 86, No. 306, pp. 1913--1947. 2017.
ISSN: 0025-5718, 1088-6842
DOI: 10.1090/mcom/3192

ABSTRACT

We propose, theoretically investigate, and numerically validate an algorithm for the Monte Carlo solution of least-squares polynomial approximation problems in a collocation frame- work. Our method is motivated by generalized Polynomial Chaos approximation in uncertainty quantification where a polynomial approximation is formed from a combination of orthogonal polynomials. A standard Monte Carlo approach would draw samples according to the density of orthogonality. Our proposed algorithm samples with respect to the equilibrium measure of the parametric domain, and subsequently solves a weighted least-squares problem, with weights given by evaluations of the Christoffel function. We present theoretical analysis to motivate the algorithm, and numerical results that show our method is superior to standard Monte Carlo methods in many situations of interest.



T.A.J. Ouermi, A. Knoll, R.M. Kirby, M. Berzins. “OpenMP 4 Fortran Modernization of WSM6 for KNL,” In Proceedings of the Practice and Experience in Advanced Research Computing 2017 on Sustainability, Success and Impact, PEARC17, No. 12, ACM, pp. 12:1--12:8. 2017.
ISBN: 978-1-4503-5272-7
DOI: 10.1145/3093338.3093387

ABSTRACT

Parallel code portability in the petascale era requires modifying existing codes to support new architectures with large core counts and SIMD vector units. OpenMP is a well established and increasingly supported vehicle for portable parallelization. As architectures mature and compiler OpenMP implementations evolve, best practices for code modernization change as well. In this paper, we examine the impact of newer OpenMP features (in particular OMP SIMD) on the Intel Xeon Phi Knights Landing (KNL) architecture, applied in optimizing loops in the single moment 6-class microphysics module (WSM6) in the US Navy's NEPTUNE code. We find that with functioning OMP SIMD constructs, low thread invocation overhead on KNL and reduced penalty for unaligned access compared to previous architectures, one can leverage OpenMP 4 to achieve reasonable scalability with relatively minor reorganization of a production physics code.