## Martin BerzinsParallel ComputingGPUs |
## Mike KirbyFinite Element MethodsUncertainty Quantification GPUs |
## Valerio PascucciScientific Data Management |
## Chris JohnsonProblem Solving Environments |

## Ross WhitakerGPUs |
## Chuck HansenGPUs |

Efficient Implementation of Smoothness-Increasing Accuracy-Conserving (SIAC) Filters for Discontinuous Galerkin SolutionsH. Mirzaee, J.K. Ryan, R.M. Kirby. In Journal of Scientific Computing, pp. (in press). 2011. DOI: 10.1007/s10915-011-9535-x The discontinuous Galerkin (DG) methods provide a high-order extension of the finite volume method in much the same way as high-order or spectral |

To CG or to HDG: A Comparative StudyR.M. Kirby, B. Cockburn, S.J. Sherwin. In Journal of Scientific Computing, Note: published online, 2011. DOI: 10.1007/s10915-011-9501-7 Hybridization through the border of the elements (hybrid unknowns) combined with a Schur complement procedure (often called We demonstrate that the HDG approach generates a global trace space system for the unknown that although larger in rank than the traditional static condensation system in CG, has significantly smaller bandwidth at moderate polynomial orders. We show that if one ignores set-up costs, above approximately fourth-degree polynomial expansions on triangles and quadrilaterals the HDG method can be made to be as efficient as the CG approach, making it competitive for time-dependent problems even before taking into consideration other properties of DG schemes such as their superconvergence properties and their ability to handle |

Formal Specification of MPI 2.0: Case Study in Specifying a Practical Concurrent Programming APIG. Li, R. Palmer, M. DeLisi, G. Gopalakrishnan, R.M. Kirby. In Science of Computer Programming, Vol. 76, pp. 65--81. 2011. DOI: 10.1016/j.scico.2010.03.007 We describe the first formal specification of a non-trivial subset of MPI, the dominant communication API in high performance computing. Engineering a formal specification for a non-trivial concurrency API requires the right combination of rigor, executability, and traceability, while also serving as a smooth elaboration of a pre-existing informal specification. It also requires the modularization of reusable specification components to keep the length of the specification in check. Long-lived APIs such as MPI are not usually 'textbook minimalistic' because they support a diverse array of applications, a diverse community of users, and have efficient implementations over decades of computing hardware. We choose the TLA+ notation to write our specifications, and describe how we organized the specification of around 200 of the 300 MPI 2.0 functions. We detail a handful of these functions in this paper, and assess our specification with respect to the aforementioned requirements. We close with a description of possible approaches that may help render the act of writing, understanding, and validating the specifications of concurrency APIs much more productive. |

Direct Isosurface Visualization of Hex-Based High-Order Geometry and Attribute RepresentationsT. Martin, E. Cohen, R.M. Kirby. In IEEE Transactions on Visualization and Computer Graphics (TVCG), Vol. PP, No. 99, pp. 1--14. 2011. ISSN: 1077-2626 DOI: 10.1109/TVCG.2011.103 In this paper, we present a novel isosurface visualization technique that guarantees the accuarate visualization of isosurfaces with complex attribute data defined on (un-)structured (curvi-)linear hexahedral grids. Isosurfaces of high-order hexahedralbased finite element solutions on both uniform grids (including MRI and CT scans) and more complex geometry represent a domain of interest that can be rendered using our algorithm. Additionally, our technique can be used to directly visualize solutions and attributes in isogeometric analysis, an area based on trivariate high-order NURBS (Non-Uniform Rational B-splines) geometry and attribute representations for the analysis. Furthermore, our technique can be used to visualize isosurfaces of algebraic functions. Our approach combines subdivision and numerical root-finding to form a robust and efficient isosurface visualization algorithm that does not miss surface features, while finding all intersections between a view frustum and desired isosurfaces. This allows the use of view-independent transparency in the rendering process. We demonstrate our technique through a straightforward CPU implementation on both complexstructured and complex-unstructured geometry with high-order simulation solutions, isosurfaces of medical data sets, and isosurfaces of algebraic functions. |

Finite element implementation of mechanochemical phenomena in neutral deformable porous media under finite deformationG.A. Ateshian, M.B. Albro, S.A. Maas, J.A. Weiss. In Journal of Biomechanical Engineering, Vol. 133, No. 8, 2011. DOI: 10.1115/1.4004810 Biological soft tissues and cells may be subjected to mechanical as well as chemical (osmotic) loading under their natural physiological environment or various experimental conditions. The interaction of mechanical and chemical effects may be very significant under some of these conditions, yet the highly nonlinear nature of the set of governing equations describing these mechanisms poses a challenge for the modeling of such phenomena. This study formulated and implemented a finite element algorithm for analyzing mechanochemical events in neutral deformable porous media under finite deformation. The algorithm employed the framework of mixture theory to model the porous permeable solid matrix and interstitial fluid, where the fluid consists of a mixture of solvent and solute. A special emphasis was placed on solute-solid matrix interactions, such as solute exclusion from a fraction of the matrix pore space (solubility) and frictional momentum exchange that produces solute hindrance and pumping under certain dynamic loading conditions. The finite element formulation implemented full coupling of mechanical and chemical effects, providing a framework where material properties and response functions may depend on solid matrix strain as well as solute concentration. The implementation was validated using selected canonical problems for which analytical or alternative numerical solutions exist. This finite element code includes a number of unique features that enhance the modeling of mechanochemical phenomena in biological tissues. The code is available in the public domain, open source finite element program FEBio (http://mrl.sci.utah.edu/software). [DOI: 10.1115/1.4004810] |

Finding consistent strain distributions in the glenohumeral capsule between two subjects: Implications for development of physical examinationsN.J. Drury, B.J. Ellis, J.A. Weiss, P.J. McMahon, R.E. Debski. In Journal of Biomechanics, Vol. 44, No. 4, pp. 607-613. February, 2011. DOI: 10.1016/j.jbiomech.2010.11.018 The anterior-inferior glenohumeral capsule is the primary passive stabilizer to the glenohumeral joint during anterior dislocation. Physical examinations following dislocation are crucial for proper diagnosis of capsule pathology; however,they are not standardized for joint position which may lead to misdiagnoses and poor outcomes. To suggest joint positions for physical examinations where the stability provided by the capsule may be consistent among patients, the objective of this study was to evaluate the distribution of maximum principal strain on the anterior-inferior capsule using two validated subject-specific finite element models of the glenohumeral joint at clinically relevant joint positions. The joint positions with 25 N anterior load applied at 60° of glenohumeral abduction and 10°, 20°, 30° and 40° of external rotation resulted in distributions of strain that were similar between shoulders(r |

The capsule's contribution to total hip construct stability - a finite element analysisJ.M. Elkins, J.S. Stroud, M.J. Rudert, Y. Tochigi, D.R. Pedersen, B.J. Ellis, J.J. Callaghan, J.A. Weiss, T.D. Brown. In Journal of Orthopedic Research, Vol. 29, No. 11, Note: William Harris, MD Award, pp. 1642--1648. November, 2011. DOI: 10.1002/jor.21435 Instability is a significant concern in total hip arthroplasty (THA), particularly when there is structural compromise of the capsule due to pre-existing pathology or due to necessities of surgical approach. An experimentally grounded fiber-direction-based finite element model of the hip capsule was developed, and was integrated with an established three-dimensional model of impingement/dislocation. Model validity was established by close similarity to results from a cadaveric experiment in a servohydraulic hip simulator. Parametric computational runs explored effects of graded levels of capsule thickness, of regional detachment from the capsule's femoral or acetabular insertions, of surgical incisions of capsule substance, and of capsule defect repairs. Depending strongly upon the specific site, localized capsule defects caused varying degrees of construct stability compromise, with several specific situations involving over 60\% decrement in dislocation resistance. Construct stability was returned substantially toward intact-capsule levels following well-conceived repairs, although the suture sites involved were often at substantial risk of failure. These parametric model results underscore the importance of retaining or robustly repairing capsular structures in THA, in order to maximize overall construct stability. © 2011 Orthopaedic Research Society. Published by Wiley Periodicals, Inc. J Orthop Res 29:1642–1648, 2011 |

A fast iterative method for solving the Eikonal equation on triangulated surfacesZ. Fu, W.-K. Jeong, Y. Pan, R.M. Kirby, R.T. Whitaker. In SIAM Journal of Scientific Computing, Vol. 33, No. 5, pp. 2468--2488. 2011. DOI: 10.1137/100788951 PubMed Central ID: PMC3360588 This paper presents an efficient, fine-grained parallel algorithm for solving the Eikonal equation on triangular meshes. The Eikonal equation, and the broader class of Hamilton–Jacobi equations to which it belongs, have a wide range of applications from geometric optics and seismology to biological modeling and analysis of geometry and images. The ability to solve such equations accurately and efficiently provides new capabilities for exploring and visualizing parameter spaces and for solving inverse problems that rely on such equations in the forward model. Efficient solvers on state-of-the-art, parallel architectures require new algorithms that are not, in many cases, optimal, but are better suited to synchronous updates of the solution. In previous work [W. K. Jeong and R. T. Whitaker, SIAM J. Sci. Comput., 30 (2008), pp. 2512–2534], the authors proposed the fast iterative method (FIM) to efficiently solve the Eikonal equation on regular grids. In this paper we extend the fast iterative method to solve Eikonal equations efficiently on triangulated domains on the CPU and on parallel architectures, including graphics processors. We propose a new local update scheme that provides solutions of first-order accuracy for both architectures. We also propose a novel triangle-based update scheme and its corresponding data structure for efficient irregular data mapping to parallel single-instruction multiple-data (SIMD) processors. We provide detailed descriptions of the implementations on a single CPU, a multicore CPU with shared memory, and SIMD architectures with comparative results against state-of-the-art Eikonal solvers. |

Interpreting Performance Data Across Intuitive DomainsM. Schulz, J.A. Levine, P.-T. Bremer, T. Gamblin, V. Pascucci. In International Conference on Parallel Processing, Taipei, Taiwan, IEEE, pp. 206--215. 2011. DOI: 10.1109/ICPP.2011.60 |

A Toolkit for Forward/Inverse Problems in Electrocardiography within the SCIRun Problem Solving EnvironmentB.M. Burton, J.D. Tate, B. Erem, D.J. Swenson, D.F. Wang, D.H. Brooks, P.M. van Dam, R.S. MacLeod. In Proceedings of the 2011 IEEE Int. Conf. Engineering and Biology Society (EMBC), pp. 267--270. 2011. DOI: 10.1109/IEMBS.2011.6090052 PubMed ID: 22254301 PubMed Central ID: PMC3337752 Computational modeling in electrocardiography often requires the examination of cardiac forward and inverse problems in order to non-invasively analyze physiological events that are otherwise inaccessible or unethical to explore. The study of these models can be performed in the open-source SCIRun problem solving environment developed at the Center for Integrative Biomedical Computing (CIBC). A new toolkit within SCIRun provides researchers with essential frameworks for constructing and manipulating electrocardiographic forward and inverse models in a highly efficient and interactive way. The toolkit contains sample networks, tutorials and documentation which direct users through SCIRun-specific approaches in the assembly and execution of these specific problems. |

Morse Set Classification and Hierarchical Refinement using Conley IndexGuoning Chen, Qingqing Deng, Andrzej Szymczak, Robert S. Laramee, and Eugene Zhang. In IEEE Transactions on Visualization and Computer Graphics (TVCG), Vol. 18, No. 5, pp. 767--782. June, 2011. DOI: 10.1109/TVCG.2011.107 PubMed ID: 21690641 Morse decomposition provides a numerically stable topological representation of vector fields that is crucial for their rigorous interpretation. However, Morse decomposition is not unique, and its granularity directly impacts its computational cost. In this paper, we propose an automatic refinement scheme to construct the Morse Connection Graph (MCG) of a given vector field in a hierarchical fashion. Our framework allows a Morse set to be refined through a local update of the flow combinatorialization graph, as well as the connection regions between Morse sets. The computation is fast because the most expensive computation is concentrated on a small portion of the domain. Furthermore, the present work allows the generation of a topologically consistent hierarchy of MCGs, which cannot be obtained using a global method. The classification of the extracted Morse sets is a crucial step for the construction of the MCG, for which the Poincaré index is inadequate. We make use of an upper bound for the Conley index, provided by the Betti numbers of an index pair for a translation along the flow, to classify the Morse sets. This upper bound is sufficiently accurate for Morse set classification and provides supportive information for the automatic refinement process. An improved visualization technique for MCG is developed to incorporate the Conley indices. Finally, we apply the proposed techniques to a number of synthetic and real-world simulation data to demonstrate their utility. |