Taggle: Scalable Visualization of Tabular Data through Aggregation|
K. Furmanova, S. Gratzl, H. Stitz, T. Zichner, M. Jaresova, M. Ennemoser, A. Lex, M. Streit. In CoRR, 2017.
Visualization of tabular data---for both presentation and exploration purposes---is a well-researched area. Although effective visual presentations of complex tables are supported by various plotting libraries, creating such tables is a tedious process and requires scripting skills. In contrast, interactive table visualizations that are designed for exploration purposes either operate at the level of individual rows, where large parts of the table are accessible only via scrolling, or provide a high-level overview that often lacks context-preserving drill-down capabilities. In this work we present Taggle, a novel visualization technique for exploring and presenting large and complex tables that are composed of individual columns of categorical or numerical data and homogeneous matrices. The key contribution of Taggle is the hierarchical aggregation of data subsets, for which the user can also choose suitable visual representations.The aggregation strategy is complemented by the ability to sort hierarchically such that groups of items can be flexibly defined by combining categorical stratifications and by rich data selection and filtering capabilities. We demonstrate the usefulness of Taggle for interactive analysis and presentation of complex genomics data for the purpose of drug discovery.
|Reducing network congestion and synchronization overhead during aggregation of hierarchical data,
S. Kumar, D. Hoang, S. Petruzza, J. Edwards, V. Pascucci. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC), IEEE, Dec, 2017.
Hierarchical data representations have been shown to be effective tools for coping with large-scale scientific data. Writing hierarchical data on supercomputers, however, is challenging as it often involves all-to-one communication during aggregation of low-resolution data which tends to span the entire network domain, resulting in several bottlenecks. We introduce the concept of indexing templates, which succinctly describe data organization and can be used to alter movement of data in beneficial ways. We present two techniques, domain partitioning and localized aggregation, that leverage indexing templates to alleviate congestion and synchronization overheads during data aggregation. We report experimental results that show significant I/O speedup using our proposed schemes on two of today's fastest supercomputers, Mira and Shaheen II, using the Uintah and S3D simulation frameworks.
Vietoris-Rips and Cech Complexes of Metric Gluings|
M. Adamaszek, H. Adams, E. Gasparovic, M. Gommel, E. Purvine, R. Sazdanovic, B. Wang, Y. Wang, L. Ziegelmeier. In CoRR, 2017.
We study Vietoris-Rips and Cech complexes of metric wedge sums and metric gluings. We show that the Vietoris-Rips (resp. Cech) complex of a wedge sum, equipped with a natural metric, is homotopy equivalent to the wedge sum of the Vietoris-Rips (resp. Cech) complexes. We also provide generalizations for certain metric gluings, i.e. when two metric spaces are glued together along a common isometric subset. As our main example, we deduce the homotopy type of the Vietoris-Rips complex of two metric graphs glued together along a sufficiently short path. As a result, we can describe the persistent homology, in all homological dimensions, of the Vietoris-Rips complexes of a wide class of metric graphs.
Sheaf-Theoretic Stratification Learning|
A. Brown, B. Wang. In CoRR, 2017.
In this paper, we investigate a sheaf-theoretic interpretation of stratification learning. Motivated by the work of Alexandroff (1937) and McCord (1978), we aim to redirect efforts in the computational topology of triangulated compact polyhedra to the much more computable realm of sheaves on partially ordered sets. Our main result is the construction of stratification learning algorithms framed in terms of a sheaf on a partially ordered set with the Alexandroff topology. We prove that the resulting decomposition is the unique minimal stratification for which the strata are homogeneous and the given sheaf is constructible. In particular, when we choose to work with the local homology sheaf, our algorithm gives an alternative to the local homology transfer algorithm given in Bendich et al. (2012), and the cohomology stratification algorithm given in Nanda (2017). We envision that our sheaf-theoretic algorithm could give rise to a larger class of stratification beyond homology-based stratification. This approach also points toward future applications of sheaf theory in the study of topological data analysis by illustrating the utility of the language of sheaf theory in generalizing existing algorithms.
Interactive Visual Exploration And Refinement Of Cluster Assignments|
M. Kern, A. Lex, N. Gehlenborg, C. R. Johnson. In BMC Bioinformatics, Cold Spring Harbor Laboratory, April, 2017.
Massively Parallel Simulations of Spread of Infectious Diseases over Realistic Social Networks|
A. Bhatele, J. Yeom, N. Jain, C. J. Kuhlman, Y. Livnat, K. R. Bisset, L. V. Kale, M. V. Marathe. In 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID), May, 2017.
Controlling the spread of infectious diseases in large populations is an important societal challenge. Mathematically, the problem is best captured as a certain class of reaction-diffusion processes (referred to as contagion processes) over appropriate synthesized interaction networks. Agent-based models have been successfully used in the recent past to study such contagion processes. We describe EpiSimdemics, a highly scalable, parallel code written in Charm++ that uses agent-based modeling to simulate disease spreads over large, realistic, co-evolving interaction networks. We present a new parallel implementation of EpiSimdemics that achieves unprecedented strong and weak scaling on different architectures — Blue Waters, Cori and Mira. EpiSimdemics achieves five times greater speedup than the second fastest parallel code in this field. This unprecedented scaling is an important step to support the long term vision of real-time epidemic science. Finally, we demonstrate the capabilities of EpiSimdemics by simulating the spread of influenza over a realistic synthetic social contact network spanning the continental United States (∼280 million nodes and 5.8 billion social contacts).
A Virtual Reality Visualization Tool for Neuron Tracing|
W. Usher, P. Klacansky, F. Federer, P. T. Bremer, A. Knoll, J. Yarch, A. Angelucci, V. Pascucci. In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2017.
Tracing neurons in large-scale microscopy data is crucial to establishing a wiring diagram of the brain, which is needed to understand how neural circuits in the brain process information and generate behavior. Automatic techniques often fail for large and complex datasets, and connectomics researchers may spend weeks or months manually tracing neurons using 2D image stacks. We present a design study of a new virtual reality (VR) system, developed in collaboration with trained neuroanatomists, to trace neurons in microscope scans of the visual cortex of primates. We hypothesize that using consumer-grade VR technology to interact with neurons directly in 3D will help neuroscientists better resolve complex cases and enable them to trace neurons faster and with less physical and mental strain. We discuss both the design process and technical challenges in developing an interactive system to navigate and manipulate terabyte-sized image volumes in VR. Using a number of different datasets, we demonstrate that, compared to widely used commercial software, consumer-grade VR presents a promising alternative for scientists.
Progressive CPU Volume Rendering with Sample Accumulation|
W. Usher, J. Amstutz, C. Brownlee, A. Knoll, I. Wald . In Eurographics Symposium on Parallel Graphics and Visualization, Edited by Alexandru Telea and Janine Bennett, The Eurographics Association, 2017.
We present a new method for progressive volume rendering by accumulating object-space samples over successively rendered frames. Existing methods for progressive refinement either use image space methods or average pixels over frames, which can blur features or integrate incorrectly with respect to depth. Our approach stores samples along each ray, accumulates new samples each frame into a buffer, and progressively interleaves and integrates these samples. Though this process requires additional memory, it ensures interactivity and is well suited for CPU architectures with large memory and cache. This approach also extends well to distributed rendering in cluster environments. We implement this technique in Intel's open source OSPRay CPU ray tracing framework and demonstrate that it is particularly useful for rendering volumetric data with costly sampling functions.
Pathways for Theoretical Advances in Visualization|
M. Chen, G. Grinstein, C. R. Johnson, J. Kennedy, M. Tory. In IEEE Computer Graphics and Applications, IEEE, pp. 103--112. July, 2017.
More than a decade ago, Chris Johnson proposed the "Theory of Visualization" as one of the top research problems in visualization. Since then, there have been several theory-focused events, including three workshops and three panels at IEEE Visualization (VIS) Conferences. Together, these events have produced a set of convincing arguments.
FluoRender: joint freehand segmentation and visualization for many-channel fluorescence data analysis|
Y. Wan, H. Otsuna, H. A. Holman, B. Bagley, M. Ito, A. K. Lewis, M. Colasanto, G. Kardon, K. Ito, C. Hansen. In BMC Bioinformatics, Vol. 18, No. 1, Springer Nature, May, 2017.
Uncertainty Footprint: Visualization of Nonuniform Behavior of Iterative Algorithms Applied to 4D Cell Tracking|
Y. Wan, C. Hansen. In Computer Graphics Forum, Wiley, 2017.
Research on microscopy data from developing biological samples usually requires tracking individual cells over time. When cells are three-dimensionally and densely packed in a time-dependent scan of volumes, tracking results can become unreliable and uncertain. Not only are cell segmentation results often inaccurate to start with, but it also lacks a simple method to evaluate the tracking outcome. Previous cell tracking methods have been validated against benchmark data from real scans or artificial data, whose ground truth results are established by manual work or simulation. However, the wide variety of real-world data makes an exhaustive validation impossible. Established cell tracking tools often fail on new data, whose issues are also difficult to diagnose with only manual examinations. Therefore, data-independent tracking evaluation methods are desired for an explosion of microscopy data with increasing scale and resolution. In this paper, we propose the uncertainty footprint, an uncertainty quantification and visualization technique that examines nonuniformity at local convergence for an iterative evaluation process on a spatial domain supported by partially overlapping bases. We demonstrate that the patterns revealed by the uncertainty footprint indicate data processing quality in two algorithms from a typical cell tracking workflow – cell identification and association. A detailed analysis of the patterns further allows us to diagnose issues and design methods for improvements. A 4D cell tracking workflow equipped with the uncertainty footprint is capable of self diagnosis and correction for a higher accuracy than previous methods whose evaluation is limited by manual examinations.
Driving Interactive Graph Exploration Using 0-Dimensional Persistent Homology Features|
A. Suh, M. Hajij, B. Wang, C. Scheidegger, P. Rosen. In CoRR, 2017.
Graphs are commonly used to encode relationships among entities, yet, their abstractness makes them incredibly difficult to analyze. Node-link diagrams are a popular method for drawing graphs. Classical techniques for the node-link diagrams include various layout methods that rely on derived information to position points, which often lack interactive exploration functionalities; and force-directed layouts, which ignore global structures of the graph. This paper addresses the graph drawing challenge by leveraging topological features of a graph as derived information for interactive graph drawing. We first discuss extracting topological features from a graph using persistent homology. We then introduce an interactive persistence barcodes to study the substructures of a force-directed graph layout; in particular, we add contracting and repulsing forces guided by the 0-dimensional persistent homology features. Finally, we demonstrate the utility of our approach across three datasets.
State of the Art in Transfer Functions for Direct Volume Rendering|
P. Ljung, J. Krüger, E. Gröller, M. Hadwiger, C. D. Hansen,, A. Ynnerman. In Computer Graphics Forum, Vol. 35, No. 3, Wiley-Blackwell, pp. 669--691. June, 2016.
A central topic in scientific visualization is the transfer function (TF) for volume rendering. The TF serves a fundamental role in translating scalar and multivariate data into color and opacity to express and reveal the relevant features present in the data studied. Beyond this core functionality, TFs also serve as a tool for encoding and utilizing domain knowledge and as an expression for visual design of material appearances. TFs also enable interactive volumetric exploration of complex data. The purpose of this state-of-the-art report (STAR) is to provide an overview of research into the various aspects of TFs, which lead to interpretation of the underlying data through the use of meaningful visual representations. The STAR classifies TF research into the following aspects: dimensionality, derived attributes, aggregated attributes, rendering aspects, automation, and user interfaces. The STAR concludes with some interesting research challenges that form the basis of an agenda for the development of next generation TF tools and methodologies.
VTK-m: Accelerating the Visualization Toolkit for Massively Threaded Architectures|
K. Moreland, C. Sewell, W. Usher, L. Lo, J. Meredith, D. Pugmire, J. Kress, H. Schroots, K. Ma, H. Childs, M. Larsen, C. Chen, R. Maynard, B. Geveci. In IEEE Computer Graphics and Applications, Vol. 36, No. 3, pp. 48--58. May, 2016.
Traditional scientific visualization software approaches do not fare well in massively threaded environments. To address the needs of the high-performance computing community, the VTK-m framework fills the gaps in functionality by bringing together the most recent research.
Resonant Laboratory and Candela: Spreading Your Visualization Ideas to the Masses|
A. Bigelow, R. Choudhury, J. Baumes. In Proceedings of Workshop on Visualization in Practice (VIP '16), Note: Best Paper Award , 2016.
Visualization practitioners are constantly developing new, innovative ways to visualize data, but much of the software that practitioners produce does not make it into production in professional systems. To solve this problem, we have developed and informally tested two open source systems. The first, Candela, is a framework and API for creating visualization components for the web that can wrap up new or existing visualizations as needed. Because Candela's API generalizes the inputs to a visualization, we have also developed a system called Resonant Laboratory that makes it possible for novice users to connect arbitrary datasets to Candela visualizations. Together, these systems enable novice users to explore and share their data with the growing library of state-of-the-art visualization techniques.
Pathfinder: Visual Analysis of Paths in Graphs|
C. Partl, S. Gratzl, M. Streit, A. Wassermann, H. Pfister, D. Schmalstieg, A. Lex. In Computer Graphics Forum (EuroVis '16), Vol. 35, No. 3, pp. 71-80. jun, 2016.
The analysis of paths in graphs is highly relevant in many domains. Typically, path-related tasks are performed in node-link layouts. Unfortunately, graph layouts often do not scale to the size of many real world networks. Also, many networks are multivariate, i.e., contain rich attribute sets associated with the nodes and edges. These attributes are often critical in judging paths, but directly visualizing attributes in a graph layout exacerbates the scalability problem. In this paper, we present visual analysis solutions dedicated to path-related tasks in large and highly multivariate graphs. We show that by focusing on paths, we can address the scalability problem of multivariate graph visualization, equipping analysts with a powerful tool to explore large graphs. We introduce Pathfinder, a technique that provides visual methods to query paths, while considering various constraints. The resulting set of paths is visualized in both a ranked list and as a node-link diagram. For the paths in the list, we display rich attribute data associated with nodes and edges, and the node-link diagram provides topological context. The paths can be ranked based on topological properties, such as path length or average node degree, and scores derived from attribute data. Pathfinder is designed to scale to graphs with tens of thousands of nodes and edges by employing strategies such as incremental query results. We demonstrate Pathfinder's fitness for use in scenarios with data from a coauthor network and biological pathways.
From Visual Exploration to Storytelling and Back Again|
Samuel Gratzl, Alexander Lex, Nils Gehlenborg, Nicola Cosgrove, Marc Streit . In Computer Graphics Forum, Vol. 35, No. 3, pp. 491--500. jun, 2016.
The primary goal of visual data exploration tools is to enable the discovery of new insights. To justify and reproduce insights, the discovery process needs to be documented and communicated. A common approach to documenting and presenting findings is to capture visualizations as images or videos. Images, however, are insufficient for telling the story of a visual discovery, as they lack full provenance information and context. Videos are difficult to produce and edit, particularly due to the non-linear nature of the exploratory process. Most importantly, however, neither approach provides the opportunity to return to any point in the exploration in order to review the state of the visualization in detail or to conduct additional analyses. In this paper we present CLUE (Capture, Label, Understand, Explain), a model that tightly integrates data exploration and presentation of discoveries. Based on provenance data captured during the exploration process, users can extract key steps, add annotations, and author "Vistories", visual stories based on the history of the exploration. These Vistories can be shared for others to view, but also to retrace and extend the original analysis. We discuss how the CLUE approach can be integrated into visualization tools and provide a prototype implementation. Finally, we demonstrate the general applicability of the model in two usage scenarios: a Gapminder-inspired visualization to explore public health data and an example from molecular biology that illustrates how Vistories could be used in scientific journals.
Embedded Domain-Specific Language and Runtime System for Progressive Spatiotemporal Data Analysis and Visualization|
C. Christensen, S. Liu, G. Scorzelli, J. Lee, P.-T. Bremer, V. Pascucci. In Symposium on Large Data Analysis and Visualization, IEEE, 2016.
As our ability to generate large and complex datasets grows, accessing and processing these massive data collections is increasingly the primary bottleneck in scientific analysis. Challenges include retrieving, converting, resampling, and combining remote and often disparately located data ensembles with only limited support from existing tools. In particular, existing solutions rely predominantly on extensive data transfers or large-scale remote computing resources, both of which are inherently offline processes with long delays and substantial repercussions for any mistakes. Such workflows severely limit the flexible exploration and rapid evaluation of new hypotheses that are crucial to the scientific process and thereby impede scientific discovery. Here we present an embedded domain-specific language (EDSL) specifically designed for the interactive exploration of largescale, remote data. Our EDSL allows users to express a wide range of data analysis operations in a simple and abstract manner. The underlying runtime system transparently resolves issues such as remote data access and resampling while at the same time maintaining interactivity through progressive and interruptible computation. This system enables, for the first time, interactive remote exploration of massive datasets such as the 7km NASA GEOS-5 Nature Run simulation, which previously have been analyzed only offline or at reduced resolution.
Visualization for Understanding Uncertainty in Activation Volumes for Deep Brain Stimulation|
B. Hollister, G. Duffley, C. Butson,, C.R. Johnson. In Eurographics Conference on Visualization, Edited by K.L. Ma G. Santucci, and J. van Wijk, 2016.
We have created the Neurostimulation Uncertainty Viewer (nuView or nView) tool for exploring data arising from deep brain stimulation (DBS). Simulated volume of tissue activated (VTA), using clinical electrode placements, are recorded along withpatient outcomes in the Unified Parkinson's disease rating scale (UPDRS). The data is volumetric and sparse, with multi-value patient results for each activated voxel in the simulation. nView provides a collection of visual methods to explore the activated tissue to enhance understanding of electrode usage for improved therapy with DBS.
TOD-Tree: Task-Overlapped Direct send Tree Image Compositing for Hybrid MPI Parallelism and GPUs|
A. V. P. Grosset, M. Prasad, C. Christensen, A. Knoll, C. Hansen. In IEEE Transactions on Visualization and Computer Graphics, IEEE, pp. 1--1. 2016.
Modern supercomputers have thousands of nodes, each with CPUs and/or GPUs capable of several teraflops. However, the network connecting these nodes is relatively slow, on the order of gigabits per second. For time-critical workloads such as interactive visualization, the bottleneck is no longer computation but communication. In this paper, we present an image compositing algorithm that works on both CPU-only and GPU-accelerated supercomputers and focuses on communication avoidance and overlapping communication with computation at the expense of evenly balancing the workload. The algorithm has three stages: a parallel direct send stage, followed by a tree compositing stage and a gather stage. We compare our algorithm with radix-k and binary-swap from the IceT library in a hybrid OpenMP/MPI setting on the Stampede and Edison supercomputers, show strong scaling results and explain how we generally achieve better performance than these two algorithms. We developed a GPU-based image compositing algorithm where we use CUDA kernels for computation and GPU Direct RDMA for inter-node GPU communication. We tested the algorithm on the Piz Daint GPU-accelerated supercomputer and show that we achieve performance on par with CPUs. Lastly, we introduce a workflow in which both rendering and compositing are done on the GPU.