R. Bhalodia, S. Elhabian, L. Kavan, R. Whitaker. Leveraging Unsupervised Image Registration for Discovery of Landmark Shape Descriptor, In Medical Image Analysis, Elsevier, pp. 102157. 2021.
In current biological and medical research, statistical shape modeling (SSM) provides an essential framework for the characterization of anatomy/morphology. Such analysis is often driven by the identification of a relatively small number of geometrically consistent features found across the samples of a population. These features can subsequently provide information about the population shape variation. Dense correspondence models can provide ease of computation and yield an interpretable low-dimensional shape descriptor when followed by dimensionality reduction. However, automatic methods for obtaining such correspondences usually require image segmentation followed by significant preprocessing, which is taxing in terms of both computation as well as human resources. In many cases, the segmentation and subsequent processing require manual guidance and anatomy specific domain expertise. This paper proposes a self-supervised deep learning approach for discovering landmarks from images that can directly be used as a shape descriptor for subsequent analysis. We use landmark-driven image registration as the primary task to force the neural network to discover landmarks that register the images well. We also propose a regularization term that allows for robust optimization of the neural network and ensures that the landmarks uniformly span the image domain. The proposed method circumvents segmentation and preprocessing and directly produces a usable shape descriptor using just 2D or 3D images. In addition, we also propose two variants on the training loss function that allows for prior shape information to be integrated into the model. We apply this framework on several 2D and 3D datasets to obtain their shape descriptors. We analyze these shape descriptors in their efficacy of capturing shape information by performing different shape-driven applications depending on the data ranging from shape clustering to severity prediction to outcome diagnosis.
R. Bhalodia, S. Elhabian, J. Adams, W. Tao, L. Kavan, R. Whitaker. DeepSSM: A Blueprint for Image-to-Shape Deep Learning Models, Subtitled arXiv preprint arXiv:2110.07152, 2021.
Statistical shape modeling (SSM) characterizes anatomical variations in a population of shapes generated from medical images. SSM requires consistent shape representation across samples in shape cohort. Establishing this representation entails a processing pipeline that includes anatomy segmentation, re-sampling, registration, and non-linear optimization. These shape representations are then used to extract low-dimensional shape descriptors that facilitate subsequent analyses in different applications. However, the current process of obtaining these shape descriptors from imaging data relies on human and computational resources, requiring domain expertise for segmenting anatomies of interest. Moreover, this same taxing pipeline needs to be repeated to infer shape descriptors for new image data using a pre-trained/existing shape model. Here, we propose DeepSSM, a deep learning-based framework for learning the functional mapping from images to low-dimensional shape descriptors and their associated shape representations, thereby inferring statistical representation of anatomy directly from 3D images. Once trained using an existing shape model, DeepSSM circumvents the heavy and manual pre-processing and segmentation and significantly improves the computational time, making it a viable solution for fully end-to-end SSM applications. In addition, we introduce a model-based data-augmentation strategy to address data scarcity. Finally, this paper presents and analyzes two different architectural variants of DeepSSM with different loss functions using three medical datasets and their downstream clinical application. Experiments showcase that DeepSSM performs comparably or better to the state-of-the-art SSM both quantitatively and on application-driven downstream tasks. Therefore, DeepSSM aims to provide a comprehensive blueprint for deep learning-based image-to-shape models.
H. Bhatia, S. N. Petruzza, R. Anirudh, A. G. Gyulassy, R. M. Kirby, V. Pascucci, P. T. Bremer. Data-Driven Estimation of Temporal-Sampling Errors in Unsteady Flows, 2021.
While computer simulations typically store data at the highest available spatial resolution, it is often infeasible to do so for the temporal dimension. Instead, the common practice is to store data at regular intervals, the frequency of which is strictly limited by the available storage and I/O bandwidth. However, this manner of temporal subsampling can cause significant errors in subsequent analysis steps. More importantly, since the intermediate data is lost, there is no direct way of measuring this error after the fact. One particularly important use case that is affected is the analysis of unsteady flows using pathlines, as it depends on an accurate interpolation across time. Although the potential problem with temporal undersampling is widely acknowledged, there currently does not exist a practical way to estimate the potential impact. This paper presents a simple-to-implement yet powerful technique to estimate the error in pathlines due to temporal subsampling. Given an unsteady flow, we compute pathlines at the given temporal resolution as well as subsamples thereof. We then compute the error induced due to various levels of subsampling and use it to estimate the error between the given resolution and the unknown ground truth. Using two turbulent flows, we demonstrate that our approach, for the first time, provides an accurate, a posteriori error estimate for pathline computations. This estimate will enable scientists to better understand the uncertainties involved in pathline-based analysis techniques and can lead to new uncertainty visualization approaches using the predicted errors.
H. Bhatia, D. Hoang, N. Morrical, V. Pascucci, P.T. Bremer, P. Lindstrom. AMM: Adaptive Multilinear Meshes, Subtitled arXiv:2007.15219, 2021.
Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g., through compression, or adapting data resolution, e.g., using spatial hierarchies. Recent research suggests that combining the two approaches, i.e., adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes
S. R. Black, A. Janson, M. Mahan, J. Anderson, C. R. Butson. Identification of Deep Brain Stimulation Targets for Neuropathic Pain After Spinal Cord Injury Using Localized Increases in White Matter Fiber Cross‐Section, In Neuromodulation: Technology at the Neural Interface, John Wiley & Sons, Inc., 2021.
The spinal cord injury (SCI) patient population is overwhelmingly affected by neuropathic pain (NP), a secondary condition for which therapeutic options are limited and have a low degree of efficacy. The objective of this study was to identify novel deep brain stimulation (DBS) targets that may theoretically benefit those with NP in the SCI patient population. We hypothesize that localized changes in white matter identified in SCI subjects with NP compared to those without NP could be used to develop an evidence‐based approach to DBS target identification.
K. M. Campbell, H. Dai, Z. Su, M. Bauer, P. T. Fletcher, S. C. Joshi. Structural Connectome Atlas Construction in the Space of Riemannian Metrics, Subtitled arXiv, 2021.
The structural connectome is often represented by fiber bundles generated from various types of tractography. We propose a method of analyzing connectomes by representing them as a Riemannian metric, thereby viewing them as points in an infinite-dimensional manifold. After equipping this space with a natural metric structure, the Ebin metric, weapply object-oriented statistical analysis to define an atlas as the Fŕechet mean of a population of Riemannian metrics. We demonstrate connectome registration and atlas formation using connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project.
3D Model of Cell Migration and Proliferation in a Tissue Scaffold, In Biophysical Journal, Vol. 120, No. 3, Elsevier, pp. 265a. 2021.S. H. Campbell, T. Bidone.
Tissue scaffolds restore tissue functionality without the limitations of transplants. However, successful tissue growth depends on the interplay between scaffold properties and cell activities. It has been previously reported that scaffold porosity and Young's modulus affect cell migration and tissue generation. However, how the geometrical and mechanical properties of a scaffold exactly interplay with cell processes remain poorly understood and are essential for successful tissue growth. We developed a 3D computational model that simulates cell migration and proliferation on a scaffold. The model generates an adjustable 3D porous scaffold environment with a defined pore size and Young modulus. Cells are treated as explicit spherical particles comparable in size to bone-marrow cells and are initially seeded randomly throughout the scaffold. Cells can create adhesions, proliferate, and independently migrate across pores in a random walk. Cell adhesions during migration follow the molecular-clutch mechanism, where traction force from the cells against the scaffold stiffness reinforces adhesions lifetime up to a threshold. We used the model to test how variations in cell proliferation rate, scaffold Young's modulus, and porosity affect cell migration speed. At a low proliferation rate (1 x 10−7 s−1), the spread of cell speeds is larger than at a high replication rate (1 x 10−6 s−1). A biphasic relation between Young's modulus and cell speed is also observed reflecting the molecular-clutch mechanism at the level of individual adhesions. These observations are consistent with previous reports regarding fibroblast migration on collagen-glycosaminoglycan scaffolds. Additionally, our model shows that similar cell diameters and pore diameter induces a crowding effect decreasing cell speed. The results from our study provide important insights about biophysical mechanisms that govern cell motility on scaffolds with different properties for tissue engineering applications.
K.M. Campbell, H. Dai, Z. Su, M. Bauer, P.T. Fletcher, S.C. Joshi. Integrated Construction of Multimodal Atlases with Structural Connectomes in the Space of Riemannian Metrics, Subtitled arXiv preprint arXiv:2109.09808, 2021.
The structural network of the brain, or structural connectome, can be represented by fiber bundles generated by a variety of tractography methods. While such methods give qualitative insights into brain structure, there is controversy over whether they can provide quantitative information, especially at the population level. In order to enable population-level statistical analysis of the structural connectome, we propose representing a connectome as a Riemannian metric, which is a point on an infinite-dimensional manifold. We equip this manifold with the Ebin metric, a natural metric structure for this space, to get a Riemannian manifold along with its associated geometric properties. We then use this Riemannian framework to apply object-oriented statistical analysis to define an atlas as the Fr\'echet mean of a population of Riemannian metrics. This formulation ties into the existing framework for diffeomorphic construction of image atlases, allowing us to construct a multimodal atlas by simultaneously integrating complementary white matter structure details from DWMRI and cortical details from T1-weighted MRI. We illustrate our framework with 2D data examples of connectome registration and atlas formation. Finally, we build an example 3D multimodal atlas using T1 images and connectomes derived from diffusion tensors estimated from a subset of subjects from the Human Connectome Project.
M. Carlson, X. Zheng, H. Sundar, G. E. Karniadakis, R. M. Kirby. An open-source parallel code for computing the spectral fractional Laplacian on 3D complex geometry domains, In Computer Physics Communications, Vol. 261, North-Holland, pp. 107695. 2021.
We present a spectral element algorithm and open-source code for computing the fractional Laplacian defined by the eigenfunction expansion on finite 2D/3D complex domains with both homogeneous and nonhomogeneous boundaries. We demonstrate the scalability of the spectral element algorithm on large clusters by constructing the fractional Laplacian based on computed eigenvalues and eigenfunctions using up to thousands of CPUs. To demonstrate the accuracy of this eigen-based approach for computing the factional Laplacian, we approximate the solutions of the fractional diffusion equation using the computed eigenvalues and eigenfunctions on a 2D quadrilateral, and on a 3D cubic and cylindrical domain, and compare the results with the contrived solutions to demonstrate fast convergence. Subsequently, we present simulation results for a fractional diffusion equation on a hand-shaped domain discretized with 3D hexahedra, as well as on a domain constructed from the Hanford site geometry corresponding to nonzero Dirichlet boundary conditions. Finally, we apply the algorithm to solve the surface quasi-geostrophic (SQG) equation on a 2D square with periodic boundaries. Simulation results demonstrate the accuracy, efficiency, and geometric flexibility of our algorithm and that our algorithm can capture the subtle dynamics of anomalous diffusion modeled by the fractional Laplacian on complex geometry domains. The included open-source code is the first of its kind.
Cell migration is essential to physiological and pathological biology. Migration is driven by the motion of a leading edge, in which actin polymerization pushes against the edge and adhesions transmit traction to the substrate while membrane tension increases. How the actin and adhesions synergistically control edge protrusion remains elusive. We addressed this question by developing a computational model in which the Brownian ratchet mechanism governs actin filament polymerization against the membrane and the molecular clutch mechanism governs adhesion to the substrate (BR-MC model). Our model predicted that actin polymerization is the most significant driver of protrusion, as actin had a greater effect on protrusion than adhesion assembly. Increasing the lifetime of nascent adhesions also enhanced velocity, but decreased the protrusion's motional persistence, because filaments maintained against the cell edge ceased polymerizing as membrane tension increased. We confirmed the model predictions with measurement of adhesion lifetime and edge motion in migrating cells. Adhesions with longer lifetime were associated with faster protrusion velocity and shorter persistence. Experimentally increasing adhesion lifetime increased velocity but decreased persistence. We propose a mechanism for actin polymerization-driven, adhesion-dependent protrusion in which balanced nascent adhesion assembly and lifetime generates protrusions with the power and persistence to drive migration.
Computational Model of E-cadherin Clustering under Cortical Tension, In Biophysical Journal, Vol. 120, No. 3, Elsevier, pp. 236a. 2021.Y. Chen, C. McNabb, T. Bidone.
E-cadherins are adhesion proteins that play a critical role in the formation of cell-cell junctions for several physiological processes, including tissue development and homeostasis. The formation of E-cadherin clusters involves extracellular trans-and cis-associations between cadherin ectodomains and stabilization through intracellular coupling with the contractile actomyosin cortex. The dynamic remodeling of cell-cell junctions largely depends on cortical tension, but previous modeling frameworks did not incorporate this effect. In order to gain insights into the effects of cortical tension on the dynamic properties of E-cadherin clusters, here we developed a computational model based on Brownian dynamics. The model considers individual cadherins as explicit point particles undergoing cycles of lateral diffusion on two parallel surfaces that mimic the membrane of neighboring cells. E-cadherins transit between …
Y. Chen, L. Ji, A. Narayan, Z. Xu. L1-based reduced over collocation and hyper reduction for steady state and time-dependent nonlinear equations, In Journal of Scientific Computing, Vol. 87, No. 1, Springer US, pp. 1--21. 2021.
The task of repeatedly solving parametrized partial differential equations (pPDEs) in optimization, control, or interactive applications makes it imperative to design highly efficient and equally accurate surrogate models. The reduced basis method (RBM) presents itself as such an option. Accompanied by a mathematically rigorous error estimator, RBM carefully constructs a low-dimensional subspace of the parameter-induced high fidelity solution manifold on which an approximate solution is computed. It can improve efficiency by several orders of magnitudes leveraging an offline-online decomposition procedure. However this decomposition, usually implemented with aid from the empirical interpolation method (EIM) for nonlinear and/or parametric-nonaffine PDEs, can be challenging to implement, or results in severely degraded online efficiency. In this paper, we augment and extend the EIM approach as a direct solver, as opposed to an assistant, for solving nonlinear pPDEs on the reduced level. The resulting method, called Reduced Over-Collocation method (ROC), is stable and capable of avoiding efficiency degradation exhibited in traditional applications of EIM. Two critical ingredients of the scheme are collocation at about twice as many locations as the dimension of the reduced approximation space, and an efficient L1-norm-based error indicator for the strategic selection of the parameter values whose snapshots span the reduced approximation space. Together, these two ingredients ensure that the proposed L1-ROC scheme is both offline- and online-efficient. A distinctive feature is that the efficiency degradation appearing in alternative RBM approaches that utilize EIM for nonlinear and nonaffine problems is circumvented, both in the offline and online stages. Numerical tests on different families of time-dependent and steady-state nonlinear problems demonstrate the high efficiency and accuracy of L1-ROC and its superior stability performance.
J. Chilleri, Y. He, D. Bedrov, R. M. Kirby. Optimal allocation of computational resources based on Gaussian process: Application to molecular dynamics simulations, In Computational Materials Science, Vol. 188, Elsevier, pp. 110178. 2021.
Simulation models have been utilized in a wide range of real-world applications for behavior predictions of complex physical systems or material designs of large structures. While extensive simulation is mathematically preferable, external limitations such as available resources are often necessary considerations. With a fixed computational resource (i.e., total simulation time), we propose a Gaussian process-based numerical optimization framework for optimal time allocation over simulations at different locations, so that a surrogate model with uncertainty estimation can be constructed to approximate the full simulation. The proposed framework is demonstrated first via two synthetic problems, and later using a real test case of a glass-forming system with divergent dynamic relaxations where a Gaussian process is constructed to estimate the diffusivity and its uncertainty with respect to the temperature.
D. Dai, Y. Epshteyn, A. Narayan. Hyperbolicity-Preserving and Well-Balanced Stochastic Galerkin Method for Two-Dimensional Shallow Water Equations, In SIAM Journal on Scientific Computing, Vol. 43, No. 2, Society for Industrial and Applied Mathematics, pp. A929-A952. 2021.
Stochastic Galerkin formulations of the two-dimensional shallow water systems parameterized with random variables may lose hyperbolicity, and hence change the nature of the original model. In this work, we present a hyperbolicity-preserving stochastic Galerkin formulation by carefully selecting the polynomial chaos approximations to the nonlinear terms of , and in the shallow water equations. We derive a sufficient condition to preserve the hyperbolicity of the stochastic Galerkin system which requires only a finite collection of positivity conditions on the stochastic water height at selected quadrature points in parameter space. Based on our theoretical results for the stochastic Galerkin formulation, we develop a corresponding well-balanced hyperbolicity-preserving central-upwind scheme. We demonstrate the accuracy and the robustness of the new scheme on several challenging numerical tests.
D. Dai, Y. Epshteyn, A. Narayan. Non-Dissipative and Structure-Preserving Emulators via Spherical Optimization, Subtitled arXiv:2108.12053, 2021.
Approximating a function with a finite series, eg, involving polynomials or trigonometric functions, is a critical tool in computing and data analysis. The construction of such approximations via now-standard approaches like least squares or compressive sampling does not ensure that the approximation adheres to certain convex linear structural constraints, such as positivity or monotonicity. Existing approaches that ensure such structure are norm-dissipative and this can have a deleterious impact when applying these approaches, eg, when numerical solving partial differential equations. We present a new framework that enforces via optimization such structure on approximations and is simultaneously norm-preserving. This results in a conceptually simple convex optimization problem on the sphere, but the feasible set for such problems can be very complex. We establish well-posedness of the optimization problem through results on spherical convexity and design several spherical-projection-based algorithms to numerically compute the solution. Finally, we demonstrate the effectiveness of this approach through several numerical examples.
E. Deelman, A. Mandal, A. P. Murillo, J. Nabrzyski, V. Pascucci, R. Ricci, I. Baldin, S. Sons, L. Christopherson, C. Vardeman, R. F. da Silva, J. Wyngaard, S. Petruzza, M. Rynge, K. Vahi, W. R. Whitcup, J. Drake, E. Scott. Blueprint: Cyberinfrastructure Center of Excellence, Subtitled arXiv, 2021.
In 2018, NSF funded an effort to pilot a Cyberinfrastructure Center of Excellence (CI CoE or Center) that would serve the cyberinfrastructure (CI) needs of the NSF Major Facilities (MFs) and large projects with advanced CI architectures. The goal of the CI CoE Pilot project (Pilot) effort was to develop a model and a blueprint for such a CoE by engaging with the MFs, understanding their CI needs, understanding the contributions the MFs are making to the CI community, and exploring opportunities for building a broader CI community. This document summarizes the results of community engagements conducted during the first two years of the project and describes the identified CI needs of the MFs. To better understand MFs' CI, the Pilot has developed and validated a model of the MF data lifecycle that follows the data generation and management within a facility and gained an understanding of how this model captures the fundamental stages that the facilities' data passes through from the scientific instruments to the principal investigators and their teams, to the broader collaborations and the public. The Pilot also aimed to understand what CI workforce development challenges the MFs face while designing, constructing, and operating their CI and what solutions they are exploring and adopting within their projects. Based on the needs of the MFs in the data lifecycle and workforce development areas, this document outlines a blueprint for a CI CoE that will learn about and share the CI solutions designed, developed, and/or adopted by the MFs, provide expertise to the largest NSF projects with advanced and complex CI architectures, and foster a …
Conversion of integrins from low to high affinity states, termed activation, is important in biological processes including immunity, hemostasis, angiogenesis and embryonic development. Integrin activation is regulated by large-scale conformational transitions from closed, low affinity states to open, high affinity states. While it has been suggested that substrate stiffness shifts the conformational equilibrium of integrin and governs its unbinding, here we address the role of integrin conformational activation in cellular mechanosensing. Comparison of WT vs activating mutants of integrin αVβ3 show that activating mutants shift cell spreading, FAK activation, traction stress and force on talin toward high stiffness values at lower stiffness. Although all activated integrin mutants showed equivalent binding affinity for soluble ligands, the β3 S243E mutant showed the strongest shift in mechanical responses. To understand this behavior, we used coarse-grained computational models derived from molecular level information. The models predicted that wild type integrin αVβ3 displaces under force, and that activating mutations shift the required force toward lower values, with S243E showing the strongest effect. Cellular stiffness sensing thus correlates with computed effects of force on integrin conformation. Together, these data identify a role for force-induced integrin conformational deformation in cellular mechanosensing.
A. Dubey, M. Berzins, C. Burstedde, M.l L. Norman, D. Unat, M. Wahib.
Structured Adaptive Mesh Refinement Adaptations to Retain Performance Portability With Increasing Heterogeneity, In Computing in Science & Engineering, Vol. 23, No. 5, pp. 62-66. 2021.
Adaptive mesh refinement (AMR) is an important method that enables many mesh-based applications to run at effectively higher resolution within limited computing resources by allowing high resolution only where really needed. This advantage comes at a cost, however: greater complexity in the mesh management machinery and challenges with load distribution. With the current trend of increasing heterogeneity in hardware architecture, AMR presents an orthogonal axis of complexity. The usual techniques, such as asynchronous communication and hierarchy management for parallelism and memory that are necessary to obtain reasonable performance are very challenging to reason about with AMR. Different groups working with AMR are bringing different approaches to this challenge. Here, we examine the design choices of several AMR codes and also the degree to which demands placed on them by their users influence these choices.
M. D. Foote, P. E. Dennison, P. R. Sullivan, K. B. O'Neill, A. K. Thorpe, D. R. Thompson, D. H. Cusworth, R. Duren, S. Joshi. Impact of scene-specific enhancement spectra on matched filter greenhouse gas retrievals from imaging spectroscopy, In Remote Sensing of Environment, Vol. 264, Elsevier, pp. 112574. 2021.
Matched filter techniques have been widely used for retrieval of greenhouse gas enhancements from imaging spectroscopy datasets. While multiple algorithmic techniques and refinements have been proposed, the greenhouse gas target spectrum used for concentration enhancement estimation has remained largely unaltered since the introduction of quantitative matched filter retrievals. The magnitude of retrieved methane and carbon dioxide enhancements, and thereby integrated mass enhancements (IME) and estimated flux of point-source emitters, is heavily dependent on this target spectrum. Current standard use of molecular absorption coefficients to create unit enhancement target spectra does not account for absorption by background concentrations of greenhouse gases, solar and sensor geometry, or atmospheric water vapor absorption. We introduce geometric and atmospheric parameters into the generation of scene-specific unit enhancement spectra to provide target spectra that are compatible with all greenhouse gas retrieval matched filter techniques. Specifically, we use radiative transfer modeling to model four parameters that are expected to change between scenes: solar zenith angle, column water vapor, ground elevation, and sensor altitude. These parameter values are well defined, with low variation within a single scene. A benchmark dataset consisting of ten AVIRIS-NG airborne imaging spectrometer scenes was used to compare IME retrieved using a matched filter algorithm. For methane plumes, IME resulting from use of standard, generic enhancement spectra varied from −22 to +28.7% compared to scene-specific enhancement spectra. Due to differences in spectral shape between the generic and scene-specific enhancement spectra, differences in methane plume IME were linked to surface spectral characteristics in addition to geometric and atmospheric parameters. IME differences were much larger for carbon dioxide plumes, with generic enhancement spectra producing integrated mass enhancements −76.1 to −48.1% compared to scene-specific enhancement spectra. Fluxes calculated from these integrated enhancements would vary by the same percentages, assuming equivalent wind conditions. Methane and carbon dioxide IME were most sensitive to changes in solar zenith angle and ground elevation. We introduce an interpolation approach that can efficiently generate scene-specific unit enhancement spectra for given sets of parameters. Scene-specific target spectra can improve confidence in greenhouse gas retrievals and flux estimates across collections of scenes with diverse geometric and atmospheric conditions.
Predicting and capturing an analyst’s intent behind a selection in a data visualization is valuable in two scenarios: First, a successful prediction of a pattern an analyst intended to select can be used to auto-complete a partial selection which, in turn, can improve the correctness of the selection. Second, knowing the intent behind a selection can be used to improve recall and reproducibility. In this paper, we introduce methods to infer analyst’s intents behind selections in data visualizations, such as scatterplots. We describe intents based on patterns in the data, and identify algorithms that can capture these patterns. Upon an interactive selection, we compare the selected items with the results of a large set of computed patterns, and use various ranking approaches to identify the best pattern for an analyst’s selection. We store annotations and the metadata to reconstruct a selection, such as the type of algorithm and its parameterization, in a provenance graph. We present a prototype system that implements these methods for tabular data and scatterplots. Analysts can select a prediction to auto-complete partial selections and to seamlessly log their intents. We discuss implications of our approach for reproducibility and reuse of analysis workflows. We evaluate our approach in a crowd-sourced study, where we show that auto-completing selection improves accuracy, and that we can accurately capture pattern-based intent.