Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

SCI Publications


J. Sandhu, T. Bidone, R. D. Rabbitt. “Prestin Generates Instantaneous Force in Outer Hair Cell Membranes,” In Biophysical Journal, Vol. 120, No. 3, 2021.


Hearing occurs from sound reaching the inner ear cochlea, where electromotile Outer Hair Cells (OHCs) amplify vibrations by elongating and contracting rapidly in response to auditory frequency changes in membrane potential. OHCs can generate force cycle-by-cycle at frequencies exceeding 50kHz, but precisely how this is achieved is unclear. Electromotility requires expression of the transmembrane protein, prestin, which facilitates the electromechanical conversion through action of the Coulomb force acting on the anion Cl- bound at the core of the protein. However, recent experimental data suggests the charge displacement is too slow to support sound amplification at auditory frequencies. As a consequence, prestin electromechanics remain unclear at the molecular level. We hypothesize that prestin instantaneously transmits stress to the membrane, which subsequently drives charge displacement, membrane deformation, and OHC shape changes. To test the hypothesis, we examined the conformational dynamics of prestin and its effects on the motion of lipids under: (1) isometric conditions and (2) constant force conditions in order to mimic different regimes of membrane loading. All-atom molecular dynamics simulations of the prestin dimer embedded in POPC membranes were run and the trajectories analyzed. We discovered that under isometric conditions, the presence of a chloride ion in the electric field increased residue fluctuations. This trend was not observed under constant force conditions, supporting the idea that isometric conditions cause instantaneous force to be generated in the membrane. The analysis allowed us to identify the molecular mechanisms by which prestin allows electromechanical amplification by OHCs in the cochlea.

A. Singh, M. Bauer, S. Joshi. “Physics Informed Convex Artificial Neural Networks (PICANNs) for Optimal Transport based Density Estimation,” Subtitled “arXiv,” 2021.


Optimal Mass Transport (OMT) is a well studied problem with a variety of applications in a diverse set of fields ranging from Physics to Computer Vision and in particular Statistics and Data Science. Since the original formulation of Monge in 1781 significant theoretical progress been made on the existence, uniqueness and properties of the optimal transport maps. The actual numerical computation of the transport maps, particularly in high dimensions, remains a challenging problem. By Brenier's theorem, the continuous OMT problem can be reduced to that of solving a non-linear PDE of Monge-Ampere type whose solution is a convex function. In this paper, building on recent developments of input convex neural networks and physics informed neural networks for solving PDE's, we propose a Deep Learning approach to solve the continuous OMT problem.

To demonstrate the versatility of our framework we focus on the ubiquitous density estimation and generative modeling tasks in statistics and machine learning. Finally as an example we show how our framework can be incorporated with an autoencoder to estimate an effective probabilistic generative model.

T. Sun, D. Li, B. Wang. “Decentralized Federated Averaging,” Subtitled “arXiv preprint arXiv:2104.11375,” 2021.


Federated averaging (FedAvg) is a communication efficient algorithm for the distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, which requires massive communication between server and clients in each communication. Moreover, attacking the central server can break the whole system's privacy. In this paper, we study the decentralized FedAvg with momentum (DFedAvgM), which is implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved when the loss function satisfies the P\L property. Finally, we numerically verify the efficacy of DFedAvgM.

T. Sun, D. Li, B. Wang. “Stability and Generalization of the Decentralized Stochastic Gradient Descent,” Subtitled “arXiv preprint arXiv:2102.01302,” 2021.


The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non) convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.

M. Thorpe, B. Wang. “Robust Certification for Laplace Learning on Geometric Graphs,” Subtitled “arXiv preprint arXiv:2104.10837,” 2021.


Graph Laplacian (GL)-based semi-supervised learning is one of the most used approaches for classifying nodes in a graph. Understanding and certifying the adversarial robustness of machine learning (ML) algorithms has attracted large amounts of attention from different research communities due to its crucial importance in many security-critical applied domains. There is great interest in the theoretical certification of adversarial robustness for popular ML algorithms. In this paper, we provide the first adversarial robust certification for the GL classifier. More precisely we quantitatively bound the difference in the classification accuracy of the GL classifier before and after an adversarial attack. Numerically, we validate our theoretical certification results and show that leveraging existing adversarial defenses for the -nearest neighbor classifier can remarkably improve the robustness of the GL classifier.

J. P. Torres, Z. Lin, M. Watkins, P. F. Salcedo, R. P. Baskin, S. Elhabian, H. Safavi-Hemami, D. Taylor, J. Tun, G. P. Concepcion, N. Saguil, A. A. Yanagihara, Y. Fang, J. R. McArthur, H. Tae, R. K. Finol-Urdaneta, B. D. Özpolat, B. M. Olivera, E. W. Schmidt. “Small-molecule mimicry hunting strategy in the imperial cone snail, Conus imperialis,” In Science Advances, Vol. 7, No. 11, American Association for the Advancement of Science, 2021.


Venomous animals hunt using bioactive peptides, but relatively little is known about venom small molecules and the resulting complex hunting behaviors. Here, we explored the specialized metabolites from the venom of the worm-hunting cone snail, Conus imperialis. Using the model polychaete worm Platynereis dumerilii, we demonstrate that C. imperialis venom contains small molecules that mimic natural polychaete mating pheromones, evoking the mating phenotype in worms. The specialized metabolites from different cone snails are species-specific and structurally diverse, suggesting that the cones may adopt many different prey-hunting strategies enabled by small molecules. Predators sometimes attract prey using the prey’s own pheromones, in a strategy known as aggressive mimicry. Instead, C. imperialis uses metabolically stable mimics of those pheromones, indicating that, in biological mimicry, even the molecules themselves may be disguised, providing a twist on fake news in chemical ecology.

V. Vedam-Mai, K. Deisseroth, J. Giordano, G. Lazaro-Munoz, W. Chiong, N. Suthana, J. Langevin, J. Gill, W. Goodman, N. R. Provenza, C. H. Halpern, R. S. Shivacharan, T. N. Cunningham, S. A. Sheth, N. Pouratian, K. W. Scangos, H. S. Mayberg, A. Horn, K. A. Johnson, C. R. Butson, R. Gilron, C. de Hemptinne, R. Wilt, M. Yaroshinsky, S. Little, P. Starr, G. Worrell, P. Shirvalkar, E. Chang, J. Volkmann, M. Muthuraman, S. Groppa, A. A. Kühn, L. Li, M. Johnson, K. J. Otto, R. Raike, S. Goetz, C. Wu, P. Silburn, B. Cheeran, Y. J. Pathak, M. Malekmohammadi, A. Gunduz, J. K. Wong, S. Cernera, A. W. Shukla, A. Ramirez-Zamora, W. Deeb, A. Patterson, K. D. Foote, M. S. Okun. “Proceedings of the Eighth Annual Deep Brain Stimulation Think Tank: Advances in Optogenetics, Ethical Issues Affecting DBS Research, Neuromodulatory Approaches for Depression, Adaptive Neurostimulation, and Emerging DBS Technologies,” In Frontiers in Human Neuroscience, Vol. 15, pp. 169. 2021.
ISSN: 1662-5161
DOI: 10.3389/fnhum.2021.644593


We estimate that 208,000 deep brain stimulation (DBS) devices have been implanted to address neurological and neuropsychiatric disorders worldwide. DBS Think Tank presenters pooled data and determined that DBS expanded in its scope and has been applied to multiple brain disorders in an effort to modulate neural circuitry. The DBS Think Tank was founded in 2012 providing a space where clinicians, engineers, researchers from industry and academia discuss current and emerging DBS technologies and logistical and ethical issues facing the field. The emphasis is on cutting edge research and collaboration aimed to advance the DBS field. The Eighth Annual DBS Think Tank was held virtually on September 1 and 2, 2020 (Zoom Video Communications) due to restrictions related to the COVID-19 pandemic. The meeting focused on advances in: (1) optogenetics as a tool for comprehending neurobiology of diseases and on optogenetically-inspired DBS, (2) cutting edge of emerging DBS technologies, (3) ethical issues affecting DBS research and access to care, (4) neuromodulatory approaches for depression, (5) advancing novel hardware, software and imaging methodologies, (6) use of neurophysiological signals in adaptive neurostimulation, and (7) use of more advanced technologies to improve DBS clinical outcomes. There were 178 attendees who participated in a DBS Think Tank survey, which revealed the expansion of DBS into several indications such as obesity, post-traumatic stress disorder, addiction and Alzheimer’s disease. This proceedings summarizes the advances discussed at the Eighth Annual DBS Think Tank.

B. Wang, D. Zou, Q. Gu, S. J. Osher. “Laplacian smoothing stochastic gradient markov chain monte carlo,” In SIAM Journal on Scientific Computing, Vol. 43, No. 1, SIAM, pp. A26-A53. 2021.


As an important Markov chain Monte Carlo (MCMC) method, the stochastic gradient Langevin dynamics (SGLD) algorithm has achieved great success in Bayesian learning and posterior sampling. However, SGLD typically suffers from a slow convergence rate due to its large variance caused by the stochastic gradient. In order to alleviate these drawbacks, we leverage the recently developed Laplacian smoothing technique and propose a Laplacian smoothing stochastic gradient Langevin dynamics (LS-SGLD) algorithm. We prove that for sampling from both log-concave and non-log-concave densities, LS-SGLD achieves strictly smaller discretization error in 2-Wasserstein distance, although its mixing rate can be slightly slower. Experiments on both synthetic and real datasets verify our theoretical results and demonstrate the superior performance of LS-SGLD on different machine learning tasks including posterior …

R. Zambre, D. Sahasrabudhe, H. Zhou, M. Berzins, A. Chandramowlishwaran, P. Balaji. “Logically Parallel Communication for Fast MPI+Threads Communication,” In Proceedings of the Transactions on Parallel and Distributed Computing, IEEE, April, 2021.


Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional “MPI everywhere” approach to better handle the disproportionate increase in the number of cores compared with other on-node resources. In practice, however, most applications observe a slower performance with MPI+threads primarily because of poor communication performance. Recent research efforts on MPI libraries address this bottleneck by mapping logically parallel communication, that is, operations that are not subject to MPI’s ordering constraints to the underlying network parallelism. Domain scientists, however, typically do not expose such communication independence information because the existing MPI-3.1 standard’s semantics can be limiting. Researchers had initially proposed user-visible endpoints to combat this issue, but such a solution requires intrusive changes to the standard (new APIs). The upcoming MPI-4.0 standard, on the other hand, allows applications to relax unneeded semantics and provides them with many opportunities to express logical communication parallelism. In this paper, we show how MPI+threads applications can achieve high performance with logically parallel communication. Through application case studies, we compare the capabilities of the new MPI-4.0 standard with those of the existing one and user-visible endpoints (upper bound). Logical communication parallelism can boost the overall performance of an application by over 2x.

L. Zhou, C. R. Johnson, D. Weiskopf. “Data-Driven Space-Filling Curves,” In IEEE Transactions on Visualization and Computer Graphics, Vol. 27, No. 2, IEEE, pp. 1591-1600. 2021.
DOI: 10.1109/TVCG.2020.3030473


We propose a data-driven space-filling curve method for 2D and 3D visualization. Our flexible curve traverses the data elements in the spatial domain in a way that the resulting linearization better preserves features in space compared to existing methods. We achieve such data coherency by calculating a Hamiltonian path that approximately minimizes an objective function that describes the similarity of data values and location coherency in a neighborhood. Our extended variant even supports multiscale data via quadtrees and octrees. Our method is useful in many areas of visualization, including multivariate or comparative visualization,ensemble visualization of 2D and 3D data on regular grids, or multiscale visual analysis of particle simulations. The effectiveness of our method is evaluated with numerical comparisons to existing techniques and through examples of ensemble and multivariate datasets.


H. Childs, S. D. Ahern, J. Ahrens, A. C. Bauer, J. Bennett, E. W. Bethel, P. Bremer, E. Brugger, J. Cottam, M. Dorier, S. Dutta, J. M. Favre, T. Fogal, S. Frey, C. Garth, B. Geveci, W. F. Godoy, C. D. Hansen, C. Harrison, B. Hentschel, J. Insley, C. R. Johnson, S. Klasky, A. Knoll, J. Kress, M. Larsen, J. Lofstead, K. Ma, P. Malakar, J. Meredith, K. Moreland, P. Navratil, P. O’Leary, M. Parashar, V. Pascucci, J. Patchett, T. Peterka, S. Petruzza, N. Podhorszki, D. Pugmire, M. Rasquin, S. Rizzi, D. H. Rogers, S. Sane, F. Sauer, R. Sisneros, H. Shen, W. Usher, R. Vickery, V. Vishwanath, I. Wald, R. Wang, G. H. Weber, B. Whitlock, M. Wolf, H. Yu, S. B. Ziegeler. “A Terminology for In Situ Visualization and Analysis Systems,” In International Journal of High Performance Computing Applications, Vol. 34, No. 6, pp. 676–691. 2020.
DOI: 10.1177/1094342020935991


The term “in situ processing” has evolved over the last decade to mean both a specific strategy for visualizing and analyzing data and an umbrella term for a processing paradigm. The resulting confusion makes it difficult for visualization and analysis scientists to communicate with each other and with their stakeholders. To address this problem, a group of over fifty experts convened with the goal of standardizing terminology. This paper summarizes their findings and proposes a new terminology for describing in situ systems. An important finding from this group was that in situ systems are best described via multiple, distinct axes: integration type, proximity, access, division of execution, operation controls, and output type. This paper discusses these axes, evaluates existing systems within the axes, and explores how currently used terms relate to the axes.

L. Cinquini, S. Petruzza, Jason J. Boutte, S. Ames, G. Abdulla, V. Balaji, R. Ferraro, A. Radhakrishnan, L. Carriere, T. Maxwell, G. Scorzelli, V. Pascucci. “Distributed Resources for the Earth System Grid Advanced Management (DREAM), Final Report,” 2020.


The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation [1]) architecture which would be suitable for managing and accessing data and services resources on a distributed and scalable environment. In particular, the project intended to focus on the computing and visualization capabilities of the stack, which at the time were rather primitive. At the beginning, the team had the general notion that a better ESGF architecture could be built by modularizing each component, and redefining its interaction with other components by defining and exposing a well defined API. Although this was still the high level principle that guided the work, the DREAM project was able to accomplish its goals by leveraging new practices in IT that started just about 3 or 4 years ago: the advent of containerization technologies (specifically, Docker), the development of frameworks to manage containers at scale (Docker Swarm and Kubernetes), and their application to the commercial Cloud. Thanks to these new technologies, DREAM was able to improve the ESGF architecture (including its computing and visualization services) to a level of deployability and scalability beyond the original expectations.

A. P. Janson, D. N. Anderson, C. R. Butson. “Activation robustness with directional leads and multi-lead configurations in deep brain stimulation,” In Journal of Neural Engineering, Vol. 17, No. 2, IOP Publishing, pp. 026012. March, 2020.
DOI: 10.1088/1741-2552/ab7b1d


Objective: Clinical outcomes from deep brain stimulation (DBS) can be highly variable, and two critical factors underlying this variability are the location and type of stimulation. In this study we quantified how robustly DBS activates a target region when taking into account a range of different lead designs and realistic variations in placement. The objective of the study is to assess the likelihood of achieving target activation.

Approach: We performed finite element computational modeling and established a metric of performance robustness to evaluate the ability of directional and multi-lead configurations to activate target fiber pathways while taking into account location variability. A more robust lead configuration produces less variability in activation across all stimulation locations around the target.

Main results: Directional leads demonstrated higher overall performance robustness compared to axisymmetric leads, primarily 1-2 mm outside of the target. Multi-lead configurations demonstrated higher levels of robustness compared to any single lead due to distribution of electrodes in a broader region around the target.

Significance: Robustness measures can be used to evaluate the performance of existing DBS lead designs and aid in the development of novel lead designs to better accommodate known variability in lead location and orientation. This type of analysis may also be useful to understand how DBS clinical outcome variability is influenced by lead location among groups of patients.

C. R. Johnson, T. Kapur, W. Schroeder,, T. Yoo. “Remembering Bill Lorensen: The Man, the Myth, and Marching Cubes,” In IEEE Computer Graphics and Applications, Vol. 40, No. 2, pp. 112-118. March, 2020.
DOI: 10.1109/MCG.2020.2971168

K. A. Johnson, G. Duffley, D. Nesterovich Anderson, J. L. Ostrem, M. Welter, J. C. Baldermann, J. Kuhn, D. Huys, V. Visser-Vandewalle, T. Foltynie, L. Zrinzo, M. Hariz, A. F. G. Leentjens, A. Y. Mogilner, M. H. Pourfar, L. Almeida, A. Gunduz, K. D. Foote, M. S. Okun, C. R. Butson. “Structural connectivity predicts clinical outcomes of deep brain stimulation for Tourette syndrome,” In Brain, July, 2020.
ISSN: 0006-8950
DOI: 10.1093/brain/awaa188


Deep brain stimulation may be an effective therapy for select cases of severe, treatment-refractory Tourette syndrome; however, patient responses are variable, and there are no reliable methods to predict clinical outcomes. The objectives of this retrospective study were to identify the stimulation-dependent structural networks associated with improvements in tics and comorbid obsessive-compulsive behaviour, compare the networks across surgical targets, and determine if connectivity could be used to predict clinical outcomes. Volumes of tissue activated for a large multisite cohort of patients (n = 66) implanted bilaterally in globus pallidus internus (n = 34) or centromedial thalamus (n = 32) were used to generate probabilistic tractography to form a normative structural connectome. The tractography maps were used to identify networks that were correlated with improvement in tics or comorbid obsessive-compulsive behaviour and to predict clinical outcomes across the cohort. The correlated networks were then used to generate ‘reverse’ tractography to parcellate the total volume of stimulation across all patients to identify local regions to target or avoid. The results showed that for globus pallidus internus, connectivity to limbic networks, associative networks, caudate, thalamus, and cerebellum was positively correlated with improvement in tics; the model predicted clinical improvement scores (P = 0.003) and was robust to cross-validation. Regions near the anteromedial pallidum exhibited higher connectivity to the positively correlated networks than posteroventral pallidum, and volume of tissue activated overlap with this map was significantly correlated with tic improvement (P < 0.017). For centromedial thalamus, connectivity to sensorimotor networks, parietal-temporal-occipital networks, putamen, and cerebellum was positively correlated with tic improvement; the model predicted clinical improvement scores (P = 0.012) and was robust to cross-validation. Regions in the anterior/lateral centromedial thalamus exhibited higher connectivity to the positively correlated networks, but volume of tissue activated overlap with this map did not predict improvement (P > 0.23). For obsessive-compulsive behaviour, both targets showed that connectivity to the prefrontal cortex, orbitofrontal cortex, and cingulate cortex was positively correlated with improvement; however, only the centromedial thalamus maps predicted clinical outcomes across the cohort (P = 0.034), but the model was not robust to cross-validation. Collectively, the results demonstrate that the structural connectivity of the site of stimulation are likely important for mediating symptom improvement, and the networks involved in tic improvement may differ across surgical targets. These networks provide important insight on potential mechanisms and could be used to guide lead placement and stimulation parameter selection, as well as refine targets for neuromodulation therapies for Tourette syndrome.

B. Kundu, T. S. Davis, B. Philip, E. H. Smith, A. Arain, A. Peters, B. Newman, C. R. Butson, J. D. Rolston. “A systematic exploration of parameters affecting evoked intracranial potentials in patients with epilepsy,” In Brain Stimulation, Vol. 13, No. 5, pp. 1232-1244. 2020.


Brain activity is constrained by and evolves over a network of structural and functional connections. Corticocortical evoked potentials (CCEPs) have been used to measure this connectivity and to discern brain areas involved in both brain function and disease. However, how varying stimulation parameters influences the measured CCEP across brain areas has not been well characterized.

To better understand the factors that influence the amplitude of the CCEPs as well as evoked gamma-band power (70–150 Hz) resulting from single-pulse stimulation via cortical surface and depth electrodes.

CCEPs from 4370 stimulation-response channel pairs were recorded across a range of stimulation parameters and brain regions in 11 patients undergoing long-term monitoring for epilepsy. A generalized mixed-effects model was used to model cortical response amplitudes from 5 to 100 ms post-stimulation.

Stimulation levels <5.5 mA generated variable CCEPs with low amplitude and reduced spatial spread. Stimulation at ≥5.5 mA yielded a reliable and maximal CCEP across stimulation-response pairs over all regions. These findings were similar when examining the evoked gamma-band power. The amplitude of both measures was inversely correlated with distance. CCEPs and evoked gamma power were largest when measured in the hippocampus compared with other areas. Larger CCEP size and evoked gamma power were measured within the seizure onset zone compared with outside this zone.

These results will help guide future stimulation protocols directed at quantifying network connectivity across cognitive and disease states.

C. Ly, C. Vachet, I. Schwerdt, E. Abbott, A. Brenkmann, L.W. McDonald, T. Tasdizen. “Determining uranium ore concentrates and their calcination products via image classification of multiple magnifications,” In Journal of Nuclear Materials, 2020.


Many tools, such as mass spectrometry, X-ray diffraction, X-ray fluorescence, ion chromatography, etc., are currently available to scientists investigating interdicted nuclear material. These tools provide an analysis of physical, chemical, or isotopic characteristics of the seized material to identify its origin. In this study, a novel technique that characterizes physical attributes is proposed to provide insight into the processing route of unknown uranium ore concentrates (UOCs) and their calcination products. In particular, this study focuses on the characteristics of the surface structure captured in scanning electron microscopy (SEM) images at different magnification levels. Twelve common commercial processing routes of UOCs and their calcination products are investigated. Multiple-input single-output (MISO) convolution neural networks (CNNs) are implemented to differentiate the processing routes. The proposed technique can determine the processing route of a given sample in under a second running on a graphics processing unit (GPU) with an accuracy of more than 95%. The accuracy and speed of this proposed technique enable nuclear scientists to provide the preliminary identification results of interdicted material in a short time period. Furthermore, this proposed technique uses a predetermined set of magnifications, which in turn eliminates the human bias in selecting the magnification during the image acquisition process.

T. A. J. Ouermi, R. M. Kirby, M. Berzins. “Numerical Testing of a New Positivity-Preserving Interpolation Algorithm,” Subtitled “arXiv,” 2020.


An important component of a number of computational modeling algorithms is an interpolation method that preserves the positivity of the function being interpolated. This report describes the numerical testing of a new positivity-preserving algorithm that is designed to be used when interpolating from a solution defined on one grid to different spatial grid. The motivating application is a numerical weather prediction (NWP) code that uses spectral elements as the discretization choice for its dynamics core and Cartesian product meshes for the evaluation of its physics routines. This combination of spectral elements, which use nonuniformly spaced quadrature/collocation points, and uniformly-spaced Cartesian meshes combined with the desire to maintain positivity when moving between these necessitates our work. This new approach is evaluated against several typical algorithms in use on a range of test problems in one or more space dimensions. The results obtained show that the new method is competitive in terms of observed accuracy while at the same time preserving the underlying positivity of the functions being interpolated.

V. Pascucci, I. Altintas, J. Fortes, I. Foster, H. Gu, S. Hariri, D. Stanzione, M. Taufer, X. Zhao. “Report from the NSF Workshop on Smart Cyberinfrastructure 2020,” NSF, 2020.


Machine learning and other Artifical Intelligenece technologies (all indicated in the following as AI) used within a modern, smart cyberinfrastructure have become critical new avenues for discovery and validation in data-driven science and engineering disciplines of all kinds. We can expect many landmark discoveries and new lines of productive research to be enabled through AI analysis of the rapidly growing treasure trove of scientific data. AI-based techniques have been applied in many fields of science and engineering, including remote sensing, cosmology, energy, cancer research, IT systems management, and machine design and control, but the lack of proper integration with the current NSF-supported cyberinfrastructure is limiting their potential. Recent events due to the COVID-19 pandemic have highlighted how cyberinfrastructure is a crucial enabler of modern research, with massive simulations and data management capabilities [8-10], but these events have also emphasized how the lack of proper integration with AI technology remains a major limiting factor for the advancement of science and engineering, especially when any kind of rapid response is needed.

S. P. Ponnapalli, M. W. Bradley, K. Devine, J. Bowen, S. E. Coppens, K. M. Leraas, B. A. Milash, F. Li, H. Luo, S. Qiu, K. Wu, H. Yang, C. T. Wittwer, C. A. Palmer, R. L. Jensen, J. M. Gastier-Foster, H. A. Hanson, J. S. Barnholtz-Sloan, O. Alter. “Retrospective clinical trial experimentally validates glioblastoma genome-wide pattern of DNA copy-number alterations predictor of survival,” In Applied Physics Letters (APL) Bioengineering, Vol. 4, No. 2, May, 2020.


Modeling of genomic profiles from the Cancer Genome Atlas (TCGA) by using recently developed mathematical frameworks has associated a genome-wide pattern of DNA copy-number alterations with a shorter, roughly one-year, median survival time in glioblastoma (GBM) patients. Here, to experimentally test this relationship, we whole-genome sequenced DNA from tumor samples of patients. We show that the patients represent the U.S. adult GBM population in terms of most normal and disease phenotypes. Intratumor heterogeneity affects ≈11% and profiling technology and reference human genome specifics affect <1% of the classifications of the tumors by the pattern, where experimental batch effects normally reduce the reproducibility, i.e., precision, of classifications based upon between one to a few hundred genomic loci by >30%. With a 2.25-year Kaplan–Meier median survival difference, a 3.5 univariate Cox hazard ratio, and a 0.78 concordance index, i.e., accuracy, the pattern predicts survival better than and independent of age at diagnosis, which has been the best indicator since 1950. The prognostic classification by the pattern may, therefore, help to manage GBM pseudoprogression. The diagnostic classification may help drugs progress to regulatory approval. The therapeutic predictions, of previously unrecognized targets that are correlated with survival, may lead to new drugs. Other methods missed this relationship in the roughly 3B-nucleotide genomes of the small, order of magnitude of 100, patient cohorts, e.g., from TCGA. Previous attempts to associate GBM genotypes with patient phenotypes were unsuccessful. This is a proof of principle that the frameworks are uniquely suitable for discovering clinically actionable genotype–phenotype relationships.