SCI Publications
2024
X. Huang, H. Miao, A. Townsend, K. Champley, J. Tringe, V. Pascucci, P.T. Bremer.
Bimodal Visualization of Industrial X-Ray and Neutron Computed Tomography Data, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
DOI: 10.1109/TVCG.2024.3382607
Advanced manufacturing creates increasingly complex objects with material compositions that are often difficult to characterize by a single modality. Our collaborating domain scientists are going beyond traditional methods by employing both X-ray and neutron computed tomography to obtain complementary representations expected to better resolve material boundaries. However, the use of two modalities creates its own challenges for visualization, requiring either complex adjustments of bimodal transfer functions or the need for multiple views. Together with experts in nondestructive evaluation, we designed a novel interactive bimodal visualization approach to create a combined view of the co-registered X-ray and neutron acquisitions of industrial objects. Using an automatic topological segmentation of the bivariate histogram of X-ray and neutron values as a starting point, the system provides a simple yet effective interface to easily create, explore, and adjust a bimodal visualization. We propose a widget with simple brushing interactions that enables the user to quickly correct the segmented histogram results. Our semiautomated system enables domain experts to intuitively explore large bimodal datasets without the need for either advanced segmentation algorithms or knowledge of visualization techniques. We demonstrate our approach using synthetic examples, industrial phantom objects created to stress bimodal scanning techniques, and real-world objects, and we discuss expert feedback.
K.E. Isaacs, H. Kaiser.
Halide Code Generation Framework in Phylanx, In Euro-Par 2022: Parallel Processing Workshops , Springer, 2024.
Separating algorithms from their computation schedule has become a de facto solution to tackle the challenges of developing high performance code on modern heterogeneous architectures. Common approaches include Domain-specific languages (DSLs) which provide familiar APIs to domain experts, code generation frameworks that automate the generation of fast and portable code, and runtime systems that manage threads for concurrency and parallelism. In this paper, we present the Halide code generation framework for Phylanx distributed array processing platform. This extension enables compile-time optimization of Phylanx primitives for target architectures. To accomplish this, (1) we implemented new Phylanx primitives using Halide, and (2) partially exported Halide’s thread pool API to carry out parallelism on HPX (Phylanx’s runtime) threads. (3) showcased HPX performance analysis tools made available to Halide applications. The evaluation of the work has been done in two steps. First, we compare the performance of Halide applications running on its native runtime with that of the new HPX backend to verify there is no cost associated with using HPX threads. Next, we compare performances of a number of original implementations of Phylanx primitives against the new ones in Halide to verify performance and portability benefits of Halide in the context of Phylanx.
K. Iyer, J. Adams, S.Y. Elhabian.
SCorP: Statistics-Informed Dense Correspondence Prediction Directly from Unsegmented Medical Images, Subtitled arXiv preprint arXiv:2404.17967, 2024.
Statistical shape modeling (SSM) is a powerful computational framework for quantifying and analyzing the geometric variability of anatomical structures, facilitating advancements in medical research, diagnostics, and treatment planning. Traditional methods for shape modeling from imaging data demand significant manual and computational resources. Additionally, these methods necessitate repeating the entire modeling pipeline to derive shape descriptors (e.g., surface-based point correspondences) for new data. While deep learning approaches have shown promise in streamlining the construction of SSMs on new data, they still rely on traditional techniques to supervise the training of the deep networks. Moreover, the predominant linearity assumption of traditional approaches restricts their efficacy, a limitation also inherited by deep learning models trained using optimized/established correspondences. Consequently, representing complex anatomies becomes challenging. To address these limitations, we introduce SCorP, a novel framework capable of predicting surface-based correspondences directly from unsegmented images. By leveraging the shape prior learned directly from surface meshes in an unsupervised manner, the proposed model eliminates the need for an optimized shape model for training supervision. The strong shape prior acts as a teacher and regularizes the feature learning of the student network to guide it in learning image-based features that are predictive of surface correspondences. The proposed model streamlines the training and inference phases by removing the supervision for the correspondence prediction task while alleviating the linearity assumption. Experiments on the LGE MRI left atrium dataset and Abdomen CT-1K liver datasets demonstrate that the proposed technique enhances the accuracy and robustness of image-driven SSM, providing a compelling alternative to current fully supervised methods.
K. Iyer, S.Y. Elhabian.
Probabilistic 3D Correspondence Prediction from Sparse Unsegmented Images, Subtitled arXiv preprint arXiv:2407.01931v1, 2024.
The study of physiology demonstrates that the form (shape) of anatomical structures dictates their functions, and analyzing the form of anatomies plays a crucial role in clinical research. Statistical shape modeling (SSM) is a widely used tool for quantitative analysis of forms of anatomies, aiding in characterizing and identifying differences within a population of subjects. Despite its utility, the conventional SSM construction pipeline is often complex and time-consuming. Additionally, reliance on linearity assumptions further limits the model from capturing clinically relevant variations. Recent advancements in deep learning solutions enable the direct inference of SSM from unsegmented medical images, streamlining the process and improving accessibility. However, the new methods of SSM from images do not adequately account for situations where the imaging data quality is poor or where only sparse information is available. Moreover, quantifying aleatoric uncertainty, which represents inherent data variability, is crucial in deploying deep learning for clinical tasks to ensure reliable model predictions and robust decision-making, especially in challenging imaging conditions. Therefore, we propose SPI-CorrNet, a unified model that predicts 3D correspondences from sparse imaging data. It leverages a teacher network to regularize feature learning and quantifies data-dependent aleatoric uncertainty by adapting the network to predict intrinsic input variances. Experiments on the LGE MRI left atrium dataset and Abdomen CT-1K liver datasets demonstrate that our technique enhances the accuracy and robustness of sparse image-driven SSM.
J Johnson, L McDonald, T Tasdizen.
Improving uranium oxide pathway discernment and generalizability using contrastive self-supervised learning, In Computational Materials Science, Vol. 223, Elsevier, 2024.
In the field of Nuclear Forensics, there exists a plethora of different tools to aid investigators when performing analysis of unknown nuclear materials. Many of these tools offer visual representations of the uranium ore concentrate (UOC) materials that include complimentary and contrasting information. In this paper, we present a novel technique drawing from state-of-the-art machine learning methods that allows information from scanning electron microscopy images (SEM) to be combined to create digital encodings of the material that can be used to determine the material’s processing route. Our technique can classify UOC processing routes with greater than 96% accuracy in a fraction of a second and can be adapted to unseen samples at similarly high accuracy. The technique’s high accuracy and speed allow forensic investigators to quickly get preliminary results, while generalization allows the model to be adapted to new materials or processing routes quickly without the need for complete retraining of the model.
L.G. Johnson, J.D. Mozingo, P.R. Atkins, S. Schwab, A. Morris, S.Y. Elhabian, D.R. Wilson, H. Kim, A.E. Anderson.
A framework for three-dimensional statistical shape modeling of the proximal femur in Legg–Calvé–Perthes disease, In International Journal of Computer Assisted Radiology and Surgery, Springer Nature Switzerland, 2024.
Purpose
The pathomorphology of Legg–Calvé–Perthes disease (LCPD) is a key contributor to poor long-term outcomes such as hip pain, femoroacetabular impingement, and early-onset osteoarthritis. Plain radiographs, commonly used for research and in the clinic, cannot accurately represent the full extent of LCPD deformity. The purpose of this study was to develop and evaluate a methodological framework for three-dimensional (3D) statistical shape modeling (SSM) of the proximal femur in LCPD.
Methods
We developed a framework consisting of three core steps: segmentation, surface mesh preparation, and particle-based correspondence. The framework aims to address challenges in modeling this rare condition, characterized by highly heterogeneous deformities across a wide age range and small sample sizes. We evaluated this framework by producing a SSM from clinical magnetic resonance images of 13 proximal femurs with LCPD deformity from 11 patients between the ages of six and 12 years.
Results
After removing differences in scale and pose, the dominant shape modes described morphological features characteristic of LCPD, including a broad and flat femoral head, high-riding greater trochanter, and reduced neck-shaft angle. The first four shape modes were chosen for the evaluation of the model’s performance, together describing 87.5% of the overall cohort variance. The SSM was generalizable to unfamiliar examples with an average point-to-point reconstruction error below 1mm. We observed strong Spearman rank correlations (up to 0.79) between some shape modes, 3D measurements of femoral head asphericity, and clinical radiographic metrics.
Conclusion
In this study, we present a framework, based on SSM, for the objective description of LCPD deformity in three dimensions. Our methods can accurately describe overall shape variation using a small number of parameters, and are a step toward a widely accepted, objective 3D quantification of LCPD deformity.
O. Joshi, T. Skóra, A. Yarema, R.D. Rabbitt, T.C. Bidone.
Contributions of the individual domains of αIIbβ3 integrin to its extension: Insights from multiscale modeling , In Cytoskeleton, 2024.
The platelet integrin αIIbβ3 undergoes long-range conformational transitions between bent and extended conformations to regulate platelet aggregation during hemostasis and thrombosis. However, how exactly αIIbβ3 transitions between conformations remains largely elusive. Here, we studied how transitions across bent and extended-closed conformations of αIIbβ3 integrin are regulated by effective interactions between its functional domains. We first carried out μs-long equilibrium molecular dynamics (MD) simulations of full-length αIIbβ3 integrins in bent and intermediate conformations, the latter characterized by an extended headpiece and closed legs. Then, we built heterogeneous elastic network models, perturbed inter-domain interactions, and evaluated their relative contributions to the energy barriers between conformations. Results showed that integrin extension emerges from: (i) changes in interfaces between functional domains; (ii) allosteric coupling of the head and upper leg domains with flexible lower leg domains. Collectively, these results provide new insights into integrin conformational activation based on short- and long-range interactions between its functional domains and highlight the importance of the lower legs in the regulation of integrin allostery.
T. Kataria, B. Knudsen, S.Y. Elhabian.
StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining, Subtitled arXiv preprint arXiv:2403.11340, 2024.
Hematoxylin and Eosin (H&E) staining is the most commonly used for disease diagnosis and tumor recurrence tracking. Hematoxylin excels at highlighting nuclei, whereas eosin stains the cytoplasm. However, H&E stain lacks details for differentiating different types of cells relevant to identifying the grade of the disease or response to specific treatment variations. Pathologists require special immunohistochemical (IHC) stains that highlight different cell types. These stains help in accurately identifying different regions of disease growth and their interactions with the cell’s microenvironment. The advent of deep learning models has made Image-to-Image (I2I) translation a key research area, reducing the need for expensive physical staining processes. Pix2Pix and CycleGAN are still the most commonly used methods for virtual staining applications. However, both suffer from hallucinations or staining irregularities when H&E stain has less discriminate information about the underlying cells IHC needs to highlight (e.g.,CD3 lymphocytes). Diffusion models are currently the state-of-the-art models for image generation and conditional generation tasks. However, they require extensive and diverse datasets (millions of samples) to converge, which is less feasible for virtual staining applications. Inspired by the success of multitask deep learning models for limited dataset size, we propose StainDiffuser, a novel multitask dual diffusion architecture for virtual staining that converges under a limited training budget. StainDiffuser trains two diffusion processes simultaneously: (a) generation of cell-specific IHC stain from H&E and (b) H&E-based cell segmentation using coarse segmentation only during training. Our results show that StainDiffuser produces high-quality results for easier (CK8/18,epithelial marker) and difficult stains(CD3, Lymphocytes).
V. Koppelmans, M.F.L. Ruitenberg, S.Y. Schaefer, J.B. King, J.M. Jacobo, B.P. Silvester, A.F. Mejia, J. van der Geest, J.M. Hoffman, T. Tasdizen, K. Duff.
Classification of Mild Cognitive Impairment and Alzheimer's Disease Using Manual Motor Measures, In Neurodegener Dis, 2024.
DOI: 10.1159/000539800
PubMed ID: 38865972
Introduction: Manual motor problems have been reported in mild cognitive impairment (MCI) and Alzheimer's disease (AD), but the specific aspects that are affected, their neuropathology, and potential value for classification modeling is unknown. The current study examined if multiple measures of motor strength, dexterity, and speed are affected in MCI and AD, related to AD biomarkers, and are able to classify MCI or AD.
Methods: Fifty-three cognitively normal (CN), 33 amnestic MCI, and 28 AD subjects completed five manual motor measures: grip force, Trail Making Test A, spiral tracing, finger tapping, and a simulated feeding task. Analyses included: 1) group differences in manual performance; 2) associations between manual function and AD biomarkers (PET amyloid β, hippocampal volume, and APOE ε4 alleles); and 3) group classification accuracy of manual motor function using machine learning.
Results: amnestic MCI and AD subjects exhibited slower psychomotor speed and AD subjects had weaker dominant hand grip strength than CN subjects. Performance on these measures was related to amyloid β deposition (both) and hippocampal volume (psychomotor speed only). Support vector classification well-discriminated control and AD subjects (area under the curve of 0.73 and 0.77 respectively), but poorly discriminated MCI from controls or AD.
Conclusion: Grip strength and spiral tracing appear preserved, while psychomotor speed is affected in amnestic MCI and AD. The association of motor performance with amyloid β deposition and atrophy could indicate that this is due to amyloid deposition in- and atrophy of motor brain regions, which generally occurs later in the disease process. The promising discriminatory abilities of manual motor measures for AD emphasize their value alongside other cognitive and motor assessment outcomes in classification and prediction models, as well as potential enrichment of outcome variables in AD clinical trials.
E. Kwan, E. ghafoori, W. Good, M. Regouski, B. Moon, J. Fish, E. Hsu, I. Polejaeva, R.S. Macleod, D. Dosdall, R. Ranjan.
Diffuse Functional and Structural Abnormalities in Fibrosis: Potential Structural Basis for Sustaining Atrial Fibrillation, In Circulation, Vol. 150, pp. A4136863--A4136863. 2024.
D. Lange, R. Judson-Torres, T.A. Zangle, A. Lex.
Aardvark: Composite Visualizations of Trees, Time-Series, and Images, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
How do cancer cells grow, divide, proliferate and die? How do drugs influence these processes? These are difficult questions that we can attempt to answer with a combination of time-series microscopy experiments, classification algorithms, and data visualization. However, collecting this type of data and applying algorithms to segment and track cells and construct lineages of proliferation is error-prone; and identifying the errors can be challenging since it often requires cross-checking multiple data types. Similarly, analyzing and communicating the results necessitates synthesizing different data types into a single narrative. State-of-the-art visualization methods for such data use independent line charts, tree diagrams, and images in separate views. However, this spatial separation requires the viewer of these charts to combine the relevant pieces of data in memory. To simplify this challenging task, we describe design principles for weaving cell images, time-series data, and tree data into a cohesive visualization. Our design principles are based on choosing a primary data type that drives the layout and integrates the other data types into that layout. We then introduce Aardvark, a system that uses these principles to implement novel visualization techniques. Based on Aardvark, we demonstrate the utility of each of these approaches for discovery, communication, and data debugging in a series of case studies.
J. Li, T.A.J. Ouermi, C.R. Johnson.
Visualizing Uncertainties in Ensemble Wildfire Forecast Simulations, In IEEE Workshop on Uncertainty Visualization: Applications, Techniques, Software, and Decision Frameworks, IEEE, pp. 84--88. 2024.
DOI: 10.1109/UncertaintyVisualization63963.2024.00016
Wildfires pose substantial risks to our health, environment, and economy. Studying wildfires is challenging due to their complex interaction with the atmosphere dynamics and the terrain. Researchers have employed ensemble simulations to study the relationship among variables and mitigate uncertainties in unpredictable initial conditions. However, many wildfire researchers are unaware of the advanced visualization available for conveying uncertainty. We designed and implemented an interactive visualization system for studying the uncertainties of fire spread patterns utilizing band-depth-based order statistics and contour boxplots. We also augment the visualization system with the summary of changes in the burned area and fuel content to help scientists identify interesting temporal events. In this paper, we demonstrate how our system can support wildfire experts in studying fire spread patterns, identifying outlier simulations, and navigating to interesting times based on a summary of events.
Z. Li, H. Miao, V. Pascucci, S. Liu.
Visualization Literacy of Multimodal Large Language Models: A Comparative Study, Subtitled arXiv:2407.10996, 2024.
The recent introduction of multimodal large language models (MLLMs) combine the inherent power of large language models (LLMs) with the renewed capabilities to reason about the multimodal context. The potential usage scenarios for MLLMs significantly outpace their text-only counterparts. Many recent works in visualization have demonstrated MLLMs' capability to understand and interpret visualization results and explain the content of the visualization to users in natural language. In the machine learning community, the general vision capabilities of MLLMs have been evaluated and tested through various visual understanding benchmarks. However, the ability of MLLMs to accomplish specific visualization tasks based on visual perception has not been properly explored and evaluated, particularly, from a visualization-centric perspective.
In this work, we aim to fill the gap by utilizing the concept of visualization literacy to evaluate MLLMs. We assess MLLMs' performance over two popular visualization literacy evaluation datasets (VLAT and mini-VLAT). Under the framework of visualization literacy, we develop a general setup to compare different multimodal large language models (e.g., GPT4-o, Claude 3 Opus, Gemini 1.5 Pro) as well as against existing human baselines. Our study demonstrates MLLMs' competitive performance in visualization literacy, where they outperform humans in certain tasks such as identifying correlations, clusters, and hierarchical structures.
X. Li, R. Mohammed, T. Mangin, S. Saha, K. Kelly, R.T. Whitaker, T. Tasdizen.
Joint Audio-Visual Idling Vehicle Detection with Streamlined Input Dependencies, Subtitled arXiv:2410.21170v1, 2024.
Idling vehicle detection (IVD) can be helpful in monitoring and reducing unnecessary idling and can be integrated into real-time systems to address the resulting pollution and harmful products. The previous approach, a non-end-to-end model, requires extra user clicks to specify a part of the input, making system deployment more error-prone or even not feasible. In contrast, we introduce an end-to-end joint audio-visual IVD task designed to detect vehicles visually under three states: moving, idling and engine off. Unlike feature co-occurrence task such as audio-visual vehicle tracking, our IVD task addresses complementary features, where labels cannot be determined by a single modality alone. To this end, we propose AVIVD-Net, a novel network that integrates audio and visual features through a bidirectional attention mechanism. AVIVD-Net streamlines the input process by learning a joint feature space, reducing the deployment complexity of previous methods. Additionally, we introduce the AVIVD dataset, which is seven times larger than previous datasets, offering significantly more annotated samples to study the IVD problem. Our model achieves performance comparable to prior approaches, making it suitable for automated deployment. Furthermore, by evaluating AVIVDNet on the feature co-occurrence public dataset MAVD, we demonstrate its potential for extension to self-driving vehicle video-camera setups.
M. Lisnic, Z. Cutler, M. Kogan, A. Lex.
Visualization Guardrails: Designing Interventions Against Cherry-Picking in Interactive Data Explorers, Subtitled Preprint, 2024.
The growing popularity of interactive time series exploration platforms has made visualizing data of public interest more accessible to general audiences. At the same time, the democratized access to professional-looking explorers with preloaded data enables the creation of convincing visualizations with carefully cherry-picked items. Prior research shows that people use data explorers to create and share charts that support their potentially biased or misleading views on public health or economic policy and that such charts have, for example, contributed to the spread of COVID-19 misinformation. Interventions against misinformation have focused on post hoc approaches such as fact-checking or removing misleading content, which are known to be challenging to execute. In this work, we explore whether we can use visualization design to impede cherry-picking—one of the most common methods employed by deceptive charts created on data exploration platforms. We describe a design space of guardrails—interventions against cherry-picking in time series explorers. Using our design space, we create a prototype data explorer with four types of guardrails and conduct two crowd-sourced experiments. In the first experiment, we challenge participants to create cherry-picked charts. We then use these charts in a second experiment to evaluate the guardrails’ impact on the perception of cherry-picking. We find evidence that guardrails—particularly superimposing relevant primary data—are successful at encouraging skepticism in a subset of experimental conditions but come with limitations. Based on our findings, we propose recommendations for developing effective guardrails for visualizations.
M. Lowery, J. Turnage, Z. Morrow, J.D. Jakeman, A. Narayan.
Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning, Subtitled arXiv:2407.00809v1, 2024.
This paper introduces the Kernel Neural Operator (KNO), a novel operator learning technique that uses deep kernel-based integral operators in conjunction with quadrature for function-space approximation of operators (maps from functions to functions). KNOs use parameterized, closed-form, finitely-smooth, and compactly-supported kernels with trainable sparsity parameters within the integral operators to significantly reduce the number of parameters that must be learned relative to existing neural operators. Moreover, the use of quadrature for numerical integration endows the KNO with geometric flexibility that enables operator learning on irregular geometries. Numerical results demonstrate that on existing benchmarks the training and test accuracy of KNOs is higher than popular operator learning techniques while using at least an order of magnitude fewer trainable parameters. KNOs thus represent a new paradigm of low-memory, geometrically-flexible, deep operator learning, while retaining the implementation simplicity and transparency of traditional kernel methods from both scientific computing and machine learning.
W. Lyu, R. Sridharamurthy, J.M. Phillips, B. Wang.
Fast Comparative Analysis of Merge Trees Using Locality Sensitive Hashing, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2024.
Scalar field comparison is a fundamental task in scientific visualization. In topological data analysis, we compare topological descriptors of scalar fields—such as persistence diagrams and merge trees—because they provide succinct and robust abstract representations. Several similarity measures for topological descriptors seem to be both asymptotically and practically efficient with polynomial time algorithms, but they do not scale well when handling large-scale, time-varying scientific data and ensembles. In this paper, we propose a new framework to facilitate the comparative analysis of merge trees, inspired by tools from locality sensitive hashing (LSH). LSH hashes similar objects into the same hash buckets with high probability. We propose two new similarity measures for merge trees that can be computed via LSH, using new extensions to Recursive MinHash and subpath signature, respectively. Our similarity measures are extremely efficient to compute and closely resemble the results of existing measures such as merge tree edit distance or geometric interleaving distance. Our experiments demonstrate the utility of our LSH framework in applications such as shape matching, clustering, key event detection, and ensemble summarization.
C. Mackenzie, S. Ruckel, A. Morris, S. Elhabian, E. Bieging.
Statistical Shape Modeling To Predict Left Atrial Appendage Thrombus, In Journal of Cardiovascular Computed Tomography, Elsevier, 2024.
DOI: https://doi.org/10.1016/j.jcct.2024.05.195
C. Mackenzie, A. Morris, S. Ruckel, S. Elhabian, E. Bieging.
Left Atrial Appendage Thrombus Prediction with Statistical Shape Modeling, In Circulation, Vol. 150, 2024.
DOI: https://doi.org/10.1161/circ.150.suppl_1.4144233
Methods: We collected 132 cardiac CTs from consecutive studies of patients over 14 months obtained for evaluation of LAA thrombus prior to cardioversion. Of these, 16 patients were excluded. The LA and LAA were manually segmented independently from the systolic phase of the remaining 116 patients. Shape analysis was then performed using Shapeworks software (SCI, University of Utah) to compute shape parameters of the LAA in isolation as well as the LA and LAA in combination without controlling for scale or orientation. The shape parameters explaining the greatest shape variance were considered for the model until at least 80% of shape variance was included. A logistic regression model for prediction of LAA thrombus was created using these shape parameters with forward and backward stepwise model selection.
Results: Of the 116 studies analyzed, 6 patients had thrombus in the LAA. Average shapes of the patients with and without thrombus differed in overall size as well as prominence of the LAA. Four shape parameters accounted for 81.2% of the LAA shape variance while six shape parameters accounted for 80.5% of the combined LA and LAA variance. The first shape parameter was predictive of LAA thrombus using both shape of the LAA only (p = 0.0258, AUC = 0.762), and when LAA shape was combined with LA shape in a joint model (p = 0.00511, AUC = 0.877).
Conclusion: Statistical shape modeling of the LAA, with or without the LA, can be performed on CT image data, and demonstrates differences in shape of these structures between patients with and without LAA thrombus. Patients with LAA thrombus had a larger overall LAA size, LA size, and a more prominent LAA with distinctive morphology. Findings suggest that statistical shape modeling may offer a quantitative and reproducible approach for using LAA shape to assess stroke risk in patients with AF.
H. Manoochehri, B. Zhang, B.S. Knudsen, T. Tasdizen.
PathMoCo: A Novel Framework to Improve Feature Embedding in Self-supervised Contrastive Learning for Histopathological Images, Subtitled arXiv:2410.17514, 2024.
Self-supervised learning has become a cornerstone in various areas, particularly histopathological image analysis. Image augmentation plays a crucial role in self-supervised learning, as it generates variations in image samples. However, traditional image augmentation techniques often overlook the unique characteristics of histopathological images. In this paper, we propose a new histopathology-specific image augmentation method called stain reconstruction augmentation (SRA). We integrate our SRA with MoCo v3, a leading model in self-supervised contrastive learning, along with our additional contrastive loss terms, and call the new model SRA-MoCo v3. We demonstrate that our SRA-MoCo v3 always outperforms the standard MoCo v3 across various downstream tasks and achieves comparable or superior performance to other foundation models pre-trained on significantly larger histopathology datasets.
Page 3 of 142