Multi-level multi-domain statistical shape model of the subtalar, talonavicular, and calcaneocuboid joints|
A.C. Peterson, R.J. Lisonbee, N. Krähenbühl, C.L. Saltzman, A. Barg, N. Khan, S. Elhabian, A.L. Lenz. In Frontiers in Bioengineering and Biotechnology, 2022.
Traditionally, two-dimensional conventional radiographs have been the primary tool to measure the complex morphology of the foot and ankle. However, the subtalar, talonavicular, and calcaneocuboid joints are challenging to assess due to their bone morphology and locations within the ankle. Weightbearing computed tomography is a novel high-resolution volumetric imaging mechanism that allows detailed generation of 3D bone reconstructions. This study aimed to develop a multi-domain statistical shape model to assess morphologic and alignment variation of the subtalar, talonavicular, and calcaneocuboid joints across an asymptomatic population and calculate 3D joint measurements in a consistent weightbearing position. Specific joint measurements included joint space distance, congruence, and coverage. Noteworthy anatomical variation predominantly included the talus and calcaneus, specifically an inverse relationship regarding talar dome heightening and calcaneal shortening. While there was minimal navicular and cuboid shape variation, there were alignment variations within these joints; the most notable is the rotational aspect about the anterior-posterior axis. This study also found that multi-domain modeling may be able to predict joint space distance measurements within a population. Additionally, variation across a population of these four bones may be driven far more by morphology than by alignment variation based on all three joint measurements. These data are beneficial in furthering our understanding of joint-level morphology and alignment variants to guide advancements in ankle joint pathological care and operative treatments.
High-Quality Progressive Alignment of Large 3D Microscopy Data|
A. Venkat, D. Hoang, A. Gyulassy, P.T. Bremer, F. Federer, V. Pascucci. In 2022 IEEE 12th Symposium on Large Data Analysis and Visualization (LDAV), pp. 1--10. 2022.
Large-scale three-dimensional (3D) microscopy acquisitions fre-quently create terabytes of image data at high resolution and magni-fication. Imaging large specimens at high magnifications requires acquiring 3D overlapping image stacks as tiles arranged on a two-dimensional (2D) grid that must subsequently be aligned and fused into a single 3D volume. Due to their sheer size, aligning many overlapping gigabyte-sized 3D tiles in parallel and at full resolution is memory intensive and often I/O bound. Current techniques trade accuracy for scalability, perform alignment on subsampled images, and require additional postprocess algorithms to refine the alignment quality, usually with high computational requirements. One common solution to the memory problem is to subdivide the overlap region into smaller chunks (sub-blocks) and align the sub-block pairs in parallel, choosing the pair with the most reliable alignment to determine the global transformation. Yet aligning all sub-block pairs at full resolution remains computationally expensive. The key to quickly developing a fast, high-quality, low-memory solution is to identify a single or a small set of sub-blocks that give good alignment at full resolution without touching all the overlapping data. In this paper, we present a new iterative approach that leverages coarse resolution alignments to progressively refine and align only the promising candidates at finer resolutions, thereby aligning only a small user-defined number of sub-blocks at full resolution to determine the lowest error transformation between pairwise overlapping tiles. Our progressive approach is 2.6x faster than the state of the art, requires less than 450MB of peak RAM (per parallel thread), and offers a higher quality alignment without the need for additional postprocessing refinement steps to correct for alignment errors.
|430 Training neural networks to identify built environment features for pedestrian safety,
A. Quistberg, C.I. Gonzalez, P. Arbeláez, O.L. Sarmiento, L. Baldovino-Chiquillo, Q. Nguyen, T. Tasdizen, L.A.G. Garcia, D. Hidalgo, S.J. Mooney, A.V.D. Roux, G. Lovasi. In Injury Prevention, Vol. 28, No. 2, BMJ, pp. A65. 2022.
Theory-guided physics-informed neural networks for boundary layer problems with singular perturbation|
A. Arzani, K.W. Cassel, R.M. D'Souza. In Journal of Computational Physics, 2022.
Physics-informed neural networks (PINNs) are a recent trend in scientific machine learning research and modeling of differential equations. Despite progress in PINN research, large gradients and highly nonlinear patterns remain challenging to model. Thin boundary layer problems are prominent examples of large gradients that commonly arise in transport problems. In this study, boundary-layer PINN (BL-PINN) is proposed to enable a solution to thin boundary layers by considering them as a singular perturbation problem. Inspired by the classical perturbation theory and asymptotic expansions, BL-PINN is designed to replicate the procedure in singular perturbation theory. Namely, different parallel PINN networks are defined to represent different orders of approximation to the boundary layer problem in the inner and outer regions. In different benchmark problems (forward and inverse), BL-PINN shows superior performance compared to the traditional PINN approach and is able to produce accurate results, whereas the classical PINN approach could not provide meaningful solutions. BL-PINN also demonstrates significantly better results compared to other extensions of PINN such as the extended PINN (XPINN) approach. The natural incorporation of the perturbation parameter in BL-PINN provides the opportunity to evaluate parametric solutions without the need for retraining. BL-PINN demonstrates an example of how classical mathematical theory could be used to guide the design of deep neural networks for solving challenging problems.
Quantifying the Severity of Metopic Craniosynostosis Using Unsupervised Machine Learning|
E.E. Anstadt, W. Tao, E. Guo, L. Dvoracek, M.K. Bruce, P.J. Grosse, L. Wang, L. Kavan, R. Whitaker, J.A. Goldstein. In Plastic and Reconstructive Surgery, November, 2022.
Quantifying the severity of head shape deformity and establishing a threshold for operative intervention remains challenging in patients with Metopic Craniosynostosis (MCS). This study combines 3D skull shape analysis with an unsupervised machine-learning algorithm to generate a quantitative shape severity score (CMD) and provide an operative threshold score.
Head computed tomography (CT) scans from subjects with MCS and normal controls (age 5-15 months) were used for objective 3D shape analysis using ShapeWorks software and in a survey for craniofacial surgeons to rate head-shape deformity and report whether they would offer surgical correction based on head shape alone. An unsupervised machine-learning algorithm was developed to quantify the degree of shape abnormality of MCS skulls compared to controls.
124 CTs were used to develop the model; 50 (24% MCS, 76% controls) were rated by 36 craniofacial surgeons, with an average of 20.8 ratings per skull. The interrater reliability was high (ICC=0.988). The algorithm performed accurately and correlates closely with the surgeons assigned severity ratings (Spearman’s Correlation coefficient r=0.817). The median CMD for affected skulls was 155.0 (IQR 136.4-194.6, maximum 231.3). Skulls with ratings ≥150.2 were highly likely to be offered surgery by the experts in this study.
This study describes a novel metric to quantify the head shape deformity associated with metopic craniosynostosis and contextualizes the results using clinical assessments of head shapes by craniofacial experts. This metric may be useful in supporting clinical decision making around operative intervention as well as in describing outcomes and comparing patient population across centers.
|A Pathologist-Informed Workflow for Classification of Prostate Glands in Histopathology,
A. Ferrero, B. Knudsen, D. Sirohi, R. Whitaker. In Medical Optical Imaging and Virtual Microscopy Image Analysis, Springer Nature Switzerland, pp. 53--62. 2022.
Pathologists diagnose and grade prostate cancer by examining tissue from needle biopsies on glass slides. The cancer's severity and risk of metastasis are determined by the Gleason grade, a score based on the organization and morphology of prostate cancer glands. For diagnostic work-up, pathologists first locate glands in the whole biopsy core, and---if they detect cancer---they assign a Gleason grade. This time-consuming process is subject to errors and significant inter-observer variability, despite strict diagnostic criteria. This paper proposes an automated workflow that follows pathologists' modus operandi, isolating and classifying multi-scale patches of individual glands in whole slide images (WSI) of biopsy tissues using distinct steps: (1) two fully convolutional networks segment epithelium versus stroma and gland boundaries, respectively; (2) a classifier network separates benign from cancer glands at high magnification; and (3) an additional classifier predicts the grade of each cancer gland at low magnification. Altogether, this process provides a gland-specific approach for prostate cancer grading that we compare against other machine-learning-based grading methods.
|Few-Shot Segmentation of Microscopy Images Using Gaussian Process,
S. Saha, O, Choi, R. Whitaker. In Medical Optical Imaging and Virtual Microscopy Image Analysis, Springer Nature Switzerland, pp. 94--104. 2022.
Few-shot segmentation has received recent attention because of its promise to segment images containing novel classes based on a handful of annotated examples. Few-shot-based machine learning methods build generic and adaptable models that can quickly learn new tasks. This approach finds potential application in many scenarios that do not benefit from large repositories of labeled data, which strongly impacts the performance of the existing data-driven deep-learning algorithms. This paper presents a few-shot segmentation method for microscopy images that combines a neural-network architecture with a Gaussian-process (GP) regression. The GP regression is used in the latent space of an autoencoder-based segmentation model to learn the distribution of functions from the encoded image representations to the corresponding representation of the segmentation masks in the support set. This regression analysis serves as the prior for predicting the segmentation mask for the query image. The rich latent representation built by the GP using examples in the support set significantly impacts the performance of the segmentation model, demonstrated by extensive experimental evaluation.
Spatiotemporal Cardiac Statistical Shape Modeling: A Data-Driven Approach|
Subtitled arXiv preprint arXiv:2209.02736, J. Adams, N. Khan, A. Morris, S. Elhabian. 2022.
Clinical investigations of anatomy’s structural changes over time could greatly benefit from population-level quantification of shape, or spatiotemporal statistic shape modeling (SSM). Such a tool enables characterizing patient organ cycles or disease progression in relation to a cohort of interest. Constructing shape models requires establishing a quantitative shape representation (e.g., corresponding landmarks). Particle-based shape modeling (PSM) is a data-driven SSM approach that captures population-level shape variations by optimizing landmark placement. However, it assumes cross-sectional study designs and hence has limited statistical power in representing shape changes over time. Existing methods for modeling spatiotemporal or longitudinal shape changes require predefined shape atlases and pre-built shape models that are typically constructed cross-sectionally. This paper proposes a data-driven approach inspired by the PSM method to learn population-level spatiotemporal shape changes directly from shape data. We introduce a novel SSM optimization scheme that produces landmarks that are in correspondence both across the population (inter-subject) and across time-series (intra-subject). We apply the proposed method to 4D cardiac data from atrial-fibrillation patients and demonstrate its efficacy in representing the dynamic change of the left atrium. Furthermore, we show that our method outperforms an image-based approach for spatiotemporal SSM with respect to a generative time-series model, the Linear Dynamical System (LDS). LDS fit using a spatiotemporal shape model optimized via our approach provides better generalization and specificity, indicating it accurately captures the underlying time-dependency.
Statistical Shape Modeling of Biventricular Anatomy with Shared Boundaries|
Subtitled arXiv:2209.02706v1, K. Iyer, A. Morris, B. Zenger, K. Karnath, B.A. Orkild, O. Korshak, S. Elhabian. 2022.
Statistical shape modeling (SSM) is a valuable and powerful tool to generate a detailed representation of complex anatomy that enables quantitative analysis and the comparison of shapes and their variations. SSM applies mathematics, statistics, and computing to parse the shape into a quantitative representation (such as correspondence points or landmarks) that will help answer various questions about the anatomical variations across the population. Complex anatomical structures have many diverse parts with varying interactions or intricate architecture. For example, the heart is a four-chambered anatomy with several shared boundaries between chambers. Coordinated and efficient contraction of the chambers of the heart is necessary to adequately perfuse end organs throughout the body. Subtle shape changes within these shared boundaries of the heart can indicate potential pathological changes that lead to uncoordinated contraction and poor end-organ perfusion. Early detection and robust quantification could provide insight into ideal treatment techniques and intervention timing. However, existing SSM approaches fall short of explicitly modeling the statistics of shared boundaries. In this paper, we present a general and flexible data-driven approach for building statistical shape models of multi-organ anatomies with shared boundaries that captures morphological and alignment changes of individual anatomies and their shared boundary surfaces throughout the population. We demonstrate the effectiveness of the proposed methods using a biventricular heart dataset by developing shape models that consistently parameterize the cardiac biventricular structure and the interventricular septum (shared boundary surface) across the population data.
Discrete-Time Observations of Brownian Motion on Lie Groups and Homogeneous Spaces: Sampling and Metric Estimation|
M.H. Jensen, S. Joshi, S. Sommer. In Algorithms, Vol. 15, No. 8, 2022.
We present schemes for simulating Brownian bridges on complete and connected Lie groups and homogeneous spaces. We use this to construct an estimation scheme for recovering an unknown left- or right-invariant Riemannian metric on the Lie group from samples. We subsequently show how pushing forward the distributions generated by Brownian motions on the group results in distributions on homogeneous spaces that exhibit a non-trivial covariance structure. The pushforward measure gives rise to new non-parametric families of distributions on commonly occurring spaces such as spheres and symmetric positive tensors. We extend the estimation scheme to fit these distributions to homogeneous space-valued data. We demonstrate both the simulation schemes and estimation procedures on Lie groups and homogenous spaces, including SPD(3)=GL+(3)/SO(3) and S2=SO(3)/SO(2).
|Relating Metopic Craniosynostosis Severity to Intracranial Pressure,
J.D. Blum, J. Beiriger, C. Kalmar, R.A. Avery, S. Lang, D.F. Villavisanis, L. Cheung, D.Y. Cho, W. Tao, R. Whitaker, S.P. Bartlett, J.A. Taylor, J.A. Goldstein, J.W. Swanson. In The Journal of Craniofacial Surgery, 2022.
Children with nonsyndromic single-suture metopic synostosis were prospectively enrolled and underwent optical coherence tomography to measure optic nerve head morphology. Preoperative head computed tomography scans were assessed for endocranial bifrontal angle as well as scaled metopic synostosis severity score (MSS) and cranial morphology deviation score determined by CranioRate, an automated severity classifier.
Forty-seven subjects were enrolled between 2014 and 2019, at an average age of 8.5 months at preoperative computed tomography and 11.8 months at index procedure. Fourteen patients (29.7%) had elevated optical coherence tomography parameters suggestive of elevated ICP at the time of surgery. Ten patients (21.3%) had been diagnosed with developmental delay, eight of whom demonstrated elevated ICP. There were no significant associations between measures of metopic severity and ICP. Metopic synostosis severity score and endocranial bifrontal angle were inversely correlated, as expected (r=−0.545, P<0.001). A negative correlation was noted between MSS and formally diagnosed developmental delay (r=−0.387, P=0.008). Likewise, negative correlations between age at procedure and both MSS and cranial morphology deviation was observed (r=−0.573, P<0.001 and r=−0.312, P=0.025, respectively).
Increased metopic severity was not associated with elevated ICP at the time of surgery. Patients who underwent later surgical correction showed milder phenotypic dysmorphology with an increased incidence of developmental delay.
Localization supervision of chest x-ray classifiers using label-specific eye-tracking annotation|
Subtitled arXiv:2207.09771, R. Lanfredi, J.D. Schroeder, T. Tasdizen. 2022.
Convolutional neural networks (CNNs) have been successfully applied to chest x-ray (CXR) images. Moreover, annotated bounding boxes have been shown to improve the interpretability of a CNN in terms of localizing abnormalities. However, only a few relatively small CXR datasets containing bounding boxes are available, and collecting them is very costly. Opportunely, eye-tracking (ET) data can be collected in a non-intrusive way during the clinical workflow of a radiologist. We use ET data recorded from radiologists while dictating CXR reports to train CNNs. We extract snippets from the ET data by associating them with the dictation of keywords and use them to supervise the localization of abnormalities. We show that this method improves a model's interpretability without impacting its image-level classification.
Integrating atomistic simulations and machine learning to design multi-principal element alloys with superior elastic modulus|
M. Grant, M. R. Kunz, K. Iyer, L. I. Held, T. Tasdizen, J. A. Aguiar, P. P. Dholabhai. In Journal of Materials Research, Springer International Publishing, pp. 1--16. 2022.
Multi-principal element, high entropy alloys (HEAs) are an emerging class of materials that have found applications across the board. Owing to the multitude of possible candidate alloys, exploration and compositional design of HEAs for targeted applications is challenging since it necessitates a rational approach to identify compositions exhibiting enriched performance. Here, we report an innovative framework that integrates molecular dynamics and machine learning to explore a large chemical-configurational space for evaluating elastic modulus of equiatomic and non-equiatomic HEAs along primary crystallographic directions. Vital thermodynamic properties and machine learning features have been incorporated to establish fundamental relationships correlating Young’s modulus with Gibbs free energy, valence electron concentration, and atomic size difference. In HEAs, as the number of elements increases …
Characterization of uncertainties and model generalizability for convolutional neural network predictions of uranium ore concentrate morphology|
C. A. Nizinski, C. Ly, C. Vachet, A. Hagen, T. Tasdizen, L. W. McDonald. In Chemometrics and Intelligent Laboratory Systems, Vol. 225, Elsevier, pp. 104556. 2022.
As the capabilities of convolutional neural networks (CNNs) for image classification tasks have advanced, interest in applying deep learning techniques for determining the natural and anthropogenic origins of uranium ore concentrates (UOCs) and other unknown nuclear materials by their surface morphology characteristics has grown. But before CNNs can join the nuclear forensics toolbox along more traditional analytical techniques – such as scanning electron microscopy (SEM), X-ray diffractometry, mass spectrometry, radiation counting, and any number of spectroscopic methods – a deeper understanding of “black box” image classification will be required. This paper explores uncertainty quantification for convolutional neural networks and their ability to generalize to out-of-distribution (OOD) image data sets. For prediction uncertainty, Monte Carlo (MC) dropout and random image crops as variational inference techniques are implemented and characterized. Convolutional neural networks and classifiers using image features from unsupervised vector-quantized variational autoencoders (VQ-VAE) are trained using SEM images of pure, unaged, unmixed uranium ore concentrates considered “unperturbed.” OOD data sets are developed containing perturbations from the training data with respect to the chemical and physical properties of the UOCs or data collection parameters; predictions made on the perturbation sets identify where significant shortcomings exist in the current training data and techniques used to develop models for classifying uranium process history, and provides valuable insights into how datasets and classification models can be improved for better generalizability to out-of-distribution examples.
3D Photography to Quantify the Severity of Metopic Craniosynostosis|
M. K. Bruce, W. Tao, J. Beiriger, C. Christensen, M. J. Pfaff, R. Whitaker, J. A. Goldstein. In The Cleft Palate-Craniofacial Journal, SAGE Publications, 2022.
This single-center retrospective cohort study included patients who were evaluated at our tertiary care center for MCS from 2016 to 2020 and underwent both head CT and 3D photography within a 2-month period.
Main Outcome Measures
The analysis method builds on our previously established ML algorithm for evaluating MCS severity using skull shape from CT scans. In this study, we regress the model to analyze 3D photographs and correlate the severity scores from both imaging modalities.
14 patients met inclusion criteria, 64.3% male (n = 9). The mean age in years at 3D photography and CT imaging was 0.97 and 0.94, respectively. Ten patient images were obtained preoperatively, and 4 patients did not require surgery. The severity prediction of the ML algorithm correlates closely when comparing the 3D photographs to CT bone data (Spearman correlation coefficient [SCC] r = 0.75; Pearson correlation coefficient [PCC] r = 0.82).
The results of this study show that 3D photography is a valid alternative to CT for evaluation of head shape in MCS. Its use will provide an objective, quantifiable means of assessing outcomes in a rigorous manner while decreasing radiation exposure in this patient population.
Deep Learning the Shape of the Brain Connectome|
Subtitled arXiv preprint arXiv:2203.06122, 2022, H. Dai, M. Bauer, P.T. Fletcher, S.C. Joshi. 2022.
To statistically study the variability and differences between normal and abnormal brain connectomes, a mathematical model of the neural connections is required. In this paper, we represent the brain connectome as a Riemannian manifold, which allows us to model neural connections as geodesics. We show for the first time how one can leverage deep neural networks to estimate a Riemannian metric of the brain that can accommodate fiber crossings and is a natural modeling tool to infer the shape of the brain from DWMRI. Our method achieves excellent performance in geodesic-white-matter-pathway alignment and tackles the long-standing issue in previous methods: the inability to recover the crossing fibers with high fidelity.
Google Street View Images as Predictors of Patient Health Outcomes, 2017–2019|
Q. C. Nguyen, T. Belnap, P. Dwivedi, A. Hossein Nazem Deligani, A. Kumar, D. Li, R. Whitaker, J. Keralis, H. Mane, X. Yue, T. T. Nguyen, T. Tasdizen, K. D. Brunisholz. In Big Data and Cognitive Computing, Vol. 6, No. 1, Multidisciplinary Digital Publishing Institute, 2022.
Collecting neighborhood data can both be time- and resource-intensive, especially across broad geographies. In this study, we leveraged 1.4 million publicly available Google Street View (GSV) images from Utah to construct indicators of the neighborhood built environment and evaluate their associations with 2017–2019 health outcomes of approximately one-third of the population living in Utah. The use of electronic medical records allows for the assessment of associations between neighborhood characteristics and individual-level health outcomes while controlling for predisposing factors, which distinguishes this study from previous GSV studies that were ecological in nature. Among 938,085 adult patients, we found that individuals living in communities in the highest tertiles of green streets and non-single-family homes have 10–27% lower diabetes, uncontrolled diabetes, hypertension, and obesity, but higher substance use disorders—controlling for age, White race, Hispanic ethnicity, religion, marital status, health insurance, and area deprivation index. Conversely, the presence of visible utility wires overhead was associated with 5–10% more diabetes, uncontrolled diabetes, hypertension, obesity, and substance use disorders. Our study found that non-single-family and green streets were related to a lower prevalence of chronic conditions, while visible utility wires and single-lane roads were connected with a higher burden of chronic conditions. These contextual characteristics can better help healthcare organizations understand the drivers of their patients’ health by further considering patients’ residential environments, which present both …
Adversarially Robust Classification by Conditional Generative Model Inversion|
Subtitled arXiv preprint arXiv:2201.04733, M. Alirezaei, T. Tasdizen. 2022.
Most adversarial attack defense methods rely on obfuscating gradients. These methods are successful in defending against gradient-based attacks; however, they are easily circumvented by attacks which either do not use the gradient or by attacks which approximate and use the corrected gradient. Defenses that do not obfuscate gradients such as adversarial training exist, but these approaches generally make assumptions about the attack such as its magnitude. We propose a classification model that does not obfuscate gradients and is robust by construction without assuming prior knowledge about the attack. Our method casts classification as an optimization problem where we "invert" a conditional generator trained on unperturbed, natural images to find the class that generates the closest sample to the query image. We hypothesize that a potential source of brittleness against adversarial attacks is the high-to-low-dimensional nature of feed-forward classifiers which allows an adversary to find small perturbations in the input space that lead to large changes in the output space. On the other hand, a generative model is typically a low-to-high-dimensional mapping. While the method is related to Defense-GAN, the use of a conditional generative model and inversion in our model instead of the feed-forward classifier is a critical difference. Unlike Defense-GAN, which was shown to generate obfuscated gradients that are easily circumvented, we show that our method does not obfuscate gradients. We demonstrate that our model is extremely robust against black-box attacks and has improved robustness against white-box attacks compared to naturally trained, feed-forward classifiers.
Translational computer science at the scientific computing and imaging institute|
C. R. Johnson. In Journal of Computational Science, Vol. 52, pp. 101217. 2021.
The Scientific Computing and Imaging (SCI) Institute at the University of Utah evolved from the SCI research group, started in 1994 by Professors Chris Johnson and Rob MacLeod. Over time, research centers funded by the National Institutes of Health, Department of Energy, and State of Utah significantly spurred growth, and SCI became a permanent interdisciplinary research institute in 2000. The SCI Institute is now home to more than 150 faculty, students, and staff. The history of the SCI Institute is underpinned by a culture of multidisciplinary, collaborative research, which led to its emergence as an internationally recognized leader in the development and use of visualization, scientific computing, and image analysis research to solve important problems in a broad range of domains in biomedicine, science, and engineering. A particular hallmark of SCI Institute research is the creation of open source software systems, including the SCIRun scientific problem-solving environment, Seg3D, ImageVis3D, Uintah, ViSUS, Nektar++, VisTrails, FluoRender, and FEBio. At this point, the SCI Institute has made more than 50 software packages broadly available to the scientific community under open-source licensing and supports them through web pages, documentation, and user groups. While the vast majority of academic research software is written and maintained by graduate students, the SCI Institute employs several professional software developers to help create, maintain, and document robust, tested, well-engineered open source software. The story of how and why we worked, and often struggled, to make professional software engineers an integral part of an academic research institute is crucial to the larger story of the SCI Institute’s success in translational computer science (TCS).
Comparing radiologists’ gaze and saliency maps generated by interpretability methods for chest x-rays|
Subtitled arXiv:2112.11716v1, R.B. Lanfredi, A. Arora, T. Drew, J.D. Schroeder, T. Tasdizen. 2021.
The interpretability of medical image analysis models is considered a key research field. We use a dataset of eye-tracking data from five radiologists to compare the outputs of interpretability methods against the heatmaps representing where radiologists looked. We conduct a class-independent analysis of the saliency maps generated by two methods selected from the literature: Grad-CAM and attention maps from an attention-gated model. For the comparison, we use shuffled metrics, which avoid biases from fixation locations. We achieve scores comparable to an interobserver baseline in one shuffled metric, highlighting the potential of saliency maps from Grad-CAM to mimic a radiologist’s attention over an image. We also divide the dataset into subsets to evaluate in which cases similarities are higher.