KEYWORDS: Scanners, Data modeling, Education and training, Machine learning, Principal component analysis, Diseases and disorders, Neuroimaging, Data acquisition, Image acquisition, Head
PurposeDistributed learning is widely used to comply with data-sharing regulations and access diverse datasets for training machine learning (ML) models. The traveling model (TM) is a distributed learning approach that sequentially trains with data from one center at a time, which is especially advantageous when dealing with limited local datasets. However, a critical concern emerges when centers utilize different scanners for data acquisition, which could potentially lead models to exploit these differences as shortcuts. Although data harmonization can mitigate this issue, current methods typically rely on large or paired datasets, which can be impractical to obtain in distributed setups.ApproachWe introduced HarmonyTM, a data harmonization method tailored for the TM. HarmonyTM effectively mitigates bias in the model’s feature representation while retaining crucial disease-related information, all without requiring extensive datasets. Specifically, we employed adversarial training to “unlearn” bias from the features used in the model for classifying Parkinson’s disease (PD). We evaluated HarmonyTM using multi-center three-dimensional (3D) neuroimaging datasets from 83 centers using 23 different scanners.ResultsOur results show that HarmonyTM improved PD classification accuracy from 72% to 76% and reduced (unwanted) scanner classification accuracy from 53% to 30% in the TM setup.ConclusionHarmonyTM is a method tailored for harmonizing 3D neuroimaging data within the TM approach, aiming to minimize shortcut learning in distributed setups. This prevents the disease classifier from leveraging scanner-specific details to classify patients with or without PD—a key aspect for deploying ML models for clinical applications.
The difference between chronological age and predicted biological brain age, the so-called “brain age gap”, is a promising biomarker for assessment of overall brain health. It has also been suggested as a biomarker for early detection of neurological and cardiovascular conditions. The aim of this work is to identify group-level variability in the brain age gap between healthy subjects and patients with neurological and cardiovascular diseases. Therefore, a deep convolutional neural network was trained on UK Biobank T1-weighted-MRI datasets of healthy subjects (n=6860) to predict brain age. After training, the model was used to determine the brain age gap for healthy hold-out test subjects (n=344), and subjects with neurological (n=2327) or cardiovascular (n=6467) diseases. Next, saliency maps were analyzed to identify brain regions used by the model to render decisions. Linear bias correction was implemented to correct for the bias of age predictions made by the model. The trained model after bias correction achieved an average brain age gap of 0.05 years for the healthy test cohort while the neurological disease test cohort had an average brain age gap of 0.7 years, and the cardiovascular disease test cohort had an average brain age gap of 0.25 years. The average saliency maps appear similar for the three test group, suggesting that the model mostly uses brain areas associated with general brain aging patterns. This works results indicate potential in the brain age gap for differentiation of neurologic and cardiac patients from healthy aging patterns supporting its use as a novel biomarker.
KEYWORDS: 3D modeling, 3D image processing, Brain, Data modeling, Neuroimaging, Medical imaging, Deep learning, Image processing, Artificial intelligence
Deep learning techniques for medical image analysis have reached comparable performance to medical experts, but the lack of reliable explainability leads to limited adoption in clinical routine. Explainable AI has emerged to address this issue, with causal generative techniques standing out by incorporating a causal perspective into deep learning models. However, their use cases have been limited to 2D images and tabulated data. To overcome this, we propose a novel method to expand a causal generative framework to handle volumetric 3D images, which was validated through analyzing the effect of brain aging using 40196 MRI datasets from the UK Biobank study. Our proposed technique paves the way for future 3D causal generative models in medical image analysis.
Medical imaging datasets, such as magnetic resonance, are increasingly being used to investigate the genetic architecture of the brain. These images are commonly used as imaging-specific or –derived phenotypes when conducting genotype-phenotype association studies. When using this type of phenotype, multivariate genome-wide association study (GWAS) designs are considered better suited than univariate methods due to the ability to account for the inherent correlations between the phenotypes related to brain structures as determined from medical images. The main objective of this work is to establish and evaluate a comprehensive pipeline for investigating genotype-phenotype associations of the human brain using canonical component analysis. The proposed pipeline was tested to investigate genotype-phenotype associations between cortical brain region volumes in subjects with attention-deficit hyperactivity disorder as a proof-of-principle. Canonical component analysis, a form of multivariate GWAS and machine learning, was utilized to determine genotype-phenotype associations between cortical brain region volumes in subjects with attention-deficit hyperactivity disorder. Using the developed pipeline, several significant (p-value < 5E−04) single nucleotide polymorphisms were found that reside in or near several genes like DSCAM or DPYSL2 that are known to be associated with neurological and mental disorders or substance addiction, a common comorbidity for subjects with attention-deficit hyperactivity disorder. These clinically meaningful results show that the proposed pipeline using canonical component analysis can be used to investigate the genetic architecture of the brain.
Parkinson’s disease (PD) is the second most common neurodegenerative disease affecting 2-3% of the population over 65 years of age. Considerable research has investigated the benefit of using neuroimaging to improve PD diagnosis. However, it is challenging for medical experts to manually identify the subtle differences associated with PD in such complex data. It has been shown that machine learning models can achieve human-like accuracies for many computer-aided diagnosis applications. However, model performance usually depends on the amount and diversity of training data available, whereas most Parkinson’s disease classification models were trained on rather small datasets. Training data size and diversity can be increased by curating multi-site datasets. However, this may also increase biological and non-biological variances due to differences in participant cohorts, scanners, and data acquisition protocols. Thus, data harmonization is important to reduce those variances and enable the models to focus primarily on the patterns associated with PD. This work compares intensity harmonization techniques on 1796 MRI scans from twelve studies. Our results show that a histogram matching approach does not improve classification accuracy (78%) compared to the model trained on unharmonized data (baseline). However, it reduces the disparity between sensitivity and specificity from 81% and 73% to 77% and 79%, respectively. Moreover, combining histogram matching and least squares mean tissue intensity harmonization methods outperform the baseline model (accuracy of 74% compared to 67%) for an independent test set. Finally, our analysis considering sex (male, female) and groups (PD, healthy) shows that models trained on harmonized data exhibited reduced performance disparities between groups, which may be interpreted as a form of bias mitigation.
Purpose: Explainability and fairness are two key factors for the effective and ethical clinical implementation of deep learning-based machine learning models in healthcare settings. However, there has been limited work on investigating how unfair performance manifests in explainable artificial intelligence (XAI) methods, and how XAI can be used to investigate potential reasons for unfairness. Thus, the aim of this work was to analyze the effects of previously established sociodemographic-related confounders on classifier performance and explainability methods.Approach: A convolutional neural network (CNN) was trained to predict biological sex from T1-weighted brain MRI datasets of 4547 9- to 10-year-old adolescents from the Adolescent Brain Cognitive Development study. Performance disparities of the trained CNN between White and Black subjects were analyzed and saliency maps were generated for each subgroup at the intersection of sex and race.Results: The classification model demonstrated a significant difference in the percentage of correctly classified White male (90.3 % ± 1.7 % ) and Black male (81.1 % ± 4.5 % ) children. Conversely, slightly higher performance was found for Black female (89.3 % ± 4.8 % ) compared with White female (86.5 % ± 2.0 % ) children. Saliency maps showed subgroup-specific differences, corresponding to brain regions previously associated with pubertal development. In line with this finding, average pubertal development scores of subjects used in this study were significantly different between Black and White females (p < 0.001) and males (p < 0.001).Conclusions: We demonstrate that a CNN with significantly different sex classification performance between Black and White adolescents can identify different important brain regions when comparing subgroup saliency maps. Importance scores vary substantially between subgroups within brain structures associated with pubertal development, a race-associated confounder for predicting sex. We illustrate that unfair models can produce different XAI results between subgroups and that these results may explain potential reasons for biased performance.
KEYWORDS: Data modeling, Brain, Neuroimaging, Performance modeling, Machine learning, Data centers, Magnetic resonance imaging, Solid modeling, Medical research, Feature extraction
Limited access to medical datasets, due to regulations that protect patient data, is a major hinderance to the development of machine learning models for computer-aided diagnosis tools using medical images. Distributed learning is an alternative to training machine learning models on centrally collected data that solves data sharing issues. The main idea of distributed learning is to train models remotely at each medical center rather than collecting the data in a central database, thereby avoiding sharing data between centers and model developers. In this work, we propose a travelling model that performs distributed learning for biological brain age prediction using morphological measurements of different brain structures. We specifically investigate the impact of nonidentically distributed data between collaborators on the performance of the travelling model. Our results, based on a large dataset of 2058 magnetic resonance imaging scans, demonstrate that transferring the model weights between the centers more frequently achieves results (mean age prediction error = 5.89 years) comparable to central learning implementations (mean age prediction error = 5.93 years), which were trained using the data from all sites hosted together at a central location. Moreover, we show that our model does not suffer from catastrophic forgetting, and that data distribution is less important than the number of times that the model travels between collaborators.
Attention deficit/hyperactivity disorder (ADHD) is characterized by symptoms of inattention, hyperactivity, and impulsivity, which affects an estimated 10.2% of children and adolescents in the United States. However, correct diagnosis of the condition can be challenging, with failure rates up to 20%. Machine learning models making use of magnetic resonance imaging (MRI) have the potential to serve as a clinical decision support system to aid in the diagnosis of ADHD in youth to improve diagnostic validity. The purpose of this study was to develop and evaluate an explainable deep learning model for automatic ADHD classification. 254 T1-weighted brain MRI datsets of youth aged 9-11 were obtained from the Adolescent Brain Cognitive Development (ABCD) Study, and the Child Behaviour Checklist DSM-Oriented ADHD Scale was used to partition subjects into ADHD and non-ADHD groups. A fully convolutional neural network (CNN) adapted from a state-of-the-art adult brain age regression model was trained to distinguish between the neurologically normal children and children with ADHD. Saliency voxel attribution maps were generated to identify brain regions relevant for the classification task. The proposed model achieved an accuracy of 71.1%, sensitivity of 68.4%, and specificity of 73.7%. Saliency maps highlighted the orbitofrontal cortex, entorhinal cortex, and amygdala as important regions for the classification, which is consistent with previous literature linking these regions to significant structural differences in youth with ADHD. To the best of our knowledge, this is the first study applying artiicial intelligence explainability methods such as saliency maps to the classification of ADHD using a deep learning model. The proposed deep learning classification model has the potential to aid clinical diagnosis of ADHD while providing interpretable results.
Deep learning in medical imaging typically requires sensitive and confidential patient data for model training. Recent research in computer vision has shown that it is possible to recover training data from trained models using model inversion techniques. In this paper, we investigate the degree to which encoder-decoder like architectures (U-Nets, etc) commonly used in medical imaging are vulnerable to simple model inversion attacks. Utilising a database consisting of 20 MRI datasets from acute ischemic stroke patients, we trained an autoencoder model for image reconstruction and a U-Net model for lesion segmentation. In the second step, model inversion decoders were developed and trained to reconstruct the original MRIs from the low dimensional representation of the trained autoencoder and the U-Net model. The inversion decoders were trained using 24 independent MRI datasets of acute stroke patients not used for training of the original models. Skull-stripped as well as the full original datasets including the skull and other non-brain tissues were used for model training and evaluation. The results show that the trained inversion decoder can be used to reconstruct training datasets after skull stripping given the latent space of the autoencoder trained for image reconstruction (mean correlation coefficient= 0.49), while it was not possible to fully reconstruct the original image used for training of a segmentation task UNet (mean correlation coefficient=0.18). These results are further supported by the structural similarity index measure (SSIM) scores, which show a mean SSIM score of 0.51± 0.14 for the autoencoder trained for image reconstruction, while the average SSIM score for the U-Net trained for the lesion segmentation task was 0.28±0.12. The same experiments were then conducted on the same images but without skull stripping. In this case, the U-Net trained for segmentation shows significantly worse results, while the autoencoder trained for image reconstruction is not affected. Our results suggest that an autoencoder model trained for image compression can be inverted with high accuracy while this is much harder to achieve for a U-Net trained for lesion segmentation.
KEYWORDS: Motion models, Acquisition tracking and pointing, 3D modeling, Data modeling, Liver, Virtual reality, Lung, Image registration, Spirometry, Data acquisition
Virtual reality (VR) training simulators of liver needle insertion in the hepatic area of breathing virtual patients often need 4D image data acquisitions as a prerequisite. Here, first a population-based breathing virtual patient 4D atlas is built and second the requirement of a dose-relevant or expensive acquisition of a 4D CT or MRI data set for a new patient can be mitigated by warping the mean atlas motion. The breakthrough contribution of this work is the construction and reuse of population-based, learned 4D motion models.
KEYWORDS: Breast, Mammography, Nipple, Chest, 3D modeling, Image compression, Data modeling, Breast cancer, Computer aided diagnosis and therapy, Tissues
Mammography is a standard tool for breast cancer diagnosis. In current clinical practice, typically two mammograms of each breast are taken from different angles. A fundamental step when using ipsilateral mammograms for the diagnosis of breast cancer, is the identification of corresponding locations/structures in both views, which is a very challenging task due to the projective nature of the images and the different compression parameters used for each view. In this contribution, four different approaches for the estimation of corresponding locations in ipsilateral mammograms are systematically compared using 46 mammogram pairs (50 point-to-point correspondences). The evaluation includes simple heuristic methods (annular bands and straight strips) as well as methods based on geometric and physically motivated breast compression models, which aim to simulate the mammogram acquisition process. The evaluation results show that on average no significant differences exist between the estimation accuracies obtained using the simple heuristic methods and the more involved compression models. However, the results of this study indicate the potential of a method that optimally combines the different approaches.
Automatic segmentation of ischemic stroke lesions in magnetic resonance (MR) images is important in clinical practice and for neuroscientific trials. The key problem is to detect largely inhomogeneous regions of varying sizes, shapes and locations. We present a stroke lesion segmentation method based on local features extracted from multi-spectral MR data that are selected to model a human observer’s discrimination criteria. A support vector machine classifier is trained on expert-segmented examples and then used to classify formerly unseen images. Leave-one-out cross validation on eight datasets with lesions of varying appearances is performed, showing our method to compare favourably with other published approaches in terms of accuracy and robustness. Furthermore, we compare a number of feature selectors and closely examine each feature’s and MR sequence’s contribution.
Respiratory motion and its variability lead to location uncertainties in radiation therapy (RT) of thoracic and abdominal tumors. Current approaches for motion compensation in RT are usually driven by respiratory surrogate signals, e.g., spirometry. In this contribution, we present an approach for statistical analysis, modeling and subsequent simulation of surrogate signals on a cycle-by-cycle basis. The simulated signals represent typical patient-specific variations of, e.g., breathing amplitude and cycle period. For the underlying statistical analysis, all breathing cycles of an observed signal are consistently parameterized using approximating B-spline curves. Statistics on breathing cycles are then performed by using the parameters of the B-spline approximations. Assuming that these parameters follow a multivariate Gaussian distribution, realistic time-continuous surrogate signals of arbitrary length can be generated and used to simulate the internal motion of tumors and organs based on a patient-specific diffeomorphic correspondence model. As an example, we show how this approach can be employed in RT treatment planning to calculate tumor appearance probabilities and to statistically assess the impact of respiratory motion and its variability on planned dose distributions.
KEYWORDS: Motion estimation, Principal component analysis, Radiotherapy, Tumors, Lung, Motion models, Motion measurement, 4D CT imaging, Chest, Simulation of CCA and DLA aggregates
Respiratory motion is a major source of error in radiation treatment of thoracic and abdominal tumors. State-of-the-art motion-adaptive radiation therapy techniques are usually guided by external breathing signals acting
as surrogates for the internal motion of organs and tumors. Assuming a relationship between the surrogate
measurements and the internal motion patterns, which are usually described by non-linear transformations,
correspondence models can be defined and used for surrogate-based motion estimation. In this contribution,
a diffeomorphic motion estimation framework based on standard multivariate linear regression is extended by
subspace-based approaches like principal component analysis, partial least squares, and canonical correlation
analysis. These methods aim at exploiting the hidden structure of the training data to improve the use of
the information provided by high-dimensional surrogate and internal motion representations. A quantitative
evaluation carried out on 4D CT data sets of 10 lung tumor patients shows that subspace-based approaches
are able to significantly improve the mean estimation accuracy when compared to standard multivariate linear
regression.
Automatic segmentation of the separate human lung lobes is a crucial task in computer aided diagnostics and
intervention planning, and required for example for determination of disease spreading or pulmonary parenchyma
quantification.
In this work, a novel approach for lobe segmentation based on multi-region level sets is presented. In a first step,
interlobular fissures are detected using a supervised enhancement filter. The fissures are then used to compute
a cost image, which is incorporated in the level set approach. By this, the segmentation is drawn to the fissures
at places where structure information is present in the image. In areas with incomplete fissures (e.g. due to
insufficient image quality or anatomical conditions) the smoothing term of the level sets applies and a closed
continuation of the fissures is provided.
The approach is tested on nine pulmonary CT scans. It is shown that incorporating the additional force term
improves the segmentation significantly. On average, 83% of the left fissure is traced correctly; the right oblique
and horizontal fissures are properly segmented to 76% and 48%, respectively.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.