"Name that manufacturer". Relating image acquisition bias with task co
"Name that manufacturer". Relating image acquisition bias with task complexity when training deep learning models: experiments on head CT
As interest in applying machine learning techniques for medical images continues to grow at a rapid pace, models are starting to be developed and deployed for clinical applications. In the clinical AI model development lifecycle (described by Lu et al. [1]), a crucial phase for machine learning scientists and clinicians is the proper design and collection of the data cohort.The ability to recognize various forms of biases and distribution shifts in the dataset is critical at this step. While it remains difficult to account for all potential sources of bias, techniques can be developed to identify specific types of bias in order to mitigate their impact. In this work we analyze how the distribution of scanner manufacturers in a dataset can contribute to the overall bias of deep learning models. We evaluate convolutional neural networks (CNN) for both classification and segmentation tasks, specifically two state-of-the-art models: ResNet [2] for classification and U-Net [3] for segmentation. We demonstrate that CNNs can learn to distinguish the imaging scanner manufacturer and that this bias can substantially impact model performance for both classification and segmentation tasks. By creating an original synthesis dataset of brain data mimicking the presence of more or less subtle lesions we also show that this bias is related to the difficulty of the task. Recognition of such bias is critical to develop robust, generalizable models that will be crucial for clinical applications in real-world data distributions.
“为制造商命名”。
随着人们对将机器学习技术应用于医学图像的兴趣持续快速增长,已经开始开发和部署用于临床的模型。在临床AI模型开发生命周期中(由Lu等人描述[1]),对于机器学习科学家和临床医生而言,至关重要的阶段是数据队列的正确设计和收集。.. 在此步骤中,识别数据集中各种形式的偏差和分布偏移的能力至关重要。尽管仍然难以解释所有可能的偏见来源,但可以开发出一些技术来识别特定类型的偏见,以减轻其影响。在这项工作中,我们分析了数据集中扫描仪制造商的分布如何导致深度学习模型的整体偏差。我们针对分类和分割任务评估卷积神经网络(CNN),尤其是两个最新模型:用于分类的ResNet [2]和用于分割的U-Net [3]。我们证明了CNN可以学习区分成像扫描仪制造商,并且这种偏见会严重影响分类和分割任务的模型性能。通过创建模拟或多或少有细微病变的大脑数据的原始合成数据集,我们还显示出这种偏见与任务的难度有关。认识到这种偏差对于开发健壮的,可推广的模型至关重要,这对于实际数据分布中的临床应用将至关重要。 (阅读更多)