Medical images play a central role in patient diagnosis, therapy, surgical planning, medical reference, and medical training. With the advent of digital imaging modalities, as well as images digitized from conventional devices, collections of medical images are increasingly being held in digital form. It becomes increasingly expensive to manually annotate medical images. Consequently, automatic medical image annotation becomes very important [2].
This paper describes the medical annotation task using the ImageCLEF 2007 dataset [1]. The objective of this task is to provide the Image Retrieval in Medical Applications (IRMA) code [3] for each image of a given set of previously unseen medical (radiological) images. 12,076 classified training images are provided to be used in any way to train a classifier. The results of the classification step can be used for multilingual image annotations as well as for DICOM standard header corrections. According to the IRMA code [3], a total of 197 classes are defined. The IRMA coding system consists of four axes with three to four positions, each in {0,…,9, a,…,z}, where “0” denotes “unspecified” to determine the end of a path along an axis. This allows a short and unambiguous notation (IRMA: TTTT-DDD-AAA-BBB), where T, D, A, and B denotes a coding or sub-coding digit of the respective axis. Figure 1 gives two examples of unambiguous image classification using the IRMA code. The image on the left is coded: 1123 (x-ray, projection radiography, analog, high energy) – 211 (sagittal, left lateral descubitus, inspiration) – 520 (chest, lung) – 3a0 (respiratory system, lung). The image on the right is coded: 1220 (x-ray, fluoroscopy, analog) – 127 (coronad, ap, supine) – 722 (abdomen, upper abdomen, middle) – 430 (gastrointestinal system, stomach).
|