2016 IEEE International Conference on Big Data (Big Data)

Comparison of Lossless and for Medical Computed Tomography Datasets Vy Bui1,3, Lin-Ching Chang1, Dunling Li2, Li-Yueh Hsu3, Marcus Y. Chen3

1Department of Electrical Engineering and Computer Science, The Catholic University of America, Washington, DC, USA 2BTS Software Solutions, Columbia, Maryland, USA 3National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, Maryland, USA [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Modern multidimensional medical imaging efficiency of for volumetric CT datasets technology produces very large amount of data especially from using five state-of-the-art video codecs, namely H264/AVC, the computed tomography modality. These volumetric dataset H265/HEVC, , MSU, MLC, and compare them with opens new demands for big data storage and high-speed still-image codecs JPEG, JPEG2000, JPEG-LS as specified in communication systems which may be alleviated by image the DICOM standard [2]. compression techniques. Current image compression schemes adopted in the DICOM standard do not exploit the inter-slice II. METHODS correlation within three-dimensional (3D) dataset. Video compression may have a potential to improve the compression A. Data Description ratio by reducing the redundancy in volumetric medical images. Medical CT image datasets were obtained from 20 patients In this paper, we compare the performance of five lossless video underwent routine clinical examinations. All subjects gave codecs (H264, H265, Lagarith, MSU, MLC) and three still-image written consent form approved by the institutional review codecs (JPEG, JPEG2000, JPEG-LS) using 3D medical board of National Heart, Lung, and Blood Institute. Volumetric computed tomography datasets. Performance evaluation shows images were reconstructed at 2.0 mm slice thickness, 1.0 mm that video codecs improve the compression ratio compared to slice overlay, 0.5 mm pixel size, and 512×512 image matrix JPEG and JPEG2000 while being competitive to JPEG-LS. size. The number of frames in each dataset ranges from 591 to Keywords— lossless compression; compression ratio; video 751 slices covering chest, abdomen, and pelvis of the patients. codecs; medical imaging; computed tomography. All studies were anonymized and converted to 8- DICOM images from the original 16-bit DICOM 3D images, and then I. INTRODUCTION to 8-bit mono channel video sequence stored in AVI format. Advances in medical imaging technology have facilitated a B. Optimized 16-to-8 bit significant increase of the volume and size of image data. In order to optimize the 16-to-8 bit data conversion and Large volumes of medical datasets as generated by different preserve the original signal and contrast levels, the stored pixel imaging modalities such as computed tomography (CT), values were calibrated to Hounsfield Unit (HU) using modality magnetic resonance imaging, positron emission tomography look-up-table (LUT) transformation. In the DICOM header, and single photon emission tomography, have led to a growing LUT transformation’s attributes are specified in tags demand for big and efficient storage systems. Report from (0028,1052) and (0028,1053) representing intercept and slope EMC Corporation and the International Data Corporation for calculating the calibrated HU using the following equation: anticipates healthcare data to grow at 48% per year, reaching 2.3 zettabytes by year 2020 [1]. Therefore, medical image = × + (1) compression plays an important role in the ability to not only reduce data storage cost but also improve transmission time for Next, window width and window center/level settings as large image studies such as high-resolution three-dimensional specified in tags (0028,1050) and (0028,1051) are adjusted to (3D) CT datasets. improve image contrast for visualization of different Current compression standards for digital imaging and anatomical structures such as lung, heart, vessel, bone, or other communications in medicine (DICOM) images use still-image soft tissues. These two parameters specify a linear conversion codecs which do not exploit inter-frame redundancies as from stored pixel values to a more restricted dynamic range for commonly used in video compression . As the optimal image display. In this study, a soft-tissue setting of corresponding pixel values between adjacent slices are window width of 380 and window center/ level of 40 are used typically very similar in volumetric medical images, video to optimize the display of anatomical structures in chest, compression techniques that exploit such inter-frame abdomen, and pelvis regions. Under this optimized redundancies may improve the compression ratio than those window/level display range, 16-bit pixel values are rescaled still-image codecs. The objective of this work is to explore the into 8-bit dynamic range (0- 255) for image compression.

978-1-4673-9005-7/16/$31.00 ©2016 IEEE 3960 C. Evaluation Metrics improvement of 30.05% in CR was obtained by the JPEG-LS Three evaluation metrics were computed to compare the . When comparing the video and still-image compression performance different codecs for their compression ratio, methods, both JPEG2000 and JPEG have a lower CR than all video codecs tested. However, JPEG-LS delivers a CR encoding cost, and decoding cost. The compression ratio (CR) comparable to MSU and MLC, but slightly better than H264, is defined as the ratio of the total number of bytes in the H265, and Lagarith. original image versus the compressed data. The encoding and decoding costs are defined as the total time required to Figure 2 shows the total computational overhead for both or decompress the input data divided by the data encoding and decoding time. In general, all video codecs and size. This cost represents the encoding or decoding time (in still-image compression methods have a lower decoding cost millisecond) required for each megabyte of data. compared to the encoding cost except for JPEG-LS (i.e., its decoding cost is higher than the encoding cost). The JPEG-LS D. Software Implementation has the lowest encoding cost while the JPEG and JPEG2000 We used IDL/Merge DICOM toolkit (Exelis Visual have the lowest decoding cost. The encoding speed of video Solutions, Boulder, Colorado) to import the pixel codecs is faster than JPEG2000 except for MSU (i.e., the data from DICOM images and convert them into 8-bit AVI encoding cost of MSU is slightly higher than other video video files. An open-source video processing utility Virtual codecs and JPEG2000). Surprisingly, the popular JPEG Dub [3] was used to process the AVI files for Lagarith, MSU compression has a very high encoding cost than other and MLC compression. For other codecs, we used an open- compression methods. Noted that MSU or MLC have a relative higher decoding cost than all still-image compressions. source multimedia application FFmpeg [4] for H264, H265 and JPEG-LS compression, and the IDL/Merge DICOM TABLE I. SUMMARY STATISTICS FOR COMPRESSION RATIO toolkit for JPEG2000 and JPEG compression. A batch COMPARISON. processing script was implemented in C++ under Microsoft Visual Studio 2015 to process all video files in RGB color Codec mean ± standard deviation space. The processing time was computed with an Intel JPEG (16-bit) 2.45±0.01 CoreTM i5-3337U 1.80 GHz processor. Since most of the JPEG2000 (16-bit) 3.48±0.02 codecs support different parameter configurations, we used the JPEG-LS (16-bit) 3.66±0.02 parameter settings that maximized the compression ratio for JPEG (8-bit) 2.69±0.09 each codecs unless otherwise specified. JPEG2000 (8-bit) 3.81±0.29 JPEG-LS (8-bit) 4.76±0.69 III. RESULTS H264 4.43±0.57 Table 1 summarizes the descriptive statistics of the H265 4.33±0.51 compression ratio for all codecs. Figure 1 shows the results of Lagarith 4.26±0.50 compression ratios using the box-and-whisker plot. For still- MSU 4.73±0.67 image compression codecs, the 16-to-8 bit conversion MLC 4.75±0.69 improves the CR by 9.80% and 9.48% for JPEG and JPEG2000 compression. Furthermore, a significantly large

Fig. 1. Comparison of compression ratio

3961

Fig. 2. Compariosn of encoding and decoding costs (millisecond/megabyte).

IV. DISCUSSION ACKNOWLEDGMENT We evaluated five lossless video compression methods on Part of this work was supported by the U.S. National volumetric medical computer tomography image datasets and Science Foundation under Grants CNS-1624485. showed their performance is consistently better than JPEG and JPEG2000 still-image compression schemes. In contrast, the REFERENCES compression ratio of JPEG-LS is similar to the video codecs [1] EMC Corporation, and International Data Corporation 2014. “The results. We also compared the CR between original 16-bit and Digital Universe: Driving Data Growth in Healthcare.” optimized 8-bit data. The results show the CR of the optimized [2] DICOM part 5. http://dicom.nema.org 8-bit data is significantly better than the original 16-bit images [3] VirtualDub. http://www.virtualdub.org for all still-image codecs compared. [4] FFmpeg. http://trac.ffmpeg.org Comparisons on lossless compression for 3D medical [5] Clunie DA. "Lossless compression of grayscale medical images: effectiveness of traditional and state-of-the-art approaches." Proc. SPIE images have been studied previously [5-10]. Various lossless 3980, Medical Imaging 2000, pp.74-84. still-image compression methods have been compared using [6] Philips W, Van Assche S, De Rycke D, Denecker K. "State-of-the-art different image modalities. Some studies applied video techniques for lossless compression of 3D medical image sets." compression techniques on medical images [8-10], but limited Computerized Medical Imaging and Graphics, 2001, 25, pp.173-185. their evaluation to H264 and H265 only. The compression ratio [7] Kivijärvi J, Ojala T, Kaukoranta T, Kuba A, Nyúl L, Nevalainen O. "A for CT images has been reported around 1.60 to 4.01 in [5] and comparison of lossless compression methods for medical images." 1.74 to 3.77 in [7]. The CR of our 8-bit medical CT images Computerized Medical Imaging and Graphics, 1998, 22, pp.323-339. using video codecs is around 4.26 to 4.75 and significantly [8] Sanchez V, Abugharbieh R, Nasiopoulos P. "3D scalable lossless better than these previous studies [5,7]. However, the CR of compression of medical images based on global and local symmetries." Proc. IEEE ICIP 2009, pp.2525-2528. our 16-bit data using still-image codec is around 2.45 to 3.66 [9] Sanchez V, Bartrina-Rapesta J. "Lossless compression of medical and within a similar range to these studies. images based on HEVC intra coding." Proc. IEEE ICASSP 2014, pp.6622-6626. V. CONCLUSIONS [10] Parikh S, Ruiz D, Kalva H, Fern G. "Content dependent intra mode Our results in this study are encouraging as it shows the selection for medical image compression using HEVC." Proc. IEEE potential of using video codecs to perform lossless ICCE 2016. pp.561-564. compression for large volumetric medical images such as computed tomography datasets. It may promote the development of video-based remote diagnosis and telemedicine applications based on video codecs to improve the compression and transmission of these large multidimensional image datasets.

3962