
image_processing Documentation Release 1.6 Mel Mason Jan 16, 2020 Contents: 1 Introduction 1 1.1 Use cases.................................................1 1.2 Installation................................................1 1.3 Quick start................................................2 2 Digital Preservation 3 2.1 Embedded metadata...........................................3 3 JPEG2000 Profile 5 3.1 Kakadu compression parameters.....................................5 3.2 References................................................6 4 JP2 colour profiles for digital preservation7 4.1 Preservation guidelines..........................................7 4.2 JP2 conversion details..........................................7 4.3 Recommendations............................................8 4.4 Testing..................................................9 4.5 Further Reading.............................................9 5 API Reference 11 5.1 DerivativeFilesGenerator......................................... 12 5.2 Validation................................................. 14 5.3 Conversion................................................ 16 5.4 Exceptions................................................ 16 5.5 Kakadu.................................................. 16 6 Indices and tables 19 Python Module Index 21 Index 23 i ii CHAPTER 1 Introduction Image-processing is a Python library that converts a source image (TIFF or JPEG) to a JP2 file with a focus on digital preservation and making sure the conversion is reversible. At the Bodleian we use it to generate the derivative image files we ingest into Digital Bodleian for both delivery and long-term preservation. 1.1 Use cases • An all-in-one workflow to go from source file to derivatives including all validation checks. The defaults are tailored to Digital Bodleian preferences, but this is customisable. • Individual functions to be called separately from a workflow manager like Goobi. • Easy TIFF to JP2 conversion from Python: basic Python wrapper around Kakadu, along with some tested parameter recipes. 1.2 Installation pip install git+https://github.com/bodleian/image-processing.git • Compatible with both Python 2.7 and 3.5+ 1.2.1 Dependencies • Exiftool – yum install perl-Image-ExifTool 1 image_processing Documentation, Release 1.6 – apt install exiftool • Kakadu – If you want to process compressed TIFFs, compile it with libtiff support. In the makefile apps/ make/Makefile-<OS>, add -DKDU_INCLUDE_TIFF to CFLAGS and -ltiff to LIBS • Pillow prerequisites before pip install – May need some image packages installed before pip installation (may not need lcms2 depending on which TIFF formats you’ll be processing) – yum install lcms2 lcms2-devel libtiff libtiff-devel libjpeg libjpeg-devel – The virtual environment’s python binary needs to match the Python.h used by GCC. If necessary, use export C_INCLUDE_PATH=/usr/local/include/python2.7/ • Jpylyzer prerequisites before pip install – Needs a relatively recent pip version to install - it fails on 1.4. 1.3 Quick start To run a full conversion on a TIFF file, with validation, format checks, XMP extraction and creation of a thumbnail JPEG: From the command line: convert_tiff_to_jp2 input.tif In Python: from image_processing.derivative_files_generator import DerivativeFilesGenerator derivatives_gen= DerivativeFilesGenerator(kakadu_base_path="/opt/kakadu") derivatives_gen.generate_derivatives_from_tiff("input.tif","output/folder") To access the validation and conversion functions separately so they can be integrated into a workflow system like Goobi: from image_processing.derivative_files_generator import DerivativeFilesGenerator from image_processing import kakadu, validation derivatives_gen= DerivativeFilesGenerator(kakadu_base_path="/opt/kakadu", kakadu_compress_options=kakadu.DEFAULT_ ,!LOSSLESS_COMPRESS_OPTIONS) # each of these statements can be run separately, with different instances of ,!DerivativeFilesGenerator validation.check_image_suitable_for_jp2_conversion("input.tif") derivatives_gen.generate_jp2_from_tiff("input.tif","output.jp2") derivatives_gen.validate_jp2_conversion("input.tif","output.jp2", check_ ,!lossless=True) To just use Kakadu directly through the wrapper: from image_processing import kakadu kdu= kakadu.Kakadu(kakadu_base_path="/opt/kakadu") kdu.kdu_compress("input.tif","output.jp2", kakadu_options=kakadu.DEFAULT_LOSSLESS_ ,!COMPRESS_OPTIONS) 2 Chapter 1. Introduction CHAPTER 2 Digital Preservation This package has a strong emphasis on digital preservation, as we want to use lossless JP2s as our preservation master files. It was developed with input from our digital preservation team. By default it checks: • the JP2 is valid (using jpylyzer) • the JP2 can be converted back into a TIFF, which – has the same pixels as the source TIFF (or the TIFF we converted from the source JPEG) – has the same colour profile and mode as the source image It does not check: • the technical metadata is correctly copied over to the JP2 (we extract this to a separate file) • the JP2 displays as expected in viewers • JPEG to TIFF conversion, if the source file was a JPEG (beyond checking the colour profiles match). This is a lossy conversion, so the pixels will not be identical We have run tests on a wide sample of source images from our repository. We cannot share this test repository on GitHub due to copyright issues, but if you want to run your own tests these automatic lossless checks should simplify that. The full lossless checks can be disabled in production, but we would recommend keeping them enabled if digital preservation is a concern. Note: our testing has been focused on the source images we ingest, not all possible formats. The check_image_suitable_for_jp2_conversion() function is run when generating derivatives, and should fail for image formats we have not tested. See JP2 colour profiles for digital preservation for some more background information and recommendations. 2.1 Embedded metadata We extract image metadata from the source file to a separate XML file for digital preservation, using Exiftool. Exiftool is a command line tool for reading, writing and editing embedded metadata with very thorough support for image 3 image_processing Documentation, Release 1.6 embedded metadata formats. This separate XML file is stored along with the JP2 in the archive. The metadata in the file is stored in the image metadata format XMP, mapped from whatever formats (EXIF, IPTC etc.) were used in the TIFF file. Exiftool also offers a proprietary XML format which preserves the original format of the metadata fields, but we chose XMP as a widely recognised format for sidecar files, rather than going with a proprietary format that may change in future. 2.1.1 Copying over metadata While the extracted metadata is what we rely on for preservation, we also want to have as much of the original metadata as possible in the JP2 image. Maintaining embedded metadata while converting between image file formats is difficult. All of the image conversion software we’ve tried had problems with embedded metadata. • ImageMagick sometimes produces badly formed metadata in the converted file • Pillow by default doesn’t copy over any metadata. Embedded metadata related functionality is limited • Kakadu only copies over some metadata Because of this, we don’t rely on the image conversion library we use (Pillow) to copy over metadata. Instead, we use Exiftool to copy over metadata after the image is converted, both when converting from JPEG to TIFF and when converting from TIFF to JP2. When copying over to JP2 we map all embedded metadata formats to XMP, as JP2 doesn’t have an official standard for storing EXIF. 4 Chapter 2. Digital Preservation CHAPTER 3 JPEG2000 Profile 3.1 Kakadu compression parameters Digital Bodleian uses a specific set of kdu_compress options (DEFAULT_LOSSLESS_COMPRESS_OPTIONS) for lossless JP2 conversion, and alters the rate and Creversible parameters for lossy JP2 conversion. The terminal commands are as follows: 3.1.1 Lossless kdu_compress -i input.tif -o output.jp2 Clevels=6 Clayers=6 “Cprecincts={256,256},{256,256},{128,128}” “Stiles={512,512}” Corder=RPCL ORGgen_plt=yes ORGtparts=R “Cblk={64,64}” Cuse_sop=yes Cuse_eph=yes - flush_period 1024 Creversible=yes -rate - 3.1.2 Lossy kdu_compress -i input.tif -o output.jp2 Clevels=6 Clayers=6 “Cprecincts={256,256},{256,256},{128,128}” “Stiles={512,512}” Corder=RPCL ORGgen_plt=yes ORGtparts=R “Cblk={64,64}” Cuse_sop=yes Cuse_eph=yes - flush_period 1024 -rate 3 3.1.3 Parameter explanation • Clevels=6 Resolution levels. At least 3 are recommended to help compression. After that, the aim is to have the lowest resolution sub-image be roughly thumbnail-sized, so the optimal value is dependent on image size.1 • Clayers=6 Quality layers. More layers can help with quicker decompression; you can decode only a subset of the layers when dealing with lower resolution images where quality decrease is not noticed.1 1 JPEG 2000 as a Preservation and Access Format for the Wellcome Trust Digital Library 5 image_processing Documentation, Release 1.6 • -rate - Compression rates for each quality layer. An initial value of - is needed to ensure true losslessness, as otherwise some data may be discarded. Subsequent numbers can be added to specify bit-rates for each of the lower quality layers - if they are all left unspecified, as they are here, “an internal heuristic determines a lower bound and logarithmically spaces the layer rates over the range”2 • Creversible=yes Use reversible wavelet and component transforms (required for losslessness)1
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages28 Page
-
File Size-