TITLE Image Visualization and Content Identification Using Mathematica

Carbone 1 TITLE Image Visualization and Content Identification Using Mathematica /Introduction In this article, we investigate image visualization (IV) and image content identification (ICI) using Mathematica. It is not the only framework to provide such capability; popular web services, cloud APIs and programs are readily available. These include various online services including Google’s Vision API and Image Recognition service, Microsoft’s CaptionBot, as well as desktop software including Picassa (an image organizer) and many others. Each capable in its own right, some have capabilities that others do not. Nevertheless, in some ways Mathematica is superior to these programs because it gives the user, through its rich functionality, the ability to explore data and imagery in ways the others do not. Those already using online APIs should have little difficulty adapting to the Wolfram Data Language (WDL). Of course, the issue of uploading imagery from an investigation to the Internet or Cloud is ill advised. Many of today’s programming languages have image processing (IP) capabilities, built-in or from third-party libraries, which have helped to advance the spread of computer vision. Many of these languages and libraries typically require many lines of programming, and some require extensive use and manipulation of data structures. Using WDL, we will show how little code is actually required to perform complex IP. Our Mathematica code, concentrated in the program body, should be understood by anyone with prior programming experience. As with our first Mathematica article, we expand on data processing using file I/O for generating potentially actionable information by transforming images into more meaningful information. Throughout this article, three different datasets are used to provide useful examples and establish performance metrics. Important points for building WDL program are introduced, enabling us to pick up where we left off from our first Mathematica article. WDL programming is straightforward but like any language, the more complex the program the more difficult the programming. WDL helps to reduce many of these hurdles by providing an intuitive manner for building up program functionality. We hope to show substantially more complex programs compared to our first article suitable for digital forensics (DF), in little time. Finally, to save space in our code we have reduced the various snippets by condensing the previously examined elements from Article 1 into familiar nuggets, denoted by “(* TITLE... *)”. If a code heading is followed by an equals sign (=) this signifies that we are adding or modifying that subsection of code; if “…” is used then we are not making any changes to that subsection of code. /About the datasets The first dataset is our primary set, based on a collection of photos taken by the authors using various camera models. They are from personal collections so are free from copyright and attribution notices. Consisting of a wide variety of places, things, time of day (e.g., daytime, blue skies, sunsets), pets, vehicles and stellar objects it makes for a realistic set of images. It is a small collection consisting of 121 images representing 961 MB of data. They include large panoramas, the largest of which is 14,807 x 11,631 pixels in size. From this set, we hope to provide useful information concerning Mathematica’s overall ICI accuracy. The second set is used to more closely examine its ability to identify and differentiate between persons in images, their backgrounds and what they are wearing, issues that tend to confuse ICI software. The images are medium sized and occupy several MB of disk space. To make the analysis more realistic we used images from the web consisting of various “persons”. Anyone working in DF knows all too well the prevalence of finding images of “persons” in a typical investigation. The final set, which we use for establishing meaningful performance metrics, consists of 8,620 images files consuming about 22.0 GB of disk space. This set represents the full set from which the first was established - we visualize and perform ICI on it as well. Finally, performance and processing issues for all datasets will be discussed. /Test system specs Carbone 2 To implement and test the functionality we describe, we need a capable system for experimentation with to ensure that our code is both accurate and functional. For our testing purposed, we use a customized workstation equipped with 2 Xeon 2630v3 processors running at 2.40 GHz, with 128 GiB RAM running atop a SuperMicro X10-DAL-i motherboard. Swap is placed onto a Kingston SV300S3 SSD raw partition and set at 128 GiB, enabled for this article. An NVidia GeForce 720 2 GiB video card supported system graphics. Five internal 1 TB hard drives are used for data storage needs, but no RAID was configured. The system is running a heavily customized installation of Fedora 23 x64-Linux using kernel 4.4.9- 300 SMP. Currently, we are running Mathematica is 10.4.1.0. The concepts shown in this article are applicable to Windows, Linux and Mac OS X. /Concept I – Clearing memory of previously stored results Mathematica is indeed a strange creature. In Article 1 we recommended using ClearSystemCache[] and ClearAll[“Global`*”] to clear both the Wolfram system cache and all values or information associated to existing symbols. It turns out that this is insufficient to “truly clear the cache” of past results or evaluations. To definitively clear out this memory we set $HistoryLength=0 (zero). This will conclusively clear all previous results from memory, returning all unused memory back to the collective pool. We only learned of this memory clearing issue after rerunning our ICI program many times over, until it crashed Mathematica because of insufficient system memory. At that time, Wolfram technical support informed us of this feature, discussed in Wolfram’s Documentation Center. Specifics for correctly implementing a memory clearing routine are shown in Code Snippet 1. From this point forward, we will use this routine in all WDL programs. (* PROGRAM INITIALIZATION & MEMORY MANAGEMENT (Begin) = *) (* Clear everything *) ClearSystemCache[] ClearAll["Global`*"] $HistoryLength=0 Code Snippet 1: Procedure to correctly clear memory of unused symbols, history and cache. Some blogs and non-Wolfram support sites state that Remove[“Global`*”] will clear a notebook’s (NB) memory of all symbols, cache and history; this is incorrect. The only way to do this correctly is using Code Snippet 1. /Concept II – Keeping track of memory usage Key to running any program consuming large amounts of memory is keeping track of it. Mathematica makes this simple. We create a variable, say startmemory, to store the value obtained from MemoryInUse[], which we implement shortly after a program begins. MemoryInUse[] identifies the amount of memory in use by the Wolfram system at the moment it is called. To identify the memory in use by Mathematica’s front-end use MemoryInUse[$FrontEnd]. Equally important is determining how much memory is in use at the end of the program, just before closing files and cleaning up. To determine this we again run MemoryInUse[] and to store this value to some new variable, say finalmemory. Of course, we will want to know the program’s peak memory usage; this value is determined from MaxMemoryUsed[]. We can use Share[] to reduce duplication of memory data structures used by a program’s functions, expressions and variables. Extensive testing reveals memory savings of several hundred KB to several MB with no noticeable overhead. To maximize memory savings use it shortly after a program begins. If the need arises to share specific program elements we can use Share[x] where x is some variable, expression or function. To implement these memory management capabilities, we need look no further Code Snippet 2. Carbone 3 (* PROGRAM INITIALIZATION & MEMORY MANAGEMENT (Begin) = *) (* Clear everything, share memory and start memory tracking *) … Share[]; startmemory = MemoryInUse[]; (* PROGRAM TIME KEEPING (Begin)... *) (* PROGRAM BODY... *) (* PROGRAM TIME KEEPING (End)... *) (* MEMORY MANAGEMENT (End) = *) (* Print out program memory usage information *) maxmem = MaxMemoryUsed[]; finalmemory = MemoryInUse[]; StringForm["Memory in use when program started: ``\n", startmemory] StringForm["Max. memory consumed by program: ``\n", maxmem - startmemory] StringForm["Memory in use when program ended: ``\n", finalmemory] (* CLOSE UP PROGRAM & STREAMS... *) Code Snippet 2: Memory management code. /Concept III (Optional) – System report writing and running external commands As with any investigation, report writing is for to detailing the tools and systems used in an analysis. Mathematica provides “self-documenting” functionality, quickly helping us to gather considerable information about a Wolfram deployment. Fortunately, the names of these functions are straightforward and self- explanatory. For the curious, Mathematica’s documentation has in-depth details about these functions. Ideally, information obtained from these functions should be stored to an external file. While highly specialised output-generating code can be implemented in a NB, it is much simpler to export this information directly to PDF or to a text file. We will use the latter approach, which is shown in Code Snippet 3, where we save the output to file systeminfo.txt about the current deployment. It may be necessary when generating a Mathematica report to obtain specific operating system (OS)

Load more