View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Calhoun, Institutional Archive of the Naval Postgraduate School

Calhoun: The NPS Institutional Archive

Faculty and Researcher Publications Faculty and Researcher Publications

2013

Garfinkel, Simson L.

American Scientist, Volume 101, September - October 2013 http://hdl.handle.net/10945/46063 A reprint from American Scientist the magazine of Sigma Xi, The Scientific Research Society

This reprint is provided for personal and noncommercial use. For any other use, please send a request to Permissions, American Scientist, P.O. Box 13975, Research Triangle Park, NC, 27709, U.S.A., or by electronic mail to [email protected]. ©Sigma Xi, The Scientific Research Society and other rightsholders

NOTE: The views expressed in this article are those of the author and do not reflect the official policy or position of the Naval Postgraduate School, the Department of the Navy, the Department of Defense or the U.S. Government

Feature Article

Digital Forensics

Modern crime often leaves an electronic trail. Finding and preserving that evidence requires careful methods as well as technical skill.

Simson L. Garfinkel

ince the 1980s, computers have counts by using an IBM AS/400 mini- data that were created weeks, months or had increasing roles in all aspects computer from the 1980s. The age of the even years before. Such contemporane- of human life—including an in- computer helped perpetuate his crime, ous records can reveal an individual’s volvement in criminal acts. This because few people on Wall Street have state of mind or intent at the time the Sdevelopment has led to the rise of digital experience with 25-year-old technology, crime was committed. forensics, the uncovering and examina- and it created an added complication But whereas pre-computer evidence, tion of evidence located on all things after Madoff was arrested, because in- such as handwritten letters and photo- electronic with digital storage, including vestigators had few tools with which to graphs, could be reproduced and given computers, cell phones, and networks. make sense of his data. to attorneys, judges, and juries, comput- Digital forensics researchers and practi- Today personal computers are so ubiq- erized evidence requires special han- tioners stand at the forefront of some of uitous that the collection and use of digi- dling and analysis. Electronic data are the most challenging problems in com- tal evidence has become a common part easily changed, damaged, or erased if puter science, including “big data” anal- ysis, natural language processing, data visualizations, and cybersecurity. Information on a computer system can be Compared with traditional forensic science, digital forensics poses signifi- changed without a trace, the scale of data cant challenges. Information on a com- puter system can be changed without to be analyzed is vast, and the variety of a trace, the scale of data that must be analyzed is vast, and the variety of data data types is enormous. types is enormous. Just as a traditional forensic investigator must be prepared to analyze any kind of smear or frag- of many criminal and civil investigations. handled improperly. Simply turning on ment, no matter the source, a digital Suspects in murder cases routinely have a consumer GPS may cause the device investigator must be able to make sense their laptops and cell phones examined to delete critical evidence. Additionally, of any data that might be found on any for corroborating evidence. Corporate computers frequently harbor hidden ev- device anywhere on the planet—a very litigation is also dominated by electronic idence that may be revealed only when difficult proposition. discovery of incriminating material. specialized tools are used—for example, From its inception, digital forensics The second class of digital forensics a digital camera may appear to have has served two different purposes, each cases are those in which the crime was 30 photos, but expert examination may with its own difficulties. First, in many inherently one involving computer sys- show another 300 deleted photos that cases computers contain evidence of tems, such as hacking. In these instanc- can be recovered. (When a device “eras- a crime that took place in the physical es, investigators are often hampered es” a file, it doesn’t clear the memory world. The computer was all but inci- by the technical sophistication of the space, but notes that the space is avail- dental—except that computerization has systems and the massive amount of evi- able; the file may not be really deleted made the evidence harder for investiga- dence to analyze. until a new one is written over it.) tors to analyze than paper records. For Digital forensics is powerful because Because they can look into the past example, financial scam artist Bernard computer systems are windows into and uncover hidden data, digital fo- Madoff kept track of his victims’ ac- the past. Many retain vast quantities rensics tools are increasingly employed of information—either intentionally, in beyond the courtroom. Security pro- the form of log files and archives, or in- fessionals routinely use such tools to Simson L. Garfinkel is an associate professor at the Naval Postgraduate School. He holds six advertently, as a result of software that analyze network intrusions—not to U.S. patents for computer-related research and does not cleanly erase memory and files. convict the attacker but to understand has written 14 books, including Database Na- As a result, investigators can frequently how the perpetrator gained access and tion: The Death of Privacy in the 21st Century recover old email messages, chat logs, to plug the hole. Data recovery firms (O’Reilley, 2000). Internet: http://simson.net Google search terms, and other kinds of rely on similar tools to resurrect files

370 American Scientist, Volume 101 © 2013 Sigma Xi, The Scientific Research Society. Reproduction with permission only. Contact [email protected]. Craig Cunningham

High-tech crime fighting is now needed everywhere, as shown in this lab belonging to the tried to create a consistent but flexible West Virginia State Police Digital Forensics Unit. Police these days typically use powerful approach for performing investigations, computers and software to copy, analyze, and decrypt data from a suspect’s electronic devices. despite policy variations. Several such digital forensic models have been pro- from drives that have been inadver- could have viewed the laptop files with- posed, but most have common elements. tently reformatted or damaged. Forensic out modifying those timestamps, so the Before data can be analyzed, they tools can also detect the unintentional investigators really determined only that are collected from the field (the “scene disclosures of personal information. In the files had not been opened by con- of the crime”), stabilized, and pre- 2009 the Inspector General of the U.S. ventional means. served to create a lasting record. Un- Department of Defense issued a report These examples emphasize that derstanding the inner workings of stating that many hard drives were not the possibilites of digital forensics are how computers store data is key to properly wiped of data before leaving bounded not by technology but by what accurate extraction and retention. Al- government service. is cost-effective for a particular case. though computers are based entirely Digital evidence can even be exam- Convictions are frequently the measure on computations involving the bi- ined to show that something did not of success. In practice there is a consider- nary digits 0 and 1, more commonly happen. Here they are less powerful, able gap between what is theoretically known as bits, modern computers do for the well-known reason that the ab- possible and what is necessary; even most of their work on groups of eight sence of evidence is not the evidence though there may be an intellectual de- bits called bytes. A byte can represent of absence. In May 2006 a laptop and sire to analyze every last byte, there is the sequences 00000000, 00000001, external hard drive containing sensitive rarely a reason to do so. 00000010, through 11111111, which personal information of 26.5 million corresponds to the decimal numbers veterans and military personnel was Following Procedures 0 through 255 (there are two options, 0 stolen from an employee at the U.S. De- Digital forensics relies on a kit of tools and 1, with eight combinations, so 28 = partment of Veterans Affairs. After the and techniques that can be applied 256). One common use for bytes inside laptop was recovered in June 2006, fo- equally to suspects, victims, and by- the computer is to store written text, rensic investigators analyzed the media standers. A cell phone found on a dead where each letter is represented by a and determined that the sensitive files body without identification would al- specific binary code. UTF-8, a com- probably had not been viewed. most certainly be subjected to analysis, mon representation, uses the binary One way to make such a judgment is but so would a phone dropped during sequence 00100001 to represent the let- by examining the access and modifica- a house burglary. How the analysis is ter A, 00100010 for the letter B, and so tion times associated with each file on performed is therefore more a matter of on. (Computers often use hexadecimal the hard drive. But someone taking ad- legal issues than technological ones. As codes in memory as well; see figure on vantage of the same forensic techniques the field has grown, practitioners have page 373.) www.americanscientist.org © 2013 Sigma Xi, The Scientific Research Society. Reproduction 2013 September–October 371 with permission only. Contact [email protected]. When recorded on a hard drive or Major Convictions, and a Few memory card, these bytes are grouped in blocks called sectors that are typically Gaffes, with Digital Data 512 or 4,096 bytes in length. A sector is the smallest block of data that a drive can read or write. Each sector on the disk amous criminal cas- instant message of a has a unique identifying number, called es show the power photograph showing the sector’s logical block address. An email Fof digital forensics, Kozakiewicz to another message might require 10 or 20 sectors but a few also highlight man, who contacted the to store; a movie might require hundreds the need for careful han- FBI and provided the of thousands. A cell phone advertised as dling of data and devices. Yahoo! screen name of having “8 GB” of storage has 8 billion r0O %FDFNCFS   the person who had sent bytes or roughly 15 million sectors. 2000, John Diamond shot the message: “master- Depending on the arrangement of and killed Air Force Cap- forteenslavegirls.” FBI other files on the device, the sectors can tain Marty Theer. The vic- investigators contacted be stored as a single sequential stream tim’s wife, Michelle Theer Yahoo! to obtain the IP or fragmented into many different loca- (right), was implicated in address for the person tions. Other sectors contain informa- the crime, but there was who had used the screen tion that the computer uses to find the no eyewitness evidence. AP Images name, then contacted Ve- stored data; such bookkeeping mate- What prosecutors did have was 88,000 rizon to learn the name and physical rial is called file system metadata (literally emails and instant messages on her address of the subscriber to whom “data about data”). computer, including clear evidence of that IP address had been assigned, To preserve the data on a computer or a sexual relationship between Theer leading them to Tyree. phone, each of these sectors must be in- and Diamond, and messages docu- r0O +VMZ     UIF TUSBOHMFE dividually copied and stored on another menting the conspiracy to murder her body of Nancy Cooper was found. Her computer in a single file called a disk husband. Theer was found guilty on husband, Brad Cooper (below), was image or physical image. This file, which December 3, 2004, of murder and con- charged with the crime. This time the contains every byte from the target de- spiracy and sentenced to life in prison. role of digital data was more compli- vice, naturally includes every visible cated. Local police, who did not seek file. But the physical image also records expert assistance, accidentally erased invisible files, as well as portions of files the victim’s phone while attempting that have been deleted but not yet over- to access it. Brad Cooper maintains he written by the operating system. went to the grocery store on the morn- In cases involving networks instead ing of his wife’s death, and that she of individual machines, the actual data called him during that time, but pros- sent over the network connection are ecutors charged that he had the tech- preserved. Thus is nical expertise and access to the nec- equivalent to a wiretap—and, indeed,

Getty Images News essary equipment to fake such a call. law enforcement is increasingly putting Investigators searching Brad Cooper’s network forensics equipment to this use. computer also found zoomed-in satel- The random access memory (RAM) lite images of the area where his wife’s associated with computer systems is body was discovered, downloaded also subject to forensic investigation. one day before she was reported miss- RAM gets its name because the data r*O BGUFSFMVEJOHQPMJDFGPS ing. Defense attorneys countered that it stores can be accessed in any order. more than 30 years, Dennis Rader searches done on that and surround- This fast access makes RAM particu- (above), a serial killer in Kansas, re- ing days contained inaccurate time- larly useful as temporary storage and emerged, took another victim, and stamps. Brad Cooper was convicted of working space for a computer’s operat- then sent police a floppy disk with a murder, although appeals are ongoing. ing systems and programs. But RAM letter on it. On the disk forensic in- is difficult to work with, because its vestigators found a deleted Micro- contents change very quickly and are soft Word file. That file’s metadata lost when a computer is turned off. contained the name “Dennis” as the RAM must be captured with a dedi- last person to modify the deleted file cated program (a memory imager) and and a link to a Lutheran church where is stored in its own special kind of file, Rader was a deacon. (Ironically, Rader called a memory dump. Although data had sent a floppy disk because he had in RAM can be extracted from all kinds been previously told, by the police MCT via Getty Images of electronic systems—not just desk- themselves, that letters on floppy tops, laptops, and cell phones but also disks could not be traced.) network communications equipment r0O+BOVBSZ  4DPUU5ZSFF such as wireless routers—each of these kidnapped and imprisoned 13-year- systems uses different kinds of internal old Alicia Kozakiewicz. He sent an software structures, so programs de-

372 American Scientist, Volume 101 © 2013 Sigma Xi, The Scientific Research Society. Reproduction with permission only. Contact [email protected]. signed to analyze one may not work on another. And even though forensics researchers have developed approaches for ensuring the forensic integrity of drive copies, currently there is no wide- ly accepted approach for mathemati- cally ensuring a RAM dump. Preserving the data is only the first step in the process. Next, an examin- er has to explore for information that might be relevant to the investigation. Most examinations are performed with tools that can extract user files from the disk image, search for files that contain a specific word or phrase in a variety of human languages, and even detect the presence of encrypted data. Relevant data are then extracted from the preserved system so they are easier to analyze.

Testable Results Just two decades ago, there were no digital forensics tools as we know them today. Instead, practitioners had to repurpose tools that had been de- veloped elsewhere. For example, disk backup software was used for collec- tion and preservation, and data recov- ery tools were used for media analysis. Although these approaches worked, they lacked control, repeatability, and known error rates. The situation began to change in 1993. Colors on a web page are indicated using hexadecimal representation, a base 16 system in That year, the U.S. Supreme Court held which the numerals for 10 to 15 represented by A to F so they are not confused with single- in the case of Daubert v. Merrell Phar- digit numbers). Such computer notation is also commonly of use in digital forensics. The maceutical that any scientific testimony number of digits in a hexadecimal code indicates the power to which the base 16 is raised, presented in court must be based on so for example, 3BF9 means (3 x 163) + (11 x 162) + (15 x 161) + (9 x 160) or 15,353. Hexadecimal a theory that is testable, that has been represents the numbers 0 to 255 as the codes 00 through FF (15 × 16 + 15 = 255). JPEG files gen- scrutinized and found favorable by the erated by digital cameras typically begin with the sequence FF D8 FF E0 and end with FF D9. scientific community, that has a known (Image courtesy of VisiBone; colors in this reproduction are approximated.) or potential error rate, and that is gener- ally accepted. Although the case didn’t a 32-bit hash function can produce comes from the way hash functions are directly kick off the demand for digi- 232 = 4,294,967,296 possible values. typically implemented as a two-step tal forensics tools, it gave practitioners Hash functions are designed so that process that first chops and then mixes grounds for arguing that validated tools changing a single character in the input the data, much the same way one might were needed not just for good science results in a completely different out- make hash in the kitchen.) and procedure but as a matter of law. Since then there has been a steady devel- opment of techniques for what has come Each system uses a different kind of to be called technical exploitation. Probably the single most transfor- internal memory structure, so programs mative technical innovation in the field has been the introduction of hash designed to analyze one may not functions, first as a means for ensuring the integrity of forensic data, and later work on another. as a way to recognize specific files. In computer science a hash function maps a sequence of characters (called a put. Although many different strings Hashing was invented by Hans Pe- string) to a binary number of a specific will have the same hash value—some- ter Luhn and first described in a 1953 size—that is, a fixed number of bits. thing called a hash collision—the more IBM technical memo; it’s been widely A 16-bit hash function can produce bits in the hash, the smaller the chance used for computerized text processing 216 = 65,536 different values, whereas of such an outcome. (The name hash since the 1960s. For example, because www.americanscientist.org © 2013 Sigma Xi, The Scientific Research Society. Reproduction 2013 September–October 373 with permission only. Contact [email protected]. puter security. Merkle’s idea was to use a hash function that produced A more than 100 bits of output and ad- recon- ditionally had the property of being structed text, one-way. That is, it was relatively easy without to compute the hash of a string, but it language was nearly impossible, given a hash, model to find a corresponding string. The es- sence of Merkle’s idea was to use a document’s 100-bit one-way hash as a stand-in for the document itself. In- stead of digitally certifying a 50-page B document, for example, the document best possible could be reduced to a 100-bit hash, reconstructed which could then be certified. Because text, with there are so many different possible language hash values (2100 is about 1030 combi- model nations), Merkle reasoned that an at- tacker could not take the digital signa- ture from one document and use it to certify a second document—because to do so would require that both docu- ments had the same hash value. C Merkle got his degree, and today best possible digital signatures applied to hashes are reconstructed the basis of many cybersecurity sys- text, with tems. They protect credit card numbers language model. sent over the Internet, certify the au- green letters thenticity and integrity of code run on indicate high- probability matches; iPhones, and validate keys used play red letters indicate low- digital music. probability matches. The idea of hashing has been applied to other areas as well—in particular, D forensics. One of the field’s first and continuing uses of hashing was to es- original text, from tablish chain of custody for forensic data. HTML version of Cory Doctorow novel Instead of hashing a document or a file, Little Brother, the hash function is applied to the en- compressed tire disk image. Many law enforcement using Info-Zip organizations will create two disk im- version 3.0 ages of a drive and then compute the hash of each image. If the values match, then the copies are assumed to each be a true copy of the data that were on the drive. Any investigator with a later copy of the data can calculate the hash Fragments of files that have been compressed can still be recovered, even when significant reas- and see if it matches the original report- sembly data are missing. One approach creates a model of the ways that a document might be ed value. Hashing is so important that decompressed based on the underlying mathematics, then chooses between the options, using a many digital forensics tools automati- human language model. (Text courtesy of Ralf Brown, Carnegie Mellon University.) cally perform this comparison. A second use for hashing is to identify every sentence in a document can be termine whether the text really does specific files. This approach takes advan- treated as a string, hashing makes appear twice, or the duplicate is a the tage of the property that it is extraordi- it possible to rapidly see if the same result of a hash collision. Using hashes narily unlikely for two files to have the paragraph ever repeats in a long docu- in this manner is quicker than work- same hash value, so they can label files ment: Just compute the hash value for ing directly with the paragraphs be- in much the same way a person can be each paragraph, put all of the hashes cause it is much faster for computers recognized by their fingerprints. into a list, sort the list, and see if any to compare numbers than sequences of Today forensic practitioners distribute number occurs two or more times. words—even when you account for the databases containing file hashes. These If there is no repeat, then no para- time to perform the hashing. data sets can be used to identify known graph is duplicated. If a number does In 1979, Ralph Merkle, then a Stan- goods, such as programs distributed as repeat, then it’s necessary to look at ford University doctoral student, in- part of operating systems, or known bads, the corresponding paragraphs to de- vented a way to use hashing for com- such as computer viruses, stolen docu-

374 American Scientist, Volume 101 © 2013 Sigma Xi, The Scientific Research Society. Reproduction with permission only. Contact [email protected]. ments, or child pornography. Recently, characteristic sequences of bytes at the with lossy systems that exploit deficien- several groups, including my team at beginning and end of each file. Such se- cies in the human perceptual system. For the Naval Postgraduate School, have ap- quences are called file headers and footers. example, a few dozen pixels of slightly plied cryptographic hashing to blocks of The file carver scans the disk image for different colors might be replaced by a data smaller than files, taking advantage these headers and footers. When ones single rectangle of uniform hue. The re- of the fact that even relatively short 512- are found, the two sequences of bytes sulting savings can be immense. With- byte and 4,096-byte segments of files can and all of the data between them are out compression an hour of full-screen be highly identifying. saved in a new file. video might require 99 gigabytes but File and sector identification with Modern carvers can validate the data with compression the same video might hashing means that a hard drive contain- that they are carving (for example, to take up only 500 megabytes—roughly ing millions of files can be automatically make sure that the bytes between the 1/200th the original size. searched against a database contain- JPEG header and footer can be actually The primary challenge posed by com- ing the hashes of hundreds of millions displayed as a digital photograph) and pression is recovering data when the of files in a relatively short amount of can even reassemble files that are broken compressed file is corrupted or partially time, perhaps just a few missing. Just five years hours. The search can be ago such corruption fre- done without any human quently made it impossi- intervention. ble to recover anything of use, but lately there have Finding Lost Files been dramatic advances Many forensic investiga- in this area. tions start with the exam- In 2009 Husrev Sencar iner looking for files be- of TOBB University of longing to the computer’s Economics and Technol- previous user. ogy in Turkey and Nasir Allocated files are ones Memon of the Polytech- that can be viewed nic Institute of New York through the file system University developed an and whose contents under approach that can show a normal circumstances will fragment of a JPEG digi- not be inadvertently over- tal photograph even if the written by the operating beginning and end of the allocated system. The word Forensics tools allow investigators to directly access memory chips removed file are missing. In 2011 refers to the disk sectors from devices such as mobile phones, satellite navigation devices, car electron- Ralf Brown of Carnegie in which the file’s content ics, and USB flash drives. This technique can be used to recover data from Mellon University devel- is stored, which are dedi- devices that have been physically damaged or are password protected. (Image oped an approach for re- cated to this particular file courtesy of the Netherlands Forensic Institute.) covering data from frag- and cannot be assigned to ments of files compressed others. Many digital forensics tools al- into multiple pieces. Such fragment recov- with the common ZIP or DEFLATE low the examiner to see allocated files ery carving is computationally challeng- algorithms, even when critical informa- present in a disk image without having ing because the number of ways that tion needed for reassembly is missing. to use the computer’s native operating fragments can be realigned; the result is Brown’s approach creates a model of system, which maintains forensic integ- a combinatorial explosion as the size of the many different ways that a docu- rity of the evidence. the media increases. Missing fragments ment might be decompressed based on One of the major technical digital fo- further complicate the problem. the underlying mathematics of com- rensics innovations of the past 15 years Closely related to file carving is the pression, and then chooses between the has been approaches for recovering a problem of reconstructing compressed different possible documents based on file after it is deleted. These files are not data. Compression is a technique that is a second model of the human language simply in a computer’s “trash can” or widely used on computer systems to in which the document is written (see “recycle bin,” but have been removed squeeze data so that it takes up less figure on page 374). by emptying the trash. File names can be space. The technique exploits redun- Recovering files in temporary com- hidden, and the storage associated with dancy; for example, if asked to com- puter memory can also be illuminat- the files is deallocated. But a file’s con- press the character sequence “humble ing for digital evidence. The RAM of tents sometimes can remain on the hard humbleness,” a computer might replace a desktop, laptop, or cell phone is a drive, in memory, or on external media, the six characters of the second instance mosaic of 4,096-byte blocks that vari- even though the metadata that could of “humble” with a pointer to the first ously contain running program code, be used to locate it are lost. Recovering occurrence. English text typically com- remnants of programs that recently ran these kinds of data requires a technique presses to one-sixth its original size. and have closed, portions of the operat- called file carving, invented around 1999 Text must be compressed with lossless ing system, fragments of what was sent by independent security researcher Dan algorithms, programs that faithfully re- and received over the network, pieces Farmer, and now widely used. store the original text when the data are of windows displayed on the screen, The first file carvers took advantage decompressed. However, photographs the copy-and-paste buffer, and other of the fact that many file types contain and videos are typically compressed kinds of information. Memory changes www.americanscientist.org © 2013 Sigma Xi, The Scientific Research Society. Reproduction 2013 September–October 375 with permission only. Contact [email protected]. rapidly—typical memory is whether the imagery systems support several is real. Photographs billion changes per sec- were doctored long be- ond—so it is nearly im- fore the advent of Pho- possible to make a copy toshop. For example, in that is internally consis- the era of Soviet Russia, tent without halting the after he was purged by machine. An added com- Stalin, Bolshevik offi- plication is that the very cial Avel Enukidze was specific manner by which carefully removed from programs store informa- official photographs tion in memory is rarely through a series of skill- documented and changes ful manipulations of between one version of light and negatives in a program and another. a Kremlin darkroom. As a result, each version Computer animation may need to be painstak- now takes such manip- A Faraday cage shields a cell phone from any outside radio signals, minimizing ingly reverse-engineered ulation to a completely possible changes to the phone’s memory that might come from its connecting to by re- the cell phone network; synchronizing data with the Internet; or receiving SMS new level, with synthe- searchers. Thus, memory messages, phone calls, or even a remote erase command. The analyst would sized scenes that are vi- analysis is time consum- normally keep the box closed for protection and work on the phone by using the sually indistinguishable ing, very difficult, and internal camera. (Image courtesy of the Netherlands Forensic Institute.) from recorded reality. necessarily incomplete. Image processing ad- Despite these challenges, recent years Reverse engineering is another impor- vances have made it possible to find have seen the development of tech- tant part of digital forensics because some kinds of artifacts that indicate niques for acquiring and analyzing the software and hardware developers gen- tampering or wholesale synthesis. contents of a running computer system, erally do not provide the public with Light reflections, highlights, and shad- a process called memory parsing. Today details of how their systems work. As ows also can be closely examined to there are open-source and proprietary a result, considerable effort is needed reveal that different objects in a single tools that can report the system time to backtrack through systems code and “photograph” actually were assem- when a memory dump was captured, understand how data are stored. To- bled from images that were in slightly display a list of running processes, and day’s techniques to extract allocated different physical environments. even show the contents of the comput- files from disk images were largely de- In one dramatic demonstration in er’s clipboard and screen. Such tools veloped through this method. 2009, computer scientist Hany Farid are widely used for reverse-engineer- System analysis is the second leg of of showed that a ing malware, such as computer viruses forensic research. It’s similar to reverse single “group picture” had been cre- and worms, as well as understanding engineering, but the fundamental differ- ated by pasting in people from differ- an attacker’s actions in computer in- ence is that the information the analyst ent photos because the reflection of the trusion cases. Memory parsing can be seeks may be unknown to the develop- room lights on each person’s eyes were combined with file carving to recover ers themselves. Although this idea may inconsistent with were they were stand- digital photographs and video. seem strange, remember that comput- ing in the frame. ers are complicated systems: Just as pro- At the leading edge of digital foren- grammers frequently put bugs in their sics research are systems that attempt code without real- to assist an analyst’s reasoning—to find izing it, programs evidence automatically that is out of invariably have the ordinary, strange, or inconsistent. other behaviors Such details can indicate that there is that aren’t bugs a deeper, hidden story. Inconsisten- but are equally cies can also indicate that evidence unforeseen by the has been tampered with or entirely original creators. falsified. Ultimately, such automated Many system us- reasoning systems are likely the only ers and quite a way that today’s analysts will be able few developers as- to keep up with the vast quantities and sumed that it was increasing diversity of data in the com- not possible to restore deleted files until ing years. Progress in this area remains data recovery experts developed tools tricky, however. Some developments that proved otherwise. have been made on systems that can Digital artwork can be sufficiently lifelike that find timestamp inconsistencies (a file analytical techniques are needed to distinguish Image Integrity can’t be deleted before it is created), but real photographs from fake ones. The image Even when photos and video can be such rules are invariably complicated above was synthesized using the model shown recovered from a subject’s computer or by the messiness of the real world (for at right. (Image courtesy of Dan Roarty.) cell phone, another question to consider example, daylight savings time).

376 American Scientist, Volume 101 © 2013 Sigma Xi, The Scientific Research Society. Reproduction with permission only. Contact [email protected]. Forging Ahead Despite its technical sophistication Without developing fundamentally For all its power, digital forensics faces and reliance on the minutiae of digital new tools and capabilities, forensics ex- stark challenges that are likely to grow systems, the single biggest challenge perts will face increasing difficulty and in the coming years. Today’s comput- facing digital forensics practice today cost along with ever-expanding data ers have on average 1,000 times more has a decidedly human dimension: the size and system complexity. Thus to- storage but are only 100 times faster lack of qualified people to serve as re- day’s digital detectives are in an arms than the high-end workstations of the searchers and practitioners. Not mere- race not just with criminals, but also early 1990s, so there is less computing ly the result of the general tech short- with the developers of tomorrow’s power available to process each byte age, the very nature of digital forensics computer systems. of memory. makes staffing significantly harder The number of cases in which digital than in other disciplines. Because the Bibliography evidence is collected is rising far faster field’s mission is to understand any Brown, R. 2011. Reconstructing corrupt DE- than the number of forensic investiga- data that might be stored, we need FLATEd files. Digital Investigation 8:S125–S131. tors available to do the examinations. individuals who have knowledge of Farid, H. 2009. Digital doctoring: Can we trust And police now realize that digital evi- both current and past computer sys- photographs? In Deception: From Ancient Empires to Internet Dating, ed. B. Har- dence can be used to solve crimes—that tems, applications, and data formats. rington. Stanford, CA: is, as part of the investigation process— We need generalists in a technological Press, 95–107. whereas in the past it was mainly a tool society that increasingly rewards ex- Garfinkel, S. 2011. Every last byte. Journal of for assisting in convictions. perts and specialization. Digital Forensics, Security and Law 6(2):7–8. Cell phones may be equipped with One way to address this training Garfinkel, S., A. Nelson, D. White, and V. Rous- “self-destruct” applications that wipe problem is to look for opportunities sev. 2010. Using purpose-built functions and block hashes to enable small block and sub- their data if they receive a particular to break down forensic problems into file forensics. Digital Investigation 7:S13–S23. text, so it is now standard practice to modular pieces so that experts in Granetto, P. J. 2009. Sanitization and Disposal store phones in a shielded metal box, related fields can make meaningful of Excess Information Technology Equipment. called a Faraday cage, which blocks ra- contributions. I believe that another Technical Report D-2009-104 of the Inspector General of the United States Department of Defense. http://www.dodig.mil/Audit/ reports/fy09/09-104.pdf Many system users and developers King, D. 1997. The Commissar Vanishes: The Falsi- fication of Photographs and Art in Stalin’s Rus- assumed that it was not possible to restore sia. New York: Metropolitan Books. Pal, A., H. T. Sencar, and N. Memon. 2008. deleted files until data recovery experts Detecting file fragmentation point using sequential hypothesis testing. Digital Investi- gation 5: S2–S13. developed tools that proved otherwise. Pollitt, M. M. 2007. An ad hoc review of digi- tal forensic models. In Second International Workshop on Systematic Approaches to Digital dio waves. But many cell phones will approach is to show how the under- Forensic Engineering, 43–54. Los Alamitos, “forget” their stored memory if left off lying principles and current tools of CA: IEEE Computer Society. for too long, so the Faraday cages must digital forensics can be widely applied Reith, M., C. Carr, and G. Gunsch. 2002. An ex- be equipped with power strips and throughout our society. This relevancy amination of digital forensic models. Interna- tional Journal of Digital Evidence 1(3):10–22. cell phone chargers. Because many should increase research into tools Sencar, H., and N. Memon. 2009. Identification low-end cell phones have proprietary and hopefully expand the user base of and recovery of JPEG files with missing frag- plugs, police must seize chargers as the software. ments. Digital Investigation 6:S88–S98. well. However, some phones will Many of the tools of digital foren- Walls, R. J., B. N. Levine, M. Liberatore, and wipe their data if they can’t call home, sics can be used for privacy auditing. C. Shields. 2011. Effective digital forensics whereas others will encrypt their data Instead of finding personal informa- research is investigator-centric. Proceedings of the Sixth USENIX Workshop on Hot Topics in with algorithms too powerful for law tion that might be relevant to a case, Security, 1–7. https://www.usenix.org/con- enforcement to decipher. businesses and individuals can use ference/hotsec11/effective-digital-forensics- Further complicating the investi- the tools to look for the inappropriate research-investigator-centric gator’s job is the emergence of cloud presence of personal information left Walters, A., and N. Petroni. Feb. 2007. Volatools: computing and other technologies for behind because of bugs or oversight. Integrating volatile memory forensics into storing data on the Internet. As a result Likewise, individuals can use pro- the digital investigation process. In Black Hat DC 2007 Proceedings, 1–18. http://www. of the cloud, there is no way to ensure grams such as file carvers to recover blackhat.com/presentations/bh-dc-07/Wal- that a seized cell phone actually holds photographs that have been acciden- ters/Paper/bh-dc-07-Walters-WP.pdf the suspect’s data—the phone might tally deleted from digital cameras. simply be a tool for accessing a remote More generally, as our cars, street server. A law enforcement professional signs, communication systems, electri- who is authorized to search a device cal networks, buildings, and even so- For relevant Web links, consult this may not have legal authority to use in- cial interactions are increasingly com- issue of American Scientist Online: formation on that device to access re- puterized, digital forensics is likely to http://www.americanscientist.org/ motely stored data. Worse still, the data be one of the only ways of understand- issues/id.104/past.aspx might be deleted in the meantime by ing these systems when they misbe- one of the suspect’s collaborators. have—or when they are subverted. www.americanscientist.org © 2013 Sigma Xi, The Scientific Research Society. Reproduction 2013 September–October 377 with permission only. Contact [email protected].