EDRM Data Set Project

Total Page:16

File Type:pdf, Size:1020Kb

EDRM Data Set Project The Electronic Discovery Reference Model Data Set Project Establishing guidelines. Setting Standards. Delivering Resources. EDRM Data Set Project http://edrm.net/activities/projects/data-set The EDRM Data Set Project’s mission is to provide industry-standard, reference data sets of electronically stored information (ESI) and software files that can be used to test various aspects of electronic discovery software and services. EDRM Reference ESI Data Sets enable organizations to more easily publish, replicate, and compare test results across various electronic discovery solutions. EDRM currently offers three Reference ESI Data Sets comprising 40GB of data, over 200 file formats, and 23 different languages. In addition to the Reference ESI Data Sets, the EDRM Data Set Project is investigating two software hash projects, the EDRM Reference Software Data Set and the EDRM Probabilistic Hash Data Set to lower review costs by improving the performance of culling of software files from ESI collections. EDRM Reference ESI Data Sets This project collects, evaluates, and publishes ESI data sets for use in testing electronic discovery software and services. There are currently three data sets being offered today and more are under evaluation. The three sets that are currently offered include: EDRM Data Set Enron PST files: 40GB of Enron e-mail messages and attach- ments in PST format organized in 32 zipped files, each less than 700 MB in size, containing 168 .pst files. EDRM File Formats Data Set: 381 files covering 200 file formats. EDRM Internationalization Data Set: A snapshot of selected Ubuntu localiza- tion mailing list archives covering 23 languages in 724 MB of email. Using the Reference ESI Data Sets, organizations can more easily establish the effec- tiveness of their electronic discovery software, services, and processes. EDRM Software Reference Data Set The EDRM Software Reference Data Set project’s mission is to augment the NIST Reference Data Set hashes used in electronic discovery with additional hashes of known software files that can be further culled for review purposes. While the NIST list focuses on a selection of software applications and only as the software exists on installation media (e.g. DVDs, and CDs), this project will provide the hashes for the software after it has been extracted from compressed media containers and installed on a system, as well as for software not currently being handled by NIST, e.g. software that is downloaded from the Internet as opposed to received on DVD and/or CD media. This project will modernize and enhance the list of hashes available for culling software files to reduce electronic discovery costs. This document is licensed under a Creative Commons Attribution 3.0 United States license. To provide attribution, please cite to "EDRM (edrm.net)." If you have questions, contact us at [email protected]. © 2010 Socha Consulting LLC and Gelbmann & Associates. The Electronic Discovery Reference Model Data Set Project Establishing guidelines. Setting Standards. Delivering Resources. EDRM Probabilistic Hash Data Set To further assist in the culling of possibly non-ESI files, the Probabilistic Hash Data Set project seeks to collect as many anonymous hashes as possible of files encountered in real world electronic discovery. The frequency of the appearance of hashes can then be used to determine how likely a particular file would be potentially relevant ESI or a non -ESI software that should be culled after sufficient sampling. This project seeks to sig- nificantly improve the performance of automated culling of non-ESI files for electronic discovery resulting in lower cost expenditures. To learn more about the EDRM Data Set Project and participate in our activities, please go to http://edrm.net/activities/projects/data-set and/or contact George Socha ([email protected]) or Tom Gelbmann ([email protected]). Select File Types Languages Adobe Photoshop Microsoft Win Metafile 1. Arabic Ami Draw Microsoft Word 2. Catalan Corel Draw Microsoft Works 3. Chinese Corel Presentations MultiMate 4. Danish dBASE Mutipage 5. Dutch First Choice DB, SS, WP Multiplan 6. English Freelance OfficeWriter 7. Finnish Harvard Graphics Paintbrush 8. French Gem File Paint Shop Pro 9. German Gem Image Paradox 10. Greek IBM DCA/RFT PDF 11. Hebrew IBM DisplayWrite PerfectWorks for Windows 12. Hungarian IBM Graphics Data Format PFS: Plan 13. Italian IBM Picture Interchange Post Script 14. Japanese IBM Writing Assistant Q&A Database 15. Korean IGES Drawing Q&A Write 16. Norwegian Kodak Photo CD Quattro Pro 17. Polish Lotus 1-2-3 Reflex 18. Portuguese Lotus Manuscript Smart Spreadsheet 19. Romanian Lotus PIC ShartWare II 20. Russian Lotus Screen Snapshot StarOffice Calc 21. Spanish Mac PowerPoint StarOffice Impress 22. Swedish Mac Word StarOffice Writer 23. Tamil Mac WordPerfect SuperCalc 24. Turkish Mac Works Symphony MacPaint Targa MacWrite Total Word Micrografax Designer vCard Microsoft Access Volkswriter Microsoft Excel VP Planner Microsoft PowerPoint Wang IWP Microsoft Project WordPerfect Microsoft PST Word Star Microsoft Visio XyWrite This document is licensed under a Creative Commons Attribution 3.0 United States license. To provide attribution, please cite to "EDRM (edrm.net)." If you have questions, contact us at [email protected]. © 2010 Socha Consulting LLC and Gelbmann & Associates. .
Recommended publications
  • Docuarchive Product Sheet
    DocuArchive Product Sheet Contents 1 What this document contains ............................................................................................... 3 2 DocuArchive Server components ........................................................................................ 3 2.1 DocuArchive ArchiveServer ..................................................................................................... 3 2.1.1 Hardware and software requirements ...................................................................................... 3 2.1.2 Runtime environment ............................................................................................................... 3 2.1.3 Supported storage technologies .............................................................................................. 4 2.1.4 Quantitative characteristics ...................................................................................................... 5 2.2 DocuArchive MediaServer ....................................................................................................... 6 2.2.1 Runtime requirements .............................................................................................................. 6 2.2.2 Hardware and software requirements ...................................................................................... 6 2.3 DocuArchive DBServer ............................................................................................................ 6 2.3.1 Hardware and software requirements .....................................................................................
    [Show full text]
  • Openoffice.Org Теория И Практика
    В серии: Библиотека ALT Linux OpenOffice.org Теория и практика Иван Хахаев Вадим Машков Галина Губкина Инна Смирнова Дмитрий Смирнов Роман Козодаев Елена Смородина Татьяна Турченюк Москва ALT Linux; БИНОМ. Лаборатория знаний 2008 УДК 004.91 ББК 32.97 O60 Авторы: Хахаев И., Машков В., Губкина Г., Смирнова И., Смирнов Д., Козодаев Р., Смородина Е., Турченюк Т. OpenOffice.org: Теория и практика / И. Хахаев, В. Машков, O60 Г. Губкина и др. М. : ALT Linux ; БИНОМ. Лаборатория знаний, 2008. 319 с. : ил. (Библиотека ALT Linux). ISBN 978-5-94774-891-8 Данная книга открывает многие нетривиальные возможности офис- ного пакета OpenOffice.org (версии 2 и выше), которые поясняются на примерах конкретных задач. Рассмотрены автоматическая нумерация и перекрестные ссылки при оформлении курсовой работы, тонкости на- бора математических формул, вычислительные возможности электрон- ных таблиц на примере задач из курсов экономического цикла, создание презентаций и составление собственной галереи элементов для создания схем и многое другое. Для широкого круга пользователей офисных при- ложений. Сайт книги: http://books.altlinux.ru/openoffice. На сайте книги вы найдёте: • Обновлённую электронную версию текста книги с исправлениями. • Файлы примеров, использованных в книге. • Дополнительные материалы, не вошедшие в книгу. УДК 004.91 ББК 32.97 Как приобрести печатный экземпляр книги? Приобрести книгу в интернет-магазине ALT Linux. По вопросам оптовых и мелкооптовых заку- пок обращайтесь на [email protected]. Каждый имеет право воспроизводить, распространять и/или вносить измене- ния в настоящий Документ в соответствии с условиями GNU Free Documentation License, Версией 1.2 или любой более поздней версией, опубликованной Free Software Foundation; Данный Документ содержит следующий текст, помещаемый на первую стра- ницу обложки: ¾В серии “Библиотека ALT Linux”¿.
    [Show full text]
  • Microsoft Exchange 2007 Journaling Guide
    Microsoft Exchange 2007 Journaling Guide Digital Archives Updated on 12/9/2010 Document Information Microsoft Exchange 2007 Journaling Guide Published August, 2008 Iron Mountain Support Information U.S. 1.800.888.2774 [email protected] Copyright © 2008 Iron Mountain Incorporated. All Rights Reserved. Trademarks Iron Mountain and the design of the mountain are registered trademarks of Iron Mountain Incorporated. All other trademarks and registered trademarks are the property of their respective owners. Entities under license agreement: Please consult the Iron Mountain & Affiliates Copyright Notices by Country. Confidentiality CONFIDENTIAL AND PROPRIETARY INFORMATION OF IRON MOUNTAIN. The information set forth herein represents the confidential and proprietary information of Iron Mountain. Such information shall only be used for the express purpose authorized by Iron Mountain and shall not be published, communicated, disclosed or divulged to any person, firm, corporation or legal entity, directly or indirectly, or to any third person without the prior written consent of Iron Mountain. Disclaimer While Iron Mountain has made every effort to ensure the accuracy and completeness of this document, it assumes no responsibility for the consequences to users of any errors that may be contained herein. The information in this document is subject to change without notice and should not be considered a commitment by Iron Mountain. Iron Mountain Incorporated 745 Atlantic Avenue Boston, MA 02111 +1.800.934.0956 www.ironmountain.com/digital
    [Show full text]
  • Unit 6: Computer Software
    Computer Software Unit 6: Computer Software Introduction Collectively computer programs are known as computer software. This unit consisting of four lessons presents different aspects of computer software. Lesson 1 introduces software and its classification, system software which assists the users to develop programs for solving user problems is presented in Lesson 2. Many programs for widely used applications are available commercially. These programs are popularly known as application packages or package programs or simply packages. Advantages of package programs and brief outline of popular packages for word-processing, spreadsheet analysis, database management systems, desktop publication and graphic and applications are discussed in Lesson 3. Tasks for developing computer programs and brief introduction to some common programming languages are presented in Lesson 4. Lesson 1: Introduction and Classification 1.1 Learning Objectives On completion of this lesson you will be able to • understand the concept of software • distinguish between system software and application software • know components of system software and types of application software. 1.2 Software Software of a computer system is intangible rather than physical. It is the term used for any type of program. Software consists of statements, which instruct a computer to perform the required task. Without software a computer is simply a mass of electronic components. For a computer to input, store, make decisions, arithmetically manipulate and Software consists of output data in the correct sequence it must have access to appropriate statements, which instruct programs. Thus, the software includes all the activities associated with a computer to perform the required task. the successful development and operation of the computing system other than the hardware pieces.
    [Show full text]
  • Layout Inference and Table Detection in Spreadsheet Documents
    Layout Inference and Table Detection in Spreadsheet Documents Dissertation submitted April 20, 2020 by M.Sc. Elvis Koci born May 09, 1987 in Sarande, Albania at Technische Universität Dresden and Universitat Politècnica de Catalunya Supervisors: Prof. Dr.-Ing. Wolfgang Lehner Assoc. Prof. Dr. Oscar Romero IT BI D C 2 THESIS DETAILS Thesis Title: Layout Inference and Table Detection in Spreadsheet Documents Ph.D. Student: Elvis Koci Supervisors: Prof. Dr.-Ing. Wolfgang Lehner, Technische Universität Dresden Assoc. Prof. Dr. Oscar Romero, Universitat Politècnica de Catalunya The main body of this thesis consists of the following peer-reviewed publications: 1. Elvis Koci, Maik Thiele, Oscar Romero, and Wolfgang Lehner. A machine learning approach for layout inference in spreadsheets. In IC3K 2016: The 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Man- agement: volume 1: KDIR, pages 77–88. SciTePress, 2016 2. Elvis Koci, Maik Thiele, Oscar Romero, and Wolfgang Lehner. Cell classification for layout recognition in spreadsheets. In Ana Fred, Jan Dietz, David Aveiro, Kecheng Liu, Jorge Bernardino, and Joaquim Filipe, editors, Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K ‘16: Revised Selected Papers), volume 914 of Communications in Computer and Information Science, pages 78–100. Springer, Cham, 2019 3. Elvis Koci, Maik Thiele, Oscar Romero, and Wolfgang Lehner. Table identification and reconstruction in spreadsheets. In the International Conference on Advanced Infor- mation Systems Engineering (CAiSE), pages 527–541. Springer, 2017 4. Elvis Koci, Maik Thiele, Wolfgang Lehner, and Oscar Romero. Table recognition in spreadsheets via a graph representation. In the 13th IAPR International Workshop on Document Analysis Systems (DAS), pages 139–144.
    [Show full text]
  • IDOL Keyview Filter SDK 12.8 C Programming Guide
    IDOL KeyView Software Version 12.8 Filter SDK C Programming Guide Document Release Date: February 2021 Software Release Date: February 2021 Filter SDK C Programming Guide Legal notices Copyright notice © Copyright 2016-2021 Micro Focus or one of its affiliates. The only warranties for products and services of Micro Focus and its affiliates and licensors (“Micro Focus”) are as may be set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Micro Focus shall not be liable for technical or editorial errors or omissions contained herein. The information contained herein is subject to change without notice. Documentation updates The title page of this document contains the following identifying information: l Software Version number, which indicates the software version. l Document Release Date, which changes each time the document is updated. l Software Release Date, which indicates the release date of this version of the software. To check for updated documentation, visit https://www.microfocus.com/support-and-services/documentation/. Support Visit the MySupport portal to access contact information and details about the products, services, and support that Micro Focus offers. This portal also provides customer self-solve capabilities. It gives you a fast and efficient way to access interactive technical support tools needed to manage your business. As a valued support customer, you can benefit by using the MySupport portal to: l Search for knowledge documents of interest l Access product documentation l View software vulnerability alerts l Enter into discussions with other software customers l Download software patches l Manage software licenses, downloads, and support contracts l Submit and track service requests l Contact customer support l View information about all services that Support offers Many areas of the portal require you to sign in.
    [Show full text]
  • Po Box 5487, Berkeley, Ca 94705 (415)
    VOLUME 1, NUMBER 4, OCTOBER 1984 AN INTERNATIONAL NEWSLETTER FOR USERS OF MORROW'S COMPUTERS P.O. BOX 5487, BERKELEY, CA 94705 (415) 654-3798 • If you thought you couldn't afford hard disk performance, think again. • The MDS-E hard disk Micro Decision computer with 128K RAM • Seagate Sl)t" Hard Disk with S.4M bytes formatted (Second hard disk can be added) • 384K floppy disk backup. Superfast CP/M 3.0 operating system (compatible with most CP/M 2.2 software) • NewWord word processor. Correct-it spelling checker • New tilt & swivel monitor. Low profile keyboard. Morrow does it again. At $1999, this special introductory offer shatters the price barrier for hard disk computer systems • Call (800) 521-3493 (in California (408) 980-7462) for a dealer near you. Or write to Morrow, 600 McCormick Street, San Leandro, California 94577. CONTENTS EDITORIAL EXCHANGE Edi torial. ••••••••• 2 Letters to the Editor•• 6 COLUMNS The Can File •••• • Ed Niehaus 10 David's Q & A Colurm •• Dave Block 12 Fran The Mailbox ••• •• Stan Ahal t 14 MORROW USERS GROUPS Lost & Found Department ••••• •• Clarence Heier 18 Cleo .............. •• Lionel Johnston • 18 News About MJrrow Users Groups • •• Clarence Heier 19 THE CURIOUS NOVICE'S EXPERIENCE INSIGHT: Spreadsheet Calculators, Part I •• Art Zerrx:>n • 22 Manuals .. ................ •• Milton Levison 25 How To Tell \\hat MD You Have ••••• •• Brian Leyton 26 About Surge and Spike Protectors •••••••• ••• Jerry Sheperd 27 I Thought It Would Never Happen to Me •••• •• Rick Goul ian 28 Never Too Old to Start with a MOrrow • Herb Kahler • 30 WORDSTAR AND NEWWORD MOre Printing and Editing Concurrently with WordStar • Nick Mills •••• 33 Brightening Your Day with NeWWord •••••••••• Bill Steele 35 Progr~ing Your Function Keys with NeWWord ••••• Bill Steele.
    [Show full text]
  • Forcepoint DLP Supported File Formats and Size Limits
    Forcepoint DLP Supported File Formats and Size Limits Supported File Formats and Size Limits | Forcepoint DLP | v8.4.x, v8.5.x This article provides a list of the file formats that can be analyzed by Forcepoint DLP, as well as the file size limits for network, endpoint, and discovery functions. See: ● Supported File Formats ● File Size Limits © 2018 Forcepoint LLC Supported File Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.4.x, v8.5.x The following tables lists the file formats supported by Forcepoint DLP. File formats are in alphabetical order by format group. ● Archive Formats , page 3 ● Backup Formats , page 5 ● Computer-Aided Design Formats , page 6 ● Cryptography Formats , page 7 ● Database Formats , page 8 ● Desktop Publishing Formats , page 9 ● Executable Formats , page 10 ● Font Formats , page 11 ● Library Formats , page 12 ● Mail Formats , page 13 ● Miscellaneous Formats , page 14 ● Multimedia Formats , page 16 ● Object Formats , page 17 ● Presentation Formats , page 18 ● Project Management Formats , page 19 ● Raster Graphics Formats , page 20 ● Spreadsheet Formats , page 22 ● Text and Markup Formats , page 24 ● Vector Graphics Formats , page 25 ● Word Processing Formats , page 27 Supported file formats are added to and updated frequently. Supported File Formats and Size Limits 2 Archive Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.4.x, v8.5.x File Format Description 7Zip 7Zip format ACE ACE Archive AppleDouble AppleDouble AppleSingle AppleSingle ARC/PAK Archive ARC/PAK Archive
    [Show full text]
  • Excel 2010: Where It Came From
    1 Excel 2010: Where It Came From In This Chapter ● Exploring the history of spreadsheets ● Discussing Excel’s evolution ● Analyzing why Excel is a good tool for developers A Brief History of Spreadsheets Most people tend to take spreadsheet software for granted. In fact, it may be hard to fathom, but there really was a time when electronic spreadsheets weren’t available. Back then, people relied instead on clumsy mainframes or calculators and spent hours doing what now takes minutes. It all started with VisiCalc The world’s first electronic spreadsheet, VisiCalc, was conjured up by Dan Bricklin and Bob Frankston back in 1978, when personal computers were pretty much unheard of in the office environment. VisiCalc was written for the Apple II computer, which was an interesting little machine that is something of a toy by today’s standards. (But in its day, the Apple II kept me mesmerized for days at aCOPYRIGHTED time.) VisiCalc essentially laid theMATERIAL foundation for future spreadsheets, and you can still find its row-and-column-based layout and formula syntax in modern spread- sheet products. VisiCalc caught on quickly, and many forward-looking companies purchased the Apple II for the sole purpose of developing their budgets with VisiCalc. Consequently, VisiCalc is often credited for much of the Apple II’s initial success. In the meantime, another class of personal computers was evolving; these PCs ran the CP/M operating system. A company called Sorcim developed SuperCalc, which was a spreadsheet that also attracted a legion of followers. 11 005_475355-ch01.indd5_475355-ch01.indd 1111 33/31/10/31/10 77:30:30 PMPM 12 Part I: Some Essential Background When the IBM PC arrived on the scene in 1981, legitimizing personal computers, VisiCorp wasted no time porting VisiCalc to this new hardware environment, and Sorcim soon followed with a PC version of SuperCalc.
    [Show full text]
  • Forcepoint DLP Supported File Formats and Size Limits
    Forcepoint DLP Supported File Formats and Size Limits Supported File Formats and Size Limits | Forcepoint DLP | v8.4.x, v8.5.x This article provides a list of the file formats that can be analyzed by Forcepoint DLP, as well as the file size limits for network, endpoint, and discovery functions. See: ● Supported File Formats ● File Size Limits © 2018 Forcepoint LLC Supported File Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.4.x, v8.5.x The following tables lists the file formats supported by Forcepoint DLP. File formats are in alphabetical order by format group. ● Archive Formats, page 3 ● Backup Formats, page 5 ● Computer-Aided Design Formats, page 6 ● Cryptography Formats, page 7 ● Database Formats, page 8 ● Desktop Publishing Formats, page 9 ● Executable Formats, page 10 ● Font Formats, page 11 ● Library Formats, page 12 ● Mail Formats, page 13 ● Miscellaneous Formats, page 14 ● Multimedia Formats, page 16 ● Object Formats, page 17 ● Presentation Formats, page 18 ● Project Management Formats, page 19 ● Raster Graphics Formats, page 20 ● Spreadsheet Formats, page 22 ● Text and Markup Formats, page 24 ● Vector Graphics Formats, page 25 ● Word Processing Formats, page 27 Supported file formats are added to and updated frequently. Supported File Formats and Size Limits 2 Archive Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.4.x, v8.5.x File Format Description 7-Zip 7-Zip format ACE ACE Archive AppleDouble AppleDouble AppleSingle AppleSingle ARC/PAK Archive ARC/PAK Archive ARJ ARJ Archive ARJ
    [Show full text]
  • The Word Processor Wars, 1978 to 1996: Reflections of a Participant Observer
    The Word Processor Wars, 1978 to 1996: Reflections of a Participant Observer John V. Lombardi, The Club, January 5, 2012 Springfield, Massachusetts For a little over a decade, the microcomputer revolution spawned an intense competition among small machines and their corresponding software. Apple, Exidy, Radio Shack, Atari, IBM PC, Compaq, Osborne, and many other brands of hardware competed to capture the wave of interest in small machines. Although today most microcomputers are either Microsoft-PC clones of one kind or another or Apple proprietary machines, the late 1970s to the early 1990s witnessed a proliferation of products and software. What has become the virtually ubiquitous personal computer (and its derivatives in smart phones and tablets), began as a hobbyist enthusiasm. The move to the mainstream of American business, academic, and consumer users required the development of software that would turn these clever little machines into useful tools for ordinary work. The process of developing useful software proved complex, challenging, and fascinating. Computer makers and software designers worked in concert and in competition to find the right combination of hardware power and software features that would produce inescapable utility at consumer price levels. Most of the time we see the face of the software, hence the focus on a "killer app," but the development of killer applications in the early years depended as much on the capability of the hardware as on the imagination and creativity of the software designers and programmers.
    [Show full text]
  • Back Matter (PDF)
    Cancer Epidemiology, Biomarkers & Prevention i Instructions for Authors Scope 1 . Title page, including title, authors, and affiliations; Cancer Epidemiology, Biomarkers & Prevention publishes original 2. A running title of fewer than 50 characters; research on cancer causation and prevention in humans. The following 3. Text, arranged in this order: Abstract (not more than 250 words), topics are of special interest: descriptive, analytical, biochemical and Introduction, Materials and Methods, Results, Discussion, Ac- molecular epidemiology; the use of biomarkers to study the neoplastic knowledgments, References; and preneoplastic processes in humans; chemoprevention and other 4. Footnotes, on a page separate from the text. Designate footnotes types of prevention trials; and the role of behavioral factors in cancer consecutively with superscript Arabic numerals; etiology prevention. and 5. Tables, on pages separate from the text, with descriptive titles Particular attention will be given to the identification of factors and/or legends; associated with various aspects of the carcinogenic process, including genetic susceptibility, host factors, infectious agents, chemical and 6. Figure legends, on pages separate from the text. Define all sym- physical carcinogens, environmental contaminants, dietary components bols and include staining for halftones, where applicable. and behavioral factors such as tobacco use and sun exposure. Besides welcoming manuscripts that address individual subjects in References any of the three disciplines, the Editors encourage the submission of manuscripts with an interdisciplinary approach. Include only those articles that have been published or are in press. Unpublished data or personal communications must be cited as foot- notes to the text. Personal communications should be substantiated by Contents a letter of permission.
    [Show full text]