CORE 5.21 Supported Data Formats Rev.: 2020-Feb-04
Total Page:16
File Type:pdf, Size:1020Kb
CORE 5.21 Supported Data Formats Revised: 2020-Feb-04 Contents 1 Supported Data Formats 3 1.1 Different Supported Formats in Updated Projects 3 1.2 Data Display 4 1.3 Archive Formats 4 1.4 Bloomberg Formats 6 1.5 Database Formats 7 1.6 Email Formats 8 1.7 Multimedia Formats 10 1.8 Presentation Formats 11 1.9 Raster Image Formats 13 1.10 Spreadsheet Formats 15 1.11 Text And Markup Formats 19 1.12 Vector Image Formats 20 1.13 Word Processing Formats 24 1.14 Other Formats 29 2 Terms of Use 31 CORE 5.21 - Supported Data Formats 2 1 Supported Data Formats 1 Supported Data Formats The CORE system supports indexing and retrieval, including conceptual search, for all data formats listed in this section. Note: Support of certain formats depends on the use case and must be assessed and set up by Customer Support. Additional formats to the ones listed here might be supported, but need testing for the specific use case and additional configuration. Note: The MIME types are assigned for mapping purposes within CORE only. They are usually, but not necessarily compatible with the official registry of media types maintained by IANA. 1.1 Different Supported Formats in Updated Pro- jects Projects created with versions prior to CORE 5.16/Axcelerate 5.10/Decisiv 8.0 use Oracle Outside In 8.5.1, which does not cover some recent data formats. To ensure con- sistent hash value computation, required, for example, for duplicate detection, this Oracle Outside In version is preserved for existing and new data sources. Only data sources of projects created with CORE 5.16 and up use Oracle Outside In 8.5.3, and support newer formats, such as Microsoft Word, Excel, PowerPoint 2016, Microsoft Outlook 2011 for Mac (OLM and EML), Corel WordPerfect, Corel Quattro Pro, Corel Presentations, Corel Draw X7, AutoCAD 2015. Decisiv only By default, new and updated projects use the latest Oracle OutsideIn version that supports the new formats. Axcelerate 5 only If you want to use Oracle Outside In 8.5.3 for projects that use 8.5.1 by default, even if this may result in inconsistent ingestion results, you can change the Oracle Outside In version for data sources. CORE 5.21 - Supported Data Formats 3 1 Supported Data Formats How to change the Oracle Outside In version used 1. In CORE Administration, open the data source configuration. 2. Go to Parsers > Stellent parser > General settings. 3. Set OutsideIn version to 8.5.3. The new version is used at data source start. 1.2 Data Display All formats that can be processed can also be displayed in document views of the dif- ferent user interfaces. Look at the tables preceded by this title: Data is processed and can be displayed. 1.3 Archive Formats Data is processed and can be displayed Common Format MIME type Comments extensions 7-zip Compressed 7z application/x-7z- File compressed ARC arc application/arc ARJ Compressed arj application/x-arj Archive File application/arj BZIP2 Com- bzip2, bz2 application/x-bzip2 Note: Only the first pressed Archive part of a split Format archive is loaded. Additional parts generate errors. Debian Software deb application/x- Package debian-binary DMG Apple Disk multipart/dmg Not available by default. Copy Disk Image Needs additional con- File figuration. CORE 5.21 - Supported Data Formats 4 1 Supported Data Formats Common Format MIME type Comments extensions GNU tar Com- tar application/x-tar pressed File Archive Gzip Compressed gzip application/x-gzip Archive ISO-9660 CD Disc iso multipart/iso Image LZH Compressed z, lzh application/x-com- Archive File press Microsoft Cabinet cab application/vnd.ms- File cab-compressed Microsoft Compiled chm application/vnd.ms- Not available by default. Help File chm-file Needs additional con- figuration. Microsoft Windows application/vnd.ms- Imaging File imaging Format RAR Compressed rar application/rar Archive File RPM Package Man- rpm audio/x-pn-realau- Not available by default. ager dio-plugin (sic!) Needs additional con- figuration. UNIX CPIO cpio application/x-cpio Not available by default. Archive Needs additional con- figuration. Uuencode uue application/uue XZ Utils Com- xz application/x-xz pressed Archive Z Compressed z application/z Archive File CORE 5.21 - Supported Data Formats 5 1 Supported Data Formats Common Format MIME type Comments extensions ZIP Compressed zip application/zip Archive File application/x-win- zip Format is detected, but data cannot be processed Common Format MIME type Comments extensions ACE ace application/x- Needs additional configuration. Then Archive ace-com- the format can be detected correctly, pressed but is not processed by default. eXtensible xar application/x- MIME type detection only;filtered by ARchiver xar default configuration (xar) Microsoft msi application/ms- The format can be detected correctly, Windows installer but cannot be processed. There is no Installer File data extracted. SIT sit application/x- The format can be detected correctly, stuffit but cannot be processed. There is no data extracted. ZOO zoo application/x- The format can be detected correctly, zoo but cannot be processed. There is no data extracted. 1.4 Bloomberg Formats Data is processed and can be displayed Common Format MIME type Comments extensions Bloomberg TXT export zip, tar, tgz archive/bloomberg archives Bloomberg email in TXT message/rfc822 (converted into EML and XML export archives file during data load- ing) CORE 5.21 - Supported Data Formats 6 1 Supported Data Formats Common Format MIME type Comments extensions Bloomberg chats in TXT text/chatxml (converted into XML and XML export archives file during data load- ing) 1.5 Database Formats Data is processed and can be displayed Common exten- Format MIME type Comments sions DBase III, IV, V dbf application/x-dbase First Choice DB. through 3.0 fol database/x- firstchoice Microsoft Works DB for DOS wps application/vnd.ms- 2.0 works Microsoft Works DB for Macin- wps application/vnd.ms- tosh 2.0 -4.0 works Paradox 2.0 – 4.0 application/paradox Paradox for Windows 1.0 application/paradox Q&A Database. through 2.0 dtf database/x-qa R:Base R:Base 5000 rb4 database/rbase R:Base R:Base System V rb4 database/rbase Reflex 2.0 rfx database/reflex SmartWare II DB 1.02 db database/x- smartdata CORE 5.21 - Supported Data Formats 7 1 Supported Data Formats Format is detected, but data cannot be processed Common Format MIME type Comments extensions Microsoft Access mda application/vnd.ms- MIME type detection only; 1.0, 2.0, 95-2013 access filtered by default con- figuration. Microsoft Access mda application/vnd.ms- The format can be detected Report Snapshot access correctly, but cannot be pro- 2000-2003 cessed. There is no data extracted. 1.6 Email Formats Data is processed and can be displayed Common Format MIME type Comments extensions Common Store csn application/csn IBM eDiscovery lib- Native raries are needed. EML Text Email eml message/rfc822 Encoded mail mht, mhtml text/mhtml messages MHT Encoded mail tnef application/ms-tnef messages TNEF EML with eml message/rfc822 Digital Sig- nature EML with eml The appropriate cer- S/MIME encryp- tificates for decryp- tion tion are required. IBM Lotus dxl, xml application/x-dxlfile Notes Domino XML Language DXL 8.5 CORE 5.21 - Supported Data Formats 8 1 Supported Data Formats Common Format MIME type Comments extensions IBM Lotus nsf application/vnd.lotus-notes File type detection Notes NSF works without IBM Lotus Notes, parsing needs installed IBM Lotus Notes. MBOX Text mbox application/mbox Email Archive Microsoft eml message/rfc822 Outlook Express EML Microsoft dbx application/dbx Outlook Express DBX Microsoft msg application/msoutlook Outlook MSG application/msoutlookarchive MSG with msg The appropriate cer- Digital Sig- tificates for decryp- nature tion are required. (S/MIME) Microsoft olm application/olm Only OLM export files Outlook for Mac are supported after 2011 enabling the OLM parser, which is dis- abled by default. Microsoft eml message/rfc822 Outlook for Mac 2011 Microsoft ost application/msoutlookarchive Outlook OST Microsoft pst application/msoutlookarchive Only supported if Outlook PST 97 Microsoft Outllook - 2007 2013 or 2016 is used for data loading. CORE 5.21 - Supported Data Formats 9 1 Supported Data Formats Common Format MIME type Comments extensions Microsoft pst application/msoutlookarchive Only supported if Outlook PST Microsoft Outllook 2010 2013 or 2016 is used for data loading. Microsoft pst application/msoutlookarchive Only supported if Outlook PST Microsoft Outlook 2013 2013 or 2016 is used for data loading. Microsoft pst application/msoutlookarchive Only supported if Outlook PST Microsoft Outlook 2016 2016 is used for data (not supported loading. by Decisiv 8.2) Microsoft pst application/msoutlookarchive Outlook PST 2001 Mac Note: Support of the following email formats depends on the use case and must be assessed and set up by Professional Services. l Apple Inc. OS X Tiger Mail EMLX l Eudora Mailbox l Microsoft Entourage items archive RGE (Text, metadata - not folder struc- ture) l Microsoft Outlook Express enhanced support l MSF Index of mail messages stored in a variety of email programs, includ- ing Netscape Mail and Mozilla Thunderbird l SNM Index of messages stored in a Netscape Messenger or Collabra mail- box 1.7 Multimedia Formats Multimedia formats are loaded, but content is not indexed. Depending on the user inter- face, you can access files in their original place, or download them. CORE 5.21 - Supported Data Formats 10 1 Supported Data Formats 1.8 Presentation Formats Data is processed and can be displayed