Supported Data Sources CORE 5.17
Total Page:16
File Type:pdf, Size:1020Kb
CORE 5.17 Supported Data Sources Published: 2017-Mar-06 Contents 1 Supported Data Formats 3 1.1 Different Supported Formats in Updated Projects 3 1.2 Data Display 4 1.3 Display in the BRAVA Third Party Plug-in 4 1.4 Archive Formats 5 1.5 Bloomberg Formats 7 1.6 Database Formats 8 1.7 Email Formats 9 1.8 Multimedia Formats 11 1.9 Presentation Formats 12 1.10 Raster Image Formats 14 1.11 Spreadsheet Formats 16 1.12 Text And Markup Formats 20 1.13 Vector Image Formats 21 1.14 Word Processing Formats 25 1.15 Other Formats 30 2 Contact Us 32 3 Terms of Use 33 2 | © Recommind, Inc. 2017. 1 Supported Data Formats 1 Supported Data Formats Recommind CORE supports indexing and retrieval, including conceptual search, for all data formats listed in this section. Note: Support of certain formats depends on the use case and must be assessed and set up by Recommind Professional Services. Please consult with Recommind Support. Additional formats to the ones listed here might be supported, but need testing for the specific use case and additional configuration. Note: The MIME types are assigned for mapping purposes within Recommind CORE only. They are usually, but not necessarily compatible with the official registry of media types maintained by IANA. 1.1 Different Supported Formats in Updated Pro- jects Projects created with versions prior to CORE 5.16/Axcelerate 5.10 use Oracle Outside In 8.5.1, which does not cover some recent data formats. To ensure consistent hash value computation, required, e.g., for duplicate detection, this Oracle Outside In version is preserved for existing and new data sources. Only data sources of projects created with CORE 5.16 and up use Oracle Outside In 8.5.3, and support newer formats, such as Microsoft Word, Excel, PowerPoint 2016, Microsoft Outlook 2011 for Mac (OLM and EML), Corel WordPerfect, Corel Quattro Pro, Corel Presentations, Corel Draw X7, AutoCAD 2015. If you want to use Oracle Outside In 8.5.3 for projects that use 8.5.1 by default, even if this may result in inconsistent ingestion results, you can change the Oracle Outside In version for data sources. How to change the Oracle Outside In version used 1. In CORE Administration, open the data source configuration. 2. Go to Parsers > Stellent parser > General settings. 3. Set OutsideIn version to 8.5.3. 3 | © Recommind, Inc. 2017. 1 Supported Data Formats The new version is used at data source start. 1.2 Data Display All formats that can be processed can also be displayed in document views of the dif- ferent user interfaces. Look at the tables preceded by this title: Data is processed and can be displayed. 1.3 Display in the BRAVA Third Party Plug-in The plug-in is part of Axcelerate 5. These native file formats can be transformed into dis- play format by BRAVA 7.5 : l files with IANA text\plain MIME type (mostly with TXT extension) l files with IANA application\acad MIME type (2D CAD files, mostly with dwg extension) l and files with these extensions: l PDF l BMP l GIF l JPEG l PNG l TIFF l WMF The other formats are converted internally and then displayed by the BRAVA third party plug-in. Microsoft Word, Excel and PowerPoint files are converted using their native applications. Chat XML documents are converted internally by the CORE system. All other formats are converted using Oracle OutsideIn. 4 | © Recommind, Inc. 2017. 1 Supported Data Formats 1.4 Archive Formats Data is processed and can be displayed Common Format MIME type Comments extensions 7-zip Compressed 7z application/x-7z- File compressed ARC arc application/arc ARJ Compressed arj application/x-arj Archive File application/arj BZIP2 Com- bzip2, bz2 application/x-bzip2 Note: Only the first pressed Archive part of a split Format archive is loaded. Additional parts generate errors. Debian Software deb application/x- Package debian-binary DMG Apple Disk multipart/dmg Not available by default. Copy Disk Image Needs additional con- File figuration. GNU tar Com- tar application/x-tar pressed File Archive Gzip Compressed gzip application/x-gzip Archive ISO-9660 CD Disc iso multipart/iso Image LZH Compressed z, lzh application/x-com- Archive File press Microsoft Cabinet cab application/vnd.ms- File cab-compressed 5 | © Recommind, Inc. 2017. 1 Supported Data Formats Common Format MIME type Comments extensions Microsoft Compiled chm application/vnd.ms- Not available by default. Help File chm-file Needs additional con- figuration. Microsoft Windows application/vnd.ms- Imaging File imaging Format RAR Compressed rar application/rar Archive File RPM Package Man- rpm audio/x-pn-realau- Not available by default. ager dio-plugin (sic!) Needs additional con- figuration. UNIX CPIO cpio application/x-cpio Not available by default. Archive Needs additional con- figuration. Uuencode uue application/uue XZ Utils Com- xz application/x-xz pressed Archive Z Compressed z application/z Archive File ZIP Compressed zip application/zip Archive File application/x-win- zip Format is detected, but data cannot be processed Common Format MIME type Comments extensions ACE ace application/x- Needs additional configuration. Then Archive ace-com- the format can be detected correctly, pressed but is not processed by default. 6 | © Recommind, Inc. 2017. 1 Supported Data Formats Common Format MIME type Comments extensions eXtensible xar application/x- MIME type detection only;filtered by ARchiver xar default configuration (xar) Microsoft msi application/ms- The format can be detected correctly, Windows installer but cannot be processed. There is no Installer File data extracted. SIT sit application/x- The format can be detected correctly, stuffit but cannot be processed. There is no data extracted. ZOO zoo application/x- The format can be detected correctly, zoo but cannot be processed. There is no data extracted. 1.5 Bloomberg Formats Data is processed and can be displayed Common Format MIME type Comments extensions Bloomberg TXT export zip, tar, tgz archive/bloomberg archives Bloomberg email in TXT message/rfc822 (converted into EML and XML export archives file during data load- ing) Bloomberg chats in TXT text/chatxml (converted into XML and XML export archives file during data load- ing) 7 | © Recommind, Inc. 2017. 1 Supported Data Formats 1.6 Database Formats Data is processed and can be displayed Common exten- Format MIME type Comments sions DBase III, IV, V dbf application/x-dbase First Choice DB. through 3.0 fol database/x- firstchoice Microsoft Works DB for DOS wps application/vnd.ms- 2.0 works Microsoft Works DB for Macin- wps application/vnd.ms- tosh 2.0 -4.0 works Paradox 2.0 – 4.0 application/paradox Paradox for Windows 1.0 application/paradox Q&A Database. through 2.0 dtf database/x-qa R:Base R:Base 5000 rb4 database/rbase R:Base R:Base System V rb4 database/rbase Reflex 2.0 rfx database/reflex SmartWare II DB 1.02 db database/x- smartdata Format is detected, but data cannot be processed Common Format MIME type Comments extensions Microsoft Access mda application/vnd.ms- MIME type detection only; 1.0, 2.0, 95-2013 access filtered by default con- figuration. 8 | © Recommind, Inc. 2017. 1 Supported Data Formats Common Format MIME type Comments extensions Microsoft Access mda application/vnd.ms- The format can be detected Report Snapshot access correctly, but cannot be pro- 2000-2003 cessed. There is no data extracted. 1.7 Email Formats Data is processed and can be displayed Common Format MIME type Comments extensions Common Store csn application/csn IBM eDiscovery lib- Native raries are needed. EML Text Email eml message/rfc822 Encoded mail mht, mhtml text/mhtml messages MHT Encoded mail tnef application/ms-tnef messages TNEF EML with eml message/rfc822 Digital Sig- nature EML with eml The appropriate cer- S/MIME encryp- tificates for decryp- tion tion are required. IBM Lotus dxl, xml application/x-dxlfile Notes Domino XML Language DXL 8.5 9 | © Recommind, Inc. 2017. 1 Supported Data Formats Common Format MIME type Comments extensions IBM Lotus nsf application/vnd.lotus-notes File type detection Notes NSF works without IBM Lotus Notes, parsing needs installed IBM Lotus Notes. MBOX Text mbox application/mbox Email Archive Microsoft eml message/rfc822 Outlook Express EML Microsoft dbx application/dbx Outlook Express DBX Microsoft msg application/msoutlook Outlook MSG application/msoutlookarchive MSG with msg The appropriate cer- Digital Sig- tificates for decryp- nature tion are required. (S/MIME) Microsoft olm application/olm Only OLM export files Outlook for Mac are supported after 2011 enabling the OLM parser, which is dis- abled by default. Microsoft eml message/rfc822 Outlook for Mac 2011 Microsoft ost application/msoutlookarchive Outlook OST Microsoft pst application/msoutlookarchive Only supported if Outlook PST 97 Microsoft Outllook - 2007 2007, 2010 or 2013 is used for data loading. 10 | © Recommind, Inc. 2017. 1 Supported Data Formats Common Format MIME type Comments extensions Microsoft pst application/msoutlookarchive Only supported if Outlook PST Microsoft Outllook 2010 2010 or 2013 is used for data loading. Microsoft pst application/msoutlookarchive Only supported if Outlook PST Microsoft Outlook 2013 2013 is used for data loading. Microsoft pst application/msoutlookarchive Outlook PST 2001 Mac Note: Support of the following email formats depends on the use case and must be assessed and set up by Recommind Professional Services. l Apple Inc. OS X Tiger Mail EMLX l Eudora Mailbox l Microsoft Entourage items archive RGE (Text, metadata - not folder struc- ture) l Microsoft Outlook Express enhanced support l MSF Index of mail messages stored in a variety of email programs, includ- ing Netscape Mail and Mozilla Thunderbird l SNM Index of messages stored in a Netscape Messenger or Collabra mail- box 1.8 Multimedia Formats By default, all multimedia formats are filtered and not loaded at all.