ABBYY FineReader Engine 11 The Most Comprehensive SDK for Recognition and Document Conversion

What is FineReader Engine 11?

FineReader® Engine 11 for Windows is the newest Development Kit (SDK) to PRODUCT OVERVIEW integrate ABBYY’s multilingual recognition and conversion technologies into external applications. The toolkit facilitates tight integration of ABBYY's core OCR (machine-print), ICR (handprint), OMR (check mark) barcode recognition and PDF technologies; FineReader • High quality recognition Engine 11 is the definitive solution for creating highly accurate, scalable, efficient technologies for OCR, ICR, recognition and conversion systems. This is information transformation at its best. OMR, 1D and 2D Barcodes

• Language support for up to 202 Extreme Flexibility, Precise Results and Cost-Effectiveness OCR and 136 ICR languages • New recognition technology for Modular Platform Scalable Enough for any Application Arabic, improved Chinese, FineReader Engine combines a full range of Engine 11 can be used to build applications Japanese and Korean, also in functions with the highest quality recognition, of any scale and complexity – from a client combination with European effective processing speed, and convenient workstation, to a server-based solution or languages development tools in a single SDK. a large multi-million page project. Built-in multi-core support and flexible network • Adaptive Document Recognition New: Classification licencing ensure flexible deployment and Technology (ADRT) processes all scalability. Available as 32-bit and 64-bit pages of a document as a Based on a combination of image and version. logical unit to ensure unified content-based classifiers, the technologies export results support a wide range of document types. This information enables work-flow automa - Easy to Deploy • Many export formats supported tion and reduces costs associated with FineReader Engine offers easy access to from pure text, XML, HTML, RTF, manual pre-processing. core technologies and its COM API through ODT, e-book, Microsoft Office development environments such as C/C++, vCard and XPS* New: Business Card Reading Visual Basic and Visual Studio.NET. Optimised development profiles make it • PDF- & PDF/A document export Enabling your applications to process easy for developers to get started with new for archiving, including, highly business cards is now an easy task. projects. compressed MRC ABBYY business card reading technology supports 27 recognition languages. Cost-Effective Flexible Enough for any Application A modular architecture and pricing model BENEFITS FOR DEVELOPERS offers a variety of features as “add-on” FineReader Engine can be used in: modules, allowing developers to choose • Archiving and document processing only the functions they need, while provid - • Ability to enhance your applications ing the option to add new functions at a applications with multi language • Control and verification systems later date. OCR and document conversion • Document conversion systems • Full control over document • Fax processing applications Secure Investment and Flexibility processing settings and • Content creation and management ABBYY’s breakthrough technologies are recognition results applications permanently being optimised and extended. • Document API to simplify • Digital mailroom applications Multi-platform support allows developers to expand their markets by choosing the processing • Document sorting applications appropriate OS support for their applications: • Integrated Scalability through • Web publishing systems Windows, , Mac OS and more. • Intranet archiving applications built-in multi CPU core support • Media clipping solutions • Visual Components for fast and • Reading or voice-playback systems easy integration of user interface elements

• Qualified technical support ABBYY FineReader Engine 11 Processing & Feature Overview

Document Recognition and Conversion Step-by-Step

Step 1: Step 2: Step 3: Document Input Image Pre-processing Document & Layout Analysis FineReader Engine can acquire documents Once document pages are loaded, After image pre-processing, the recognition and images from different sources: FineReader Engine offers a variety of areas have to be defined. Developers can image processing options which prepare choose 3 different modes for automatic • Load images from disc or memory the document images to deliver the best document analysis (DA) based on artificial • Scan images via TWAIN OCR results: intelligence: • New: Asynchronous scanning and OCR • Image cleaning routines to remove • Full text DA recognises all text on processing noise and garbage documents, including text embedded in pictures, charts and diagrams • Load images from digital cameras • Image optimisation from digital cam - • Open PDFs and automated, intelligent eras, e.g. straighten curved text lines • DA with layout retention automatically detects blocks, tables, barcodes, and PDF processing • Auto-cropping. Auto-dual-page splitting. pictures • Different algorithms for skew correction Engine 11 accurately converts all types of • Invoice pre-processing DA focuses on up to 20 degrees PDFs. The SDK can access internal PDF numbers and tables information like annotations, meta-data, • Adaptive binarisation and texture filtering • Manual block creation is mostly used in font dictionaries, content streams and Field Level/Zonal Recognition scenarios keep existing bookmarks.

NEW: BUSINESS CARD RECOGNITION NEW: AUTOMATIC DOCUMENT CLASSIFICATION

With the new business card reading ABBYY FineReader Engine 11 provides new functionality for document classification capabilities of FineReader Engine 11, technology. Based on a combination of image and content-based classifiers, the tech - developers can now easily extend their nologies support a wide range of document types. The API also allows the training of applications and offer a solution for different document types and provides confidence levels after classification run. this problem. Classification Profiles Classification can be executed in 2 modes: · Maximum Speed – this mode is based on image pattern (black pixels location template) and quick OCR text analysis of title texts. It works up to 10 times faster than full-page OCR**. · Maximum Accuracy – this mode is based on the full OCRed text. It analyses the full- text of the document including the title as well as the key words that were detected during the training.

Business Cards Business card recognition technology is accessible via a new API in FineReader Engine 11. It offers special pre-processing features and access to Invoices ClassiÞcation the extracted data. Business card from ABBYY recognition supports 27 recognition languages. Multiple business cards Various Documents Your Application Receipts scanned on one page can be automati - cally detected and separated before processing. The recognised data can After the classification has been run your application "knows" what document type be exported to the vCard format, a is being processed, e.g. a business card, a receipt, an invoice or a complaint. This standard exchange format for information enables workflow automation and reduces the costs associated with managing contact information. manual pre-processing. Users can easily train new document types via a custom designed interface. The precompiled code sample is a perfect starting point. It's not necessary to create document templates separately. ABBYY FineReader Engine 11 Processing & Feature Overview

Document Recognition and Conversion Step-by-Step – continued

Step 4: Step 5: Step 6: Recognition Verification & User Interaction Export/Document Output Once the recognition areas have been set up, Developers have full access to internal FineReader Engine 11 contains a new character and word recognition are executed. recognition results. Engine 11 provides basic improved font management API allows The SDK supports 202 OCR and 136 ICR information like the character coordinates, extended access to the fonts (predefined languages and has a built-in omnifont OCR but also very advanced attributes, including: font filters) used during document synthesis. Engine. So it is capable of recognising a The SDK offers multiple export options wide variety of font types and objects: • Font and formatting information and formats: • Standard fonts used in office environ - • Word and character recognition hypotheses • TXT, CSV – contain text in reading order, ments, magazines, newspapers no formatting and layout information The information is available via API and • Documents printed with dot-matrix type - XML Export, so that they can be used for • HTML, RTF, ODT, DOCX, XLS(X) and PPTX writers or receipts printers auto mated correction. For simplified, user – allow direct usage and editing • Special fonts like OCR-A, OCR-B, MICR driven correction and verification, visual • E-book Formats – EPUB (.epub) and (E13B) and CMC7 components (ActiveX controls) are avail - FictionBook (.FB2) able. So layout analysis results and uncer - • Old fonts such as Fraktur and tainly recognised characters can be • ABBYY XML – different levels of layout, Schwabacher changed, but also the page order within paragraphs and formatting • Hand-printed characters (ICR) in various a document. Available components are: • ALTO XML – Library standard to for OCR field borders and frames • Scan Interface text and layout information of printed documents • Checkmarks • Document Viewer • New: vCard Export of business card data • 1D & 2D Barcodes • Image Viewer • New: XPS (XML Paper Specification)* • Text Editor FineReader Engine gives developers full • PDF Export – further details below processing control: • Text Validator • Recognition modes normal, fast or balanced mode options for OCR, ICR, and barcodes NEW AND ENHANCED PDF CAPABILITIES • Intelligent processing of PDFs. The SDK determines on a block-by-block basis when to apply full recognition or if the ABBYY’s PDF export can be controlled via API or simple to use PDF export profiles. text layer can be used. Version 11 also Options available are: allows the OCR to be turned off when • Image only PDFs the text layer can be trusted • Searchable PDFs in different versions: text only, text under/above the page image • Core recognition parameters tuning • Tagged and linearised PDFs for improved and faster information allows certain algorithms for pre-pro - cessing, document analysis and recog - • Secured, encrypted PDFs supporting open and permission passwords nition to be switched o n/ off • Automated , intelligent PDF processing access, using internal PDF information • Sophisticated definition of field content, • New: Detection of an existing PDF text layer and the ability to skip OCR and leave by setting alphabets, dictionaries, regu - the document as is lar expressions, types of segmentation, • MRC (Mixed Raster Content) compression for PDF and PDF/A. MRC compression handwriting styles, etc. achieves significantly better file compression without visible degradation. File size • Voting API, gives developers access to can be up to 10 times smaller compared to JPEG compressio n. Version 11 word-level and character-level hypotheses. improvements allow higher background image compression. This information can then be used in • PDF/A Standards for long-term archiving: external voting systems PDF/A-1a & 1b – tagged and with unicode charac - ter maps PDF/A-3 • Pattern training, e.g. for special New: PDF/A-2 – enables smaller files to be creat - XML characters, or decorative fonts ed using JPEG2000 compression, embedding of PDF/A • Own language definitions and dictionaries PDF/A files allowed + can be used to improve the recognition New: PDF/A-3* – extension of the A-2 standard or other results which allows the inclusion of PDF/A and other binary binary file formats such as XML or office. formats * Planned for a maintenance release of FineReader Engine 11. ** Based on internal ABBYY testing. ABBYY FineReader Engine 11 Specifications and Licencing

SPECIFICATIONS ABBYY Licencing Policy ABBYY FineReader Engine is sold via a flexible, modular licencing policy that allows System Requirements developers to select the best combination of tools and pricing options for their project. • PC with x86-compatible processor (1 GHz or higher). Licencing is offered as: • Operating Systems (32 & 64-bit): Microsoft® Windows 8, Windows 7, Vista,Windows Vista®, Developer Licences Add-on Modules for Runtime Licences Windows®XP, Windows Server®2012 (only 64- bit), Windows Server®2008, Windows Providing rights to develop and test applica - RTLs can be enhanced by adding one or Server®2003 tions based on FineReader Engine technology. more of the following functionalities offered Cloud Platforms: Microsoft®Windows® Azure, The licence bundle includes three hardware as add-on modules: Classification, PDF Amazon EC2 • Memory: licence dongles or one concurrent network export, Arabic OCR, CJK (Chinese, - for processing one-page documents – minimum licence. Each stand-alone licence allows up Japanese, Korean) OCR, Thai OCR, Hebrew 400 MB RAM, recommended 1 GB RAM to 10.000 pages per month to be processed. OCR, Vietnamese OCR, ICR. - for processing multi-page documents – minimum 1 GB RAM, recommended 1,5 GB RAM. • Hard disk space: 800 MB for library installation and 100 MB for program operation plus Runtime Licences Software Maintenance, Certification additional 15Mb for every page of a multi-page Grant the right to distribute applications Trainings and Professional Services document processed • 100% TWAIN-compatible scanner, digital with FineReader Engine functionality incorpo - To ensure the success of your projects camera, or fax modem rated. Runtime Licences (RTL) differ by func - ABBYY offers additional support, training, • Video card and monitor (min. resolution tionality, page volume , and network support and certification programs for all products. 1024*768) (Network Runtime Licence). The Professiona l If you need to speed up your project, OCR Runtime Licence provides access to core contact ABBYY for Professional Services. Multilingual OCR 202 languages (including Latin, Greek, Cyrillic alphabets, Arabic Chinese, recognition technologies. Additional RTLs for Software Maintenance guarantees that Japanese and Korean), of which 52 languages specialised functions include the Barcode you always have access to the latest with dictionary support. Runtime Licence and the FineReader XIX technologies. Business Card Recognition Runtime Licence. 27 languages, including 4 hieroglyphic languages

Text Types Normal, Matrix, Typewriter, Receipt, OCR-A, OCR-B, CMC7, MICR, Fraktur/Gothic, mixed text type support processing with auto detection on a word-level. More ABBYY Developer Products ICR On digits, digits combined with letters of one FineReader Engines for Other Platforms language, and digits combined with letters of Cloud OCR SDK several languages, even if fields contain both ABBYY also offers its recognition technology ABBYY’s online OCR Service with RESTful upper and lower case letters. Separates field for other operating systems such as Linux API offers full-text/full-page OCR, field- content from borders and frames. 136 languages, 38 with morphology, custom-field dictionaries; 22 and Mac OS. This cross platform approach level/zonal OCR/ICR, barcode and business handwriting styles including English, American, allows customers to follow market trends card recognition. Developers can register for German, French and Russian. and to secure the investment that was free. Pre-paid and subscription models are Barcodes made. ABBYY also offers customisation available for production. The service is pow - Includes processing of barcodes that are services for embedded OCR. ered by Microsoft® Windows® Azure. damaged or are printed at any angle and fast barcode extraction, more than 16 most popular 1D industrial types, 2D PDF 417, Aztec, Data Matrix, QR Code, MaxiCode, USPS 4CB*. Mobile OCR Engine FlexiCapture Engine

Check mark (OMR) ABBYY’s "compact code OCR" is optimised ABBYY SDK for Data Capture scenarios Simple, grouped, model check marks, marks with to deliver a highly accurate conversion of allows document separation, classification, “corrections” made by hand. image files into text using a small amount template matching for fixed forms as well as Input Formats: of memory and system resources. Platform intelligent data extraction via FlexiLayouts BM P, PCX, DCX, JPEG, JPEG 2000, PNG, GIF, TIFF, independence ensures support for operating from all kind of document types. DjVu, PDFs. systems such as Android, Linux, MacOS, FlexiCapture Engine functionality can also be Output Formats iOS , Symbian, Windows (PC, x86) . combined with FineReader Engine API. DOCX, ODT, XLS, XLSX, PPTX, CSV, TXT, XML, ALTO XML, EPUB, FB2, searchable PDFs, PDF/A-1, A-2, A-3*, compressed MRC PDF/As, XPS*, BMP, PCX, DCX, JPEG, JPEG 2000, PNG, TIFF, image snippet.

Development FineReader Engine API supports the COM ABBYY Europe GmbH ABBYY UK ABBYY Spain ABBYY Benelux Elsenheimerstr. 49, [email protected] [email protected] [email protected] standard and can be easily used in Microsoft 80687 Munich, Germany Visual Studio.NET (VB.NET, C#); Microsoft Visual Tel: +49 89 511 159 0 ABBYY France ABBYY Italy ABBYY Scandinavia [email protected] [email protected] [email protected] [email protected] Basic 5.0, 6.0; Microsoft Visual C++ 4.x and www.ABBYY.com above; VB Script, and other scripting languages; Borland Delphi 2.0 and above; Any other Windows® is a registered trademark of Microsoft Corporation in the United States and other countries. Adobe PDF Library is used for opening and processing PDF files: © 1984-2011 Adobe Systems Incorporated and its licensors. All rights reserved.Protected by U.S. Patents 5,929,866; 5,943,063; 6,289,364; 6,563,502; 6,639,593; 6,754,382; Patents Pending. Adobe, the Adobe development environment that supports COM logo, Acrobat, the Adobe PDF logo, Distiller and Reader are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. All other tra - and ActiveX objects correctly. demarks are the property of their respective owners. Opening DjVu image format: Portions of this computer program are copyright © 1996-2007 LizardTech, Inc. All rights reserved. DjVu is pro - tected by U.S. Patent No. 6,058,214. Foreign Patents Pending. Working with JPEG image format: This software is based in part on the work of the Independent JPEG Group. Working with JPEG2000 image format: Portions of this software are copyright ©2011 University of New South Wales All rights reserved. Unicode support: © 1991-2013 Unicode, Inc. All rights reserved. Intel® Further information online: www.ABBYY.com Performance Primitives: Copyright © 2002-2008 Intel Corporation. Font support: Portions of this software are copyright © 1996-2002, 2006 The FreeType Project (www.freetype.org). All rights reserved. Other: U.S. Patent Nos. 5,625,465, 5,768,416 and 6,094,505. WIBU, CodeMeter, SmartShelter, and SmartBind are registered trademarks of Wibu-Systems. This software includes * Planned for a maintenance release of FineReader Engine 11 ABBYY® FineReader® Engine 11 recognition technologies. © 2013, ABBYY Production LLC. All rights reserved. ABBYY, FINEREADER and ABBYY FineReader are either registered trademarks or trademarks of ABBYY Software Ltd.