Intelligent Document Processing - Extracting All the Information from Documents Instantly
Total Page:16
File Type:pdf, Size:1020Kb
Intelligent Document Processing - Extracting all the information from documents instantly www.springml.com © 2018 Introduction For many companies, operations rely heavily on the processing and handling of all kinds of documents. From invoices and purchase orders to resumes and legal documents, a business can sink or swim based on how it handles its data and records. Whether these are physical documents or digitized, an organized and reliable system is required to ensure that all of them are properly processed. Intelligent Document Processing Statistics 20$ 120$ 220$ 84% of small businesses rely on a It can cost an organization $20 to file a manual process for document document, $120 to locate a lost document, processing and management1 and $220 to recreate a lost document2 65% Each year, the amount of data a About 3% of a company’s company produces increases by 65% revenue is spent on printing, filing, and maintaining documentation3 It is important to understand the difference between digitization and digitalization. While most businesses receive documents and files in electronic formats such as PDFs, word documents, scanned images or emails -- these files are digital but not read nor can be extracted using machine learning or artificial intelligence. This is considered digitization. However, the solution that SpringML offers is the most advanced solution for processing documents from invoices and contracts to legal forms and healthcare documents. It turns data into an automated process that generates valuable insight. This is digitalization. By automating document processing in the healthcare industry, for instance, professionals will have more time to focus on patients and providing care rather than dedicating unnecessary time to paperwork and filing. For indus- tries like legal and finance, security and efficiency are more important than ever and all directly affect productivity. Page 2 www.springml.com Current Methodologies and their Drawbacks Manual Processing More often than not, teams and staff are hired to transfer much of this data into an application in order to create structure and organization. Additionally, when you have staff employed to process documents, human error can occur because these tasks can become manually intense and demanding. As a result, you might spend extra resources attempting to remedy some of these inefficiencies or errors. In comparison, think about the time it takes to process documents manually. An employee would have to individu- ally do the data entry by reviewing forms and entering information into the system. Obviously, tasks of this nature are not scalable for large amounts of data or forms. The time and cost alone would be greatly decreased with the use of intelligent document processing. Furthermore, it is incredibly difficult to gain insights into your data if every- thing is done manually. OCR There have been many attempts to develop technology that can successfully process and extract data from docu- ments. A common method is through Optical Character Recognition, or OCR, which can identify specific coordi- nates of each piece of text. For instance, it may rely on a ruleset that indicates the top left a corner of a page start- ing at coordinates (15, 25) and follows the text “Bill To:”. While this may work on a very small scale where the docu- ment structure is consistent, it creates a problem for larger companies who might be dealing with dozens or even hundreds of different layouts. Not all invoices have the same layout and not all resumes have a similar structure, likewise different type of docu- ments can come in a variety of formats, structures, and layouts even when they have essentially the same goal. Invoices, for example, will have all the same types of information but where and how they appear on the page can vary between clients and vendors. Even basic forms and documents like a driver’s license will have variations based on the state or country they come from. Hence, OCR is also not as scalable as one would like it to be. Any further scalability would require to create the right number of rulesets and managing even the smallest or slightest changes to a form, which ultimately is just not an efficient use of time for a business. Page 3 www.springml.com OCR works best with good quality typed documents. Handwritten documents, typed fonts cannot be easily read by OCR software. Unfortunately, OCR also has difficulty with documents that have both images and text and Spreadsheets. Error rate depends upon the quality and type of document, including the font used. Errors that occur during OCR include misreading letters, skipping over letters that are unreadable, or mixing together text from adjacent columns or image captions. Workarounds need to be implemented to tackle the difficulty OCR faces when differentiating between similar look- ing characters, such as the number zero and a capital "O." However, this solution can only be implemented with documents created just for OCR making it not very adaptable for different types of documents. Refrum ------------------------------------------ BEST BUY STORE ------------------------------------------ 3047 North Osage Avenue RECEIPT Juniata, NE 68955 USA *** *** Invoice# CASHIER #1 02/02/2019 - 03:30 PM Date SIP/JUL/0154 ------------------------------------------ 23/16/2013 ITEM 1 $20.00 ITEM 2 $10.00 Service Charge DISC. 50% (SPECIAL PROMO) @ $20.00 ITEM 3 $50.00 DISC. 50% (SPECIAL PROMO) @ $100.00 Reparing Charge ITEM 4 $30.00 x2 @ $15.00 ITEM 5 $30.00 Transportation Charge x3 @ $10.00 ITEM 6 $20.00 DISC. 50% (SELECTED ITEM) @ $40.00 Replacing Charge ------------------------------------------ SUBTOTAL $160.00 LOYALTY MEMBER -5.00 ------------------------------------------ TOTAL AMOUNT $155.00 CASH $160.00 CHANGE $5.00 ------------------------------------------ THANK YOU FOR SHOPPING! ------------------------------------------ Drawbacks No Scalability Standardization of documents and fonts is needed Slow turnaround time High Error Rate Page 4 www. springml. com Solution With so much lack of structure and consistency across documents, it may be hard to believe that technology exists that can not only extract but read and identify different elements on a page no matter where they are. However, with SpringML’s modern Machine Learning algorithms, we make that possible. Our solution is dynamic, flexible, and relies on artificial intelligence rather than the use of coordinates to work. Instead, our technology uses deep learning techniques such as Natural Language Processing and Image Classification to identify document types and extract relevant fields. We leverage several techniques that together form one cohesive solution: First, a document is identified using character and text recognition which helps determine which fields and data to extract Then, a combination of image object detection and Named Entry Recognition models are used to extract all relevant information. Sometimes these are Machine Learning API’s while other techniques involve customization depending on the document type Lastly, our solution works to integrate easily with your existing workflows and applications for a more seamless process. Additionally, the results can also be integrated into your analytics systems in order to gain real-time actionable insights Page 5 www.springml.com Benefits SpringML’s solution is dynamic and works across a multitude of documents and forms without needing any additional configuration to make it work. Ensures a more seamless transition with our solution which integrates with your existing workflows and applications. It makes it easier for staff to use and adopt into their daily routine Our solution has an accuracy rate of higher than 95% ensuring document extraction that you can rely on and trust Intelligent Document Extraction is scalable for businesses of any size. Whether you are processing dozens or hundreds of thousands of documents a day, our solution can handle the workload Before implementing SpringML’s intelligent document processing solution, a large business could be processing invoices manually. The finance department or someone in a data entry job might be manually going through invoices and transferring the data into the company’s designated system. While not only tedious and error-prone, a process like this could take hours each day. With an intelligent system in place, the automation could free up an employee to focus on more important tasks and save hours of time and resources by reducing any manual data entry labor by 99%. Not only would the process become simplified and automatic, but the data collection would be more accurate and reliable. Getting Started We start with a phone call to review your documents and fields that need to be extracted. We conduct a quick product demo focused on one document and five fields. Then, we will discuss a full implementation scope and timeline. For more information about SpringML’s Intelligent Document Processing, please visit https://springml.com/intelligent-document-processor 1 https://smallbiztrends.com/2017/04/manual-process.html 2 https://www.corpmagazine.com/technology/82-percent-companies-still-spending-billions-paper/ 3 https://www.mydatascope.com/blog/en/2018/03/08/how-much-paper-waste-is-costing-your-business/ Page 6 www.springml.com.