Orcus Documentation Release 0.16

Total Page:16

File Type:pdf, Size:1020Kb

Orcus Documentation Release 0.16 Orcus Documentation Release 0.16 Kohei Yoshida Sep 24, 2021 CONTENTS 1 Overview 3 2 C++ API 37 3 Python API 117 4 CLI 127 5 Notes 135 6 Indices and tables 147 Index 149 i ii Orcus Documentation, Release 0.16 Orcus is a library that provides a collection of standalone file processing filters and utilities. It was originally focused on providing filters for spreadsheet documents, but filters for other types of documents have been added tothemix. Contents: CONTENTS 1 Orcus Documentation, Release 0.16 2 CONTENTS CHAPTER ONE OVERVIEW 1.1 Composition of the library The primary goal of the orcus library is to provide a framework to import the contents of documents stored in various spreadsheet or spreadsheet-like formats. The library also provides several low-level parsers that can be used inde- pendently of the spreadsheet-related features if so desired. In addition, the library also provides support for some hierarchical documents, such as JSON and YAML, which were a later addition to the library. You can use this library either through its C++ API, Python API, or CLI. However, not all three methods equally expose all features of the library, and the C++ API is more complete than the other two. The library is physically split into four parts: 1. the parser part that provides the aforementioned low-level parsers, 2. the filter part that providers higher level import filters for spreadsheet and hierarchical documents that internally use the low-level parsers, 3. the spreadsheet document model part that includes the document model suitable for storing spreadsheet document contents, and 4. CLI for loading and converting spreadsheet and hierarchical documents. If you need to just use the parser part of the library, you need to only link against the liborcus-parser library file. If you need to use the import filter part, link againt boththe liborcus-parser and the liborcus libraries. Likewise, if you need to use the spreadsheet document model part, link against the aforementioned two plus the liborcus-spreadsheet-model library. Also note that the spreadsheet document model part has additional dependency on the ixion library for handling formula re-calculations on document load. 1.2 Loading spreadsheet documents The orcus library’s primary aim is to provide a framework to import the contents of documents stored in various spreadsheet, or spreadsheet-like formats. It supports two primary use cases. The first use case is where the client program does not have its own document model, but needs to import data from a spreadsheet-like document file and access its content without implementing its own document store from scratch. In this particular use case, you can simply use the document class to get it populated, and access its content through its API afterward. The second use case, which is a bit more advanced, is where the client program already has its own internal document model, and needs to use orcus to populate its document model. In this particular use case, you can implement your own set of classes that support necessary interfaces, and pass that to the orcus import filter. 3 Orcus Documentation, Release 0.16 For each document type that orcus supports, there is a top-level import filter class that serves as an entry point for loading the content of a document you wish to load. You don’t pass your document to this filter directly; instead, you wrap your document with what we call an import factory, then pass this factory instance to the loader. This import factory is then required to implement necessary interfaces that the filter class uses in order for it to pass data tothe document as the file is getting parsed. When using orcus’s own document model, you can simply use orcus’s own import factory implementation to wrap its document. When using your own document model, on the other hand, you’ll need to implement your own set of interface classes to wrap your document with. The following sections describe how to load a spreadsheet document by using 1) orcus’s own spreadsheet document class, and 2) a user-defined custom docuemnt class. 1.2.1 Use orcus’s spreadsheet document class If you want to use orcus’ document as your document store instead, then you can use the import_factory class that orcus provides which already implements all necessary interfaces. The example code shown below illustrates how to do this: #include <orcus/spreadsheet/document.hpp> #include <orcus/spreadsheet/factory.hpp> #include <orcus/orcus_ods.hpp> #include <ixion/model_context.hpp> #include <iostream> using namespace orcus; int main() { // Instantiate a document, and wrap it with a factory. spreadsheet::document doc; spreadsheet::import_factory factory(doc); // Pass the factory to the document loader, and read the content from a file // to populate the document. orcus_ods loader(&factory); loader.read_file("/path/to/document.ods"); // Now that the document is fully populated, access its content. const ixion::model_context& model= doc.get_model_context(); // Read the header row and print its content. ixion::abs_address_t pos(0,0,0); // Set the cell position to A1. ixion::string_id_t str_id= model.get_string_identifier(pos); const std::string*s= model.get_string(str_id); assert(s); std::cout<<"A1:"<<*s<< std::endl; pos.column=1; // Move to B1 str_id= model.get_string_identifier(pos); (continues on next page) 4 Chapter 1. Overview Orcus Documentation, Release 0.16 (continued from previous page) s= model.get_string(str_id); assert(s); std::cout<<"B1:"<<*s<< std::endl; pos.column=2; // Move to C1 str_id= model.get_string_identifier(pos); s= model.get_string(str_id); assert(s); std::cout<<"C1:"<<*s<< std::endl; return EXIT_SUCCESS; } This example code loads a file saved in the Open Document Spreadsheet format. It consists of the following content on its first sheet. While it is not clear from this screenshot, cell C2 contains the formula CONCATENATE(A2, ” “, B2) to concatenate the content of A2 and B2 with a space between them. Cells C3 through C7 also contain similar formula expressions. Let’s walk through this code step by step. First, we need to instantiate the document store. Here we are using the concrete document class available in orcus. Then immediately pass this document to the import_factory instance also from orcus: // Instantiate a document, and wrap it with a factory. spreadsheet::document doc; spreadsheet::import_factory factory(doc); The next step is to create the loader instance and pass the factory to it: // Pass the factory to the document loader, and read the content from a file // to populate the document. orcus_ods loader(&factory); In this example we are using the orcus_ods filter class because the document we are loading is of Open Document Spreadsheet type, but the process is the same for other document types, the only difference being the name of the class. Once the filter object is constructed, we’ll simply load the file by calling its read_file() method and passing the path to the file as its argument: loader.read_file("/path/to/document.ods"); Once this call returns, the document has been fully populated. What the rest of the code does is access the content of the first row of the first sheet of the document. First, you need to get a reference to the internal cell value storethatwe call model context: 1.2. Loading spreadsheet documents 5 Orcus Documentation, Release 0.16 const ixion::model_context& model= doc.get_model_context(); Since the content of cell A1 is a string, to get the value you need to first get the ID of the string: ixion::abs_address_t pos(0,0,0); // Set the cell position to A1. ixion::string_id_t str_id= model.get_string_identifier(pos); Once you have the ID of the string, you can pass that to the model to get the actual string value and print it to the standard output: const std::string*s= model.get_string(str_id); assert(s); std::cout<<"A1:"<<*s<< std::endl; Here we do assume that the string value exists for the given ID. In case you pass a string ID value to the get_string() method and there isn’t a string value associated with it, you’ll get a null pointer instead. The reason you need to take this 2-step process to get a string value is because all the string values stored in the cells are pooled at the document model level, and the cells themselves only store the ID values. You may also have noticed that the types surrounding the ixion::model_context class are all in the ixion names- pace. It is because orcus’ own document class uses the formula engine from the ixion library in order to calculate the results of the formula cells inside the document, and the formula engine requires all cell values to be stored in the ixion::model_context instance. Note: The document class in orcus uses the formula engine from the ixion library to calculate the results of the formula cells stored in the document. The rest of the code basically repeats the same process for cells B1 and C1: pos.column=1; // Move to B1 str_id= model.get_string_identifier(pos); s= model.get_string(str_id); assert(s); std::cout<<"B1:"<<*s<< std::endl; pos.column=2; // Move to C1 str_id= model.get_string_identifier(pos); s= model.get_string(str_id); assert(s); std::cout<<"C1:"<<*s<< std::endl; You will see the following output when you compile and run this code: A1: Number B1: String C1: Formula Accessing the numeric cell values are a bit simpler since the values are stored directly with the cells. Using the document from the above example code, the following code: for (spreadsheet::row_t row=1; row<=6;++row) { ixion::abs_address_t pos(0, row,0); (continues on next page) 6 Chapter 1.
Recommended publications
  • The Microsoft Office Open XML Formats New File Formats for “Office 12”
    The Microsoft Office Open XML Formats New File Formats for “Office 12” White Paper Published: June 2005 For the latest information, please see http://www.microsoft.com/office/wave12 Contents Introduction ...............................................................................................................................1 From .doc to .docx: a brief history of the Office file formats.................................................1 Benefits of the Microsoft Office Open XML Formats ................................................................2 Integration with Business Data .............................................................................................2 Openness and Transparency ...............................................................................................4 Robustness...........................................................................................................................7 Description of the Microsoft Office Open XML Format .............................................................9 Document Parts....................................................................................................................9 Microsoft Office Open XML Format specifications ...............................................................9 Compatibility with new file formats........................................................................................9 For more information ..............................................................................................................10
    [Show full text]
  • International Standard Iso/Iec 29500-1:2016(E)
    This is a previewINTERNATIONAL - click here to buy the full publication ISO/IEC STANDARD 29500-1 Fourth edition 2016-11-01 Information technology — Document description and processing languages — Office Open XML File Formats — Part 1: Fundamentals and Markup Language Reference Technologies de l’information — Description des documents et langages de traitement — Formats de fichier “Office Open XML” — Partie 1: Principes essentiels et référence de langage de balisage Reference number ISO/IEC 29500-1:2016(E) © ISO/IEC 2016 ISO/IEC 29500-1:2016(E) This is a preview - click here to buy the full publication COPYRIGHT PROTECTED DOCUMENT © ISO/IEC 2016, Published in Switzerland All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form orthe by requester. any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of Ch. de Blandonnet 8 • CP 401 ISOCH-1214 copyright Vernier, office Geneva, Switzerland Tel. +41 22 749 01 11 Fax +41 22 749 09 47 www.iso.org [email protected] ii © ISO/IEC 2016 – All rights reserved This is a preview - click here to buy the full publication ISO/IEC 29500-1:2016(E) Table of Contents Foreword .................................................................................................................................................... viii Introduction .................................................................................................................................................
    [Show full text]
  • [MS-XLSX]: Excel (.Xlsx) Extensions to the Office Open XML Spreadsheetml File Format
    [MS-XLSX]: Excel (.xlsx) Extensions to the Office Open XML SpreadsheetML File Format Intellectual Property Rights Notice for Open Specifications Documentation . Technical Documentation. Microsoft publishes Open Specifications documentation (“this documentation”) for protocols, file formats, data portability, computer languages, and standards support. Additionally, overview documents cover inter-protocol relationships and interactions. Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation. You can also distribute in your implementation, with or without modification, any schemas, IDLs, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications documentation. No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. Patents. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft's delivery of this documentation grants any licenses under those patents or any other Microsoft patents. However, a given Open Specifications document might be covered by the Microsoft Open Specifications Promise or the Microsoft Community Promise. If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting [email protected].
    [Show full text]
  • Office File Formats Overview
    Office File Formats Overview Tom Jebo Sr Escalation Engineer Agenda • Microsoft Office Supported Formats • Open Specifications File Format Documents and Resources • Benefits of broadly-adopted standards • Microsoft Office Extensibility • OOXML Format Overview Microsoft Office 2016 File Format Support • Office Open XML (.docx, .xlsx, .pptx) • Microsoft Office Binary Formats (.doc, .xls, .ppt) (legacy) • OpenDocument Format (.odt, .ods, .odp) • Portable Document Format (.pdf) • Open XML Paper Specification (.xps) Microsoft File Formats Documents and Resources File Format Related Documents • Documentation Intro & Reference Binary Formats Standards • https://msdn.microsoft.com/en- [MS-OFFDI] [MS-DOC] [MS-DOCX] us/library/gg134029.aspx [MS-OFCGLOS] [MS-XLS] [MS-XLSX] [MS-OFREF] [MS-XLSB] [MS-PPTX] • [MS-OFFDI] start here [MS-OSHARED] [MS-PPT] [MS-OE376] • Standards implementation notes [MS-OFFCRYPTO] [MS-OI29500] • File format documentation Macros [MS-OODF] OneNote [MS-OFFMACRO] [MS-OODF2] • SharePoint & Exchange/Outlook client-server protocols [MS-ONE] [MS-OFFMACRO2] [MS-OODF3] • Windows client and server protocols [MS-ONESTORE] [MS-OVBA] [MS-ODRAWXML] • .NET Framework Office Drawing/Graphics Other • XAML Customization [MS-CTDOC] [MS-ODRAW] [MS-DSEXPORT] • Support [MS-CTXLS] [MS-OGRAPH] [MS-ODCFF] [MS-CUSTOMUI] [MS-OFORMS] • [email protected] [MS-CUSTOMUI2] [MS-WORDLFF] • MSDN Open Specifications forums [MS-OWEMXML] Outlook [MS-XLDM] [MS-PST] [MS-3DMDTP] Reviewing Binary Formats • CFB – [MS-CFB] storages and streams Binary Formats • Drawing
    [Show full text]
  • Excel Xml Mapping Schema
    Excel Xml Mapping Schema Is Hiram sprightly or agnostic after supersensual Shepard outthinks so unthinking? If factual or unciform Taddeo usually buy-ins his straightness embrute imbricately or restock threefold and pivotally, how backstage is Gilbert? Overgrown Norwood polarizes, his afternoons entwining embrangles dam. Office Open XML text import filter. One or more repeating elements or attributes are mapped to the spreadsheet. And all the code used in the book is available to customers in a downloadalbe archive. Please copy any unsaved content to a safe place, only the address of the topmost cell appears, and exchanged. For the file named in time title age of the dialog box, so you purchase already using be it. This menu items from any other site currently excel is focused on a check box will be able only used for? Input schema mapping operation with existing schema? You select use memories to open XML files as current, if you reimport the XML data file, we may seize to do handle several times. The Windows versions of Microsoft Excel 2007 2010 and 2013 allow columns in spreadsheets to be mapped to an XML structure defined in an XML Schema file. The mappings that text, i did you want to hear the developer tab click xml excel schema mapping. Thread is defined in schema, schemas with xpath in onedrive than referencing an editor. Asking for help, it may be impractical to use a Website to validate your XML because of issues relating to connectivity, and then go to your pc? XML Syntax. When you take a schema and map it has Excel using the XML Source task pane you taunt the exporting and importing of XML data implement the spreadsheet We are.
    [Show full text]
  • Open XML the Markup Explained
    Wouter van Vugt Open XML The markup explained i Contents Contents Contents ........................................................................................................................................................................ ii Acknowledgements ...................................................................................................................................................... iv Foreword ....................................................................................................................................................................... v Introduction .................................................................................................................................................................. vi Who is this book for? ............................................................................................................................................... vi Code samples ........................................................................................................................................................... vi ECMA Office Open XML ................................................................................................................................................. 1 The Open XML standard ........................................................................................................................................... 1 Chapter 1 WordprocessingML ......................................................................................................................................
    [Show full text]
  • Web Publications W3C Working Group Note 13 August 2019
    Web Publications W3C Working Group Note 13 August 2019 This version: https://www.w3.org/TR/2019/NOTE-wpub-20190813/ Latest published version: https://www.w3.org/TR/wpub/ Latest editor's draft: https://w3c.github.io/wpub/ Previous version: https://www.w3.org/TR/2019/WD-wpub-20190614/ Editors: Matt Garrish (DAISY Consortium) Ivan Herman (W3C) Participate: GitHub w3c/wpub File a bug Commit history Pull requests Copyright © 2019 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply. Abstract The primary objective of this specification is to define requirements for the production of Web Publications. In doing so, it also defines a framework for creating packaged publication formats, such as EPUB and audiobooks, where a pathway to the Web is highly desirable but not necessarily the primary method of interchange or consumption. Status of This Document This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/. Due to the lack of practical business cases for Web Publications, and the consequent lack of commitment to implement the technology, the Publishing Working Group has chosen to publish this document as a Note and focus on other areas of interest, including developing the manifest format as a separate specification. This document was still a work in progress at the time of its publication. As a result, anyone seeking to create Web Publications, or implement a reader for them, should read the approach and proposals outlined in this document with an abundance of caution.
    [Show full text]
  • No Retyping. No Paper. Just Smart Documents. for Windows® #1 Conversion Software Table of Contents Readiris 17
    Readiris™ 17 No retyping. No paper. Just smart documents. for Windows® #1 Conversion Software Table of Contents Readiris 17 ..................................................................................................................................... 1 Introducing Readiris ...................................................................................................................... 1 What's new in Readiris 17 .............................................................................................................. 2 Legal Notices ............................................................................................................................... 3 Section 1: Installation and Activation ................................................................................................. 5 System Requirements ................................................................................................................... 5 Installing Readiris ......................................................................................................................... 6 Activating Readiris ........................................................................................................................ 7 Software Registration .................................................................................................................... 8 Search for updates ....................................................................................................................... 9 Uninstalling Readiris ..................................................................................................................
    [Show full text]
  • [MS-ODRAWXML]: Office Drawing Extensions to Office Open XML
    [MS-ODRAWXML]: Office Drawing Extensions to Office Open XML Structure Intellectual Property Rights Notice for Open Specifications Documentation . Technical Documentation. Microsoft publishes Open Specifications documentation (“this documentation”) for protocols, file formats, data portability, computer languages, and standards support. Additionally, overview documents cover inter-protocol relationships and interactions. Copyrights. This documentation is covered by Microsoft copyrights. Regardless of any other terms that are contained in the terms of use for the Microsoft website that hosts this documentation, you can make copies of it in order to develop implementations of the technologies that are described in this documentation and can distribute portions of it in your implementations that use these technologies or in your documentation as necessary to properly document the implementation. You can also distribute in your implementation, with or without modification, any schemas, IDLs, or code samples that are included in the documentation. This permission also applies to any documents that are referenced in the Open Specifications documentation. No Trade Secrets. Microsoft does not claim any trade secret rights in this documentation. Patents. Microsoft has patents that might cover your implementations of the technologies described in the Open Specifications documentation. Neither this notice nor Microsoft's delivery of this documentation grants any licenses under those patents or any other Microsoft patents. However, a given Open Specifications document might be covered by the Microsoft Open Specifications Promise or the Microsoft Community Promise. If you would prefer a written license, or if the technologies described in this documentation are not covered by the Open Specifications Promise or Community Promise, as applicable, patent licenses are available by contacting [email protected].
    [Show full text]
  • International Standard Iso/Iec 29500-1:2016(E)
    INTERNATIONAL ISO/IEC STANDARD 29500-1 Fourth edition 2016-11-01 Information technology — Document description and processing languages — Office Open XML File Formats — Part 1: Fundamentals and Markup Language Reference Technologies de l’information — Description des documents et langages de traitement — Formats de fichier “Office Open XML” — Partie 1: Principes essentiels et référence de langage de balisage Reference number ISO/IEC 29500-1:2016(E) © ISO/IEC 2016 ISO/IEC 29500-1:2016(E) COPYRIGHT PROTECTED DOCUMENT © ISO/IEC 2016, Published in Switzerland All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized otherwise in any form orthe by requester. any means, electronic or mechanical, including photocopying, or posting on the internet or an intranet, without prior written permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of Ch. de Blandonnet 8 • CP 401 ISOCH-1214 copyright Vernier, office Geneva, Switzerland Tel. +41 22 749 01 11 Fax +41 22 749 09 47 www.iso.org [email protected] ii © ISO/IEC 2016 – All rights reserved ISO/IEC 29500-1:2016(E) Table of Contents Foreword .................................................................................................................................................... viii Introduction .................................................................................................................................................. x 1. Scope ......................................................................................................................................................1
    [Show full text]
  • User's Guide Utilizes Fictitious Names for Purposes of Demonstration; References to Actual Persons, Companies, Or Organizations Are Strictly Coincidental
    IRISPdfTM 6 for TM IRISPowerscan User’s Guide IRISPdfTM for IRISPowerscanTM – User’s guide Table of Contents Copyrights ........................................................................................... 3 Chapter 1 Introducing IRISPdf for IRISPowerscan ............... 5 Chapter 2 Image enhancement ................................................. 7 Autorotation ........................................................................................ 8 Despeckle ............................................................................................ 8 Adjust images ...................................................................................... 9 Chapter 3 Character recognition ............................................ 11 Language ........................................................................................... 12 Secondary languages ......................................................................... 12 Character pitch .................................................................................. 13 Font type ........................................................................................... 13 Page range ......................................................................................... 14 Recognition ....................................................................................... 14 Chapter 4 Image compression ................................................. 17 General image compression .................................................... 17 JPEG 2000 compression .......................................................
    [Show full text]
  • Creating Compressed Excel Workbooks in SAS on a Non
    PhUSE 2015 Paper CS04 Creating native Excel workbooks in SAS on a non-Windows system without SAS/ACCESS Edwin van Stein, Astellas Pharma, Leiden, The Netherlands ABSTRACT Most people who’ve worked in SAS on a non-Windows OS without a SAS/ACCESS Interface to PC Files license have at some point tried to create Microsoft Excel files. In most cases we end up creating uncompressed formats such as CSV or XML, which are then compressed to be sent to our customers, who then have to uncompress them before being able to open them in Excel. This paper discusses XLSX as an alternative. XLSX is an open ISO standard created by Microsoft. It’s basically a zip file containing XML files that can be directly opened in Excel. There’s a lot of information out there about creating XML files in SAS and zipping it to an XLSX file is a simple X statement. This results in a compressed format that doesn’t require decompression to be opened in Excel. INTRODUCTION Most people who’ve worked in SAS on a non-Windows OS without a SAS/ACCESS Interface to PC Files license have at some point tried to create Microsoft Excel files. The options that are most obvious for this are: • PROC EXPORT using DBMS=CSV; • ODS CSVALL; • ODS TAGSETS.EXCELXP or EXCELBASE. However, all of the above options create text files (either CSV or XML), which can result in very large files depending on the records and variables included. To be able to e-mail these to customers we resort to compressing (ZIP) the files, but then the customers have to uncompress to be able to open them in Excel.
    [Show full text]