Extracting Body Text from Academic PDF Documents for Text Mining

Total Page:16

File Type:pdf, Size:1020Kb

Extracting Body Text from Academic PDF Documents for Text Mining Extracting Body Text from Academic PDF Documents for Text Mining Changfeng Yu, Cheng Zhang and Jie Wang Department of Computer Science, University of Massachusetts, Lowell, MA, U.S.A. Keywords: Body-text Extraction, HTML Replication of PDF, Line Sweeping, Backward Traversal. Abstract: Accurate extraction of body text from PDF-formatted academic documents is essential in text-mining appli- cations for deeper semantic understandings. The objective is to extract complete sentences in the body text into a txt file with the original sentence flow and paragraph boundaries. Existing tools for extracting text from PDF documents would often mix body and nonbody texts. We devise and implement a system called PDFBoT to detect multiple-column layouts using a line-sweeping technique, remove nonbody text using computed text features and syntactic tagging in backward traversal, and align the remaining text back to sentences and para- graphs. We show that PDFBoT is highly accurate with average F1 scores of, respectively, 0.99 on extracting sentences, 0.96 on extracting paragraphs, and 0.98 on removing text on tables, figures, and charts over a corpus of PDF documents randomly selected from arXiv.org across multiple academic disciplines. 1 INTRODUCTION bitary layouts is challenging, due to the utmost flex- ibility of PDF typesetting. Instead, we focus on BT It is desirable for text mining applications to extract extraction from single-column and multiple column complete sentences and correct boundaries of para- research papers, reports, and case studies. We do so graphs from the body text of a PDF document into by working with the location, font size, and font style a txt file without hard breaks inside each paragraph. of each character, and the locations and sizes of other Layered reading (http://dooyeed.com) and extractive objects. While a PDF file provides such information, summarization, for example, are such applications. we find it easier to work with HTML replications pro- Layered reading allows the reader to read the most duced by an exiting tool named pdf2htmlEX (Wang, important layer of sentences first based on sentence 2014), with almost the same look and feel of the orig- rankings, then the layer of next important sentences inal PDF document, providing necessary formatting interleaving with the previous layers of sentences in information via HTML tags, classes, and id’s in the the original order of the document, and continue in underlying DOM tree. this fashion until the entire document is read. We devise a system named PDFBoT (PDF to Body Text) that, using pdf2htmlEX as a black box, By “body text” (BT in short) it means the main incorporates certain text formatting features produced text of an article, excluding “nonbondy text” (NBT in by it to identify NBT texts. We use a line-sweeping short) such as headings, footings, sidings (i.e., text on method to detect multi-column layouts and the area side margins), tables, figures, charts, captions, titles, for printing the BT text. We also develop multiple authors, affiliations, and math expressions in the dis- tests to identify NBT text inside the BT-text area and play mode, among other things. use a backward traversal method to deploy these tests. Most existing tools for extracting text from PDF In addition, we use POS (Part-of-Speech) tagging to documents, including pdftotext (FooLabs, 2014) and help identify NBT text that are harder to distinguish. PDFBox (Apache, 2017), extract a mixture of both The rest of the paper is organized as follows: BT and NBT texts. Identifying BT text from such Section 2 is related work on text extractions from mixtures of texts is challenging, if not impossible. PDF. Section 3 describes HTML replications via Other tools extract texts according to rhetorical cate- pdf2htmlEX and Sections 4 presents the architecture gories such as LA-PDFText (Burns, 2013) and logical of PDFBoT and its features Section 5 is evaluation re- text blocks such as Icecite (Korzen, 2017), which only sults with F1 scores and running time, and Section 6 provide a suboptimal solution to our applications. is conclusions and final remarks. Extracting BT text from PDF documents of ar- 235 Yu, C., Zhang, C. and Wang, J. Extracting Body Text from Academic PDF Documents for Text Mining. DOI: 10.5220/0010131402350242 In Proceedings of the 12th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K 2020) - Volume 1: KDIR, pages 235-242 ISBN: 978-989-758-474-9 Copyright c 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved KDIR 2020 - 12th International Conference on Knowledge Discovery and Information Retrieval 2 RELATED WORK et al., 2019; Wang et al., 2018; Phong et al., 2020). In summary, previous methods, while meeting Existing tools, such as pdftotext (FooLabs, 2014) and with certain success, still fall short of the desired ac- PDFBox (Apache, 2017) the two most widely-used curacy required by text-mining applications relying tools for extracting text from PDF, and a number of on clean extractions of complete sentences and cor- other tools such as pdftohtml (Kruk, 2013), pdftoxml rect boundaries of paragraphs in BT text. (Dejean and Giguet, 2016), pdf2xml (Tiedemann, 2016), ParsCit (Kan, 2016), PDFMiner (Shinyama, 2016), pdfXtk (Hassan, 2013), pdf-extract (Ward, 3 HTML REPLICATION OF PDF 2015), pdfx (Constantin et al, 2011), PDFExtract (Berg, 2011), and Grobid (Lopez, 2017), extract text HTML technologies have been used to replicate PDF from PDF extract BT text and NBT text together with- layouts to facilitate online publishing. A PDF docu- out a clear distinction. PDFBox can extract text in ment can be represented as a sequence of pages, with two-column layouts; some other tools extract text line each page being a DOM tree of objects with sufficient by line across columns. information for an HTML viewer to display the con- Using heuristics is a common approach. For ex- tent (Wang and Liu, 2013). The text extracted from ample, the Java PDF library was used to obtain a PDF by pdf2htmlEX (Wang, 2014) are translated into bounding box for each word, compute the distance HTML text elements that are placed into the same po- between neighboring words, connect them based on a sitions as they are displayed by PDF. set of rules to form a larger text block, place them into Let F denote a PDF document and f the HTML rhetorical categories, and connect these categories file produced by pdf2htmlEX on F. The DOM tree following the order of the underlying document (Ra- for f , denoted by Tf , is divided into four levels: doc- makrishnan et al., 2012). However, this method fails ument, page, text line, and text block (TBK in short). to align broken sentence and determine text on formu- (1) Document Structure. Tf starts with the following las, tables, or figures. Using an intermediate HTML tag as the root: hdiv id=“page-container”i, and each representation generated by pdftohtml (Yildiz et al., of its children is the root of a subtree for a page, listed 2005). Text blocks may also be created by grouping in sequence, with an id indicating its page number characters based on their relative positions (Shigarov and a class name indicating the width and height of et al., 2016), while extracting the tables in PDF. These a page. For example, a child node with hdiv id=“pf7” two methods are focused only on extracting tables. class=“pf w0 h0 data-page-no=“7”i is the root of the Other methods include rule-based and machine- subtree for Page 7, where w0 and h0 are the width and learning models. For example, text may be placed height of the page (specifying the printable area) with into predefined logical text blocks based on a set of the origin at the lower-left corner of the page. rules on the distance, positions, fonts of characters, (2) Page Structure. Each page starts with a page node, words, and text lines (Bast and Korzen, 2017). How- followed by object nodes with contents to be printed. ever, these rules also connect text on tables or fig- Each object occupies a rectangular area (a bounding ures as BT text. A Conditional Random Field (CRF) box) specified on a coordinate system of pixels. The model is trained (Luong et al., 2011; Romary and text of the document is divided into TBKs as leaf Lopez, 2015) to extract texts according to a prede- nodes. Each TBK is represented by a hdivi tag with fined rhetorical category, such as title, abstract, and corresponding attributes, and so the text in a TBK are other sections in the input document. However, this either all BT text or all NBT text. Each object is iden- model fails to determine paragraph boundaries or tified by coordinates (x;y) at the lower-left corner of align broken sentences, among other things. the bounding box relative to the coordinates of its par- CiteSeerX (Giles, 2006), a search engine, extracts ent node. In what follows, these coordinates are re- metadata from indexed articles in scientific docu- ferred to as the starting point of the underlying object. ments for searching purpose, but not focused on the In addition to the starting point, a non-textual object is accuracy of extracting body text. PDFfigures (Clark specified by a width and a height, and a TBK is speci- and Divvala, 2015) chunks the text table and figure fied with a height without a width, where the width is into blocks, then classifies these blocks into captions, implied by the enclosed text, font size and style, and body text, and part-of-figure text. Recent studies have word spacing. The parent of each object may either be shifted attentions to extracting certain types of text, the origin, a node for a figure or a table, or a node due including titles (Yang et al., 2019) (but not text on ta- to some (probably invisible) formatting code.
Recommended publications
  • Paratext in Bible Translations with Special Reference to Selected Bible Translations Into Beninese Languages
    DigitalResources SIL eBook 58 ® Paratext in Bible Translations with Special Reference to Selected Bible Translations into Beninese Languages Geerhard Kloppenburg Paratext in Bible Translations with Special Reference to Selected Bible Translations into Beninese Languages Geerhard Kloppenburg SIL International® 2013 SIL e-Books 58 2013 SIL International® ISSN: 1934-2470 Fair-Use Policy: Books published in the SIL e-Books (SILEB) series are intended for scholarly research and educational use. You may make copies of these publications for research or instructional purposes free of charge (within fair-use guidelines) and without further permission. Republication or commercial use of SILEB or the documents contained therein is expressly prohibited without the written consent of the copyright holder(s). Editor-in-Chief Mike Cahill Compositor Margaret González VRIJE UNIVERSITEIT AMSTERDAM PARATEXT IN BIBLE TRANSLATIONS WITH SPECIAL REFERENCE TO SELECTED BIBLE TRANSLATIONS INTO BENINESE LANGUAGES THESIS MASTER IN LINGUISTICS (BIBLE TRANSLATION) THESIS ADVISOR: DR. L.J. DE VRIES GEERHARD KLOPPENBURG 2006 TABLE OF CONTENTS 1. INTRODUCTION.................................................................................................................. 3 1.1 The phenomenon of paratext............................................................................................ 3 1.2 The purpose of this study ................................................................................................. 5 2. PARATEXT: DEFINITION AND DESCRIPTION.............................................................
    [Show full text]
  • Dissertation Body Text FINAL SUBMISSION VERSION
    UCLA UCLA Electronic Theses and Dissertations Title Openness to the Development of the Relationship: A Theory of Close Relationships Permalink https://escholarship.org/uc/item/2030n6jk Author Page, Emily Publication Date 2021 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Openness to the Development of the Relationship: A Theory of Close Relationships A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Philosophy by Emily Page 2021 © Copyright by Emily Page 2021 ABSTRACT OF THE DISSERTATION Openness to the Development of the Relationship: A Theory of Close Relationships by Emily Page Doctor of Philosophy in Philosophy University of California, Los Angeles, 2021 Professor Alexander Jacob Julius, Chair In a word, this dissertation is about friendship. I begin by raising a problem related to one traditionally found within some of the philosophical literature regarding moral egalitarianism: that of partiality and friendship, except that I raise the issue of partiality within the context of one’s close relationships. From here I propose a solution to the problem based on understanding a close relationship as one in which friends possess the attitude of openness to the development of the relationship. The remainder of the dissertation is concerned with elaborating upon and explaining this conception of friendship and its consequences. In the course of doing this I propose a theory of the self and how we relate to one another, consider the importance of the psychophysical self, explore the notion of mutual recognition, reflect on relationship’s end, and, finally, explore the connections between friendship and play.
    [Show full text]
  • Thesis and Dissertation
    Thesis and Dissertation UWG General Guidelines for Formatting and Processing Go West. It changes everything. 2 TABLE OF CONTENTS Table of Contents Thesis and Dissertation Format and Processing Guidelines ...................................................... 3 General Policies and Regulations .................................................................................................. 5 Student Integrity ........................................................................................................................ 5 Submission Procedures ............................................................................................................ 5 Format Review ...................................................................................................................... 5 Typeface .................................................................................................................................... 6 Margins ...................................................................................................................................... 6 Spacing ...................................................................................................................................... 6 Pagination ................................................................................................................................. 6 Title Page .................................................................................................................................. 7 Signature Page ........................................................................................................................
    [Show full text]
  • Removing Boilerplate and Duplicate Content from Web Corpora
    Masaryk University Faculty}w¡¢£¤¥¦§¨ of Informatics !"#$%&'()+,-./012345<yA| Removing Boilerplate and Duplicate Content from Web Corpora Ph.D. thesis Jan Pomik´alek Brno, 2011 Acknowledgments I want to thank my supervisor Karel Pala for all his support and encouragement in the past years. I thank LudˇekMatyska for a useful pre-review and also for being so kind and cor- recting typos in the text while reading it. I thank the KrdWrd team for providing their Canola corpus for my research and namely to Egon Stemle for his help and technical support. I thank Christian Kohlsch¨utterfor making the L3S-GN1 dataset publicly available and for an interesting e-mail discussion. Special thanks to my NLPlab colleagues for creating a great research environment. In particular, I want to thank Pavel Rychl´yfor inspiring discussions and for an interesting joint research. Special thanks to Adam Kilgarriff and Diana McCarthy for reading various parts of my thesis and for their valuable comments. My very special thanks go to Lexical Computing Ltd and namely again to the director Adam Kilgarriff for fully supporting my research and making it applicable in practice. Last but not least I thank my family for all their love and support throughout my studies. Abstract In the recent years, the Web has become a popular source of textual data for linguistic research. The Web provides an extremely large volume of texts in many languages. However, a number of problems have to be resolved in order to create collections (text corpora) which are appropriate for application in natural language processing. In this work, two related problems are addressed: cleaning a boilerplate and removing duplicate and near-duplicate content from Web data.
    [Show full text]
  • Adobe Type 1 Font Format Adobe Systems Incorporated
    Type 1 Specifications 6/21/90 final front.legal.doc Adobe Type 1 Font Format Adobe Systems Incorporated Addison-Wesley Publishing Company, Inc. Reading, Massachusetts • Menlo Park, California • New York Don Mills, Ontario • Wokingham, England • Amsterdam Bonn • Sydney • Singapore • Tokyo • Madrid • San Juan Library of Congress Cataloging-in-Publication Data Adobe type 1 font format / Adobe Systems Incorporated. p. cm Includes index ISBN 0-201-57044-0 1. PostScript (Computer program language) 2. Adobe Type 1 font (Computer program) I. Adobe Systems. QA76.73.P67A36 1990 686.2’2544536—dc20 90-42516 Copyright © 1990 Adobe Systems Incorporated. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of Adobe Systems Incorporated and Addison-Wesley, Inc. Printed in the United States of America. Published simultaneously in Canada. The information in this book is furnished for informational use only, is subject to change without notice, and should not be construed as a commitment by Adobe Systems Incorporated. Adobe Systems Incorporated assumes no responsibility or liability for any errors or inaccuracies that may appear in this book. The software described in this book is furnished under license and may only be used or copied in accordance with the terms of such license. Please remember that existing font software programs that you may desire to access as a result of information described in this book may be protected under copyright law. The unauthorized use or modification of any existing font software program could be a violation of the rights of the author.
    [Show full text]
  • Conforming to Heaven Organizational Principles of the Shuō Wén Jiě Zì
    Conforming to Heaven Organizational Principles of the Shuō wén jiě zì Rickard Gustavsson S1581066 [email protected] Supervisor: Dr. P. van Els MA Thesis Asian Studies: Chinese Studies Faculty of Humanities Leiden University Word count: 14,879 (excluding appendices) 14 July 2016 1 Table of contents 1. INTRODUCTION .......................................................................................................................... 3 1.1 RESEARCH TOPIC ........................................................................................................................ 3 1.2 LITERATURE REVIEW ................................................................................................................... 4 1.3 METHODOLOGY AND OBJECTIVES ............................................................................................... 10 2. THE 一 YĪ SECTION ................................................................................................................... 12 2.1 THE 一 YĪ RADICAL .................................................................................................................... 12 2.2 元 YUÁN ................................................................................................................................... 14 2.3 天 TIĀN ..................................................................................................................................... 15 2.4 丕 PĪ .......................................................................................................................................
    [Show full text]
  • Qimaging Digital Proof Guidelines NEW.Indd
    Welcome to Quad/Graphics! Client Reference Guide PLANT PROFILE Waseca, Minnesota ADDRESS A web offset plant, Waseca specializes in weekly, bi- Finishing 2300 Brown Avenue weekly and monthly special interest publications; • Waseca’s array of perfect binders includes 24-, 32-, Waseca, MN 56093 consumer magazines and business-to-business 46- and 52-pocket machines. (P) 320.654.2400 publications; and catalogs and retail inserts. (F) 507.835.0420 • The plant operates 8-pocket, 18-pocket, FAQS 20-pocket and 28-pocket saddle stitchers. Year Opened: 1957 Plant Size: 784,000 sq. ft. • Finishing equipment includes polywrappers, Employees: 750 roto trimmers with quarter folders, flat cutters, Product Mix: Publications, catalogs and retail inserts shrink wrappers, a bellyband/auto wrapper a collator and an offline mailer with selective inkjet PLANT TEAM capabilities. Plant Director: George Forge Plant Controller: Julie Mast Distribution Customer Service Manager: Michael Morrison • The plant offers full logistics and distribution Press Department Manager: Darin Coraggio services, including list services, logistics planning, Premedia: David Jorgensen package services, ground expediting, newsstand Finishing Department Manager: Mike Kelly distribution, product tracking, ocean freight, firm Distribution: Daniel Walock bundling, co-palletization and co-mailing. Postal Solutions: Brenda Mullins Human Resources: Ghulam Awan POINTS OF INTEREST • Quad/Graphics acquired the Waseca plant in SERVICES/CAPABILITIES 2014 as part of our acquisition of Brown Printing Printing Company. The plant was Brown’s original site and • The plant is equipped with single- and double- served as the company’s headquarters. web offset presses featuring closed loop control • The plant holds chain-of-custody paper control systems, including Goss Sunday 3000 certifications from FSC, FSI and PEFC.
    [Show full text]
  • Rocket Multivalue Integration Server Installation
    Rocket MultiValue Integration Server Installation and User Guide Version 1.3.0 March 2021 MVIS-130-IUG-01 Notices Edition Publication date: March 2021 Book number: MVIS-130-IUG-01 Product version: Version 1.3.0 Copyright © Rocket Software, Inc. or its affiliates 1996–2021. All Rights Reserved. Trademarks Rocket is a registered trademark of Rocket Software, Inc. For a list of Rocket registered trademarks go to: www.rocketsoftware.com/about/legal. All other products or services mentioned in this document may be covered by the trademarks, service marks, or product names of their respective owners. Examples This information might contain examples of data and reports. The examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. License agreement This software and the associated documentation are proprietary and confidential to Rocket Software, Inc. or its affiliates, are furnished under license, and may be used and copied only in accordance with the terms of such license. Note: This product may contain encryption technology. Many countries prohibit or restrict the use, import, or export of encryption technologies, and current use, import, and export regulations should be followed when exporting this product. 2 Corporate information Rocket Software, Inc. develops enterprise infrastructure products in four key areas: storage, networks, and compliance; database servers and tools; business information and analytics; and application development, integration, and modernization. Website: www.rocketsoftware.com Rocket Global Headquarters 77 4th Avenue, Suite 100 Waltham, MA 02451-1468 USA To contact Rocket Software by telephone for any reason, including obtaining pre-sales information and technical support, use one of the following telephone numbers.
    [Show full text]
  • A Corpus-Based Study of the Effects of Collocation, Phraseology
    A CORPUS-BASED STUDY OF THE EFFECTS OF COLLOCATION, PHRASEOLOGY, GRAMMATICAL PATTERNING, AND REGISTER ON SEMANTIC PROSODY by TIMOTHY PETER MAIN A thesis submitted to the University of Birmingham for the degree of DOCTOR OF PHILOSOPHY Department of English Language and Applied Linguistics College of Arts and Law The University of Birmingham December 2017 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. Abstract This thesis investigates four factors that can significantly affect the positive-negative semantic prosody of two high-frequency verbs, CAUSE and HAPPEN. It begins by exploring the problematic notion that semantic prosody is collocational. Statistical measures of collocational significance, including the appropriate span within which evaluative collocates are observed, the nature of the collocates themselves, and their relationship to the node are all examined in detail. The study continues by examining several semi-preconstructed phrases associated with HAPPEN. First, corpus data show that such phrases often activate evaluative modes other than semantic prosody; then, the potentially problematic selection of the canonical form of a phrase is discussed; finally, it is shown that in some cases collocates ostensibly evincing semantic prosody occur in profiles because of their occurrence in frequent phrases, and as such do not constitute strong evidence of semantic prosody.
    [Show full text]
  • Preparing Your Printer-Ready Manuscript
    Preparing Your Printer-Ready Manuscript Revised: 30 June 2003 Introduction ....................................................................................................... 1 What Is a Printer-Ready Manuscript? 1 Why Read This Manual? 1 Preparing Your Printer-Ready Manuscript 1 Page and Document Setup............................................................................... 1 Page Size 1 Headers and Footers 2 Chapter Title Pages 2 Pagination 3 Font Selection..................................................................................................... 3 Style and Size 3 Spacing 4 Text Formatting................................................................................................. 5 Headings and Subheadings 5 Paragraphs 5 Line Spacing 5 Block Quotations/Extracts 6 Footnotes and Endnotes 6 Bibliographies and Indices 7 Orphans and Widows 7 Stacking 7 Typical SBL Book Specifications: 6 x 9 ........................................................... 8 Indices................................................................................................................. 8 Photographs and Illustrations ......................................................................... 8 Cover Design ..................................................................................................... 9 Preparing a Printer-Ready PDF File ............................................................. 10 Software 10 Settings 10 Creating a PDF File 11 Checking Your Printer-Ready File 11 Printing a Final Copy of Your Manuscript .................................................
    [Show full text]
  • Create a Report with Formatting, Headings, Page Numbers and Table of Contents
    Next page Create a report with formatting, headings, page numbers and table of contents MS Office Word 2010 Combine this model with instructions from your teacher and your report will be something you can be proud of. I have made a sample report based on this instructions. You can find it here. ICT-instructor LTU Christer Wahlberg MS Word 2010 Contents Next Contents page Outline Formatting Headings The links on the left leads you directly to Body a particular topic. Some topics consist of several pages. References Page and Section Breaks Pagination Table of Contents Multilevel List Table of Figures Final touch ICT-instructor LTU Christer Wahlberg MS Word 2010 Contents Next Outline page Start by creating the report's outline. It may look slightly different depending on in which department you are studying. An example: • The title page Once you know the layout and what to • Foreword enter under the headings, type your text • Summary into a word processor. This guide shows • Abstract MS Office Word 2010, but, with a little • Table of Contents(TOC) ingenuity you can transfer the instructions • Introduction to other programs. • Theory • Method • Evaluation/Outcomes • Discussion • References • Appendices ICT-instructor LTU Christer Wahlberg MS Word 2010 Contents Next Outline, continued page Fill your report with text and anything else that should be there. Don´t worry about any Write your text formatting at continuously under this stage. each heading. The only time you should use the ENTER key is where you want to create a new paragraph! And maybe to create This heading will be at the a little "air" on the top of next page, but we will pages once you solve that later by inserting a finish formatting.
    [Show full text]
  • Headings and Subheadings Manual
    Headings and Subheadings Manual This manual discusses: 1. The formatting requirements for headings and chapters/ sections. 2. It provides step-by-step instructions on how to set up heading styles (major headings and subheadings) in Microsoft Word. This will help with consistently formatting both headings and subheadings. Heading styles are particularly important because they allow Word to index the location of each individual heading and subheading. This will then enable Word to automatically generate a Table of Contents (discussed in the Table of Contents Manual), rather than having to manually create one. Thus, this will ultimately save you time and ensure that the headings and subheadings are consistently labeled on your Table of Contents. Briefly, it is important to note the different types of headings discussed in this manual. 1. Major headings: These include chapter headings as well as the headings for any other major section in your document (e.g., Abstract, Table of Contents, Acknowledgements, References, Appendices/Appendix, and Curriculum Vitae). 2. Subheadings: These include the different section headings within your chapters. You can have primary (first-level) subheadings, secondary (second-level) subheadings, etc. Sections: Section 1: Formatting Requirements for Headings and Chapters/ Sections (p. 2) Section 2: Formatting Major Headings (p. 3) Section 3: Formatting Subheadings (p. 6) Section 4: Clearly Mistakes Related to Style Formatting (p. 7) Updated by GC SST Summer 2021 Section 1: Formatting Requirements for Headings and Chapters/ Sections Formatting Requirements for Headings: • Heading sizes and styles must be consistent throughout the document. This means that all major headings (e.g., Table of Contents, chapter headings, References, etc.) must have the same style.
    [Show full text]