An Overview of Pandoc the (Few) Conventions That the Program Follows in Order to Format the Document

An Overview of Pandoc the (Few) Conventions That the Program Follows in Order to Format the Document

44 TUGboat, Volume 35 (2014), No. 1 An overview of Pandoc the (few) conventions that the program follows in order to format the document. Massimiliano Dominici These languages are mainly used in two fields: Abstract documentation of code (reStructuredText, AsciiDoc, etc.) and management of contents for the web (Mark- This paper is a short overview of Pandoc, a utility down, Textile, etc.). In the case of code documenta- for the conversion of Markdown-formatted texts to tion, the use of an LML is a good choice, because the many output formats, including LAT X and HTML. E documentation is interspersed in the code itself, so it 1 Introduction should be easy to read by a developer perusing the code; but at the same time it should be able to be Pandoc is software, written in Haskell, whose aim converted to presentation formats (PDF and HTML, is to facilitate conversion between some lightweight traditionally, but today many IDEs include some markup languages and the most widespread ‘final’ form of visualization for the internal documentation). document formats.1 On the program's website [3], In the case of web content, the emphasis is placed Pandoc is described as a sort of `swiss army knife' for on the ease of writing for the user. Many content converting between different formats, and in fact it management systems already provide plugins for one is able to read simple files written in LAT X or HTML; E or more of those languages and the same is true for but it is of lesser use when trying to translate LAT X E static site generators2 that are usually built around documents with non-trivial constructs such as com- one of them and often provide support for others. mands defined by the user or by a dedicated package. The various wiki dialects can be considered another Pandoc shows its real utility, in my opinion, instance of LML. when what is needed is to obtain several output The actual `lightness' of an LML depends greatly formats from a single source, as in the case of a docu- on its ultimate purpose. In general, an LML con- ment distributed online (HTML), in print form (PDF ceived for code documentation will be more complex via LAT X) and for viewing on tablets or ebook read- E and less readable than one conceived for web content ers (EPUB). In such cases one may find that writing management, which in turn will often not be capable the document in a rich format (e.g. LAT X) and con- E of general semantic markup. A paradigmatic exam- verting later to other markup languages often poses ple of this second category is Markdown that, in its significant problems because of the different `philoso- original version, stays rigorously close to the mini- phies' that underlie each language. It is advisable, in- malistic approach of the first LMLs. The following stead, to choose as a starting point a language that is citation from its author, John Gruber, explains his `neutral' by design. A good candidate for this role is a intentions in designing Markdown: lightweight markup language, and in particular Mark- down, of which Pandoc is an excellent interpreter. Markdown is intended to be as easy-to-read In this article we will briefly discuss the concept and easy-to-write as is feasible. Readabil- of a `lightweight markup language' with particular ity, however, is emphasized above all else. reference to Markdown (x 2), and then we will re- A Markdown-formatted document should be view Pandoc in more details (x 3) before drawing our publishable as-is, as plain text, without look- conclusions (x 5). ing like it's been marked up with tags or for- matting instructions.3 2 Lightweight markup languages: The only output format targeted by the refer- Markdown ence implementation of Markdown is HTML; indeed, Before getting to the heart of the matter, it is advis- Markdown also allows raw HTML code. Gruber has able to say a few words about lightweight markup languages (LML) in general. They are designed with 2 Static site generators are a category of programs that the explicit goal of minimizing the impact of the build a website in HTML starting from source files written in markup instructions within the document, with a a different format. The HTML pages are produced beforehand, usually on a local computer, and then loaded on the server. particular emphasis on the readability of the text by Websites built this way share a great resemblance with old a human being, even when the latter does not know websites written directly in HTML, but unlike those, in the building process it is possible to use templates, share metadata Translation by the author from his original in ArsTEXnica across pages, and create structure and content programmati- #15, April 2013, \Una panoramica su Pandoc", pp. 31{38. cally. Static site generators constitute an alternative to the 1 Strictly speaking, LATEX isn't a ‘final’ document format more popular dynamic server applications. in the same way PDF, ODF, DOC, EPUB, etc. are. But, from 3 [1], http://daringfireball.net/projects/markdown/ the point of view of a Pandoc user, LATEX is a ‘final’ | or syntax#philosophy. A significant contribution to the intermediate, at least | product. design of Markdown was made by Aaron Swartz. Massimiliano Dominici TUGboat, Volume 35 (2014), No. 1 45 Table 1: Markdown syntax: inline elements. Element Markdown LATEX HTML Links [link](http://example.net) \href{link}{% <a href="http://example.net/"> http://example.net} link</a> Emphasis _emphasis_ \emph{emphasis} <em>emphasis</em> *emphasis* \emph{emphasis} <em>emphasis</em> Strong emphasis __strong__ \textbf{strong} <strong>strong</strong> **strong** \textbf{strong} <strong>strong</strong> Verbatim `printf()` \verb|printf()| <code>printf()</code> Images ![Alt](/path/to/img.jpg) \includegraphics{img} <img src="/path/to/img.jpg" alt="Alt" /> Table 2: Markdown syntax: block elements. Element Markdown LATEX HTML Sections # Title # \section{Title} <h1>Title</h1> ## Title ## \subsection{Title} <h2>Title</h2> ... ... ... Quotation > This paragraph \begin{quote} <blockquote><p> > will show This paragraph This paragraph > as quote. will show will show as quote. as quote. \end{quote} </p></blockquote> Itemize * First item \begin{itemize} <ul> * Second item \item First item <li>First item</li> * Third item \item Second item <li>Second item</li> \item Third item <li>Third item</li> \end{itemize} </ul> Enumeration 1. First item \begin{enumerate} <ol> 2. Second item \item First item <li>First item</li> 3. Third item \item Second item <li>Second item</li> \item Third item <li>Third item</li> \end{enumerate} </ol> Verbatim Text paragraph. Text paragraph. <p>Text paragraph.</p> grep -i '\$' <file \begin{verbatim} <pre><code> grep -i '\$' <file grep -i '\$' <file \end{verbatim} </code></pre> always adhered to these initial premises and has Of course, in the reference implementation there is consistently refused to extend the language beyond no LATEX output, so I have provided the most logical the original specifications. This stance has caused a translation. In the following sections we will see how proliferation of variants, so that every single imple- Pandoc works in practice. mentation constitutes an `enhanced' version. Famous websites like GitHub, reddit and Stack Overflow, all 3 An overview of Pandoc support their own Markdown flavour; and the same As mentioned in the introduction, Pandoc is pri- is true for conversion programs like MultiMarkdown marily a Markdown interpreter with several output or Pandoc itself, which also introduce new output formats: HTML,LATEX, ConTEXt, DocBook, ODF, formats. It's not necessary, here, to examine the OOXML, other LMLs such as AsciiDoc, reStructured- details of the different flavours; the reader can get an Text and Textile (a complete list can be found in [3]). idea of the basic formatting rules from tables 1 and 2. Pandoc can also convert, with severe restrictions, An overview of Pandoc 46 TUGboat, Volume 35 (2014), No. 1 a source file in LATEX, HTML, DocBook, Textile - name: First Author or reStructuredText to one of the aforementioned - affiliation: First Affiliation output formats. Moreover it extends the syntax of - name: Second Author Markdown, introducing new elements and providing - affiliation: Second Affiliation customization for the elements already available in --- the reference implementation. and then, in the template $for(author)$ 3.1 Markdown syntax extensions $if(author.name)$ Markdown provides, by design, a very limited set $author.name$ of elements. Tables, footnotes, formulas, and biblio- $if(author.affiliation)$ ($author.affiliation$) graphic references have no specific markup in Mark- $endif$ down. The author's intent is that all markup exceed- $else$ ing the limits of the language should be expressed $author$ $endif$ in HTML. Pandoc maintains this approach (and, for $endfor$ LATEX or ConTEXt output, allows the use of raw TEX code) but makes it unnecessary, since it introduces to get a list of authors with (if present) affiliations. many extensions, giving the user proper markup for As we will see in section 3.1, a YAML block can each of the elements mentioned above. In the follow- also be used to build a bibliographic database. ing paragraphs we'll take a look at these extensions. Footnotes Since the main purpose of Markdown Metadata Metadata for title, author and date can and its derivatives is readability, the mark and the be included at the beginning of the file, in a text text of a footnote should usually be split. It is block, each preceded by the character %, as in the recommended to write the footnote text just below following example. the paragraph containing the mark, but this is not strictly required: the footnotes could be collected % Title at the beginning or at the end of the document, for % First Author; Second Author % 17/02/2013 instance.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    7 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us