Testing Indexes: Testidx.Sty \Testidx \Tstidxprintglossaries Nicola L

$Testing Indexes: Testidx.Sty \Testidx \Tstidxprintglossaries Nicola L$ TUGboat, Volume 38 (2017), No. 3 373 Testing indexes: testidx.sty \testidx \tstidxprintglossaries Nicola L. C. Talbot \end{document} Abstract 2 Intentional issues The testidx package [10] provides a simple method The dummy text is designed to introduce issues that of generating a test document with a multi-paged your style or build process may need to guard against. index for testing purposes. The dummy text and These allow you to test the document style, the way index produced is designed to replicate problems the indexing information is written to the external commonly encountered in real documents. file, and the way the indexing application processes The words and phrases indexed cover the basic that information. Latin set A(a), . , Z(z) and some extended Latin 2.1 Stylistic issues characters, such as Ø(ø), Æ(æ), Œ(œ), Å(å), Þ(þ) and Ł(ł), to test the indexing application’s ability to The style issues are those which need addressing sort according to various Latin alphabets (such as through the use of LATEX code within the document Swedish or Icelandic). Version 1.1 also includes some itself, or in the class or package that deals with the words starting with digraphs, Dd(dd), Dz(dz), Ff(ff), index style, or within a style file or module used by IJ(ij), Ll(ll), Ly(ly), Ng(ng), and a trigraph Dzs(dzs), the indexing application which controls the LATEX to test alphabets where these are considered separate code that’s written to the output file. The test doc- letters (such as Welsh, Dutch or Hungarian). ument should load the appropriate document class There are also some numbers and symbols in- and indexing package to match your real document. dexed that don’t have a natural word order. 2.1.1 Page breaking 1 Introduction There are enough entries for the index to span mul- There are a number of problems that can occur when tiple pages. If you have letter group headings in generating an index using LATEX. These may relate to your index style there’s a good chance that there will the index style (\printindex), the way the indexing be at least one instance of a page or column break information is written to an external file (\index) occurring between a heading and the first entry of or the way the indexing application (such as xindy that letter group. There’s also a chance that a break or makeindex) performs. A large document may will also occur between a main entry and the first of have a complicated and slow build process, which its sub-entries. can be frustrating when making minor adjustments This does, of course, depend on the font and to the index layout. The testidx package provides page dimensions. You may need to adjust the geome- a way to create a test document that can be used try to cause an unwanted break before experimenting to enhance the required style. Section 5 shows how with adjusting the style to prohibit it. the sample text can be extended to include tests for 2.1.2 Headers and footers other languages or scripts. The simplest test document is: Since the index spans multiple pages, it’s possible to test the headers and footers for the first page of the \documentclass{article} \usepackage{makeidx} index as well as subsequent even and odd pages. This \usepackage{testidx} is useful if the header or footer content needs to vary \makeindex and you need to check that this is done correctly. \begin{document} 2.1.3 Line breaking \testidx \printindex The index contains a mixture of single words, com- \end{document} pound words, phrases, names, places and titles. This Version 1.1 of testidx comes with the supplemen- means that some of the entries are quite wide, which tary package testidx-glossaries, which uses the inter- can cause line breaking problems in narrow columns. face provided by the glossaries package [9] instead of 2.1.4 Whatsits testing \index and \printindex. In this case, the simplest test document is: Some of the entries are indexed immediately before the term, for example \documentclass{article} \usepackage{testidx-glossaries} \index{page break}page break \tstidxmakegloss and some are indexed immediately after the term, \begin{document} for example Testing indexes: testidx.sty 374 TUGboat, Volume 38 (2017), No. 3 paragraph\index{paragraph} So if this is the way that you’re indexing words The whatsit introduced by \index can cause prob- in your real document, this is the mode you need lems. This is most noticeable in an example equation in the test document. where the indexing interferes with the limits of a sum- 2. Unmodified ASCII. (‘ASCII Accents’) mation. In practice, the \index would need to be This mode is triggered by running LATEX with moved to a more suitable location, but the example testidx’s nostripaccents option and without provided by the dummy text helps to highlight the the inputenc package. This mode emulates doing problem. \index{\'elite@\'elite} 2.2 Index recording issues Use this mode if in your real document you are simply doing, for example, \index{\'elite}. The way that indexing typically works is to write the 3. Active UTF-8. entry data (using \index) to an external file that’s This mode is triggered if the inputenc package then input and processed by the indexing application. with the utf8 option is loaded and testidx is This write operation can sometimes go wrong causing loaded afterwards with the nosanitize option. incorrect information to be written to the external This emulates doing file. (There’s no test for incorrect syntax within the \index{élite} argument of \index. It’s assumed you know how to correctly index entries. The tests here are for the Since the inputenc package makes the first octet underlying operation of \index.) of a UTF-8 character active, this causes the entry The glossaries package uses a similar method but to be expanded as it’s written to the index file, instead of using \index, the file write instruction so that it appears as: is internally performed by commands like \gls and \IeC {\'e}lite \glsadd. Use this mode if in your real document you are doing, for example, \index{élite} and you 2.2.1 Page breaking want to test how it’s written to the index file. The dummy text has some long paragraphs with (This mode is the default when using X LE ATEX indexing scattered about them. This increases the or LuaLATEX, but the characters aren’t active, chance of a page break occurring mid-paragraph so it’s much the same as the next mode.) (although again it depends on the font and page 4. Sanitized UTF-8. dimensions). TEX’s asynchronous output routine can The three modes listed above are for emu- cause page numbers to go awry, and this provides a lating different \index usage. This last mode useful way to check that the page number is written really belongs in the next section as it’s provided correctly to the external file. for testing the indexing application’s UTF-8 sup- 2.2.2 Extended Latin characters port, but is included in this list for completeness. This mode is triggered if inputenc is loaded The indexed entries include terms that contain non- with the utf8 option and testidx is loaded after- ASCII letters (either through accent commands like wards with the sanitize option. This emulates \’ or using UTF-8 characters). The UTF-8 encoding doing isn’t an issue for the modern X LAT X or LuaLAT X E E E \def\word{élite}\@onelevel@sanitize\word A engines, but it is a problem for the older LTEX \expandafter\index\expandafter{\word} formats. If your engine doesn’t natively support The sanitization isn’t applied to any remain- UTF-8 and you have characters outside the basic ing content of \index, such as the encap. For Latin set, then this is something that needs to be example, tested. The testidx package has four modes to test this, depending on your set-up. \index{ð|see{eth (ð)}} is implemented such that only the ð part be- 1. ASCII with LATEX commands stripped. (‘Bare ASCII’) fore the encap is sanitized so this would end up This mode is triggered through the use of written to the index file as LATEX with testidx’s stripaccents option and ð|see{eth (\IeC {\dh })} without the inputenc package [2]. This option (testidx doesn’t modify the \index command, is the default, so the first test example above but uses the \expandafter approach where the is in this mode. This mode emulates doing, for control sequence has a combination of sanitized example: and non-sanitized content.) \index{elite@\'elite} There’s no support for other encodings. Nicola L. C. Talbot TUGboat, Volume 38 (2017), No. 3 375 2.3 Indexing application issues One rule may ignore those characters (for example, An indexing application typically reads the external ‘vice-president’ < ‘viceroy’ < ‘vice versa’) whereas another rule may treat a space as coming before a file created by LATEX (the input file that contains data, discussed in the previous section) and produces hyphen (for example, ‘vice versa’ < ‘vice-president’ another file (the output file that contains typeset- < ‘viceroy’). A ting instructions) which can then be read by LTEX 2.3.4 Numbers (through commands like \printindex). The termi- nology here is a little confusing as the input file from The test entries include some numbers (2, 10, 16, 42, the indexing application’s point of view is an output 100).

Load more