Digraphs and trigraphs

In , are sequences of two and three characters, respectively, that appear in source code and, according to a ' specification, should be treated as if they were single characters.

Various reasons exist for using digraphs and trigraphs: keyboards may not have keys to cover the entire set of the language, input of special characters may be difficult, text editors may reserve some characters for special use and so on. Trigraphs might also be used for some EBCDIC code pages that lack characters such as { and }.

Contents

History Implementations Language support ALGOL Pascal C++ RPL

Application support GNU Screen Lotus

See also References External links

History

The basic character set of the C programming language is a subset of the ASCII character set that includes nine characters which lie outside the ISO 646 invariant character set. The ANSI C committee invented trigraphs as a way of entering source code using keyboards that support any version of the ISO 646 character set.

Implementations

Trigraphs are not commonly encountered outside test suites.[1] Some support an option to turn recognition of trigraphs off, or disable trigraphs by default and require an option to turn them on. Some can issue warnings when they encounter trigraphs in source files. Borland supplied a separate program, the trigraph preprocessor (TRIGRAPH.EXE), to be used only when trigraph processing is desired (the rationale was to maximise speed of compilation).

Language support

Different systems define different sets of digraphs and trigraphs, as described below.

ALGOL

Early versions of ALGOL predated the standardized ASCII and EBCDIC character sets, and were typically implemented using a manufacturer-specific six-bit character code. A number of ALGOL operations either lacked codepoints in the available character set or were not supported by peripherals, leading to a number of substitutions including := for ← (assignment) and >= for ≥ (greater than or equal).

Pascal

The Pascal programming language supports digraphs (., .), (* and *) for [, ], { and } respectively. Unlike all other cases mentioned here, (* and *) were and still are in wide use. However, many compilers treat them as a different type of commenting block rather than as actual digraphs, that is, a comment started with (* cannot be closed with } and vice versa.

J

The J programming language is a descendant of APL but uses the ASCII character set rather than APL symbols. Because the printable range of ASCII is smaller than APL's specialized set of symbols, . () and : () characters are used to inflect ASCII symbols, effectively interpreting unigraphs, digraphs or rarely trigraphs as standalone "symbols".[2]

Unlike the use of digraphs and trigraphs in C and C++, there are no single-character equivalents to these in J.

C

The replaces all occurrences of the following nine trigraph sequences by their single-character equivalents before any other processing.[3][4]

A programmer may want to place two question marks together yet not have the compiler treat them as introducing a trigraph. The C grammar does not permit two consecutive ? tokens, so the only places in a C file where two question marks in a row may be used are in multi-character constants, string literals, and comments. This is particularly a problem for the classic Mac , where the constant Trigraph Equivalent '????' may be used as a file type or creator. To safely place two consecutive question marks within a , the programmer can use string concatenation "...?""?..." or an ??= # "...?\?...". ??/ \

??? is not itself a trigraph sequence, but when followed by a character such as - it will be interpreted as ? + ??-, as in the example below which has 16 ?s before the /. ??' ^

The ??/ trigraph can be used to introduce an escaped for line splicing; this must be taken into account for correct and efficient handling of trigraphs within the preprocessor. It can also cause ??( [ surprises, particularly within comments. For example: ??) ]

??! | // Will the next line be executed????????????????/ a++; ??< {

??> } which is a single logical comment line (used in C++ and ), and ??- ~ /??/ * A comment *??/ / which is a correctly formed block comment.

In 1994, a normative amendment to the C standard, included in C99, supplied digraphs as more readable alternatives to five of the trigraphs. They are listed in the table on the right. Digraph Equivalent Unlike trigraphs, digraphs are handled during tokenization, and any digraph must always represent a full token by itself, or compose the token %:%: replacing the preprocessor concatenation token ##. <: [ If a digraph sequence occurs inside another token, for example a quoted string, or a character constant, it will not be replaced. :> ]

C++ <% { %> } C++ (through C++14, see below) behaves like C, including the C99 additions, but with additional tokens listed in the table.[5] %: # As a note, %:%: is treated as a single token, rather than two occurrences of %:.

The C++ Standard makes this comment with regards to the term "digraph":[6] Token Equivalent

%:%: ## The term "digraph" (token consisting of two characters) is not perfectly descriptive, since one of the alternative preprocessing-tokens is %:%: and of course several primary tokens contain two characters. Nonetheless, those alternative tokens that aren' lexical keywords are colloquially known as "digraphs". compl ~

not ! [7] [8] Trigraphs were proposed for deprecation in C++0x, which was released as C++11. This was opposed by IBM, speaking on behalf of itself and other users of C++, and as a result trigraphs were bitand & retained in C++0x. Trigraphs were then proposed again for removal (not only deprecation) in C++17.[9] This passed a committee vote, and trigraphs (but not the additional tokens) are removed from bitor | C++17 despite the opposition from IBM.[10] Existing code that uses trigraphs can be supported by translating from the source files (parsing trigraphs) to the basic source character set that does not include trigraphs.[9] and &&

or ||

RPL xor ^

Hewlett-Packard calculators supporting the RPL language and input method provide support for a large number of trigraphs (also called TIO codes) to reliably transcribe non-seven-bit ASCII characters of and_eq &= the calculators' extended character set[11][12][13] on foreign platforms, and to ease keyboard input without using the CHARS application.[14][15][12][13] The first character of all TIO codes is a \, followed or_eq |= by two other ASCII characters vaguely resembling the glyph to be substituted.[14][15][12][13][16] All other characters can be entered using the special \nnn TIO code syntax with nnn being a three-digit decimal number (with leading zeros if necessary) of the corresponding (thereby formally representing a tetragraph).[14][12][13] xor_eq ^= not_eq != Application support

Vim

The Vim supports digraphs for actual entry of text characters, following RFC 1345 (https://tools.ietf.org/html/rfc1345). The entry of digraphs is bound to Ctrl + by default.[17] The list of all possible digraphs in Vim can be displayed by typing :dig .

GNU Screen

GNU Screen has a digraph command, bound to Ctrl + A Ctrl + by default.[18]

Lotus

Lotus 1-2-3 for DOS uses Alt + F1 as to allow easier input of many special characters of the Lotus International Character Set (LICS)[19] and Lotus Multi-Byte Character Set (LMBCS).

See also

Compose key Character entity reference Escape sequence C alternative tokens

References

1. Jones, Derek . "Sentence 117". The New C Standard: An Economic and Cultural Commentary. 2. Hui, Roger. "Vocabulary" (http://www.jsoftware.com/help/dictionary/vocabul.htm). jsoftware.com. Retrieved 2015-04-16. 3. British Standards Institute (2003). The C Standard - Incorporating TC1 - BS ISO/IEC 9899:1999. John Wiley & Sons. ISBN 0-470-84573-2. 4. "Rationale for International Standard - Programming Languages - C" (http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf) (PDF). 5.10. April 2003. Archived (https://web.archive.org/web/ 20160606072228/http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf) (PDF) from the original on 2016-06-06. Retrieved 2010-10-17. 5. Stroustrup, Bjarne (1994-03-29). Design and Evolution of C++ (1st ed.). Addison-Wesley Publishing Company. ISBN 0-201-54330-3. 6. http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2012/n3337.pdf 7. C++0X, CD 1, National Body Comments (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2837.pdf), SC22/WG21 N2837, 2009-01-30 comment UK 11 8. Comment on Proposed Trigraph Deprecation (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2009/n2910.pdf), by Michael Wong, Hubert Tong, Robert Klarer, Ian McIntosh, Raymond Mak, Christopher Cambly, Alain LaBonté, N2910, 2009-06-19 9. "Removing trigraphs??! (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n3981.html)", N3981, Richard Smith, 2014-05-06 10. IBM comment on preparing for a Trigraph-adverse future in C++17 (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4210.pdf), IBM paper N4210, 2014-10-10. Authors: Michael Wong, Hubert Tong, Rajan Bhakta, Derek Inglis 11. HP 82240B Infrared Printer (http://www.manualslib.com/manual/912948/Hp-82240b.html) (1 ed.). Corvallis, OR, USA: Hewlett Packard. August 1989. HP reorder number 82240-90014. Retrieved 2016-08-01. 12. HP 48G Series – User's Guide (UG) (http://www.hpcalc.org/details.php?id=3937) (8th ed.). Hewlett-Packard. December 1994 [1993]. pp. 2–5, 27–16. HP 00048-90126, (00048-90104). Archived (https://we .archive.org/web/20160806145719/http://www.hpcalc.org/details.php?id=3937) from the original on 2016-08-06. Retrieved 2015-09-06. [1] (http://www.hpcalc.org/hp48/docs/misc/hp48gug.zip) 13. HP 50g / 49g+ / 48gII graphing calculator advanced user’s reference manual (AUR) (http://www.hpcalc.org/details.php?id=7141) (2 ed.). Hewlett-Packard. 2009-07-14 [2005]. pp. J-1, J-2. HP F2228- 90010. Retrieved 2015-10-10. Searchable PDF (http://holyjoe.net/hp/HP_50g_AUR_v2_English_searchable.pdf) 14. "HP RPL TIO Table" (http://holyjoe.org/hp/tiotable.htm). holyjoe.org. Archived (https://web.archive.org/web/20160523164117/http://holyjoe.org/hp/tiotable.htm) from the original on 2016-05-23. Retrieved 2015-01-23. 15. Heinz, Sr., Michael . (2005). "HP-ASCII and Trigraphs" (http://hpconnect.sourceforge.net/trigraphs.html). Archived (https://web.archive.org/web/20160802011132/http://hpconnect.sourceforge.net/trigra phs.html) from the original on 2016-08-02. Retrieved 2016-08-02. 16. Finseth, Craig A. (2012-02-25). "chars" (https://www.finseth.com/hpdata/chars.php). Archived (https://web.archive.org/web/20171221075534/https://www.finseth.com/hpdata/chars.php) from the original on 2017-12-21. Retrieved 2017-12-21. 17. "Vim documentation: *digraphs-default*" (http://vimdoc.sourceforge.net/htmldoc/digraph.html#digraphs-default). 2011-01-15. 18. "Digraph - Screen User's Manual" (https://www.gnu.org/software/screen/manual/html_node/Digraph.html). 19. "Appendix ". HP 95LX User's Guide (http://www.retroisle.com/others/hp95lx/OriginalDocs/95LX_UsersGuide_F1000-90001_826pages_Jun91.pdf) (PDF) (2 ed.). Corvallis, OR, USA: Hewlett-Packard Company, Corvallis Division. June 1991 [March 1991]. F0001-90003. Archived (https://web.archive.org/web/20161128202642/http://www.retroisle.com/others/hp95lx/OriginalDocs/95LX_UsersGuide_F100 0-90001_826pages_Jun91.pdf) (PDF) from the original on 2016-11-27. Retrieved 2016-11-27.

External links

RFC 1345 (https://tools.ietf.org/html/rfc1345)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Digraphs_and_trigraphs&oldid=854819537"

This page was last edited on 14 August 2018, at 00:51 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.