KNOW HOW GNU

GNU m4 Creating HTML pages m4 is a language used for text processing. It simply copies its input to its output, while expanding built-in or user-defined macros. It can also capture output from standard commands. BY STIG BRAUTASET

e will cover the basics of m4 familiar to anyone this point you may ask “What’s the by separating out content and installing software directly from source. point then? I may as well be writing the Wlayout of HTML web pages. As mentioned earlier, m4 allows us to HTML code in the usual way, instead of The techniques shown are not by any easily define our own macros, and this is using this m4 mumbo-jumbo!” The means restricted to HTML, but are the feature we will focus on. We will astute reader would, of course, be 100% applicable elsewhere as well. learn how to hide layout specifics, long- correct in this observation. However, winded code or arcane syntax behind read on as the benefits will be explained Before we begin… our own simple macros. in the next couple of paragraphs. Very basic knowledge of HTML would be Consider a snip of HTML you a lot; beneficial, but should not be necessary to How m4 can you write a simple example is the code for creating read and comprehend the material HTML links. Chances are it will be like this: covered in this article. It is m4’s There are no (to the author’s knowledge) capabilities as a macro processor ready-made macros for writing HTML, U language the author wishes to communi- so we we will have to write them ourself. www.w3.org cate; HTML was chosen because it is something people should be at least Naming conventions vaguely familiar with. When splitting the content from the layout of web pages, the author prefers to call the common What is m4? macro for html.m4.The content of each page goes in a file with the ending “.mac”,e.g. index.mac and so on.When explaining this,however,we develop our macro and .mac files bit by m4 is a macro processor used by bit,so we need to refer to several different files.Thus it is convenient to name them first.m4, Sendmail and GNU , etc. first.mac and so on. Sendmail relies on m4 for creating its See the sidebar “Joining the content and layout”how to create the resulting HTML file from the (in)famous sendmail.cf configuration file macros and the content. while Autoconf uses m4 to create the

40 November 2002 www.linux-magazine.com GNU m4 KNOW HOW

Now, what if we instead define a simple Comments: documenting The only change is that we now have macro that will allow us to write “$1” and “$2” instead of “$*”. “$1” and our macros “$2” refer to the first and the second __elink(www.w3.org) If a macro is not self-explanatory,we would argument of our macro. The arguments like to put an explanatory comment along are separated by the comma character. side the macro definition. m4 naturally and let m4 do the tedious job of filling in So, Sherlock, what now if we want to allows us to do this; it provides the built-in the necessary bits? That’s than half create a macro that can take an dnl reads and discards all characters, the number of characters already. up to and including the first newline. argument with a comma in it? That, my Additionally, observe that we only had to good Watson, is simple. We just have to dnl Iamanexample comment. write the name once, so there is less put quote the argument. When you dnl Iamhighly unhelpful, chance of us spelling it wrong. quote something, everything between dnl but 100% correct. Another example is if we have a note the quote-characters will be treated as a Using dnl as part of a string does not exhibit on our page telling visitors the date of single argument, even if it consist in this behaviour. the last update. It is tedious work entirety of a string of, say, 90 commas. searching through all the files you have The default opening quote is a “`” changed, searching for and updating body. We will refer to the whole line as (back-tick) and the default closing quote date stamps. Instead we can simply the macro definition. The definition line is a “‘” (single quote). We can now define a macro named, say, __today that above in English: “Define a new macro invoke __link2 with a comma in the will into today’s date. The named __link. Everywhere where this second argument thus: search-and-replace business will then be macro occurs, substitute the macro name taken care of for us automatically. How for the body of the macro, but substitute __link2(www..org, U to do this will be shown later; we need to the macro’s arguments (whatever is 'www.GNU.org, the home of much U take care of the basics first. inside the parentheses following the software') macro name) for “$*” wherever “$*” Getting your hands dirty occurs in the macro body.” The second comma is now quoted, so As mentioned earlier, m4 allow us to We will store the macros we write the macro is indeed invoked with only define our own macros with ease. The ourselves, such as the above, in a file two arguments. Note that only one layer command to let us do this is cunningly called html.m4. Se sidebar “Comments: named define. Here’s how to define a documenting our macros” for details Listing 1: sample.html macro to let us use the link shorthand about how we can mix comments and above: macros in this file. It’s worth noting that macro names do $*) is just a convention, because we need to 8859-1"> sure that we do not pick a string the parentheses) is the macro name, and Otherwise we may get spurious layout sample html page The define command we used above Here’s how you create the resulting expects two arguments; the new macro index.html from the macro definitions in name, and what to replace the macro html.m4 and the content in index.mac:

Hello, World

name with. Considering again the $m4html.m4 index.mac > example with the __link macro above, index.html This invokes the m4 processor with two what should we do if we don’t want to arguments.The m4 command will take the use the URL as the visible, “clickable” macro definitions it finds,do the necessary link? We could simply create a new Listing 2: first.mac macro that takes several arguments, and substitutions and output the result on its __title({Hello World, HTML U standard output.However,we make use of invoke it thus (in context this , just version}) the shell’s redirection facilities to make the to show that it is possible): “Here is

Hello World

output go to a file instead of the screen (if __link2(www.w3.org, a link to w3.org).

Look at this link: U this makes no sense to you,just tag along It is an informative website.” Here’s how __link(www.w3.org)

and follow the directions,but you should to define such a beast: consider reading up on shell basics).Now

Last updated: __today

open first.html in a browser,and voil! We define(__link2, U have a web page! $2)

www.linux-magazine.com November 2002 41 KNOW HOW GNU m4

of quotes (the outermost) are stripped by and can even be called several times m4, so the apostrophe in the following from the same file. Listing 5: second.m4 invokation will not yield an error: The effect is immediate, but only for this changequote({,}) U invokation of m4. The author strongly dnl change quote character __link2(www.gnu.org, U advocates that you stick to one set of dnl two macros for link-creation. 'GNU, RMS's pet hobby-horse') quotes, as it quickly becomes rather define(__link, U hairy having to remember which quotes $*) The author usually changes the default go where. define(__link2, U quote characters into “{” and “}” for The quoting “characters”, by the way, $2) readability and ease of typing. The need not be single characters; you may dnl abstract away all the U command for changing the quote use “{([whoopee->” as your opening layout cruft at the beginning. characters, with an appropriate com- quote if you wish. Neither is there any define(__title, { ment attached (see the “Comments: need for the closing quote to correspond documenting our macros” sidebar), is: logically to the opening quote. It is just a convention, and makes the macros Type" content= change the quote characters "text/html; charset=U changequote({,}) dnl change U iso-8859-1"> changequote takes two arguments, the quote character respectively. It can be called at any time, link name specified specifically Listing 3: first.m4 $2) changequote({,}) U $1 dnl change quote character With a html.m4 containing the dnl two macros for link-creation. definitions shown above we can invoke define(__link, U our __link2 macro thus: $1 $*) define(__link2, U __link2(www.w3.org, {w3.org, U $2) a site well worth reading}) }) dnl the __title macro U dnl abstract away all the layout U ends here cruft at the beginning. Enough basics, let’s do some real work dnl use built-in ‘esyscmd’to call the define(__title, { With HTML, there’s always a lot of standard Linux ‘date’dnl utility and have its stuff that needs to be set up at the of output replaced with the ‘__today’dnl each page. If you have than, say, macro name.The date will be on the form U 2-3 pages that have a similar layout (but tags "Content-Type" content= esyscmd(date ‘+%a %d %b %Y’)) "text/html; charset=U etc.) then you will probably want to iso-8859-1"> create a macro for all this stuff. We will in listing 1, and see how we can get a We already know how to create the macro-skills. new addition is the macro __today listing 2 shows the content of the $1 file first.mac. This is the mixture of Listing 6: third.mac,25 HTML and macro calls that together define(__index) dnl allows U with our macro definitions enables us to conditional processing of the U }) dnl the __title macro U page ends here Listing 4: second.m4 __title2({Hello World, HTML U dnl use built-in 'esyscmd' to call the __title2({Hello World, U version}, { standard Linux 'date' HTML version}, {

Hello World

dnl utility and have its output

Hello World

__menu replaced with the '__today'

Look at this link: U

Look at this link: U dnl macro name. The date will be on __link(www.w3.org)

__link(www.w3.org)

the form "Sun 16 Jun 2002" define(__today, esyscmd(date '+%a %d

Last updated: __today

Last updated: __today

%b %Y')) }) })

42 November 2002 www.linux-magazine.com GNU m4 KNOW HOW

capability to call standard Linux tools, and tags either, and indeed __rlink(index.html, index))
and puts the output of the said you don’t have to. m4 allows macros to ifdef({__pics}, pictures, U command (“date” in this case) into the be nested, thus we can use a macro __rlink(pics.html, pictures)) text. Listing 3 shows the full content of within another macro. The result is

first.m4. This contains all the macro shown in listing 4. Witness that the }) definitions required by first.mac which is __title2 macro takes two arguments, the found in listing 2. first being the page title and the second The two ifdef lines are new to us. They being the full page body. Be careful first check whether a certain macro More abstraction when you go to these lengths of name is defined (observe that the first Looking at the code in listing 2, you may abstraction though, as it is easy to miss argument of ifdef has to be quoted). If not want to write the closing out the closing “})” at the end of the file the macro name is undefined, the third if you do extensive updates. argument will be input into the text, and Listing 7: third.m4 The change to listing 3 to facilitate this the second argument will be ignored. changequote({,}) U is shown in listing 5. The use of the __menu macro is shown dnl change quote character in listing 6. define(__link, U More advanced macros The __rlink macro is also new. Its $*) Up till now, we have only looked at fairly name stands for relative link in that it define(__link2, U simple search-and-replace macros. These does not prepend http:// to the link. It is $2) work fine, but consider if we have a shown in listing 7, which is the final define(__rlink, U collection of pages, with a common macro listing file. $2) menu. We could put the whole menu in dnl abstract away all the U a macro of the type we have used before, Summary layout cruft at the beginning. but then the pages would include a link We have seen how to use m4 to create define(__title2, { to itself as well as all others, and this is macros to help us maintain HTML pages. not very elegant. A solution, of course, is We went from very simple one-line to just -and- the menu in to the substitution macros, like __link and built-in conditionals. built-in ability to capture the output of that identifies that file. In index.mac we __today macro. Lastly we used m4’s conditional “ifdef” can then use these macro that expands into a different take special actions on this file. The $1 menu could then be something like this: [1] GNU m4:more information about the m4 define(__menu, { macro processor can be found at $2

MENU
http://www.gnu.org/software/m4/m4.html ifdef({__index}, index, U [2]HTML tidy:get your HTML cleaned up and validated }) Using tidy to clean up the http://www.w3.org/People/Raggett/tidy/ dnl use built-in ‘esyscmd’to call the mess standard Linux ‘date’dnl utility and have its output replaced with the ‘__today’dnl If you’ve opened any of the HTML files you’ve Stig Brautaset,born in macro name.The date will be on the form created from the macro and content files, Norway, is the “Sun 16 Jun 2002” you’ve probably found that there’s a lot of unnecessary white-space in them.This is OK, founder of the Linux define(__today, esyscmdU since excessive white-space is simply ignored Society at the University of (date '+%a %d %b %Y')) by web browsers. If you’re a pedantic zealot Westminster. He is define(__menu, { like the author,you’ll want your source to be HOR

T currently in his last

beautiful on its own as well. year of a BSc Artificial Intelligence ifdef({__index}, index, U Enter “tidy”,an HTML validating,correcting degree. His interests largely revolves __rlink(index.html, index))
and pretty-printing program. Simply invoke around

U THE AU ifdef({__pics}, pictures, tidy on your HTML files thus: tidy -im from e- spam filters to games. __rlink(pics.html, pictures)) first.html and the file will be audited and Regularly spending much time on IRC he can be found there under the nick

printed prettily.See the sidebar “Links and “Skuggan”or “Skugg”. }) resources”for where to get tidy.

www.linux-magazine.com November 2002 43