GENER8 — Text Preprocessor
Scripting Manual
Program release 8.1.6 dated 9 Dec 2021
Copyright © 2002–2022 by Stan Brown, BrownMath.com
Scripting Manual
Program release 8.1.6 dated 9 Dec 2021
Copyright © 2002–2022 by Stan Brown, BrownMath.com
Summary: GENER8 is perfect for generating documents with a lot of repetition, or repetition with variations, like a series of HTML files for a Web site. GENER8 lets you include files, define and use macros, perform arithmetic, and output different text conditionally. While its original inspiration was the C preprocessor, GENER8 was written from the ground up to work with any text at all, and it requires no programming knowledge.
GENER8 takes one or more input files and processes them to produce an output file. Along the way it can perform text substitution and computations and paste other files or the results of system commands into the output.
This is ideal for producing series of files that are partly boilerplate, like pages in a Web site. It’s also terrific when you have several versions of a document that are mostly the same but have a few differences. And even if you’re producing just one document, if it’s got lots of repetition it may be worth your while to use macros to cut down the amount of typing you do.
The GENER8 package consists of a script of 2000+ lines written in the AWK language. You don’t need to understand the AWK language to use it.
GENER8 requires the free program GAWK (GNU AWK). If GAWK isn’t already on your system, you can find a free copy at GNU.org. Windows users can get the latest version of the same port I use, part of the GnuWin32 project. Users of any operating system can download the source code and compile it for themselves.
Be sure to use GNU AWK (GAWK). GENER8 uses several features of GAWK that are not present in some other implementations. GENER8 has been tested with GAWK 3.0.4 and 3.1.0 in 32-bit and 64-bit Windows, but it should work on any operating system with GAWK 3.0.4 or later.
There’s no special installation process. Simply unzip the downloaded ZIP file in any convenient directory.
You may choose to move the actual program file somewhere
else. It’s completely self-contained; you can even delete all the
other files if you wish.
You may find it convenient to set the
AWKPATH
variable to point to it.
There’s no special uninstall procedure; simply delete the GENER8 files. GENER8 doesn’t write any secret files or modify the Windows registry.
GENER8 is a command-line utility:
gawk -f gener8.awk {options} inputfiles... >outputfile
gawk
is the GNU AWK program; it might be called
something different on your system. Please make sure you are using
GNU AWK, because other AWK variants may not have
all the features that GENER8 relies on.
-f
specifies the GENER8 program. If
gener8.awk
gener8.awk
is not in your current directory, as probably it is not,
you have two choices:
-f ../mybin/gener8.awk
.
AWKPATH
environment variable.
After any options, specify one or more input files. GENER8 will process them in the order specified and will write the results to the specified output file. Both input and output files may be in other directories; just specify appropriate paths.
The input files contain text, directives, and macro calls.
The options are, well, optional. But each one that you
specify must be in lower case, with -v
before
each option. All the options must precede the input file(s).
-v stderr=
fileNormally any error messages are written to the standard error stream (which is usually your screen), but this option defines a file for error messages. If any messages are actually written, they will overwrite the file; otherwise the file is untouched.
-v maxerrors=
nSometimes a single mistake — misspelling a macro name in
a #define
directive, for
instance — can trigger a whole cascade of error messages.
Therefore, by default GENER8 will display only the first five
errors to the standard error output. The input file(s) are processed
to the end, and if debugging is in
effect then all error messages are still written to the output
file.
You may want to set a different limit on the number of errors that
are displayed. To do this, set the maxerrors
option to
the desired number. If you set maxerrors=0
, GENER8
will display all errors, however many there may be.
-v debug=
nThis option sets initial debugging mode to 0, 1, or 2; the default is 0 (no debug information).
Debugging mode can also be set or changed by the
#debug
directive in any input
file. Please see that section for more information about
debugging.
-v picky=
nThis option sets initial macro pickiness to 0, 1, or 2; the default is 1 (treat undefined macros as defined but empty).
Macro pickiness can also be set or changed by the
#picky
directive in any input
file. Please see that section for more information about
debugging.
-v home=
path/file-v target=
path/fileThese options are intended primarily for use when you’re using
GENER8 to create HTML files for a Web site. You specify
-v target
as the path and filename of the output
file on your disk, and -v home
as the path and
filename of the site’s home page on your disk. If you use these
options, either specify absolute locations for the files or specify
locations relative to the current file.
Use forward slashes to separate directories, even in Windows.
GENER8 uses these options, if present, to set up the
predefined macros
HOME
,
RELHOME
,
TARGET
,
TARGETDIR
, and
TARGETNAME
.
-v logo=no
This option suppresses the one-line program logo that is normally displayed on the standard error stream. You may find it useful in batch files, to streamline the output.
-v tocmin=
minlevel and -v tocmax=
maxlevelThese options are used to implement the two-pass processing of
the TOCB
macro, and you should not
enter them on the command line.
Two optional environment variables specify search paths. You may wish to use them instead of specifying full paths on some file names.
The exact method for setting environment variables varies from one version of Windows to another. In general, go to AUTOEXEC.BAT file.
or » , and then select . In very old Windows, or “classic” DOS, set the variable in yourAWKPATH
: Locations of ProgramsIf you specify just
-f gener8.awk
on the command line, without a
path, GAWK normally looks for the program in the
current directory. You can use the
AWKPATH
environment variable to tell GAWK
where to look for AWK programs. Example for Windows:
set AWKPATH=.;d:/my/code;c:/util
Example for UNIX:
set AWKPATH=.:~/mybin/progs
For UNIX, use forward slashes in directories and separate them with colons; for Windows use forward slashes (not backslashes) but separate directories with semicolons.
The period (.
) at the start of the two examples above
stands for the current directory.
GAWK examines AWKPATH
only to look for AWK
programs. Input files on the command line must have explicit paths if
they are not in the current directory.
INCLUDE
: Locations of Include FilesIf your input files contain
#include
directives, you
may want to keep the include files in one or more directories other
than the current directory. And if you do keep some include files
elsewhere, you probably want to specify the search path in one place,
not specify paths for the included files one by one.
That one place is the INCLUDE
environment
variable. If it is defined, then whenever GENER8 finds an
#include
in an input file, and no path is given for the
file to be included, and the file isn’t in the current directory, then
GENER8 will look for the include file in the directories
specified in your INCLUDE
environment variable.
For Windows, specify paths with / or \ and separate them with semicolons. Example:
d:/my/code;c:/util
For other systems, specify paths with / and separate them with colons, like this:
~/mybin/progs
There is no need to specify current directory for search. If you
#include
a plain file with no path, GENER8 will
always look for it in the current directory first.
If you prefer to set the include path right within your
input file, use the #includepath
directive.
Any #includepath
directive will be
used in preference to the environment variable, from the point in the
file where the #includepath
directive occurs.
ERRORLEVEL
)GENER8 normally returns the value 0 if the program finished
normally and 1 if GENER8 found errors like bad macro calls or
missing include files. Any other value comes from the
gawk
program itself.
Windows command-line programmers can access the returned value with the
IF ERRORLEVEL
statement.
GENER8 is pre-programmed to do a lot of the work for you. There are many directives and predefined macros. They are presented later in alphabetical order, directives followed by macros, but first here’s a list by category:
ARITH
REGINC
REGPRE
GSUB
GDEL
ARITH
LOWER
UPPER
REGINC
REGPRE
IIF
IIFDEF
DEFINED
EXISTS
#if
#elif
#ifdef
#ifndef
#elifdef
#elifndef
#else
#endif
DATE
FILEDATE
DATE_MONTHS4
DATE_SYSFORMAT
#include
file
#include-always
file
#include!
#includepath
TOCB
TOCF
TOCMIN
TOCMAX
#tocif
#tocinsertli
EXISTS
FILEDATE
FILESIZE
ENV
SYSTEM
SYSTEMINLINE
#include!
FILENAME
HOME
INCLUDEFILE
RELHOME
TARGET
TARGETDIR
TARGETNAME
#commentregexp
#includepath
#macrosep
#macrosepregexp
DATE_MONTHS4
#picky
DATE_SYSFORMAT
#info
#debug
#define
#freeze
#undef
REGSET
EMPTY
What’s the difference between a directive and a predefined macro, since both are pre-programmed in GENER8? To some extent it’s a matter of historical accident, stemming from GENER8’s inspiration by the C-language preprocessor.
Syntactically, directives begin with a #
character and
stand alone on a line, but macro calls are set off with
(#
… #)
and can appear
on the same line as other text.
GENER8 honors a number of directives, as listed alphabetically below. (See also: Directives and Predefined Macros by Category.)
If the first
non-blank character on the line is #
, the line is
recognized as a directive. If you need to start a regular text line with a
#
character, use # instead.
Every directive must be completed on one line. If you need
additional lines, use a trailing \
character
to continue on the next line.
You use directives to control or change what is written to the output file, but the directive itself is never written to the output file.
#commentregexp
regularexpressionBy default, GENER8 treats any line containing the string
<!-- ignore -->
as a
comment (regardless of the number of
blanks before and after ignore
). You can
change that with the #commentregexp
directive. From that
point onward, any line that matches that regular expression will be
ignored by GENER8.
Examples: To add comment:
as a second comment
marker, use the |
character in your regexp, like
this:
#commentregexp <!-- *ignore *-->|comment:
To set the comment marker to comment:
alone,
use
#commentregexp comment:
To make the semicolon the only comment marker, and only at the start of a line, use
#commentregexp ^;
#debug
If you get output that seems wrong, you can turn on debugging to see in detail what is going on. This should help you correct your input so that you get the output you want.
Debugging information, such as macro substitution and lines ignored
because of #if
directives and
#ifdef
directives, gets
written to the output file in sequence with the regular output.
The bare #debug
directive turns debugging
mode on. You can have finer control by putting a number after the
directive:
#debug 1
turns on debugging as described above.
#debug 2
turns on debugging, and also tells GENER8 to
stop immediately when it finds an error.
#debug 0
turns debugging off.
The initial value is #debug 0
, unless you set
a different value on the command line.
You can have multiple #debug
directives in an
input file, so that you can debug only a small section of input and
not have to cope with voluminous debugging output.
When debugging is on (debugging level 1 or 2), GENER8 writes error messages to the standard error stream as usual but also writes a copy of the same error messages to the output file.
#define
and #freeze
and #undef
Please see the section on defining macros.
#if
and Friends: Conditional ProcessingThe directives #if #ifdef #ifndef #elif #elifdef
#elifndef #else #endif
work together to let you determine whether to process blocks of
lines or not, based on some conditions. For instance, you might have
something like this in your input file:
#if coursenum == 200
Statistics
#elif coursenum == 201
Calculus
. . .
#endif
If you have defined the macro
coursenum
to equal 200, GENER8 will write Statistics to
the output file and ignore everything else till the
#endif
; if coursenum
is 201, GENER8 will
write Calculus to the output file and ignore everything else till the
#endif
; and so forth.
Here is the complete pattern of a group of conditional directives:
#if
or #ifdef
or #ifndef
directive
#elif
or #elifdef
or #elifndef
directives, each
followed by a block to be processed if that condition is true
#else
directive followed
by a block to be processed if no previous condition in this group is true
#endif
directive
Out of any group of conditional directives, as soon as one
condition is found to be true, its block is processed and everything
else until the #endif
is ignored. If there is more than one true
condition in
an #if-#elif
series, only the first
true one will be processed.
Anything can be inside conditional blocks, including other directives, even more conditional directives. In other words, conditional blocks can be nested.
Conditional processing is most useful when you have a number of decisions in the input file that depend on a small number of conditions. A useful technique is to put the macro definitions in a separate file. In this example you’d have one file for the Statistics definitions, one for the Calculus definitions, and so on. Then when you are building an output file, on your command line you would list the appropriate file of macro definitions, followed by your main input file. For instance:
gawk -f gener8.awk stat.def syllabus >statsyllabus.htm
#if
expression#elif
expressionThese directives contain conditions to be tested.
The form is fairly loose: any expression that ultimately
evaluates to true (nonzero) or false (zero).
Expressions are described under the
ARITH
macro below.
If there’s any error in the expression, GENER8 will print an error message to the standard error stream and ignore the directive, which may cause additional errors further down the line.
Any macro calls that appear after
#if
or #elif
on the line will be evaluated
in the same way as macros inside (#ARITH#)
.
Any macro names that appear will be expanded but their
contents won’t be evaluated; in that case, a pure number 0 is
false and anything else, including text strings, is true.
Example:
#if 0
ignores everything until the next #elif
,
#else
, or #endif
.
Example:
#if coursenum
depends on whether a macro named coursenum
is
defined.
If there is no such macro, then
coursenum
is a pure text string.
Text strings evaluate to true, not to false as you might expect.
If there is such a macro, its contents are
pasted but not evaluated: numeric 0 is treated as false and
anything else, including text, is treated as true.
To avoid this kind of confusion with macros that may or may
not be defined, you may want
to stick to #ifdef
directives, or use
expressions
like 0+
macroname
and 1-
macroname to force an undefined macro to be
treated as 0.
If the macro was defined as 25>100
,
the #if
will nevertheless be true because the macro is
pasted as text, not evaluated. To avoid this kind of
confusion with macros that contain expressions, either evaluate the
macro on the #if
line by enclosing it in
(#…#)
, or define the macro with
the #freeze
directive
and the ARITH
macro.
Example:
#if (#coursenum#)
evaluates the contents of coursenum
.
A non-numeric result or a nonzero numeric result is treated as true; a
result of numeric 0 is treated as false. Any of the following would
return a result of true: abc
, 1
,
25<100
. Any of the following return false:
50-50
, 25>100
.
Example:
#if coursenum == 200
tests whether the definition of macro coursenum
is
the three characters 200
. If coursenum
has exactly that
definition, the following block (to the next #elif
or #else
or #endif
) will be
executed; if coursenum
has some other definition, even
100+100
, the following block will be ignored.
Example:
#if coursenum+0 == 200
converts the contents of coursenum
to a
number. Effectively, this tests whether the definition of macro coursenum
begins with something that looks like the number 200, including
200xyz, 2e2nonsense, and so forth. If coursenum
is
100+100
, the string-to-integer conversion stops at the first
non-numeric character, the plus sign, so the test is for 100==200,
which is false.
Example:
#if (#coursenum#) == 200
tests whether the macro coursenum
is an expression that evaluates to 200, including 200, 100+100,
800xyz/4, and so forth.
If conditionals don’t seem to be going as you expect, try turning on debugging.
#ifdef
macroname#ifndef
macronameIt can be handy to use a macro as a simple switch: you take one set
of actions if the macro is defined and another (or no actions) if it
is not defined. These directives let you
test whether a macro is defined.
It makes no difference whether the macro
definition used #define
or
#freeze
.
These tests are most appropriate for empty macros, though you can test any macro name with them.
The DEFINED
macro
makes the same test as #ifdef
, but can be used in
expressions.
#elifdef
macroname#elifndef
macronameThese directives let you test whether a macro is defined if some preceding condition is not true, without nesting.
Example: Suppose you give some people your phone number and some your e-mail address, but nobody gets both. (Admittedly, this is a contrived example.) You could code your contact information like this:
#ifdef phone
You can phone me at (#phone#).
#elifdef email
You can e-mail me at (#email#).
#else
Well, there's always tin cans and a string.
#endif
#else
#endif
These directives have already been explained: the block following
the optional #else
is processed if no preceding
condition was true, and the required #endif
marks the
end of the conditional group.
Anything on the line after #else
or
#endif
is ignored (treated as a
comment).
#include
file#include-always
fileThese directives are identical except in the case noted below.
GENER8 reads either of these in an input file, it suspends processing the current file, opens the named file and processes it, then after reaching the end of the named file continues with the input file that was being processed.
You may use macro calls on the
#include
line to specify part or all of file. In
fact, the entire #include
line can be created inside a
macro.
If file includes a path specification, GENER8 will look
only in the specified location. If file does not specify a
path, GENER8 will look first in the current directory and then in
order in the directories (if any) specified in the latest
#includepath
directive, or if there has been no
#includepath
, the directories in the
INCLUDE
environment
variable. If GENER8 can’t locate the file, it prints an error
message to the standard error stream and aborts all processing.
The value of the
FILENAME
macro
does not change while processing an included file: it always
refers to the current input file from the command
line. The value of the
INCLUDEFILE
macro
does change when an included file is opened or closed.
The included file may itself contain another #include
directive, and so on down the line. The limit to these
nested includes depends on how many open files your system
allows.
#include-always
versus #include
In almost every context, these two are the same,
and you should use #include
because it’s shorter.
The one exception comes when the directive occurs
among a group of ignored lines between an #if
directive and an
#endif
directive. In this case, the #include
directive is ignored just
like all other lines in the ignored group, but
the #include-always
directive will be processed.
This lets you include an #elif
,
#else
, or #endif
directive in an include
file and have your logic work exactly as if the lines of the included
file had been inserted at that spot in the main file.
Why not just make the #include
directive behave that
way? Well, suppose something like this is in the included file:
#ifdef extrafile
some lines
#include (#extrafile#)
more lines
#endif
If extrafile
isn’t defined, the
#include
line is ignored; this is how GENER8 has
always worked.
But suppose the #include
directive was
processed, even though the #if
condition is false? Macro
extrafile
isn’t defined, so that’s one error. Either that
will end your run, or having an #include
line with no
filename will end it, depending on the value of the #picky
directive.
The cleanest solution is to continue the #include
directive’s longtime
behavior of being ignored or processed just like other lines, and
introduce a new #include-always
for the rare cases where
your #if
… #endif
logic is split
between the main file and an include file, or between two include
files.
#includepath
path;path;...While you can specify a search path for include files with the
INCLUDE
variable,
it’s awkward to specify an environment variable in a set of
make
commands. So it may be more convenient to specify
the include path right in the input file.
Specify one path or multiple paths separated with semicolons (colons for UNIX systems). Any \ characters within the paths will be changed to /. Don’t specify an empty path or a single period to indicate current directory; GENER8 will always look in the current directory before searching the include path.
The #includepath
directive overrides any path
that may be specified in the environment variable.
Therefore, if you want to
include any previously defined paths, use the value of the
INCLUDE
variable, like this:
#includepath f:\somewhere\faraway;(#ENV INCLUDE#)
GENER8 sets its internal copy of the INCLUDE
variable to the contents of the #includepath
directive.
Therefore a pattern like the above will work whether the previous path
was set in a previous #includepath
directive, or in the
environment before calling GENER8.
#include!
commandWhen GENER8 reads this directive in an input file, it
executes the command, which might be a program name
with arguments or a shell command. You may use
macro calls to specify all or part of the
command. In
fact, the entire #include
line can be created inside a
macro.
Spaces are allowed before or after the
!
character.
GENER8 intercepts any output from the command and
processes it just as though it had come from an input file; therefore
any macro calls and directives in the command output are
processed. This is the difference from the otherwise similar
SYSTEM
macro, which
also executes a system command: the SYSTEM
macro lets the
system command write directly to the output file with no processing by
GENER8.
If the command generates no output, GENER8 prints an error message to the standard error stream and aborts all processing.
#info
textSometimes you need to display a piece of information while GENER8 is processing source files. For example, maybe you’re not sure how a macro is being expanded. But you don’t want to embed this kind of debugging information in your output file.
Use the #info
directive. Any macros on the line
will be expanded (and #info
itself can be the result of a
macro expansion), and then GENER8 will display the source file
name, the line number, and the text on the console, without writing
anything to the output file.
#macrosep
charactersBy default, macro arguments are separated from each other and from the macro name by a run of one or more spaces. That might be inconvenient in two contexts: if a lot of your macro arguments contain spaces, or if you reflow text and macro calls end up split across a line. The solution is to redefine the macro separator.
#macrosep
takes one argument, the macro separator
characters to be defined in addition to a run of spaces. For example,
#macrosep \+
would define the macro separator as one or more spaces, or a
backslash or plus character possibly with one or more spaces on either
side. The corresponding regular expression is
/ *| *[\\+] */
.
(GENER8 automatically escapes characters from the
#macrosep
directive that have special meaning in regular
expressions.)
#macrosepregexp
regularexpressionYou can customize the macro argument separator completely by specifying a regular expression. For example,
#macrosepregexp [;:\-]
would separate macro arguments with a single semicolon, colon, or hyphen.
#macrosepregexp \.\.\.| *- *
would separate macro arguments by either a string of three dots, or a single hyphen possibly preceded or followed by spaces.
When specifying a regular expression, you must escape any
characters that need escaping: GENER8 passes your regular
expression unchanged to the split( )
function of
AWK.
#picky
Most people like to know if they have
used a macro without
defining it, or entered an incomplete
macro definition. In these
cases, GENER8 normally displays a warning message in the standard
error stream, then treats the macro as defined but empty. The
#picky
directive lets you alter this behavior:
#picky 0
suppresses these messages.#picky 1
displays the messages but doesn’t count them
as errors. Any undefined macro is expanded as a zero-length string.#picky 2
displays the messages and does treat them as
errors. Any undefined macro is expanded as the text MACRO ERROR, and
any #if
directive using an
undefined macro generates an error message.#picky prev
returns to the previous level of
pickiness. This lets you put a #picky
directive in a file
to apply to a short section of code, then return to whatever value may
have been set on the command line or in a previous file without
knowing what that level was.The initial value is #picky 1
, unless you
specify a different value on the command
line. You can have multiple #picky
directives in an
input file.
When the macro pickiness level is 2, any macro error counts
against the maxerrors
quota (if set) and will affect the return value
passed to the operating system.
#tocif
expressionWhen a table of contents is being
generated, you may not want particular headers to appear in it. For
instance, if the table of contents itself has a header, you probably
don’t want that header in the table. #tocif
lets you
control this.
If expression evaluates to 0, subsequent headers won’t be included in the table of contents. If expression evaluates to nonzero, headers will be included. (Undefined macros are treated as text, which evaluates to nonzero.)
#tocinsertli
textWhen a table of contents is being
generated, you may occasionally need to insert some text, such as a
class=
or style=
attribute, in one or more
of the generated <li>
tags.
Any macro calls on the line will be evaluated, and the
resulting text will be stored. When the table of contents is
generated, the text will be placed just before the >
of the <li>
tag. To stop the insertion, use a
#tocinsertli
directive with no text, or only
spaces.
A macro is a bit of stored text for later processing. Pretty much any sequence of characters that is used several times (perhaps with variants) is a candidate for making into a macro. When you call a macro, GENER8 inserts the macro text at that point in the output file. You can define your own macros, and GENER8 has quite a number of macros predefined for you.
A macro can be simple unvarying text, or it can contain placeholders for arguments that are supplied when you call the macro. For example, if you are creating an HTML table and you want most cells to be centered horizontally and vertically, you can define a macro that contains the repetitive HTML coding with a placeholder for the cell contents.
To call a macro, whether it’s
predefined or one you
defined yourself,
simply specify its name surrounded by (#
and #)
. For instance, if you have
defined the macro qtr
to
contain the text second quarter of 2002
, you might write a sentence
like this:
Profits rebounded in the (#qtr#) — hooray!
As you see from the example, a macro call need not be on a line by itself.
If you call a macro that was never defined, GENER8
displays a warning message and then treats the macro as defined but
empty. You can change that behavior with the
#picky
directive.
Some macros take arguments, bits of additional text or numbers that the macro uses in some way. See below for the arguments required for predefined macros; see Defining Your Own Macro for how to create macros that take arguments.
When you use (“call”) a macro, separate any macro arguments from the macro name
and each other by spaces. If you want to have a
space inside a macro argument,
code it as __
(two underscores); if you want to
include a line break in a macro
argument, code it as \n
. (If you actually want a double
underscore, code it as _\_
. If you actually want _\_,
you’re out of luck.)
You can change the argument separator from a space to almost
anything you like. If you often want spaces inside your macro
arguments, this may be a better solution than the __
hack. See the #macrosep
directive or #macrosepregexp
directive.
A macro with all its arguments need not be alone on a line, but
everything from the opening (#
to the closing
#)
must be on one line. If the macro call is too
long to fit comfortably on an input line, use \
and
continue it to one or more additional lines.
The \
continuation may occur in the middle of an argument
or between arguments.
If you’re using an editor that reflows text, a macro that you code on a
line may be split when you reflow the text. To avoid this, pick a
different argument separator so that there are no spaces within
the macro. See the #macrosep
directive.
GENER8 will check that you have supplied the proper number
of arguments according to the macro definition. For example, suppose a
macro is defined with %1
through %4
but no
higher argument numbers:
%4
and neither
%?
nor %*
, the call must have exactly four
arguments.%4
and also
%?
, the call must have four arguments but may have
more.%4
and also
%*
, the call must have more than four
arguments, namely five arguments or more.You can use a macro call in an argument to another macro call. In this case, the inner macro is expanded first, before the outer macro is analyzed. For more about this, see Nested Macros.
The following macros are automatically defined for you when GENER8 starts up. They are listed in alphabetical order here. (See also: Directives and Predefined Macros by Category.)
If you define a macro with the same name as a predefined macro, your definition will replace the original one. Probably you don’t want to do that. Since the names of predefined macros consist entirely of upper-case (capital) letters, I recommend that you pick names for your own macros that are not all capital letters.
ARITH
expressionARITH
format expressionGENER8 performs arithmetic, string, and logical operations and pastes the numeric or string result in the output. The expression consists of numeric and string operands connected by the operators listed below. Spaces are optional between operators and operands. You can use parentheses to specify the order of operations.
GENER8 uses AWK’s logic in treating
operands and expressions as strings or numerics.
1234
is always numeric,
and 2e2
is always numeric (with a value of 200). But
1234z
is a string.
Strings and numerics are automatically converted where appropriate:
+
-
*
/
^
are done in
floating point. The result will have an appropriate number of decimal
places, or if it’s a whole number it will have no decimal places and
no decimal point.
String operands are converted to numeric. Only the numeric characters before the first non-numeric get used; if there aren’t any, then the string is converted to 0. For example, the strings 200xyz and 2e2zzz and 200.0 would all become 200 if converted to numeric.
Unary +
and !
are not arithmetic
operators, and string arguments are not converted. +abc
is abc, and !abc
is 0, not 1 as you might
expect.
The relationals can provide some surprises: if either operand
is a string, both are treated as strings.
1234>98
is true because both are numeric, but
1234z>98
is false because they are compared as strings.
You can force conversions where you need to:
""
(an empty string).You can specify a format string before the
expression. Use standard AWK format (printf
style)
strings to specify the conversion from number to string.
For example,
(#ARITH 7/4#)
displays as 1.75, but
(#ARITH %08.3f 7/4#)
displays as 0001.750. Perhaps you want a rounded answer; then you would use
(#ARITH %.0f 7/4#)
which displays as 2 (no decimal point).
(%d
and %i
don’t
round; they truncate results to an integer.)
The grammar of expressions follows, from highest priority (evaluated first) to lowest priority (evaluated last).
"
characters. Use \"
to include a "
character
within the string.mac
is defined as 2+2, then (#ARITH mac+5#)
equals 7, not 9. If you want a macro expanded, not just treated
as a text string, call the
macro —
(#ARITH (#mac#)+5#)
does equal 9.
If you want to use a text string that happens to equal the name
of a macro, put it in quotes. The value of mac
is
the string 2+2
, but the value of "mac"
is
mac
.( )
^
and unary + - !
2^5
is 32, and -2^4
is
−16. The !
operator is “not”: !0
is
1 and !
turns any nonzero value into 0.
Caution: !
applied to a text string
returns 0, not 1. + - * / %
%
is the modulus: 17%3
is 2.
*
/
and %
are done left to
right before +
and -
.2+4 5
is 65. == != < <= > >=
134xxx >98
is false (0)
because the operands are compared as strings.~ !~
123zonk ~ 3z
is true (1) but 123zonk ~ "^3z"
is false (0)
because 123zonk doesn’t begin with 3z.
Caution! The parser is easily fooled when
the regular expression contains characters that are also GENER8
operators. For instance, ^abc
looks like a defective
exponentiation, and [019][0-9]
looks like a subtraction.
If you run into problems like this, enclose the regular expression in
double quotes to force it to be parsed as a string.
"^abc"
and "[019][0-9]"
are both parsed
correctly.
&& ||
DATE
format date timeDATE
format dateDATE
formatConvert the specified date and optional time — or the current date and time, if no date is specified — to any desired format.
If you specify a date in the macro, it must be in the same
format as your system date. More precisely, it must be in the same
format that you have told GENER8 is your system date format. See
the DATE_SYSFORMAT
macro, below.
While this might seem restrictive, the alternative was to specify both input and output formats in the macro, with different possibilities available for the two. Most people always write their dates in a given format, and a simple macro call lets you set that format if it’s different from what your system does.
In all system date formats, years can be two or four digits, and days and months can be one or two digits. Two-digit years 70 to 99 will have 1900 added; 00 to 69 will have 2000 added.
Although the supported date input
formats all use the hyphen (-
) as separator within
the date, your input dates can equally well use a
period (.
) or slash (/
) as separator.
The time, if specified, can be separated from the date by a
T
or by one or more spaces. The time can be in the form
hh, hh:
mm, or
hh:
mm:
ss, with or
without leading zeroes. If the time is on a 12-hour clock, add a space
and AM or PM in upper or lower case.
You have complete freedom in the output format that you
specify. Although you can use
strftime
’s
completely general formatting codes, most likely you’ll find
one of the following keywords gives you the format you want.
All of these are for the date 2016-04-25 9:12:27 PM.
trad
= Apr 25, 2016.traditional
=
April 25, 2016 — the same as
trad
, but with the month spelled out.custom1
= the same as trad
, except that
the year is suppressed if it’s the current year. Thus the date
2016-04-25 would display as
Apr 25 during the year 2016,
or Apr 25, 2016 (same as
trad
) in any other year.mil
= 25 Apr 2016.milopt
= the same as mil
, except that
the current year is suppressed. Thus,
25 Apr during 2016, or
25 Apr 2016 (same as mil
) in
any other year.custom2
=
Apr 25 (same as
custom1
) for the
current year, or 25 Apr 2016 (same as
mil
) for any other year.iso
= 2016-04-25.isofull
= 2016-04-25T21:12:27.timestamp
= “raw” format, suitable for
comparisons, in the format 1461643947.
This is the number of seconds since the epoch, which for
Windows systems was the start of 1 January 1970.The author is open to creating new keywords.
The keyword format strings are not case sensitive.
The three-letter abbreviations Jun, Jul, and Sep are changed
to June, July, and Sept. See the DATE_MONTHS4
macro
if you want to stick with three-letter months.
If you want non-breaking spaces ( ) instead of regular
spaces, include nbsp
anywhere in the one of the keyword
formats; for example,
isofullnbsp
or isonbspfull
. (If you
want some other character in place of the spaces, use the
GSUB
macro on the result of the
DATE
macro.)
Examples:
(#DATE custom1#)
formats the current date, in the format
April 25
.
(#DATE isofull 11/22/12 2:26 pm#)
formats the indicated date and time as 2012-11-22T14:26:00, if your system date format is m-d-y.
strftime
In addition to the predefined formats
above, you can use any
format acceptable to the strftime( )
function in
AWK. In addition, any text of yours that isn’t one of the listed format codes is passed through verbatim.
strftime
has dozens of format strings;
here’s what each one produces for the date and time
2016-04-25 9:12:27 PM:
Example | Description | Range | |
---|---|---|---|
%c | Monday April 25 21:12:27 2016 | Date and time | LOC |
%x | Monday April 25 2016 | Date | LOC |
%F | 2016-04-25 | Short YYYY-MM-DD date, same as %Y-%m-%d | |
%D | 04/25/16 | Short MM/DD/YY date, same as %m/%d/%y | |
Year | |||
%Y | 2016 | Year | |
%y | 16 | Year, last two digits | 00-99 |
%C | 20 | Year divided by 100 and truncated to integer; not the same as century | 00-99 |
%G | 2016 | Week-based year | |
%g | 16 | Week-based year, last two digits | 00-99 |
Month | |||
%m | 04 | Month number | 01-12 |
%B | April | Full month name | LOC |
%b | Apr | Abbreviated month name | LOC |
%h | Apr | Abbreviated month name, same as %b | LOC |
Week | |||
%V | 17 | ISO 8601 week number | 01-53 |
%U | 17 | Week number with the first Sunday as the first day of week one | 00-53 |
%W | 17 | Week number with the first Monday as the first day of week one | 00-53 |
Day | |||
%d | 25 | Day of the month, zero-padded | 01-31 |
%e | 25 | Day of the month, space-padded | 1-31 |
%j | 116 | Day of the year | 001-366 |
%u | 1 | ISO 8601 weekday as number with Monday as 1 | 1-7 |
%w | 1 | Weekday as number with Sunday as 0 | 0-6 |
%A | Monday | Full weekday name | LOC |
%a | Mon | Abbreviated weekday | LOC |
Time | |||
%T | 21:12:27 | ISO 8601 time, same as %H:%M:%S | |
%X | 21:12:27 | Time | LOC |
%r | 09:12:27 PM | 12-hour clock time | LOC |
%R | 21:12 | 24-hour HH:MM time, same as %H:%M | |
Time Components | |||
%H | 21 | Hour in 24-hour format | 00-23 |
%I | 09 | Hour in 12-hour format | 01-12 |
%M | 12 | Minute | 00-59 |
%S | 27 | Second | 00-61 |
%p | PM | AM or PM | |
Time Zone | |||
%z | -0800 | ISO 8601 offset from UTC in time zone (1 min=1, 1 hr=100); empty if unavailable | |
%Z | PDT | Time zone name or abbreviation; empty if unavailable | LOC |
Special Characters | |||
%% | % | Percent sign | |
%n | New-line character ('\n') | ||
%t | Horizontal-tab character ('\t') | ||
“LOC” means locale dependent in principle, but GENER8 doesn’t support locales at this time. |
Please observe these rules:
strftime
format strings are case sensitive.DATE
macro
must be one argument, you must code any spaces in the format string as
double underscores.DATE_MONTHS4
macro
if you want to stick with three-letter months.For example, the format string for the abbreviated
weekday and month/day/year is %a %D
, so you would
enter it like this:
(#DATE %a__%D#)
(The double underscore is how you embed a space in a macro argument, the format string in this case.)
Since no date was given in the macro, GENER8 will format the
system date. Sample output is
Mon 04/25/16
.
#define DATE_MONTHS4
0_or_1By default, if you select an output date format with a three-letter month, GENER8 will change Jun, Jul, and Sep to the four-letter abbreviations June, July, and Sept.
If you want to stick with three-letter months, include this line in your input file:
#define DATE_MONTHS4 0
If you want this to apply to every file, you can easily edit the GENER8.AWK file. Simply change 1 to 0 in the program line
macroStore("DATE_MONTHS4", "1")
You can change back and forth within a single file, simply by
redefining DATE_MONTHS4
.
#define DATE_SYSFORMAT
formatIf you use the
DATE
macro or the
FILEDATE
macro,
GENER8 has to know what date format you are using, or what date
format your system is using. By default, GENER8 assumes that the
system date format is y-m-d.
You can change that by defining the DATE_SYSFORMAT
macro
with one of these values:
m-d-y
: The system date, and your dates in the
DATE
macro, are in the form
month-day-year, month/day/year, or month.day.year.d-m-y
: The system date, and your dates in the
DATE
macro, are in the form
day-month-year, day/month/year, or day.month.year.y-m-d
: The system date, and your dates in the
DATE
macro, are in the form
year-month-day, year/month/day, or year.month.day. (This is the
default setting for DATE_SYSFORMAT
.)You can use .
or /
in the format
string in place of -
, if you wish.
Caution! You’re storing text into a macro. Like
any other macro, the text doesn’t actually get examined until
it’s used. An invalid system format will be caught when you call
a DATE
macro or
FILEDATE
macro. A valid
format that doesn’t match the actual date format in your system
may or may not get caught; you might just get bogus dates.
These rules apply to all input dates and times, regardless of your system date format:
If your system uses a different date format, or you want to
use a different format for entering dates in the
DATE
macro,
redefine the system date format in your input file. For example:
#define DATE_SYSFORMAT m-d-y
says that input dates will be in the form month-day-year, month/day/year, or month.day.year. You can change the format multiple times within any file.
If you want to change the default for every file, you can easily edit the GENER8.AWK file. Simply change the date format in the
macroStore("DATE_SYSFORMAT", "y-m-d")
line to one of the other supported formats.
The author is open to adding other
input date formats. Or you
could do it yourself: just edit the isodate( )
function in the GENER8.AWK file.
DEFINED
macronameIf the named macro is defined (including an empty definition),
DEFINED
is replaced with 1; otherwise it is replaced with
0.
This is handy for #if
directives
where you want to check whether something is defined
or some other condition is true. While that can be done with
nested
#ifdef
directives and
#if
directives, it’s easier to do it on one line using
a call to the DEFINED
macro.
EMPTY
This macro (which may be called with or without arguments) emits nothing. It is useful when you need an empty macro.
Unlike the other predefined macros, EMPTY
cannot be
redefined.
ENV
variableThis macro is replaced with the value of the named environment variable. If the variable is not defined in the environment, the macro is replaced with nothing (and no message is displayed).
Caution! Not all environment variables in Windows are
upper case. For instance, in Windows 7 I see
ComSpec
, not COMSPEC
. Although the Windows
command line doesn’t care about the case of environment-variable
names, AWK does.
EXISTS
fileThis macro is replaced with 1 if the named file exists, and 0 if it does not. The file path can use forward or backward slashes.
In Windows operating systems, the macro finds only real files, not folders (directories). I don’t know what happens in other operating systems, but I’d be grateful for information.
FILEDATE
file formatGENER8 queries the system for the last-modified date of the file, reformats the file date according to the format you specify, and pastes the result into the output.
The possible output formats are the same as
for the DATE
macro, including
strftime
format strings. If you don’t specify a
format, trad
will be used.
Example:
(#FILEDATE (#FILENAME#) iso#)
will find the date when the current input file was last modified and format it in ISO format, such as 2016-04-25.
System dependencies:
filedate( )
function in
the GENER8.AWK file. If you have a confirmed working edit,
please send it to me for inclusion in
GENER8.DATE_SYSFORMAT
macro
to interpret the file date that it gets from the system. If
this doesn’t match the actual system date format,
FILEDATE
will return an error (if you’re lucky), or
garbage.FILENAME
translates to the name of the file currently being read from the
command line, converted to lower case. If
the file on the command line contains a path, it will be part of
FILENAME
.
The value of (#FILENAME#)
does not change when
GENER8 finds an #include
directive
and reads the included file. Even within an include file,
(#FILENAME#)
still expands to the name of the input
file named on the command line. If you name multiple input files on
the command line, (#FILENAME#)
will change as GENER8
finishes processing one input file and starts on the next.
Compare to the INCLUDEFILE
macro.
FILESIZE
format filespecGENER8 queries the system for the size of the file in bytes, then writes that size to your output file in a format that you specify.
There is no need for quotes around a filespec that contains spaces, though they seem to do no harm. If the system can’t find the file, or if it’s a directory, the size will be zero.
The format is one or two characters:
If the second character is a comma or is omitted, and the
first character was K, M, or G, then the number displayed will be the
nearest whole number of that unit. For example, 512–1535 bytes
with format K
would display as 1 KB.
Examples, using the size of another file:
(#FILESIZE B grepman.htm#)
bytes = 293044 bytes
(#FILESIZE B, grepman.htm#)
bytes = 293,044 bytes
(#FILESIZE K grepman.htm#)
= 286 KB
(#FILESIZE K, grepman.htm#)
= 286 KB
(#FILESIZE M grepman.htm#)
= 0 MB
(#FILESIZE M3 grepman.htm#)
= 0.279 MB
System dependency: I believe I have a correct UNIX command for
getting file size in bytes. If you’re not getting correct
answers, edit the UNIX line in the filesize( )
function in
the GENER8.AWK file. If you have a confirmed correction, please
send it to me for inclusion in GENER8 in
the future.
GDEL
old how textThis macro deletes the part of text that matches the regular expression old. If there is no match, text will be placed in the output unchanged.
how determines which occurrence(s) of old get
deleted. If how is G
or g
, every
occurrence will be deleted. If how is an unsigned
number, only that occurrence will be deleted. If how is
anything else, the first occurrence will be deleted.
old must not contain any spaces. As usual, you can use a double underscore (__) as a stand-in for a space. text may contain spaces.
GDEL
is useful for manipulating file names.
For instance, this gives the base name part of
the current file name, without extension:
(#GDEL \.*$ (#FILENAME#)#)
GSUB
old new how textThis macro performs text substitution. If the regular expression old matches the text or part of it, it will be replaced with new. If there is no match, text will be placed in the output unchanged.
old is a regular expression. It may not contain spaces, but
you can use a double underscore (__)
as a stand-in for a space. old may contain parentheses to delimit
subexpressions, and you can then address these in new by
\1
, \2
, and so forth.
new may not contain spaces, but you can use a double underscore (__) as a stand-in for
a space. new may contain special characters: \&
to
indicate the entire substring of text that was matched by
old, and \1
, \2
, etc. to indicate the
part of text matched by the first, second, etc. parenthesized
subexpression in old.
new is required. There is no way to delete text using
GSUB
; use the GDEL
macro
instead.
how determines which occurrence(s) of old get
replaced. If how is G
or g
, every
occurrence will be replaced. If how is an unsigned
number, only that occurrence will be replaced. If how is
anything else, the first occurrence will be replaced.
GSUB
is useful for all sorts of manipulations.
For instance, this replaces all occurrences of two hyphens in the text from the macro with the
em dash character:
(#GSUB -- — g (#somemacro#)#)
This puts a comma after the thousands part of a number, but doesn’t put a comma if the number has no thousands part:
(#GSUB ([1-9][0-9]+)([0-9][0-9][0-9])$ \1,\2 1 (#somemacro#)#)
It would be cleaner with ([1-9][0-9]+)(0-9}{3}) for old, but for some reason that doesn’t work, even with backslashes to escape the brace characters.
GSUB
offers an easy way to get the
first N or last N characters of a text string, so that there’s
no need for a LEFT
or RIGHT
macro:
#define first4 (#GSUB (....).* \1 1 %1#)
#define last4 (#GSUB .*(....) \1 1 %1#)
HOME
If you specify the
-v home
option on the
command line, macro HOME
will be set to the given path
and file, translated to lower case. If you don’t specify the option,
HOME
is undefined.
IIF
condition iftrue iffalseIIF
condition iftrueThis macro is an inline version of the
#if
-#else
-#endif
sequence.
condition is evaluated, using the logic of
(#ARITH
condition#)
,
to determine whether it’s true (nonzero) or false (zero).
Non-numeric text is considered true, unless it matches a macro name.
Macros in condition
are expanded just as they are within ARITH
.
If condition is true
(nonzero or text), IIF
returns iftrue; if condition
is false (numeric zero), IIF
returns iffalse.
iftrue and iffalse may themselves be macro calls, as
long as they return values that don’t contain
macro-argument separator characters.
iftrue and iffalse are not evaluated unless they are
actually macro calls, so if either of them happens to match a macro
name without (#
...#)
it will still be
treated as ordinary text.
Sometimes you want to produce text if a certain condition is true and nothing if the condition is false. In that case, simply omit iffalse.
Examples:
#define zonk 45
(#IIF zonk yes no#) (#IIF zonk==45 yes no#) (#IIF zonk==1 yes zonk#)
will place
yes
yes
zonk
in the output. Notice that the third one is zonk
, not
45
, because iftrue and iffalse aren’t
evaluated.
#define n 15
My sample size was (#n#) individual(#IIF n==1 . s.#)
will place this text in the output:
My sample size was 15 individuals.
You could get the same effect, slightly shorter, like this:
My sample size was (#n#) individual(#IIF n!=1 s#).
IIFDEF
name ifdefined ifundefinedIIFDEF
name ifdefinedThis is a shortcut version of
(#IIF (#DEFINED name#) ifdefined ifundefined#)
.
If name is a user-defined or predefined macro,
IIFDEF
returns ifdefined; otherwise,
IIFDEF
returns ifundefined (or empty text if
ifundefined is omitted).
INCLUDEFILE
translates to the name of the file currently being read, whether
from the command line or because of an
#include
directive,
converted to lower case. If a path was specified,
it will be part of INCLUDEFILE
.
There’s only one difference between INCLUDEFILE
and
the FILENAME
macro:
FILENAME
changes only when GENER8 begins processing the next
file named on the command line, and INCLUDEFILE
changes when
a file named on the command line or in an #include
directive is
opened or closed.
LOWER
textconverts text to lower case.
REGINC
macronameREGPRE
macronameFairly often you need to number or letter things in a
document. REGINC
(“register increment”) and
REGPRE
(“register preincrement”)
let you maintain any number of
separate counters and update them automatically. (Counters are simply
specialized macros.)
A counter can be a regular number, an upper-case letter, or a lower-case letter. Letter series end at Z or z; number series have no practical upper limit. You can have multiple counters going at once: counters are completely independent of each other.
Before using a counter, first set it to its initial value with
#define
or
#freeze
, like any other
macro.
If you use a counter without giving it an initial value,
what happens depends on the latest
#picky
directive. If
picky
=2, GENER8 displays an error message and
pastes MACRO ERROR in the output. Otherwise, GENER8 supplies an initial
value of 0.)
To use a counter, pass its name as argument to a REGINC
or
REGPRE
macro. The two macros work identically, with one
exception:
REGINC
pastes the current value of the counter and
then immediately increments it for future use. If you use the counter
again without incrementing it, it will be one unit past the previously
displayed value.REGPRE
increments the counter and then
pastes the new value. If you use the counter again without
incrementing it, it will be the same as the previously displayed value.For example, if you want to identify sections of a document as A, B,
C, and so on, assign the value A
to a macro, using
Perhaps you might
use the name secnum
, like this:
#define secnum A
Now you use REGINC
in your section heads, like
this:
(#REGINC secnum#). Fruits
(#REGINC secnum#). Vegetables
(#REGINC secnum#). Grains and Cereals
(#REGINC secnum#). Dairy
The sections will be lettered A, B, C, and so on.
REGINC
increments the counter immediately after using it.
This lets you start the counter off at its intended initial value, but it
prevents you from reusing the value of the counter. For example, if you had
subsections A1, A2, A3 under section A, you would be unable to use
(#secnum#)
to re-display the A
part of that,
because secnum
has already been updated to B
.
For these situations, there is the REGPRE
macro, which increments first and then displays. The quirk here is that
the initial value must be one before the actual initial value you want to see.
(@
comes before A
, and `
comes before
a
. 0
, of course, comes before 1
.)
Here’s how that would play out, using
REGPRE
. I’ve also defined a macro for the whole subsection
number, A1, A2, and so on:
#define secnum @
#define subsec (#secnum#)(#REGPRE subsecnum#)
(#REGPRE secnum#). Fruits
#define subsecnum 0
(#subsec#). Berries
(#subsec#). Melons
(#subsec#). Stone Fruits
(#REGPRE secnum#). Vegetables
#define subsecnum 0
(#REGPRE secnum#). Grains and Cereals
#define subsecnum 0
(#REGPRE secnum#). Dairy
#define subsecnum 0
There’s a way around the repeated #define subsecnum 0
lines;
see the REGSET
macro, below.
Occasionally you may want to
increment a counter without displaying it. To do this, wrap the
REGINC
call inside a call of the
EMPTY
macro, like
this:
(#EMPTY (#REGINC secnum#)#)
REGSET
macroname valueIn the example for REGPRE
,
you had to reset the counter for the subsection number every time you started a
new section. That’s kind of tedious. Wouldn’t it be better to have
a macro that shows the section number and also resets the subsection number?
REGSET
to the rescue! REGSET
is similar to
#define
or
#freeze
, but it doesn’t
have to be on a line by itself. Here’s that example rewritten in shorter form
by using REGSET
:
#define secnum @
#define subsec (#secnum#)(#REGPRE subsecnum#)
(#REGPRE secnum#)(#REGSET subsecnum 0#). Fruits
(#subsec#). Berries
(#subsec#). Melons
(#subsec#). Stone Fruits
(#REGPRE secnum#)(#REGSET subsecnum 0#). Vegetables
(#REGPRE secnum#)(#REGSET subsecnum 0#). Grains and Cereals
(#REGPRE secnum#)(#REGSET subsecnum 0#). Dairy
But there’s still some repetition that can be squeezed out. Here sec
is the section number as it appears on the header line, and secnum
is the underlying counter. Similarly, subsec
is the
section+subsection number that appears on the header line, and
subsecnum
is the underlying counter for the subsection:
#define secnum @
#define sec (#REGPRE secnum#)(#REGSET subsecnum 0#)
#define subsec (#secnum#)(#REGPRE subsecnum#)
(#sec#). Fruits
(#subsec#). Berries
(#subsec#). Melons
(#subsec#). Stone Fruits
(#sec#). Vegetables
(#sec#). Grains and Cereals
(#sec#). Dairy
The sec
macro increments the secnum
counter and
pastes the new value, then sets the subsecnum
counter to 0. The
subsec
macro pastes the current value of
secnum
, then increments the subsecnum
counter and
pastes its new value. The result looks like this:
A. Fruits
A1. Berries
A2. Melons
A3. Stone Fruits
B. Vegetables
C. Grains and Cereals
D. Dairy
RELHOME
If you specify the
-v home
option
and the
-v target
option on the
command line, macro RELHOME
will be set to the
relative URL from the target to the home page, translated to
lower case. If you specify only one of those options or neither,
RELHOME
is undefined.
Most Web pages contain a link to the site’s home page. You
want to do them as relative URLs so that you can test all the links on
the site before you upload it to your Web site. RELHOME
makes that easy. For example:
<a href="(#RELHOME#)">Home</a>
SYSTEM
commandGENER8 expands any inner macro calls and
passes the command to your operating system.
Anything the command writes to
the standard output stream goes directly into the output file,
with no intervention by
GENER8. (Compare to the
#include
!
command directive, where GENER8 reads and
processes the output of the command, expanding macro calls and so
forth.)
Example:
(#SYSTEM gawk -f someprog.awk (#FILENAME#)#)
will run the current input file through the
someprog
program and write the result to the current
output file.
While it’s legal to call the SYSTEM
macro on the
same line as other text, the results can be confusing.
If the SYSTEM
macro occurs on a line with any other
text, the command runs before the rest of the text is processed. After
the command has run, GENER8 processes the rest of the line, less
the SYSTEM
macro call and its arguments. This hardly ever
matters, but if it does you may want to use the
SYSTEMINLINE
macro instead, or make sure that the call of
the SYSTEM
macro is on a line by itself.
SYSTEMINLINE
commandGENER8 also expands any inner macros and
passes the command to your operating system. However,
unlike the SYSTEM
macro, with
the SYSTEMINLINE
macro GENER8
intercepts the output of the command. GENER8 replaces the
macro call with the first (or only) line of output from the command;
any further output from the command is discarded.
Example:
This file is \
(#SYSTEMINLINE echo %@filesize[(#FILENAME#)]#) bytes long.
In 4DOS, %@filesize[...]
returns the file size in
bytes; naturally the command would be different on other operating
systems. Suppose the current file is 32487 bytes long. Then GENER8
will read the output 32487 from the command and substitute it for the
macro call:
This file is 32487 bytes long.
(That was just a historical example, Beginning in GENER8
8.0, the FILESIZE
macro
gets file sizes for you.)
TARGET
TARGETDIR
TARGETNAME
These three macros are defined only if you specify the
-v target
option on the
command line. If you do, TARGET
is the full path and
filename as defined in that option, translated to lower case;
TARGETDIR
is the path part of TARGET
,
including a trailing /
; and TARGETNAME
is
the name (and extension) part of TARGET
.
TOCB
minlevel
maxlevel awkexe awkprogTOCB
minlevel
maxlevel awkexe awkprog ul_identifierThis macro implements tables of contents in HTML documents.
All the
<h
minlevel>
through
<h
maxlevel>
tags from files
listed on the gawk
command line
are gathered for the table of contents. Although the start and end
tags need not be on the same line, the attribute id=
must
be on the same line as the start tag. (name=
won’t
work.)
The TOCB
macro is replaced with nested
<ul>
lists,
nested the necessary number of levels. Each header tag becomes an
<li>
, and the text becomes a link to the actual
header within the document.
This document itself provides a sample of a generated table of
contents.
Creating the table of contents requires two passes:
%TEMP%\gener8.toc
. All directives are honored, and all
macros are expanded. (TOCB
itself is ignored, of course.
The predefined macros
TOCMIN
and TOCMAX
return nonzero values in this pass.)
The temporary file will contain only
<h
n>
or
<H
n>
tags.
The
#tocif
directive and
#tocinsertli
directive
are effective only during this pass.
<li><a href="#id">header text</a></li>
This pass also inserts <ul>
and
</ul>
as appropriate for the hierarchy of
the headers.
Through CSS, you can style the table of contents however you
wish. A common technique is to place
<div id="TOC">
before TOCB
and
</div>
after TOCB
; you then style
#TOC ul
, #TOC li ul
, and so
on in your CSS. Or instead, you can specify an id
or a
class
attribute for the main <ul>
through the optional fifth argument to TOCB
; see below.
Caution: If you specify an id=
or
class=
on the header tag preceding the table, you can’t use that in CSS to
style the table of contents. The <ul>
tag is not a
child of the <h
n>
tag, as far as HTML and CSS are
concerned. To style the entries in the table of contents, either use
the optional fifth argument to TOCB
below, or enclose
the whole business in a <div>
with a unique
identifier, as suggested above.
TOCB
takes four or five arguments. The first two
are the minimum and maximum <h
n>
levels to generate entries in the table. Usually you want
minlevel to be 2 and maxlevel to be 3 or 4, but all
values 1 to 6 are accepted.
The third argument is the path and name of the AWK or GAWK
executable, the same as on the command
line. (GENER8 has no way to determine this.) If your
PATH
environment variable is set properly, you need not
specify the whole path, and awk
or gawk
is
sufficient.
The fourth argument is the path and name of gener8.awk
itself,
just as it would appear after -f
on the
command line, but without the
-f
.
The optional fifth argument will be added to the initial
<ul>
in the generated table of contents. This lets
you give the main list an id
or class
attribute that will then become part of your CSS selectors, thus
avoiding the <div>
wrapper mentioned above.
Recommendation: Put the TOCB
macro on a line by
itself. If you have it on a line with other text, that text
will be written to your final document after the generated table of
contents.
Example:
(#TOCB 2 4 gawk gener8.awk#)
will generate a table of contents from all the
<h2>
, <h3>
, and
<h4>
tags. It’s assumed that the
PATH
variable includes the directory where
GAWK.EXE
is located, and the
AWKPATH environment variable includes
the directory where gener8.awk
is located.
Example:
(#TOCB 2 3 c:\utils\text\gawk c:/utils/gener8/gener8.awk id="TOC"#)
will generate a table of contents from all the
<h2>
and <h3>
tags. The
paths of the GAWK program and of gener8.awk
are given
explicitly — notice that the latter has forward slashes. The
first tag in the generated table of contents will be
<ul id="TOC">
.
TOCF
minlevel
maxlevel awkexe awkprog sourcefileTOCF
minlevel
maxlevel awkexe awkprog sourcefile ul_identifierThis macro is exactly the same as
TOCB
macro, except that
TOCF
reads header tags from sourcefile ,
not the current file set as with TOCB
. This can be
useful when each chapter of a book has its own table of contents and
you want to pull together an overall table of contents in a separate
file.
TOCMIN
TOCMAX
These two macros let you determine whether GENER8 is in
normal processing, or scanning for headers while expanding a
TOCB
macro or
TOCF
macro. This can be helpful
for debugging complex logic.
During normal processing, both macros return 0. During the
scan for headers, TOCMIN
and TOCMAX
return
the minlevel and maxlevel arguments from the
TOCB
or TOCF
that triggered the scan.
UPPER
textconverts text to upper case.
#define
macroname definition#define
tells
GENER8 to store the definition under the given name
as a string of text characters for later use.
(To store the result of an expression, see
the #freeze
directive.)
The macro name may contain any character except a space.
Macro names are case sensitive, meaning that abc
and Abc
are different macros.
If a macro with the same name already exists, even a
predefined macro, GENER8 discards
the old definition.
To avoid accidentally redefining a predefined macro, give your macro a name that is not all capital letters.
The macro definition
must be on one line. If the definition is too long to fit
comfortably on one line, use the \
character and then
continue the definition on the next
line. This does not generate a line break in the output.
The macro definition may contain any characters at all, but it must contain at least one non-blank character. There are a few special rules:
\
.
Spaces at the
beginning or end of the macro definition are discarded.
If you need spaces at the beginning or end of
your macro definition, add a call to (#EMPTY#)
before
leading spaces or after trailing spaces.
*
, ?
, or a digit, code it as
%%
(two % signs). This prevents its being interpreted as
placeholder for a macro
argument. You don’t need to double a percent
sign when the next character is something other than *
,
?
, or a digit, but it does no harm if you do.\n
.EMPTY
macro.
As an alternative, you can use the #picky
directive to tell GENER8
to accept empty macro definitions.Example: You are preparing a document that will be modified each quarter and used again. The identification of the quarter occurs many times in the document, and you would like to be able to change that just once without searching through the document to find all occurrences. (Also you worry about typos and want to make sure that all mentions are identical.) Code the macro definition like this:
#define qtr third quarter of 2002
Then everywhere in the document you would write
(#qtr#)
instead of third quarter of 2002
. Come
October, you simply change the macro definition to reference the
fourth quarter.
You can define a macro with placeholders for text to be supplied later, when the macro is used. (Presumably it will be different text with different uses.)
For example, suppose you’re creating an HTML table and you want most cells to be centered horizontally and vertically. This means that you need to code them as
<td align=center valign=middle>contents</td>
You can define a macro that contains the HTML coding with a placeholder for the cell contents, like this:
#define cell <td align=center \
valign=middle>%*</td>
The %*
says “whatever text is supplied with the
macro call, insert it here.” You might call
the macro like this:
<tr>(#cell 45#)(#cell 88#)(#cell 133#)</tr>
Your macro definition doesn’t need to treat the text of the macro call as a big lump, but can deal with the individual arguments. Here’s an example:
#define myhref <small><a href="%1%2">%2</a></small>
This is handy when I want to create a link in my document but type the link address only once. For instance, I might call this macro in this fashion:
found at (#myhref http://www. gnu.org#).
Then I have made a proper link to http://www.gnu.org, but the
visible text of the link is just gnu.org
.
In fact, you can specify placeholders for multiple arguments in the definition of a macro:
%1
through %9
stand for the first
through ninth arguments in the macro call. It’s rare but legal
to use,
say, argument %3
without using %2
; this can
happen for instance if you change your macro definition after coding
a lot of calls to the macro.
A macro can have only nine numbered arguments. The sequence
%10
means the first argument followed by a zero
character.
%*
and %?
stand for whatever text is
left in the macro call after the numbered arguments have been
extracted. For instance, if your macro definition contains
%1
, %2
, and %*
, then
%*
stands for all the text in the macro call after the
second argument.
All macro arguments are separated when the macro is first read, and
the unused ones are put back together, separated by a single space,
for %*
and %?
. With the default macro
separator, that means that any runs of spaces are collapsed to a
single space. (If you want spaces not to collapse,
code them as double underscores.)
If you have used the
#macrosep
directive or
the #macrosepregexp
directive, then the argument
delimiters are all changed to single spaces in the %*
or
%?
text.
The distinction between %*
and %?
is
this: use %*
when there must be additional text
in the macro call after the numbered arguments; use %?
when there may be additional text in the macro call after
the numbered arguments. Never use %*
and %?
in the same macro definition.
If the macro definition contains %1
and
%3
but not %2
, the second argument in the
macro call is discarded, not used in %*
or
%?
.
%#
stands for the actual
number of arguments supplied when the macro is called.
%%
stands for a literal % sign. If you
want the text %3 in the output of a macro, you must code it as
%%3
so that it doesn’t look like a macro argument. If you
want %% in the output, code it as %%%%
.
A percent sign is only special before *
,
?
, %
, or a numeric. Before any other
character you don’t have to code the percent sign specially, but you
can if you want.
For an example with %?
, consider this macro from
my personal macro file for Web pages:
#define copyright %? Copyright © %1 \
Stan Brown, Oak Road Systems
This macro definition says that when the copyright
macro is called, it must contain one argument, which gets
placed after the copyright symbol. But the call may contain
extra text, which if present gets placed before the word Copyright.
Here are two sample calls of this macro:
(#copyright 2002#)
(#copyright 2002 portions of this page are#)
Earlier in this section, the example with table cells used
%*
alone in a macro definition. Here’s an example using
%*
and numbered arguments:
#define ti83pic <img src="(#pics#)%2.gif" \
%* width=200 height=%1>
I put many screen shots from the TI-83 in my Web pages for my
students. All of the images are 200 pixels wide, but they have varying
heights (%1
) and of course different filenames
(%2
). In addition, each one needs some alternative text
(%*
), and some need special alignment. I might call this
macro like this:
(#ti83pic 136 tdist alt="t distribution"#)
EMPTY
When you define a macro, you need to
give it a definition. But sometimes you define a macro just to act as
a switch to be used with an
#ifdef
directive or
#ifndef
directive.
With such a macro, all you care about is whether it has been defined.
To define a macro with an empty definition, use the special
code (#EMPTY#)
for the definition.
As an alternative, you can set macro pickiness
to accept empty macro definitions.
Under special circumstances, you might want to
call the EMPTY
macro with arguments.
Consider this example:
#ifdef something
#define pageref (See page %1.)
#else
#define pageref (#EMPTY %?#)
#endif
Suppose that you have page references scattered through your
document. If something
is defined, you want to display all of them;
otherwise you want to suppress them all. It would be cumbersome to
bracket each page reference with an
#ifdef-#endif
pair. Instead, you
code them all in the form (#pageref 162#)
and define
the macro to emit no text if something
is not defined. But you need to
eat up the argument to the pageref
macro; hence the
%?
in the definition.
(%*
or %1
would work as well in this case,
but %?
will always work.)
The EMPTY
macro is special in that it simply ignores
any arguments. Also, unlike other predefined macros, it
cannot be redefined.
You can call one macro inside the definition of another. For
example, in creating this document I defined a macro
pcode
to make a one-line “paragraph” consisting of a line
of code, and a second macro ex
to show the word
Example
followed by a line of code. It makes sense to define
ex
in terms of pcode
, like this:
#define ex <p class="brk">Example:</p>(#pcode %*#)
#define pcode <pre class="codeline">%*</pre>
When you nest macro definitions in this way, GENER8 simply
stores each definition as an unrelated text string.
Inner macros are not evaluated until the outer macro is called.
In the example
above, note that ex
calls pcode
before
pcode
has been defined. That’s perfectly legitimate:
GENER8 doesn’t pay any attention to the contents of
ex
until it is called in the
document. As long as the inner macro pcode
is defined
by then, everything is fine.
When you call a macro whose definition contains a
call to another macro, GENER8 completely evaluates the inner macro
call before the outer macro call. This lets you do things like
(#chap(#chapnum#)#)
, where chapnum
is a
macro that will be defined later. If chapnum
currently
has a value of 14, then the macro call is a call to macro
chap14
.
Example:
#define coursenum 200
#define course MATH(#coursenum#)
At this point the definition of the course
macro is
not MATH200
; it is
MATH(#coursenum#)
. You can change the definition of
coursenum
, and when you use course
the
then-current value of coursenum
will appear. Continuing,
#define coursenum 105
The prerequisites for (#course#) are
The text The prerequisites for MATH105 are
will be written to
the output file.
Example:
#define sqrt (#ARITH %%9.4f %1^.5#)
This macro computes the square root of a number and
displays it to four decimal places.
For example, the value of
(#sqrt 1127#)
is
(#ARITH %9.4f 1127^.5#)
, which is
“ 33.5708
”.
The %1
in this definition refers to the first
argument of the outer macro, sqrt
. The
formatting argument to the ARITH
macro normally takes a single
percent sign, but it needs a double percent sign when it’s nested
inside another macro; otherwise it would be taken as the ninth
argument to sqrt
.
#freeze
macroname definitionBut suppose you want to define one macro in terms of the
current value of another, regardless of how the inner macro
might be redefined later? This is where #freeze
comes in. Unlike
#define
, #freeze
evaluates any
macro calls in the definition right away. Modifying the previous
example,
#define coursenum 200
#freeze course MATH(#coursenum#)
At this point GENER8 evaluates (#coursenum#)
and stores the text MATH200
as the definition of
the course
macro. Any later changes in coursenum
have no effect on course
. For example, suppose these two
lines occur later in the input file:
#define coursenum 105
The prerequisites for (#course#) are
The text The prerequisites for MATH200 are
will be written to
the output file.
With #freeze
, changes to inner macros don’t affect the outer macro.
When you are nesting macros, use #define
to make
the outer macro change with the changes in the inner macro; use
#freeze
to freeze the outer macro and make it
invariable with changes in the inner macro. When you are not nesting
macros — when the definition doesn’t contain any macro
calls — it doesn’t matter whether you use
#define
or #freeze
.
#freeze
is also useful to store the
result of an expression, particularly an expensive one like a test for
the existence of a file. If you write
#define gotit (#EXISTS somefile.htm#)
then the macro definition contains a call to
EXISTS
, and every time you write (#gotit#)
a system call will check for the existence of the file. On the other
hand, if you write
#freeze gotit (#EXISTS somefile.htm#)
then the result
of EXISTS
is
“frozen” as a 1 or 0 in the definition of
gotit
.
You can play this same game with complicated expressions that
don’t actually change. Just wrap them in
(#ARITH…#)
and freeze
them. This expression isn’t
complicated, but illustrates the technique:
#freeze myval (#ARITH 88*44#)
#undef
macroname: Removing a Macro DefinitionIf you want a macro to be defined for part of a document but not
defined for the rest, use the #undef
directive to remove
the definition.
#undef
doesn’t care whether the macro was originally
defined with #define
or #freeze
,
but it is an error to undefine a macro that is not presently defined
one way or the other.
There are two ways to tell GENER8 to ignore lines and not write them to the output file (or not process them, if they’re directives):
For single-line comments, put the text
<!-- ignore -->
anywhere on the line. GENER8 checks for this marker after pasting any continued lines together, so the entire logical line is ignored.
You can change the comment marker with the
#commentregexp
directive.
For a block of comments, use an
#if
directive:
#if 0
All these lines will be
ignored, not written to the
output file.
#endif
There is no way to tell GENER8 to write only part of a line to the output file.
There are two situations where you may need continuation lines:
#
directives or
macro calls
If you put a \
character at the very end of a line, GENER8
will remove the \
and paste the following line to the end before
doing any other processing.
Example:
#define row <tr><td align=center>%1</td>\
<td align=center>%2</td><td>%*</td></tr>
Here, a long macro definition is split into two input lines for convenience in editing.
Be careful with spaces in continued lines! If you want a space
where the lines were joined, you must provide one either before the
\
as shown above, or at the start of the next input
line. This example is properly coded to get spaces between the
words:
You want all \
of these lines \
to be joined in one \
line in the output file.
If you actually want a backslash at the end of a line, code it as \\. GENER8 will translate it to a single \ but will not append the next line to this one.
The #
character is special in two contexts:
#
is the first non-blank character on a line,
it marks the line as a directive.
(#
and #)
mark
the start and end of a macro call.
If you need a #
in the output file in either of these
two contexts, code it as \#
to remove the special
meaning and have GENER8 output it as plain text. In fact, GENER8 will
change every \#
sequence to plain #
, so if
you want a #
character output as text it’s always safe
to code it as \#
.
Example:
\#1. Put out the cat.
\#2. Lock all doors.
\#3. Turn off lights.
ARITH
expressionARITH
format expression
DATE
format date timeDATE
format dateDATE
format
#define DATE_MONTHS4
0_or_1#define DATE_SYSFORMAT
formatDEFINED
macronameEMPTY
ENV
variableEXISTS
fileFILEDATE
file formatFILENAME
FILESIZE
format filespecGDEL
old how textGSUB
old new how textHOME
IIF
condition iftrue iffalseIIF
condition iftrueIIFDEF
name ifdefined ifundefinedIIFDEF
name ifdefinedINCLUDEFILE
LOWER
textREGINC
macronameREGPRE
macronameREGSET
macroname valueRELHOME
SYSTEM
commandSYSTEMINLINE
commandTARGET
TARGETDIR
TARGETNAME
TOCB
minlevel maxlevel awkexe awkprogTOCB
minlevel maxlevel awkexe awkprog ul_identifierTOCF
minlevel maxlevel awkexe awkprog sourcefileTOCF
minlevel maxlevel awkexe awkprog sourcefile ul_identifierTOCMIN
TOCMAX
UPPER
textUpdates and new info: https://BrownMath.com/utils/