BrownMath.com → Free Software → GREP Reference Manual
Updated 9 Dec 2021 (What’s New?)

GREP — Find Regular Expressions in Files

Reference Manual

Program release 8.01 dated 9 Dec 2021

Copyright © 1986–2022 by Stan Brown, BrownMath.com

Summary: This is the exhaustive user manual on GREP, and it contains everything I think someone might possibly want to know. Before you tackle this manual, consider reading the GREP Quick Start Guide, which you might think of as “GREP 101”.
Contents:

1. Command Line

TIP: Rename your preferred version, GREP32.EXE or GREP16.EXE, to GREP.EXE.

The GREP command form is

        grep options regex inputs outputs

All arguments are optional, except that you must specify either a regex or an /F option.

As with any command, you can redirect or pipe inputs or output. GREP can return a useful value in ERRORLEVEL, as explained below.

Command-line options can actually appear anywhere, not only before the regex. The first thing that isn’t an option is taken as the regex, and everything else that isn’t an option is taken as input filespecs. All the options are processed before any input files are scanned — it doesn’t matter whether a given option comes before, after, or among the filespecs.

This means that if your regex begins with a hyphen - or slash /, GREP will think it’s an option; please see Regex Starting with - or /. Quotes are also problematic; please see Quotes in a Regex.

For a quick summary of operating instructions, type

        grep /? | more 

You can run GREP from the desktop in Windows XP through Windows 10. Click Start and then Run, or press the Windows and R keys together. Then type

        cmd /k grep options regex inputs outputs

In Windows 95, 98, and ME, it’s the same except that you type command instead of cmd.

You can also put a GREP command in a Windows shortcut, using the same form.

This assumes that your PATH contains the directory where you’ve placed GREP; if it doesn’t, you’ll need to specify the path. PATH is a standard Windows environment variable; to set it, right-click Computer or This PC and select Properties, then Advanced system settings, then Environment Variables, and look in the System list.

As an alternative to setting the path, you could use DOSKEY to define an alias that includes the path to the GREP executable. Despite the name, DOSKEY is also included in most versions of Windows.

2. Input Files

Contents:

2.1  Named Input Files

A filespec is a file name, possibly containing wild cards or preceded by a path or both. A filespec can also be a directory name, which tells GREP to search all files in the directory.

You can specify named input files or have GREP read the standard input (possibly with redirection or piping). Redirection and piping are discussed later; this section tells you how to specify named input files.

List your input files on the command line or in a list file referenced with the /@ option, or both. You can also exclude files or groups of files by using the /X option. Input filespecs and /X exclusion filespecs use normal command-line conventions augmented by some features from UNIX-style filename globbing.

These rules apply to all filespecs, whether or not they contain wildcards:

2.1.1  Wildcard Expansion (Globbing)

Beginning with release 7.0, GREP16 and GREP32 treat wildcards in filenames identically. The rules are derived from Windows conventions and UNIX “globbing”. There are three wildcard characters, namely * ? [. Here are the extra rules, in addition to the rules in the previous section for all filespecs.

Caution: Globbing is not regexes. In a filespec, [0-9]* doesn’t mean zero or more digits; it means one digit followed by zero or more characters (which need not be digits).

2.1.2  Hidden and System Files

Normally, GREP ignores hidden and system files when expanding wildcards. If you want to include hidden and system files in the search, use the /A option.

If you name a specific file, without wildcards, GREP always tries to open it and there’s no need for the /A option.

2.1.3  Missing Files

It may happen that you mistype an input filespec on the command line. At the end of execution, GREP warns you about each input filespec that didn’t match any files. That warning is suppressed like the rest if you specify the /Q3 option.

GREP gives you a similar warning about filespecs from a list file (/@ option) that don’t match any actual files. That warning appears right after GREP reads that filespec from the list file.

Caution: If you exclude files with the /X option, you may cause GREP to bypass existing files that actually match your input filespecs. Consider this example:

        grep regex abcde.htm /X*.htm 

In this situation, GREP tells you that no files matched abcde.htm. This is correct, since /X*.htm makes GREP exclude every *.HTM file. GREP reminds you of this possibility when you have /X option file exclusions.

When in doubt about which files GREP is scanning, you can use the /B option to make GREP tell you the name of every file it examines. If you want to know why GREP is bypassing certain files, use the /D option for full debugging display.

2.2  Standard Input and Redirection

If you don’t specify any named input files, GREP takes its input from the standard input. That can mean any of these three sources:

GREP actually can have up to three types of file inputs: regular expressions (/F option), lines to be scanned for matches, and a list of files to scan for matches (/@ option). Any of the three can come from standard input (depending on options), and standard input could be from the keyboard, piped, or redirected. When GREP is waiting for keyboard input, it prompts you for the specific type it’s expecting.

The standard input is always read in text mode regardless of the /R option.

Example:

        grep /F- inputfiles 

tells GREP to read one or more regexes from the keyboard, rather than take a regex from the command line. GREP prompts you with regex: for each regex, then after you’ve entered your regex(es) it reads the named input files and match them against the regex(es) you typed.

For another example of redirection, please see the /L option.

2.3  Subdirectory Searches

If you set the /S option, GREP searches not only the filespecs indicated on the command line, but also the same-named files in subdirectories.

The /S option searches all the way to the bottom of a directory tree. The search is depth first, meaning that if a directory has subdirectories A and B, and A has A1 and A2, then the order is A\A1 (and its descendants), A\A2 (and its descendants), B (and its descendants).

For example, with the command

        grep /S regex \hazax* *.c g:\mumble\*.htm 

GREP examines all files on the entire current drive whose names start with hazax; then it looks at all C source files in the current directory and all subdirectories under it; finally it looks at all HTML files in directory g:\mumble and all subdirectories under it.

Perhaps a more realistic example: you have a document about Vandelay Industries somewhere on your disk, but you can’t remember where. You can find it this way:

        grep Vandelay /S \* 

or

        grep Vandelay /S \*.* 

or even

        grep Vandelay /S \ 

(Both * and *.* select all files; see Wildcard Expansion. And just specifying a directory is equivalent to specifying all files in the directory; see Named Input Files.) You might want to add the /I option if you can’t remember how “Vandelay” was capitalized.

Subdirectory search follows the normal file-searching rules, and therefore GREP normally ignores hidden and and system subdirectories. (Yes, they exist in Windows 95 and later.) The /A option also applies during subdirectory search: with /S /A together, GREP searches every subdirectory including hidden and system subdirectories. There’s no way to search every subdirectory but only normal files, or to search only normal subdirectories but to search for hidden files in them.

2.3.1  Search Order

You may want to know in what order GREP examines multiple filespecs when the /S option is set. (If not, skip the rest of this section.)

First, consider the situation without the /@ option. All file inputs are listed on the command line. Ordinarily, GREP examines all files in the first file argument and its subdirectory tree, then proceeds to the second file argument and its subdirectory tree, and so on — GREP calls this column order. However, when you use the /S option and none of the file arguments are directories or have paths, then GREP first scans the current directory for all of them, then scans the first subdirectory for all of them, then the next subdirectory, and so on — GREP calls this row order.

Example:

        grep /S regex sampro f*

If SAMPRO is not a directory, GREP applies row order: it looks first for file SAMPRO and any file starting with F in the current directory, then for the same files in all subdirectories. But if SAMPRO is a directory, then GREP applies column order, looking first for all files in SAMPRO and its subdirectories, then for all files beginning with F in the current directory and its subdirectories.

For contrast, consider the command

        grep /S regex sam\pro f*

Here one of the filespecs is a directory, and therefore GREP applies column order: it looks first for PRO in subdirectory SAM and its descendants, then for any file starting with F in the current directory and its descendants.

Now suppose you specify a list of input files with the /@ option. GREP processes the first filespec in that list and all subdirectories, then the second filespec and subdirectories, and so on. When the /@ list file is exhausted, GREP goes on to process any filespecs on the command line, in row or column order as described above.

The /D option shows you every directory and wildcard search as GREP performs it. The output also contains lots of other stuff, but the string GX: starts the record of each file visit.

2.4  Binary Files and Text Files

GREP was originally written with plain text files in mind, but you can also use it quite well with binary files.

2.4.1  Differences between Text and Binary Files

Text Files

A text file can be displayed without special processing, for instance by the TYPE and MORE commands. HTML files are always text files; program source code and files with extension .TXT are nearly always text files.

Text files are separated into variable-length lines by carriage return plus line feed (ASCII 13 and 10) or either one alone; and other control characters (ASCII 0–8, 9, 11, and 14–31) are usually not present.

Starting with release 8.0, GREP can handle any form of line termination: CR+LF (Windows), LF (UNIX), CR (Macintosh), even LF+CR if any system uses it. Also starting with release 8.0, GREP continues reading a file even if it contains an embedded Control-Z (ASCII 26). I am grateful to Scott Brueckner for an excellent set of test cases in his Reading Text Files in C. Although I didn’t use any of his C code, I did benefit greatly from his research into the issues with the standard C library.

By default, GREP adjusts automatically to the longest line in the input. However, if you specify the /G0 option and give a maximum line length with the /W option, GREP processes any lines longer than that in chunks.

If you think some matches might begin on one line and continue to another line, use the /G2 option to tell GREP to read in paragraph mode.

Binary Files

A binary file contains numbers and sometimes even text in a special internal form that looks like gibberish if simply printed character by character. Any file byte can contain any value 0–255. There are two types of binary files:

How Does GREP Know?

Windows doesn’t mark a file as text or binary; the program that reads the file just has to know. GREP “knows” files are binary when you tell it via the /R2 or /R3 option; otherwise it treats input files as text. If GREP reads a file in text mode but the file is actually binary, some matches may be missed. It’s important, therefore, to scan binary files in binary mode.

You can also use the /R-1 or /R-2 option to have GREP examine each file and decide whether it’s text or free-form binary. (Please see the /R option for details on how GREP makes that decision.) I recommend /R-1.

2.4.2  How Does GREP Read Binary and Text Files?

Here’s a comparison of the three ways GREP can read input files:

line-oriented text
(/R0)
record-oriented binary
(/R2)
free-form binary
(/R3)
(/R0) The file is read a line at a time. (But if you specify the /G0 option, any line bigger than the /W option value is read in chunks with each chunk treated as a line.) (/R2) The file is read a record at a time; the record length is given by the /W option. (/R3) The file is read in overlapping half-buffers. The /W option gives the buffer size; see that option description for recommended buffer size.
(/R0) A line ends with a carriage return or line feed (ASCII 13 or 10) or both. (/R2 or /R3) ASCII 13 and 10 have no special meaning.
(all) The file length is given by the directory entry. Control-Z is just another character.
(/R0 or /R2) The regex characters ^ and $ mean the start and end of a line or record. (/R3) The characters ^ and $ in an extended regex match a newline (ASCII 10). In a basic regex they don’t match anything useful.
(/R0 or /R2) The /V option looks for lines or records that don’t contain a match. (/R3) The /V option makes no sense with free-form binary processing, unless you use it with the /L option to report files that contain no matches to the regex at all.

2.4.3  How Does GREP Display Hits?

The file format not only affects how the file is read (above), but it also affects how hits are displayed. The table below shows the default output formats for the various input formats; however, you can use the letter /o option to specify any output format for any input format.

line-oriented text
(/R0)
record-oriented binary
(/R2)
free-form binary
(/R3)
(/R0 or /R2) When a match is found, the matching line or record is displayed, unless you used the /C option, /J option, or /L option. (/R3) The /C option, /J option, or /L option is strongly recommended. But if you don’t use any of them, then when a match is found, GREP displays the buffer that contains it.
(/R0) In paragraph mode (/G2 option), matching lines are output as character streams. GREP doesn’t check for control characters like form feed (ASCII 12) and backspace (8); if they’re in the input, output may be formatted strangely.

In line modes (/G0 or /G1 option), any non-printable characters in an output line are replaced by a control code like ^Q or a hex sequence like <7F>.

(/R2 or /R3) Hits are displayed in both hex and text form, 16 bytes per line. In the text output, non-printable characters appear as dots. (GREP16 considers characters 0-31 and 127-255 as non-printable characters; in GREP32 that is the default but you can change it by setting a character mapping with the /M option.) Here’s a sample:
---------- d:\abc\web\LN10HT.200
 792E6F72 672F7374 6174732F 61736B2F  > y.org/stats/ask/ <
 6E6E742E 61737022 3E6E756D 62657220  > nnt.asp">number  <
 6E656564 65640D0A 746F2074 72656174  > needed..to treat <
 3C2F                                 > </               <

It takes more than one line to display an output record or buffer that contains more than 16 bytes. In this case, GREP inserts a blank line after each record or buffer. Exception: with the /R2 option and /N option together, records are identified by record number and are not separated by blank lines.

(/R0 or /R2) With the /N option, GREP displays the line or record number with each hit. (/R3) With the /N option, GREP displays the starting byte number with each hit. The first byte in the file is numbered 0.
(/R0 or /R2) The /P option specifies how many lines or records from the file to display before and after each hit. (/R3) The /P option is ignored.

3. Outputs

Contents:

GREP displays the matches (with filespecs and line numbers, depending on your options) to the standard output. Normally, the standard output is your screen, but you can redirect or pipe the standard output.

GREP displays the program logo and all messages to the standard error stream. Normally, that’s also your screen, but some systems let you redirect standard error output see Redirected Error Stream for details.

3.1  List of Hits

Normally, GREP displays hits on your screen. Hits are the text lines, binary records, or binary buffers that contain matches for the regex(es). As part of the output, GREP displays the file path and name as a header above the group of hits from that file. Here’s an example:

        ---------- GREP.C
                op_showhead = ShowNoHeads;
                else if (op_showhead == ShowNoHeads)
                op_showhead = ShowNoHeads;

        ---------- GREP_MAT.C
                op_showhead == ShowNoHeads) 

Many people prefer UNIX-style output (/U option). UNIX style shows the file path and name on the same line as the hit, like this:

        GREP.C:        op_showhead = ShowNoHeads;
        GREP.C:        else if (op_showhead == ShowNoHeads)
        GREP.C:        op_showhead = ShowNoHeads;
        GREP_MAT.C:        op_showhead == ShowNoHeads) 

3.1.1  Variations on the Hit List

Output options give you a lot of control over what GREP produces

In addition to these options, under the /R2 or /R3 option GREP reads files in binary mode, and binary hits are displayed in a slightly different format.

3.1.2  Redirected Output

You can redirect GREP’s list of hits into a file or pipe it to another command (even another GREP command). To redirect GREP output, follow the command prompt rules and put one of these at the end of the GREP command line:

You can pipe or redirect output regardless of whether input was piped or redirected.

3.1.3  Redirected Error Output

Errors and warning messages are sent to the standard error stream. That is usually your screen, though some OSes or shell replacements let you redirect error output. For instance, in the Windows XP command prompt and in the 4NT command processor,

        grep … 2>file

will redirect the standard error stream to the named file with no effect on the standard output. In those systems, you can redirect standard output and standard error to different files, like this:

        grep … >file1 2>file2

3.2  Debugging Output

The /D option lets you create extra debugging output and send it to a named file or the standard error stream. This can be very handy when GREP isn’t doing what you expect.

You may be asked for include debugging output with any trouble report.

3.3  Return Values (ERRORLEVEL)

GREP returns a status number to the command shell. You can test the exit status with IF ERRORLEVEL in a batch file or script. (In TCC, 4NT, and 4DOS, %? gets you the error level on the command line, not just in a batch file.)

Here are GREP’s status codes returned in ERRORLEVEL:

0 or 1Success — the program read at least one input file and ran to completion. See below for details.
2Help message requested (/? option).
3Warnings were issued, but there were no actual errors. (This status is possible only if you specified the /3 option.)
4Not a single file matched any of the input filespecs.
128Internal GREP error in expanding a regex — please report this to BrownMath.com.
253Insufficient memory for GREP to run with the options selected. (If this happens, see “insufficient memory” in the list of messages.)
254Couldn’t read specified file for /F option or /@ option, or file-system error while reading any file.
255Bad option in the environment variable or on the command line, bad regex, or some other mistake by user.

You might want to use GREP in a batch file or a makefile and take different actions depending on whether hits were found. To do this, use the /0 or /1 option; each tells GREP what to return in ERRORLEVEL if any hits were found.

Here are GREP’s success codes, the codes it returns if it ran to completion and processed at least one file or the standard input:

with /0 with /1 with neither
value in ERRORLEVEL
GREP ran to completion and found at least one hit in at least one file or the standard input. (With the /V option: GREP found at least one line that wasn’t a match.) 010
GREP ran to completion and read at least one file or standard input, but didn’t find any hits in any files. (With the /V option: Every single line was a match.) 100

4. Regular Expressions (Regexes)

Because this program helps you,
please click to donate!
Because this program helps you,
please donate at
BrownMath.com/donate.

A regular expression or regex is a pattern of characters to compare to one or more input files. A line/record/buffer from an input file is a hit if all or part of it agrees with the pattern in the regex. You've already met some examples in the Quick Start Guide.

A regex can be a simple text string, like mother, or it can include a bunch of special characters to express possibilities like “repeated” and “any of these characters or substrings”. (If you want to search only for simple strings, use the /E0 option and ignore all this regex stuff.)

Regexes come in two flavors, basic and extended regexes. If you're new to regexes, you might want to ignore extended regexes while you get comfortable with basic regexes. Use the following Overview to help you find the particular feature you need. On the other hand, if you're already comfortable with regexes, you'll find additional material and tips in Mastering Regular Expressions by Jeffrey Friedl (O'Reilly & Associates).

You specify a single regex on the command line, or you specify one or more regexes in a file with the /F option.

Contents:

4.1  Overview

A regex is a mix of normal characters and the special characters listed in this section.

The following characters are special if they occur outside of square brackets:

The characters ? { | ( are special in an extended regex, but in a basic regex they’re just normal characters. Please see Basic and Extended Regexes, below.

The following characters are special if they occur within square brackets:

For easy reference, here’s a condensed list of special characters:

Special characters
in a basic regex
(default or /E1 option)
Special characters
in an extended regex
(/E2 or /E4 option)
Outside square brackets \ . * + [ ^ $ the same, plus ? { | (
Inside square brackets \ ^ - ] the same, plus the [: sequence

Every character not listed above is a normal character. Any of the above characters also becomes a normal character if preceded by a backslash, as you’ll see below.

4.1.1  Basic and Extended Regexes

GREP offers two levels of regular expressions. This manual marks certain features as “extended regex”; all others are common to basic and extended regexes.

Basic regexes offer a “core subset” of the regex capabilities. By default, GREP treats your regexes as basic, since that’s the only kind of regex that GREP supported before release 6.0. Special characters marked as “extended regex” are treated as normal characters in basic regexes.

Extended regexes can do much more than basic, including | alternatives, ? optional match, { } quantifiers, and ( ) subexpressions. If you want to use extended regexes, specify the /E2 option, available only in GREP32.

Acknowledgment: Extended regexes were added to GREP in release 6.0, using the open-source PCRE library package, copyright by the University of Cambridge, England. Thanks are due to Philip Hazel for making this available, and in that spirit extended regexes were added to GREP with no increase in price. (Later, in 2019, GREP became completely freeware.) The main Web site is hre.

This manual covers most of the features of extended regexes, but you might want to know about two additional references. For your convenience, the GREP download files include an abridged copy of Philip Hazel’s PCRE man page, PCRE Specification, with just the information relevant to GREP users. His original man page at Library Functions Manual also contains considerable information about incorporating PCRE in programs.

4.1.2  Compatibility Note

Different utilities define regexes differently; the following sections tell you how this GREP defines them. You can find fascinating tables of different interpretations in Jeffrey Friedl’s book Mastering Regular Expressions (pages 63 and 182-183 of the 1997 edition).

A note to UNIX or Vim veterans: This GREP follows the Perl or egrep scheme, which uses | not \| for alternatives, ( ) not \( \) for subexpressions, \b not \< \> for word boundaries. Be alert to differences from the scheme you may know.

4.1.3  Quotes in a Regex

If you put quotes in your regex, most post-1995 versions of DOS and Windows strip them out. Even worse, if you have one quote, most versions treat everything till the end of the line as part of the regex passed to GREP.

Before release 7.3 (July 2004), GREP stripped away any quotes surrounding a regex. Combined with the stripping of quotes by Windows, it was almost impossible to GREP for a regex including quotes.

With release 7.3, GREP no longer strips quotes out of a regex. You can keep DOS and Windows (most versions) from stripping quotes by using backslashes. Now if you type

        grep \"Really!\" *.doc /r-1

Windows or DOS passes “Really!” including quotes as command argument to GREP, and GREP searches for the string “Really!” including the quotes. But if you type

        grep "Really!" *.doc /r-1

then Windows strips away the quotes and passes Really! to GREP, without quotes. GREP then searches for the string Really! whether quoted or not.

Unfortunately, different versions of DOS or Windows are not consistent in how they handle quotes on the command line. If GREP is not behaving as expected, it might be GREP or the command processor or your own error. You can diagnose the problem easily by adding this to the end of your command:

        /D- | grep "grep G[CR]:"

The /D option shows you debugging information, and the second GREP call selects only information about what arguments GREP received from the command line and how GREP interpreted the regex.

Usually adding backslashes will make things work as you wish. If not, your other options are to use the /F- option and type in the regex at the keyboard, or store it in a file and use the /Ffile option. In a regex read from keyboard or file, a quote is always a normal character.

4.1.4  Regex Starting with - or /

If your regex begins with a hyphen - or slash / and you type it on the command line, GREP will think it’s an option. To prevent this, give it an extra backslash \ at the start. For instance, to search for /x in all .TXT files in the current directory, use a command like

        grep \/x *.txt

When your regex is in a file (/Ffile option) or when GREP prompts you to type it from the keyboard (/F- option), there’s no problem with a leading hyphen or slash, and the extra backslash is unnecessary (though harmless).

A really unlikely scenario, mentioned for completeness: With the /E0 option, the “regex” is actually a plain string with no special characters. If you need to search for a string that begins with - or / and you can’t let it be a regex, then you can’t do it on the command line. In this case you’ll need to use the /F option.

4.1.5  Limitations

For basic regexes, GREP is limited to 127 characters compiled into no more than 511. The “compiled” basic regex is GREP’s internal representation, after character ranges have been expanded and so on.

For extended regexes, the maximum compiled size is 65,539 (sic) bytes. There can be no more than 65,536 capturing subpatterns, and all kinds of subpatterns can be nested no more than 200 levels deep.

4.2  Normal Character (any regex)

Any normal character matches itself. (Any character that is not a special character is a normal character. The special characters were listed in the Overview.) Example: the regex abc matches input lines that contain the three-character sequence abc anywhere on the line.

GREP can handle any character from space through character 255. When using 8-bit characters or certain special characters on the command line, see Special Rules for the Command Line below.

If you specify the /I option, any letter in your regex matches both the upper and lower case of that letter. (By default, only unaccented English letters A-Z and a-z are affected by the /I option. In GREP32, you can use the /M option to select a mapping that includes all letters.)

If you want to match a special character, you must precede it with a backslash \ in your regex.

Example: to search for the string ^abc\def, you must put backslashes before the two special characters:

        \^abc\\def

That makes GREP treat them as normal characters and not give them special meanings. The Overview lists all the special characters.

4.3  . for Any Character (any regex)

The period (full stop or dot) in a regex normally matches any character. Example: o.e matches lines that contain “ode”, “one”, “ope”, “ore”, and “owe”. Of course it also matches lines that contain “oae”, “o e”, “o$e”, “o´e”, and so on.

In binary mode, the period matches any character without exception. But in text mode, there are some special cases:

If you want to match a literal period, for instance to search for “3.50”, you need a backslash \ before the period in your regex to turn it into a normal character, like this:

        3\.50

A period between square brackets is just a normal character. For example, [.?!] matches any of the characters that end an unquoted sentence.

4.4  * or + for Repetition (any regex)

A plus sign + after a character, character class, subexpression, or back reference matches one or more occurrences; an asterisk * matches zero or more occurrences. In other words, the plus sign means “one or more” and the asterisk means “any number, including none at all”.

(The note on greediness below applies to * and + in extended regexes.)

Example: Big.*night matches lines that contain “Big” followed by any number of any characters followed by “night”. Since “any number” could be zero, that regex also matches lines that contain “Bignight”.

Examples: snor+ing matches lines that contain “snoring”, “snorring”, “snorrring”, and so on, but not “snoing”. snor*ing matches those and also “snoing”.

Used with a character class or character type, the plus sign + and asterisk * match any multiple characters in the class, not only multiple occurrences of the same character. For instance, sno[rw]+ing matches lines that contain “snowing”, “snorwing”, “snowrring”, and so on.

Obligatory example: [A-Za-z_][A-Za-z0-9_]* matches a C or C++ identifier, which is an English letter or underscore, possibly followed by any number of letters, digits, and underscores. (The square brackets enclose character classes.)

But + and * are normal characters when used between square brackets [ ]. For instance, the regex 2[*+]2 matches lines containing “2+2” and “2*2”.

Anything followed by * always matches. For example, the regex .* would match any number of characters including none, meaning that empty and non-empty lines would match. .* is more useful as part of a regex.

4.5  ? for Optional Match (extended regex)

In an extended regex only, a question mark after a character, character class, subexpression, or back reference indicates that the construct is optional. For example, the extended regex move?able matches lines containing “moveable” or “movable”, but not “moveeable”; labou?r matches lines containing “labour” or “labor”.

(The note on greediness below applies to ? in extended regexes.)

? is a normal character when it occurs within square brackets in an extended regex; it’s always a normal character in a basic regex.

Anything followed by ? always matches. For example, the extended regex .? would match one character or none. Since every line contains a string of no characters (whether or not there are some additional characters on the line), every line would be a match.

4.6  { } for Repetition (extended regex)

In an extended regex only, you can use curly braces { } after a character, character class, subexpression, or back reference to specify repetition. The general form is {minimum,maximum} where both numbers are in the range 0 to 65535 and minimum is less than maximum. Here are the three variations:

The braces are normal characters in other contexts. For instance, {,3} is just four normal characters because it doesn’t match any of the three variations listed above. The braces are always normal characters inside square brackets [ ], and the right brace on its own is always a normal character. Both braces are normal characters anywhere in a basic regex.

You already know convenient shorthand for the three most common combinations of minimum and maximum:

4.6.1  Greedy Quantifiers (extended regex)

(You can skip this advanced topic, unless you’re using capturing subexpressions or back references in your extended regex.)

The quantifiers { }, ?, *, and + can be “greedy” or “ungreedy”. A greedy quantifier consumes as many characters as possible without causing the overall extended regex to fail; an ungreedy quantifier consumes as few as possible without causing the overall extended regex to fail. Because both greedy and ungreedy quantifiers still let the overall regex succeed if possible, you don’t need to worry about the distinction unless you’re using capturing subexpressions and back references.

In an extended regex, all quantifiers are greedy by default. You can make a particular quantifier ungreedy by putting a question mark after it: { }?, ??, *?, or +?.

For details and examples, please see the Repetition section of the included file PCRE Specification.

4.7  [ ] for Character Class (any regex)

To match any one of a group of characters, enclose them in square brackets [ ]. Examples: [Aa] matches a capital or lower-case letter A; sno[wr]ing matches lines that contain “snowing” or “snoring”.

Immediately after the opening [ or [^, a right square bracket is just a normal character. For example, []abc] matches the character ], a, b, or c.

A right square bracket after a left square bracket and at least one other character ends the character class, though as always you can use a backslash to make it normal. For example, [abc\]] is the same character class as []abc].

Finally, a right square bracket with no preceding left square bracket is a normal character.

In an extended regex, certain abbreviations and class names are available for commonly used classes.

4.7.1  - for Character Range (any regex)

You can indicate a character range with the minus sign or hyphen -, ASCII 45. For example, [0-9] matches any single digit, and [a-zA-Z] matches any English letter.

A character class can contain both ranges and single characters, mixed any way you like as long as each range within the class is written low-high: T-f is fine since they are ASCII 84 and 102, but f-T is invalid.

There’s no difference to GREP between writing out all the characters in a range and using the minus sign to abbreviate a range: [pqrsty] and [ytsrpq] and [yp-t] and [yq-stp] are just some of the ways to write the same class.

The minus sign is a normal character outside square brackets. It’s also a normal character if it occurs at the beginning or end of a class (immediately after the opening [ or [^ or immediately before the closing ] character). Otherwise, you can always make it a normal character with a backslash.

For example, if you want to search for any of the four arithmetic signs, any of the regexes [+\-*/] and [-+*/] and [+*/-] does the job.

Here’s one final example: To match any Western European letter (under most recent versions of Windows, in North America and Western Europe), a basic regex is

        [a-zA-ZÀ-ÖØ-öø-ÿ] 

(Note 1. That regex works fine on the command line with GREP16 or in a file [/F option] with either GREP. But to enter it on the command line with GREP32, you must use numeric sequences for the 8-bit characters; see Special Rules for the Command Line below.)

(Note 2. In GREP32, you can avoid the above mess. Set an appropriate character mapping with the /M option and use the extended regex [[:alpha:]]. (The /E2 option selects extended regexes, and named character classes are discussed below.)

4.7.2  [^ ] for Negative Character Class (any regex)

To match any character that is not in a class, use square brackets with a caret or circumflex, ^, ASCII 94.

Examples: [^0-9 ] matches any character except a digit or a space, and the[^a-z] matches “the” followed by anything except a lower-case letter.

The negative character class matches any character not within the square brackets, but it does match a character. It might help to read it as “a character other than …” rather than just “not …”. For instance, the[^a-z] matches “the” followed by a character other than a lower-case letter, but it does not match “the” at the end of a line where “the” is not followed by any characters. For further explanation, please see the Finding a Word under the rules for ^ and $, below.

The caret ^ has a different meaning when it occurs outside square brackets. And when it occurs within square brackets but not immediately after the opening left square bracket, the caret is a normal character.

4.7.3  Character Class and Case-Blind Matching (any regex)

If you use the /I option to specify case-blind matching, then the character class [abc] matches an upper-case or lower-case a, b, or c. With the /I option in effect, [^abc] matches any character except A, a, B, b, C, or c.

4.7.4  Character Class Names (extended regex)

Extended regexes support POSIX character class names, such as [:lower:] for any lower-case letter and [:^lower:] for any character except a lower-case letter. Notice that you can negate a character class name by putting a caret or circumflex ^ after the first colon.

These are not character classes, but special names that you can insert within square brackets as (part of) a character class. For instance, the extended regex

        [AB[:^alpha:]]

matches a capital A or B or any non-alphabetic character.

Here’s the complete list of POSIX character class names. Remember that they occur inside the normal square brackets for a character class. Also remember that they must be surrounded by [: :], or [:^ :] for negation.

word Any “word” character (letters, digits and underscore, same as \w and can be redefined with the /M option).
alnum Any letter or digit.
alpha Any letter.
lower Any lower case letter.
upper Any upper case letter.
digit Any decimal digit (same as \d).
xdigit Any hexadecimal digit, decimal digits plus A-F and a-f.
space Any whitespace character (same as \s).
graph Any printing character, excluding space.
print Any printing character, including space.
punct Any printing character, excluding letters and digits and the space character.
ascii Any ASCII character (see note below).
cntrl Any control character.

The exact definitions of the above classes depend on the character mapping in effect. In the default C locale, the above classes match only 7-bit characters (character positions 0-127); in other mappings, 8-bit characters also match. You can set the character mapping with the /M option.

Use the supplied file TEST255 to test the meaning of any character class in your selected locale; see examples in the supplied TOUR.BAT file.

4.8  ^ and $ for Start and End of Line (any regex)

A caret or circumflex ^, ASCII 94 at the start of a regex means that the regex starts at the beginning of a line. (In paragraph mode, /G2 option, a caret in a basic regex matches the beginning of a paragraph and a caret in an extended regex matches the beginning of a paragraph or a line.)

A dollar sign $, ASCII 36 at the end of a regex means that the regex ends at the end of a line in the file(s) being searched. (In paragraph mode, /G2 option, a dollar sign matches the end of a paragraph.)

The caret and dollar are sometimes called anchors because they anchor a regex to the start or end of a line (or both). They’re also the two best-known examples of assertions, constructs that match a condition rather than a character.

Caution: Basic and extended regexes treat ^ differently when it’s not at the start of the regex, and they treat $ differently when it’s not at the end of a regex. In a basic regex, ^ and $ are normal characters when they’re not in their “anchor positions”; in an extended regex they keep their anchor meaning (and therefore won’t match anything). For safety, always use a backslash before ^ and $ if you want them interpreted as normal characters.

The caret and dollar are always normal characters inside square brackets, except that the caret is special if it’s the first character after the left square bracket.

Examples:

You should probably use ^ and $ only in text mode or record-oriented binary mode, not in free-format binary mode. Also, they make sense only at the beginning and end of your regex. For those who prefer to live life on the edge, here are the full rules:

Basic regexExtended regex
With line-oriented text or record-oriented binary
(/R0 or /R2)
^ at the start of a basic regex matches the start of a line or record; everywhere else it’s a normal character. $ at the end of a basic regex matches the end of a line or record; everywhere else it’s a normal character.

In paragraph mode (G2 option), ^ matches only the start of a paragraph and $ matches only the end of a paragraph.

^ and $ outside square brackets always mean start and end of a line or record, and if used anywhere but at their “anchor positions” they won’t match anything.

In paragraph mode (G2 option), ^ matches the start of a line or paragraph and $ matches only the end of a paragraph.

With free-form binary
(/R3)
^ and $ outside square brackets match the start and end of GREP’s read buffer, which isn’t likely to be useful. ^ and $ outside square brackets match a newline (ASCII 10).
When GREP senses file format
(/R-1 or /R-2)
Don’t use ^ and $ in a regex with the /R-1 or /R-2 option. If you do use them, they work correctly in text files, but in binary files they match the start and end of every buffer, arbitrary file positions that are not likely to be useful.

It’s a historical artifact that the rules for basic and extended regexes are not quite the same.

4.8.1  Finding a Word (Lengthy Example)

Suppose you want to find the word “the” in a file, whether in caps or lower case. You can use the /I option to make the search case blind, and concentrate on constructing the regexes.

This section shows progressive refinements of the search technique. If using GREP32, you might want to skip it and just use the /E4 option.

At first glance, [^a-z]the[^a-z] seems adequate: anything other than a letter, followed by “the”, followed by anything but a letter. That lets in “the” and rules out “then” and “mother”. But it also rules out “the” at the beginning or end of a line. (Remember that a negative character class does insist on matching some character. Read it as “any character other than …” rather than as simply “not …”.) The solution with basic regexes requires four of them, for “the” at the beginning, middle, or end of a line, or on a line by itself:

        ^the[^a-z]
        [^a-z]the[^a-z]
        [^a-z]the$
        ^the$ 

To search for just the occurrences of the word “the”, put those four lines in a file and then use the /F option on GREP.

But this becomes much easier with the power of extended regular expressions (/E2 option, GREP32 only). You can search for the word “the”, not embedded in larger words, with one extended regex:

        grep /E2 \bthe\b 

Read this as “a word boundary, followed by t-h-e, followed by a word boundary.” As you would expect, start and end of line count as word boundaries.

Easiest of all, the /E4 option (GREP32 only) supplies the \b sequences for you:

        grep /E4 the 

There might be one problem with the above regular expression: it would not match “the6” or “the_” since the underscore and the digits are considered “word” characters. (This is how the -w option works in most UNIX greps, too.) It’s not likely you’d get such sequences in a text file, but if you want to be absolutely precise you should use something like the /Mfr,alpha option to define “word” characters as just letters.

4.9  | for Alternatives (extended regex)

In an extended regex only, the vertical bar or pipe |, ASCII 124 separates two or more alternatives. The extended regex matches lines that contain any of the alternatives. It’s legal for an alternative to be empty, and this can be useful in subexpressions.

Example: the extended regex cat|dog matches any input line that contains the string “cat” or “dog”.

If you want alternatives for part of an extended regex, use parentheses or round brackets to form a subexpression. See the examples in the section on subexpressions.

If the alternatives must occur at the start or end of a line, the anchor needs to be in each alternative. Example: to match lines that start with “cat” or “dog”, use ^cat|^dog as your extended regex. Another way to express that is with a subexpression, ^(cat|dog).

Efficiency note: Alternatives can be slower than character classes. The extended regexes bar|bat and ba(r|t) are logically equivalent to the basic regex ba[rt], but the latter generally executes faster (even as an extended regex). You may or may not notice any time difference, depending on the speed of your computer and the size of the files that you’re searching.

Caution: The vertical bar | has special meaning on the command line. If your operating system doesn’t let you override that meaning, use the /F- option to enter your regex from the keyboard, or see Backslash for Character Encoding below.

4.10  ( ) for Subexpressions (extended regex)

In an extended regex only, the parentheses or round brackets ( ) have several uses, but only two are discussed in this manual. (Parentheses are normal characters anywhere in a basic regex, and inside square brackets in an extended regex.)

The first use is straightforward: to set up alternatives as part of an extended regex. For example, the extended regex

        the quick (brown fox|white rabbit) 

matches lines containing either “the quick brown fox” or “the quick white rabbit”. Here’s another example, adapted from the PCRE manual page:

        cat(aract|erpillar|)s 

matches lines containing “cataracts”, “caterpillars”, or “cats”.

The second use of parentheses is to set up a “capturing subpattern”, which can be referred to with a “back reference”; see Backslash for Back References, below.

4.10.1  ( ) Advanced Topics (extended regex)

The parentheses or round brackets have several other meanings in an extended regex. To save space in this manual, they are not documented here but you can read about them in the accompanying PCRE Specification file:

4.11  The Backslash \

In a basic regex, the backslash (\) has only one use: “escaping” a special character to treat it as a normal character.

In an extended regex, the backslash has five uses: “escaping” a special character, designating character types, asserting a word boundary or other conditions, making a back reference, and encoding ASCII characters.

4.11.1  Backslash as Escape (any regex)

When the backslash precedes any special character it makes that character normal. For example, the regex 2+2 normally matches a string of two or more 2s. (The 2+ construct means “one or more occurrences of the character 2”.) If you want to match that middle character as an actual plus sign, you must “escape” it with a backslash: 2\+2.

If you want to match a backslash itself, you escape it in the same way. For example, the regex ^c:\\ matches every line that begins with “c:\”.

The backslash functions as an escape both inside and outside of square brackets. If you’re not sure when a non-alphabetic character like ] or $ is special and when it’s not, just precede it with a backslash to make it a normal character whether it already was one or not.

Example: To match any of the four signs of arithmetic, you might write the regex [+-*/]. But that minus sign has a special meaning inside square brackets. To treat it as a normal character you must escape it with the backslash, like this: [+\-*/].

This is the only use of the backslash in basic regexes; the others that follow all relate to extended regexes.

4.11.2  Backslash for Character Types (extended regex)

Many regexes involve a type of character: digit (or not), blank (or not), and so forth. While you can always use ordinary character classes, in an extended regex you can also use these shortcuts on their own or as part of a character class:

\w Any “word” character, meaning any letter or decimal digit or an underscore — can be redefined with the /M option.
\W Any character except a “word” character.
\d Any of the decimal digits.
\D Any character except a decimal digit.
\s Any whitespace character: (see below).
\S Any character except a whitespace character.
The definition of a whitespace character depends on the locale. In GREP16, and by default in GREP32, the whitespace characters are ASCII 9-13 and 32 (TAB, LF, VT, FF, CR, and space). You can change the locale in GREP32 with the /M option. For instance, with /Mfr the non-breaking space (160) is added to the above list of whitespace characters. To list the whitespace characters in a given locale, use the supplied TEST255 file with this GREP command:
            grep32 /R2 /W21 /E2 ^\s TEST255

The exact definitions of the above types depend on the character mapping in effect. In the default C locale, no 8-bit characters (characters 128-255) are considered as possible "word" characters, digits, or whitespace; in other mappings, some 8-bit characters also match. You can set the character mapping with the /M option. Use the supplied file TEST255 to test the meaning of any character type in your selected locale; see examples in the supplied TOUR.BAT file.

Example: To scan a file for four-digit numbers, your regex could repeat the \d four times or use curly braces { }: \d\d\d\d or \d{4}.

Did you spot the problem with this example? Yes, either of those extended regexes matches lines containing four-digit numbers. But it also matches lines containing five-digit numbers, since a five-digit number contains four consecutive digits. One way to match numbers of exactly four digits is to mark them as being preceded by start or line or a non-digit, and followed by end of line or a non-digit:

        (^|\D)\d{4}($|\D) 

Of course, if you know something about the files you’re scanning you may not need to get so elaborate.

Example: To scan for four hexadecimal digits, use the extended regex

        [\da-fA-F]{4} 

(This one has the same problem as the previous example: it also matches five or more hex digits. Fixing it is left as an exercise for the reader!)

4.11.3  Backslash for Assertions (extended regex)

The assertions in this section look like the above character types, but there’s an important difference. The difference is that while a character type matches a character of specified type, an assertion matches a position in the line and doesn’t “consume” a character. (You already know two examples of assertions, namely the anchors ^ and $.)

\b Word boundary, namely the transition between a word and a non-word character or vice versa, or the beginning or end of line if the adjacent character is a word character.
\B Not a word boundary.
\A Similar to ^ but matches start of buffer even in free-form binary mode (/R3 option).
\Z Similar to $ but matches end of buffer even in free-form binary mode (/R3 option).

These assertions are not valid inside square brackets, and in fact \b has a different meaning inside a character class; see Backslash for Character Encoding, below.

4.11.4  Backslash for Back References (extended regex)

Outside square brackets, a backslash followed by a digit other than 0 is interpreted as a back reference to a capturing subpattern in the regex. For example, \6 refers to the sixth capturing subpattern in the extended regex.

Example (from the PCRE man page): the extended regex

        (sens|respons)e and \1ibility 

matches “sense and sensibility” or “response and responsibility” but not “sense and responsibility”. A back reference always refers to the actual matching subpattern in this particular instance, not to just any alternative.

Example: U.S. toll-free area codes are 800, 888, 877, 866, and 855. The regex 8[08765]{2} would be wrong because it matches strings like “867” and “808”. You need a back reference to ensure that the third digit is the same as the second:

        8([08765])\1

is your regex. That says you must have an 8, followed by 0, 8, 7, 6, or 5, followed by a second occurrence of the same digit.

A “back reference” can actually be a forward reference: any of \1 through \9 refers to the first through ninth capturing subpattern in the extended regex, even if that subpattern comes after the “back reference” in the regex. But \10 and greater can refer only to subpatterns that precede the back reference. If something looks like a back reference but the number is greater than 9 and greater than the number of capturing subexpressions to the left of it, it is read as an encoded character in octal.

4.11.5  Backslash for Character Encoding (extended regex)

The last use of backslash in extended regexes is also the ugliest. You can use a backslash to encode certain characters, either non-printing characters or those that DOS or Windows doesn’t allow in command arguments.

But be aware that you may not need these rules — if you use the /F option to enter a regex from the keyboard or in a file, you can include any character in it except NUL (ASCII 0), CR (ASCII 13), LF (ASCII 10), and Control-Z (ASCII 26)..

Also note that these rules for extended regexes are quite different from the Special Rules for the Command Line. It’s an unfortunate incompatibility, but neither can be changed because PCRE is a supplied library for extended regexes and users rely on existing behavior of basic regexes.

Except as noted, each of these sequences has the indicated meaning anywhere in an extended regex:

\a “Alarm”, the BEL character, ASCII 7.
\b Backspace character, ASCII 8, but only inside square brackets. Outside square brackets it is an assertion..
\cx A control character. If x is a letter, it’s straightforward: \cb and \cB are both Control-B, ASCII 2. If x is not a letter, it is XORed with 64 (hex 40)..
\e Escape, ASCII 27.
\f Form feed, ASCII 12.
\n “Newline”, line feed, ASCII 10. This character is never seen in a text file, since it marks a line break, but it can occur in a binary file.
\r Carriage return, ASCII 13. This character is never seen in a text file, since it marks a line break, but it can occur in a binary file..
\t Tab, ASCII 9.
\xhh The character with the given hex code hh (zero, one, or two digits). Examples: \x7c or \x7C is hex 7C (ASCII 124), the | character. \x or \x0 or \x00 is the NUL character, ASCII 0.
\0dd The character whose code is an octal number of one to three digits. \032 is Control-Z, ASCII 26.
\ddd This sequence, a backslash followed by one to three digits where the first one is not zero, is complicated. Outside square brackets, it’s read as a decimal number and is interpreted as a back reference (above) if possible. Otherwise, or always inside square brackets, it’s read as an octal number and the least significant 8 bits are taken as its value. Examples: \7 is a back reference. \11 is a back reference if there have already been eleven capturing subpatterns; otherwise it’s octal 11, ASCII 9, the tab character.

4.12  Special Rules for the Command Line

GREP defines some special sequences starting with a backslash \ to let you get problem characters into your regex.

These rules date back to a much earlier release of GREP. Better ways are available now (see the /F option), but the special rules are maintained for upward compatibility.

When the special rules are in effect, you can find out how GREP applied them by using the /D option and looking for the “massaged” string or regex.

The special rules are in effect by default, but you can turn them on or off with the /E option. The special rules never apply when regexes are read from file or keyboard (/F option).

4.12.1  When Do You Need the Special Rules?

You need them only when you enter a regex or search string on the command line (no /F option), and either of these is true:

When you select extended regexes (/E2 option), you probably don’t want the special rules given below. Extended regexes come with their own ways of using a backslash for character encoding, and therefore the /E2 option turns off the special rules automatically.

4.12.2  What Exactly Are the Special Rules?

Special “escape sequences” give you a way to enter special characters in a regex on the command line, as follows:

instead of you can use any of
< (less) \l \60  \0x3C \074
> (greater) \g \62  \0x3E \076
| (vertical bar) \v \124 \0x7C \0174
" (double quote) \" \34  \0x22 \042
, (comma) \c \44  \0x2C \054
; (semicolon) \i \59  \0x3B \073
= (equal) \q \61  \0x3D \075
(the space character) \s \32  \0x20 \040
(tab) \t \9   \0x09 \011
(escape) \e \27  \0x1B \033

You can enter any character as a numeric sequence, not just the special characters in the above list. Use decimal, hex (leading 0x), or octal (leading zero). Example: capital A would be \65, \0x41, or \0101. \0 is not allowed; either code something like [^\1-\255] (“any character except ASCII 1 to 255”) in your basic regex, or use an extended regex.

There are additional problems with quotes in a regex, because the command line interprets them differently from GREP, strips them out, or even welds command-line arguments together in ways that can surprise you. Please see Quotes in a Regex for discussion and some strategies for coping.

5. Options

Because this program helps you,
please click to donate!
Because this program helps you,
please donate at
BrownMath.com/donate.

The first section tells you how to specify options, and the last four describe the options in detail by functional groups: input file options, pattern-matching options, output options, and general options.

Contents:

5.1  Specifying Options

5.1.1  On the Command Line

On the command line, options can appear anywhere, before or after the regex and the input files. All options are processed before any files are read.

You have a lot of freedom about how you enter options: use a leading hyphen or slash, use upper- or lower-case letters, leave spaces between options or combine them. For instance, the following are just some of the different ways of turning on the /P3 option and /B option:

        /p3 -b    /b/P3    /p3B    -B/P3    -P3 -b 

For clarity, you should always use a hyphen or slash before the numeric /0 option and /1 option. Example: /E0 means the /E option with a value of 0, but /E/0 means the /E option with no value specified, followed by the /0 option.

When you set up GREP commands in a batch file or script or a makefile, I strongly recommend you begin each GREP call with the /Z option. This will neutralize any options that may be stored in the environment variable.

5.1.2  In an Environment Variable (ORS_GREP)

If you use certain options frequently, you can put them in the ORS_GREP environment variable. You have the same freedom as on the command line: leading slashes or hyphens, space separation or options run together, caps or lower case.

Example: If you prefer to have GREP sense the type of each file (/R-1 option) and you prefer UNIX-style output (/U option) with line numbers (/N option), then you want to set the environment variable as

        /R-1UN 

or

        /R-1 /U /N 

or similar.

Only options can be put in the environment variable. If you want to store a regex, put it in a file and put the /Ffile option in the environment variable; if you want to store a list of input filespecs, put them in a file and put the /@file option in the environment variable.

Setting the Environment Variable

You can use a SET command on the command line to set the environment variable temporarily for that session. For example:

        set ORS_GREP=/u /i 

In Windows versions all the way back to Windows XP, you can set an environment variable permanently as follows. Press the Windows logo key and Break key at the same time, then Advanced system settings » Environment Variables. (If those keys are not on your keyboard, right-click Computer or This PC and select Properties » Advanced system settings » Environment Variables.) Set the variable in the user list for just yourself, or in the system list for all users of this computer. This will be effective in any command windows that you open after setting the variable.

In DOS and very old versions of Windows, you can make a permanent setting by including a SET command in your AUTOEXEC.BAT file.

Overriding Environment Variable Options on the Command Line

If you have some options in the ORS_GREP environment variable but you don’t want one of them for a particular run of GREP, you don’t have to edit the environment variable. You can make most changes on the command line, like this:

Extended example: Suppose you have set the environment variable as

        set ORS_GREP=/UNI 

because you usually run GREP with UNIX-style output (/U option) with line numbers (/N option), ignoring case of letters (/I option).

If you want to run case sensitive for one particular run of GREP, simply put the /I option on the command line to reverse the setting from the environment variable.

If you don’t know what’s in the environment variable (perhaps because you’re on an unfamiliar machine), either put the /Z option on the command line followed by the options you want, or set them positively by specifying for instance /I+.

Finally, if you want to turn an option definitely off, without regard to the environment variable, turn it on and then toggle it. To turn off line numbers, /N+N always works, whether N was set in the environment variable or not. (/N- might be more logical, but for historical reasons options with leading minus signs are allowed to run together, and such a usage would conflict.)

If you’re ever in doubt about the interaction of options between the command line and the environment variable, simply add /D-|more to the end of your command line and GREP tells you all the option settings in effect and how it interprets your regex.

5.2  Input File Options

/@- or /@file — Take Input Filespecs from Standard Input or File

If you have too many input filespecs to put on the command line, you can put them in a list file for GREP to read. This can also be useful when GREP or another program generates a list of files and you want to have GREP examine every file in the list; see an example below.

file must follow the @ with no intervening space, and ends at the next space; it must not contain wildcards. If you use a minus sign for the file (the /@- option), GREP accepts filespecs from standard input. Standard input is the keyboard, unless you redirect it from a file with the < character or pipe it from another command with the | character.

In the list file, filespecs must appear one per line. They may contain wildcards. Spaces are legal within a filename; don’t put quotes around a filename that contains spaces. Leading and trailing spaces are automatically removed; if you actually want a space at the start or end of the filespec you can specify it as [ ] in square brackets.

Interactions:

Example: Suppose you want a list of files that contain both “this” and “that”, but not necessarily on the same line. You can GREP once for “this” and produce a file list with the /L option, then GREP a second time for “that”, using just the files that contain “this”:

        grep this * /L | grep that /@- /L 

/A — Include Hidden and System Files

Ordinarily, GREP ignores hidden and system files while searching for files that match a wildcard. But with the /A option, GREP includes hidden and system files when expanding wildcards (*, ?, [) in filespecs.

The /A option also modifies the action of the /S option (if present), determining whether subdirectories marked hidden or system are searched.

The /A option matters only when expanding wildcards or searching subdirectories.. If you explicitly specify a file on the command line, the /A option is irrelevant and GREP always reads it even if it’s a hidden or system file.

The /A option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /A+.

/Gn — Read Variable-Length Text Lines or Paragraphs

/G0
When a line is longer than the txwid value in the /W option, chop the line and treat the remainder as a new line. (GREP always did this before release 7.5, but now if you want this behavior you must specify it with /G0.)
/G1 (default) Handle text lines of any length. The txwid value in the /W option is simply an initial allocation; GREP quietly reallocates buffers as needed when it reads long lines.

Since GREP32 can use Windows virtual memory, you’ll almost certainly have no problems with the default /G1. But GREP16 will run out of memory if

(actual line length) times (1 + before from the /P option)

totals around 60K. If this happens, run GREP again either with a lower /Pbefore option or with the /G0 option.

/G2 Search paragraphs rather than lines, to find matches that might begin and end on different lines. GREP /G2 defines a paragraph to be any sequence of non-blank lines ended by one or more blank lines or by end of file. A blank line is one that contains no characters or contains only whitespace characters.

Please see More about Paragraph Mode below.

Interactions: If you specify binary mode with the /R2 or /R3 option, GREP ignores the /G option and displays a warning message. If you specify the /G option with the /R-1 or /R-2 option, GREP applies the /G option only to the files that are actually text but doesn’t display a warning message about the binary files.

More about Paragraph Mode (/G2)

When GREP reads text files in paragraph mode (/G2), it pastes together a set of lines with newlines (ASCII 10) as separators, and tests regexes against the whole paragraph. If there’s a match, GREP normally displays the paragraph (followed by a blank line); you may want to consider the /J option to limit how much text is displayed.

You can’t use the /P option with /G2.

Regexes: When writing your regex, bear in mind that lines within a paragraph are separated by a single ASCII 10 character (LF). If you actually want to test for this character, code it as \012 either with extended regexes (/E2 option), or with basic regexes when the special rules for the command line are in effect (no /E option and no /F option). For example, "Mozart or\012Beethoven" matches “Mozart or” at the end of a line and “Beethoven” at the start of the next line. If there might be trailing blanks after “Mozart or”, you want "Mozart or *\012Beethoven".

But probably you want to check for a phrase in a paragraph, without worrying about exactly where the line break occurs. To do this, there’s an easy way with extended regexes and a less easy way with basic regexes.

With an extended regex (/E2 option), use the character type \s as a shortcut. For instance,

grep /G2E2 Mozart\s+or\s+Beethoven filespecs

will find “Mozart”, “or”, “Beethoven” separated by one or more whitespace characters. In other words, you don’t care whether “Mozart or Beethoven” is all on a line or has a line break after the first or second word.

In a basic regex, you need a character class to accomplish the same thing. Assuming the special rules for the command line are in effect:

grep /G2 Mozart[\s\t\012]+or[\s\t\012]+Beethoven filespecs

As you see, in a basic regex you need to specify the space, tab, and newline (octal 012 = ASCII 10) as a character class. (If you want, add \014 for a form feed.)

Two special notes: The period or full stop in a regex matches any character; with /G2, that includes the ASCII 10 that separates lines in a paragraph. Also with /G2, the dollar sign matches the end of paragraph but not the ends of those interior lines.

Output: Hits are displayed with the line breaks and spacing from the original file; paragraphs in output are separated by blank lines. If your paragraphs are long, you might consider the /J option to reduce the volume of output.

Line numbers (/N option) are replaced by paragraph numbers.

Memory usage: Since GREP32 can use Windows virtual memory, you most likely won’t have memory problems even with long paragraphs. But GREP16 could run out of memory if you have long paragraphs. If this happens, one workaround is to use /R3 option with a suitable buffer size (/W option) instead of the /G option.

/Rn — Read Input Files as Binary or Text

Binary Files and Text Files gives detailed information about the differences among text files, free-format binary, and record-oriented binary.

GREP normally matches the output format automatically to the input file type. However, you can specify the output format yourself by using the letter /o option.

You can choose from these file input modes:

/R0
(default) Read all input files as text.
/R1 (reserved for future use)
/R2 Read all input files as record-oriented binary. The fixed record length is given by the /W option. (You’ll see warning message if the last record is incomplete.)
/R3 Read all input files as free-format binary, using the buffer size given in the /W option.

To find at least one match, make sure your buffer size is at least twice the longest string you expect to find; you’ll probably want to use the /J option to restrict the size of output to something manageable.

To find all matches, use the /J2 or /J3 option and the largest buffer you can; ideally the buffer would be large enough to hold the whole file.

/R-1
/R-2
Examine each input file to decide whether to read it as free-format binary (like /R3) or text (like /R0); display “binary” or “text” with the filespec in the header. /R-1 reads only the first 256 bytes and /R-2 reads the whole file or until it finds a binary character; otherwise the two are identical.

If you gave two numbers with the /W option, the first number is used as line width for text files and the second as buffer size for binary files.

How does GREP infer the file type with the /R-1 or /R-2 option? GREP reads until it finds a binary character, namely any of the characters ASCII 0-6 or 14-26. The file is binary if it contains any of those characters; otherwise it’s treated as text.

Caution: After GREP decides whether the file is text or binary, it either rewinds the file (if it’s binary) or closes and reopens it (if it’s text). Ordinarily that’s not a problem, but if you specify a pseudo-file like COM1 or CON, GREP discards the bytes it used to sense the file format. Use /R-1 or /R-2 only with real files.

Should you use /R-1 or /R-2? Experiments show that 256 bytes is plenty for a correct decision for most file types, including picture files, executable programs, and MS Office files of all types. Adobe Acrobat PDF files are an exception, in that the first binary byte can show up well after byte 256; but the displayed text is encrypted in those files so you can’t search for text in them anyway. (If anyone knows of another file type where binary bytes show up only after byte 256, I’d be grateful for information.)

Thus /R-2 is theoretically safer than /R-1, but by the same token /R-2 is slower on a big file that is actually text. The difference may or may not be noticeable, depending on how fast your disk and your CPU are and how your operating system buffers file reads.

So which one should you use? My own choice is to put /R-1/X*.pdf in the environment variable. That way I’m confident that GREP will correctly sense the type of non-PDF binary files, without taking a long time to decide that a big text file is actually text.

Setting the /R option correctly lets you search for regexes in .EXE and .DLL files, word-processing files, and so forth. /R-1 or /R-2 can be particularly useful when you don’t know whether files are text or binary. (For instance, Microsoft Word writes some .DOC files in a binary format and some .DOC files in a text format. Or you might have some source files and some object files and want to search them all in one go.)

Only named input files can be read in binary mode — GREP always scans the standard input in text mode.

Also, when you use the /@ option to read a list of input files, or the /F option to read regexes from a file, GREP reads that file in normal text mode.

/S — Scan Subdirectories

With this option, GREP searches not only the files named on the command line (and in any list file specified with /@ option), but files of the same names in subdirectories. For full details, please see the section on subdirectory searches.

The /S option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /S+.

/Wwidth or /Wtxwid,bnwid — Specify Line Width or Binary Block Length

With this option, you tell GREP to expect text lines up to txwid characters long, or process binary files in records or buffers of bnwid bytes. (If you specify only one number, it’s used for both txwid and bnwid.)

txwid and bnwid default to 4096 in GREP32, and you can specify anything from 2 to 2147483645; the default for GREP16 is 256 and you can specify 2 to 32765. (The widths are also limited to available memory, which depends on your system configuration, what other programs you have running at the time, and what you specify with the /P option. With GREP32, available memory includes Windows virtual memory. GREP16 is limited to its 64 KB data segment.)

For full details of binary and text file modes, please see Binary and Text Files.

Text Mode (/W Option without /R or with /R0)

Starting with release 7.5, by default txwid is simply an initial value and not an absolute limit on the width of a text line. Therefore, by default this section no longer applies.

However, if you actually want input lines to be split every txwid characters, specify the /G0 option. The CR/LF (ASCII 13 or 10 or both) line terminator doesn’t count against the specified txwid. Reading a long line from the input, GREP breaks the line after txwid characters and treats the remainder as a separate line. GREP scans the whole line and finds any match within one of the fragments, but misses any match that starts before the break and ends after the break. Therefore, if possible you should set txwid large enough to hold the longest line in the file (or simply don’t specify the /G0 option). If GREP does find any lines longer than the specified or default txwid, it displays a warning message at the end of execution, telling you the length of the longest line. (This warning is suppressed by the /Q3 option.) GREP also logs every such file in the debug output; look for “exceeds your specified length”.

Record-oriented Binary Mode (/W Option with /R2)

Files are read in records of bnwid bytes. Make sure that you set bnwid to the exact length of the records in the binary file.

Free-form Binary Mode (/W Option with /R3)

Files are read in buffers of bnwid bytes. (If you specify an odd number, GREP will increase it by 1.) The recommended value of bnwid depends on the /J option value, as follows:

When GREP Chooses File Mode (/W Option with /R-1 or /R-2)

txwid is used as a line width for any file that is treated as a text file, and bnwid is used as buffer width for any file that is treated as free-form binary. bnwid must be an even number; if not, GREP will increase it by 1.

When you use the /R-1 or /R-2 option, I recommend that you specify two numbers with the /W option. The first number, text line width, doesn’t much matter unless you specify the /G0 option. If you do, you should specify a largish text width so that every line is kept as a unit. For the second number, binary buffer width, see the advice in Free-form binary mode, just above.

/Xpattern — Exclude Matching Files from Scan

When expanding wildcards in the named input files, GREP ignores any files that match the pattern. The pattern may contain *, ?, and [ ] wildcards, but no drive or path information. pattern must follow the X with no intervening space, and ends at the next space.

For example, if you specify /X*.exe to exclude all .EXE files, and your list file contains ABC*, GREP processes all files starting with ABC except for ABC*.EXE.

An input filespec without wildcards will be honored without regard to any /X option exclusions. For instance, if you have /X*exe in the environment variable, and you type the command

        grep /r3j2 warning prog.exe

then GREP will read the file prog.exe even though it matches the exclusion, on the theory that when you specify a particular file you must want it read. The same is true on the command line:

        grep /r3j2 /x*exe warning prog.exe

Here again, GREP will read the file prog.exe; it doesn’t distinguish between options on the command line and options in the environment variable. However, if you specify a wildcard in the input filespec, line this:

        grep /r3j2 /x*exe warning prog*.exe

then GREP will search for files matching prog*.exe but will exclude all of them because they match the exclusion *.exe.

To specify multiple exclusion patterns, specify multiple /X options. Example: To exclude MS-Word documents, Excel spreadsheets, and ABC.DEF from the search, type something like this:

        grep regex /x*.doc /x*.xls /xabc.def *

or

        grep regex * /x*.doc /x*.xls /xabc.def 

Remember that GREP reads and interprets all options before it looks at the input files. Therefore the exclusions that you specify with /X will be applied to all input filespecs, even those that come before the /X on the command line and those specified in a list file (/@ option). For example, the two commands shown just above this paragraph have exactly the same effect.

You can store one or more /X options permanently in the environment variable. Any /X exclusions on the command line are equally effective with those in the environment variable. The special case /X* tells GREP to disregard all previous exclusions specified with /X.

5.3  Pattern-Matching Options

/Eregex_level — Select Extended Regexes or Strings

This option tells GREP how to interpret the regex(es) you enter on the command line, from keyboard, or in a file.

Basic and extended regexes are fully explained under Regular Expressions, later. An extended regex supports all the features of a basic regex plus the quantifiers ? and {…}, alternatives |, subexpressions (…), some special constructs with the backslash \, and more.

/E0 Don’t use regular expressions at all. Treat the regex(es) as simple literal strings and search files for exact match with no special treatment of any characters.
/E1 (default) Treat regexes as basic regexes. (This is how GREP always worked before release 6.0.)
/E2 (GREP32 only) Treat regexes as extended regexes.
/E4 (GREP32 only) Treat regexes as stand-alone words. For example, if you specify the regex other, GREP finds all occurrences of “other” but ignores it where it occurs as “others”, “mother”, “brothers”, and so on.

By default, a “word” is any group of letters, digits, and underscores bounded by start or end of line and/or by any other characters. For instance, if you’re searching for other with the /E4 option, then “other55” would not be found because the 5s are part of the “word”. If this is a problem, you can redefine a “word” to be any sequence of non-blanks, or any sequence of letters. Please see the /M option for details.

When you use /E4 you probably won’t put special characters in your search regex. But if you do, it’s treated as an extended regex. In fact, the E4 option is the same as /E2 except that GREP slaps a \b (assert word boundary) at the beginning and end of your regex.

/E0\ /E1\ /E2\ /E4\ are the same as /E0 /E1 /E2 /E4 except that they turn on the (deprecated) Special Rules for the Command Line, which are described later in this manual. The Special Rules are the old way to have a regex contain characters like < and | that have special meanings on the command line. The better way to bypass command-line restrictions is to use the /F option and enter your regex.

If you never specify the /E option at all, the effect is the same as /E1\, which is basic regexes with the Special Rules for the Command Line enabled; this default was chosen to match GREP’s behavior before release 6.0. /E with no number is the same as /E1, which specifies basic regexes without the Special Rules.

/F- or /Ffile — Read Regexes from Keyboard or File

GREP reads one or more regexes from file instead of taking a single regex from the command line, and reports lines from the input file(s) that match any of the regexes read from file. You must enter the regexes one per line in the file; don’t put quotes around them.

file must follow the F with no intervening space, and ends at the next space; it must not contain wildcards.

If you use a minus sign for the file (/F- option), GREP accepts regexes from standard input. Standard input is the keyboard, unless you redirect it from a file with the < character or pipe it from another command with the | character.

When you supply two or more regexes, GREP normally reports each line from the input file that matches one or more of the regexes. If you set the /V option or /Y option or both, you modify that behavior according to the rules of logic. Specifically:

no /Y option/Y option set
no /V option (no /V, no /Y) GREP reports every line that matches one or more of the regexes. (/Y, no /V) GREP reports a line only if that line matches all of the regexes in any order.
/V option set (/V, no /Y) GREP reports only the lines that match none of the regexes. (If the input line matches one or more of the regexes, GREP doesn’t report it.) (/V /Y) GREP reports every line that matches less than all of the regexes, i.e. every line that matches 0 to N-1 of your N regexes in any order. (If the input line matches all the regexes, GREP doesn’t report it; if it matches some of the regexes but not all, or none of the regexes, GREP reports it.)

When using multiple regexes, you can speed up GREP’s searching slightly. With the /Y option, put the regex first that you expect to match the fewest input lines; without the /Y option, put the regex first that you expect to match the most input lines. If you’re not sure, don’t worry about it: unless the regexes are very complex, most of the time is spent reading lines from disk, not matching them against the regexes.

/I — Ignore Case in Matching

GREP ignores case, treating capitals and lower case as matching each other.

Caution: By default, the /I option does not apply to 8-bit characters (characters 128-255). You can turn on 8-bit character support in GREP32 with the /M option.

In GREP16, the /I option does not apply to 8-bit characters (characters 128-255) because Microsoft C 16-bit code does not support setting the locale. Therefore, if you want case-blind comparisons in GREP16, you must explicitly code any 8-bit upper and lower case in your regex. For instance, to search for the French word “thé” in upper or lower case, code it as th[éÉE] since é can be upper-cased as É or as plain E. The “th”, being 7-bit ASCII characters, will be found as upper or lower case by the /I option. (You may need to code 8-bit characters like éÉ in a special way if you enter them on the command line; either use the /F option or see Special Rules for the Command Line below.)

The /I option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /I+.

/Mloc or /Mloc,word — Specify Character Mapping and Define “Word”

This option is available only in GREP32, because Microsoft 16-bit C does not support setting the locale. There are four issues with locale: binary output, case-blind matching, the definition of a “word”, and character classes in general. Details about all four are given below, after the list of mappings.

While many locales (character mappings) are supported in GREP32, most are duplicates. The six unique locales are:

/Mc (default) The C locale, in which none of the characters 128–255 are considered letters, digits, punctuation, space, or printing characters.
/Mfr Code page 1252, valid for most European languages including Danish, Dutch, English, Finnish, French, German, Icelandic, Italian, Norwegian (both), Portuguese, Spanish, and Swedish; this also matches the MS-Windows U.S.A. character set.
/Mcsy Code page 1250, valid for Czech, Hungarian, Polish, and Slovak.
/Mell Code page 1253, valid for Greek.
/Mrus Code page 1251, valid for Russian.
/Mtrk Code page 1254, valid for Turkish.

I suggest you put an /M option in your environment variable with the appropriate locale and then forget about it. The locale affects the following issues:

The /M mapping affects how GREP interprets each character. But it does not affect the appearance of characters on your screen; that’s controlled by Windows language settings, or by DOS commands like CHCP.

/V — Display Lines That Don’t Contain a Match

GREP shows or counts the lines that don’t match the regex instead of those that do.

Interactions:

The /V option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /V+.

/Y — Multiple Regexes Must All Match

When multiple regexes are given (/F option), GREP normally reports a hit if the line, record, or buffer contains a match for any of the regexes. If you also set the /Y option, GREP reports a hit only if the line, record, or buffer matches every regex, though not necessarily in order. The normal test is an OR; the test with the /Y option is an AND.

For example, if you use the /F option and enter the two regexes brown and fox, then all of these lines match:

        The quick brown fox
        I see a brown smudge
        Crazy like a fox
        The foxtail is brown 

But if you also use the /Y option, then GREP matches only lines that contain both the regular expressions, namely the first and fourth lines in the example.

As you see from the example, with the /Y option, input lines must match all the regexes, but in any order. If you want to match all regexes in a specific order, specify them as a single regex connected with period and asterisk. For instance, to match lines that contain “brown” somewhere before “fox”, use the regex brown.*fox.

Interactions:

The /Y option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /Y+.

5.4  Output Options

Before reading about output options, you might like to glance over the standard forms of GREP output.

/B — Display a Header for Every File Scanned

Ordinarily, GREP displays a file header for each file that contains matches, but with the /B option GREP displays a file header for every file examined, even if the file contains no matches.

This option is meaningful only with FIND-style output, when the /U option is not set.

The /B option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /B+.

/C — Display the Hit Count, Not the Actual Hits

Use the /C option when you don’t want to display the matches, only count them. (You can use the /K option with /C to stop reading a particular file after a certain number of matches are found.)

GREP can count as high as 232−1, 4,294,967,295.

Ordinarily, the /C option will count the lines, records, or buffers that contain matches; in other words, even if a line contains five matches it will count only as one. However, if you use the /J2 or /J3 option with /C, GREP will count the actual matches rather than the lines containing them.

For free-form binary (the /R3 option), I recommend the /J2 or /J3 option since otherwise the buffer size may affect the number of matches found.

The /C option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /C+.

/H — Don’t Display Filespecs in Output

The /H option is most appropriate when you’re using GREP as a filter to extract lines from one or more named file for processing by another program, like this:

        grep /H "Directory" inputfiles | other program 

The /H option is not needed and has no effect with redirected input, such as

        grep /H "Directory" <inputfile

or

        other program | grep /H "Directory" 

GREP never displays a filespec header for redirected input.

If you want to keep the filename with each extracted line, use the /U option instead of the /H option.

The /H option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /H+.

/Jn — Display Just the Parts of Each Line That Match

Normally, GREP displays the whole line (or binary record or buffer) that contains a match. But with the /J option, you can display only the part(s) of the line that match the regex(es):

/J0 (default) Display the entire line that contains a match.
/J1 Display the first part of the line that matches the regex.

When there are multiple regexes (/F option), GREP matches the regexes in order against the line, displays the first part of the line that matches the first matching regex, and doesn’t try to match that line against any later regexes.

Historical note: The behavior of /J with no number hasn’t changed from previous releases, and it’s the same as the new /J1. But it was a half measure, and chances are good that you want to use /J2 or /J3 now that they’re available.

/J2 Display every non-overlapping part of the line that matches. (After displaying a match GREP continues from the first character after the end of the match, looking for further matches against any regex.)

For details and examples, please see Displaying every match in a file with /J2 or /J3, below.

/J3 Display every part of the line that matches every regex, allowing all overlapping matches. (After displaying a match GREP continues from the first character after the start of the match, looking for further matches against any regex.)

For details and examples, please see Displaying every match in a file with /J2 or /J3, below.

With /J0 (default) or /J1 there’s always exactly one output line for each input line that contains a match, but /J2 and /J3 can produce multiple lines of output from one input line. In fact, by allowing overlaps /J3 can report more matches than /J2.

The /J option is not allowed with the /V option, because it doesn’t make any sense to display only non-matches but display the part of each line that was a match. /J is also forbidden with the /Y option: /Y matches all regexes anywhere on the line and it’s not likely any particular part will match all of them.

Displaying Every Match in a File with /J2 or /J3

The /J2 and /J3 options were added in release 7.4 to let you display every match in the file. There’s only one difference between them: GREP /J3 reports all matches including overlapping matches, and GREP /J2 reports all non-overlapping matches.

Example (one variable-length regex): Suppose your regex is str[ab]*str and the line or buffer is straaastrbbbstrGREP /J2 reports the match straaastr, continues scanning at character 10, and doesn’t find another match. But GREP /J3 reports the match straaastr, scans again from character 2, and reports strbbbstr as a second match.

Example (one fixed-length regex): Even with a fixed-length regex, /J2 and /J3 can give you different results. Suppose your regex is abcabc and the line is abcabcabcGREP /J2 reports abcabc, then GREP continues scanning at character 7 and doesn’t find another match. But GREP /J3 reports abcabc (characters 1–6), then continues scanning at character 2 and reports a second match abcabc (characters 4–9).

With multiple regexes (/F option), both /J2 and /J3 display all matches in the order they occur in the input file. For instance, if you specified the regexes [A-Z]+ and [a-z]+ in that order, and Grep reads an input line wonderful Copenhagen, then GREP displays the three results wonderful, C, and openhagen in that order.

(This represents a change from release 7.4 to release 7.5. Release 7.4 of GREP displayed multiple matches in regex order — that would be C, wonderful, and openhagen for the same example.)

Things work as you expect and GREP always reports all matches when scanning text files or record-oriented binary files (/R0 or /R2 option). But there can be some surprises with free-format binary (/R3 option) if your regex is variable length or if you have multiple regexes.

The rest of this section goes into some detail about the interaction between free-form binary and /J2 or /J3. Since the only potential problems have to do with a match that straddles a buffer boundary, you can avoid them all by setting the buffer size (/W option) at least as large as the file size, if you have enough memory (including Windows virtual memory) on your machine.

With free-format binary files (/R3 option with /J2 or /J3), GREP adopts a flexible read length. That means that GREP /J2 reports all matches in a buffer, then discards everything up through the end of the last match and reads additional characters to make a full buffer again;  GREP /J3 is similar but discards everything up to and including the start of the last match. This flexible read length guarantees that GREP finds every match for a fixed-length regex, even in free-format binary files.

If your regex is variable length — if it contains quantifiers like * ? + or {,} — and a match happens to cross a buffer boundary, every match will be reported but GREP may or may not report the longest possible match.

Example (variable-length regex with buffer boundary): Your regex is aa[a-z]* and the string aabbccddeeff happens to occur in the file, but aabbccdd is at the end of one buffer and eeff is at the start of the next. In this case GREP reports aabbccdd as the match.

This situation arises only with (a) a variable-length regex in (b) a free-format binary file where (c) the end of the buffer matches the regex but (d) if you added some characters from the start of the next buffer you would have a longer match for the regex — and that’s a rare combination. If the match straddles a boundary but the part at the end of the buffer isn’t a match, GREP will keep reading and will report the whole match as long as it fits in a buffer.

Finally, in a free-format binary file when you have multiple regexes (/F option), a match can be missed if it straddles a record boundary under a combination of other circumstances (below). Though I don’t recommend doing multiple regexes with the /R3 option and /J2 or /J3, GREP allows it and matches the regexes in order against the buffer. It displays all the matches for the first regex, then all the matches for the next regex, and so on. But when (a) there are multiple regexes being matched against (b) a free-format binary file, and (c) a particular buffer matches more than one regex, and (d) the later regex’s match straddles a buffer boundary, then that later match may be missed.

Example (multiple regexes with buffer boundary): Suppose your regexes are somestr and string, and the file contains somestring.  Normally GREP /J2 or GREP /J3 would report somestr and string as matches. (They overlap, but they match separate regexes.) But suppose the buffer boundary happens to fall between the somestrin and the gGREP /J2 finds and reports somestr, then starts scanning from the end of the match, which is the letter i. Coming to the end of the buffer without a match, GREP reads enough characters after in to make up a full buffer. Since that buffer begins with ing, GREP doesn’t find a match; thus the match string is missed.

In that example, GREP /J3 does the right thing: after reporting somestr it resumes scanning from the o, then reads additional characters and reports the string match. But it’s possible to construct other scenarios where /J3 would miss a straddling match: for instance, regexes of 456abcdefg and [a-z]+ where the string 456abcdefg actually exists in the file but there’s a buffer boundary between d and e. I don’t know of any way logically to guarantee that /J2 or /J3 will find all matches that straddle a buffer boundary when there are multiple regexes and the buffer matches more than one.

This discussion has focused on the bad cases. Don’t lose perspective: for most practical uses /J2 or /J3 will work exactly as you expect. In particular, there are never any worries for text files or record-oriented binary; and even for free-format binary there are no worries with a fixed-length regex.

Recommendation: With /J2 or /J3, with multiple regexes or a variable-length regex, make the /W option buffer width at least as large as the file. If you can’t do this, search for multiple regexes one at a time.

/Kcount — Report Only the First Few Hits Per File

GREP stops reading each file and moves on to the next after reporting the first count hits. count may be any number from 0 to 9999. /K0 means to report all matches, and it is the default.

No special message is displayed in the output when GREP stops reading a file early because of the /K option. However, the event is noted in the debug output (/D option).

The /K option displays up to the indicated number of matches per file. There is no option in GREP to stop after displaying a certain number of matches total. But you can always redirect GREP output (>reportfile or |more) and then just look at the beginning of the output.

If you also use the /P option to report context lines before and after matches, you may see more matches than requested. For example, suppose you specify /K2P5,5 to get the first two hits per file, with five lines of context before and after each one. Five lines are reported after the second and last requested hit, naturally. Those five context lines might contain additional hits, which are shown, but the context doesn’t extend past the five lines that follow the second hit, the last one you actually requested.

Interactions:

/L — List Files That Contain Hits, Not the Actual Hits

This option lets you get a bare list of files, usually for further processing.

The /L option and /V option together list the files that don’t contain any matches. When you bring in multiple regexes with the /F option and possibly the /Y option, things get a bit trickier:

with the /L option
no /Y option/Y option set
no /V option (no /V, no /Y) GREP reports every file that contains a match for one or more of the regexes. (/Y, no /V) GREP reports a file only if at least one of its lines matches all of the regexes in any order.
/V option set (/V, no /Y) GREP reports only the files that match none of the regexes. (If any line of the input file matches even one of the regexes, GREP doesn’t report the file.) (/V /Y) GREP reports only files that don’t contain any single line matching all of the regexes. (If the file contains even one line that matches all the regexes in any order, GREP doesn’t report the file.)

The /L option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /L+.

Finding Files That Match a List of Regexes

It might not be obvious how to list files that contain every one of a set of regexes, not necessarily on the same line. This can’t be done in one pass, but you can get the same effect by chaining calls to GREP:

        grep /L this * | grep /@- /L that | grep /@- /L tother

The first call to GREP identifies the files that contain “this”; the second call reads only those files and identifies those that also contain “that”; the third call reads only that smaller group of files and identifies the ones that also contain “tother”.

This sort of chain runs faster if you search first for the string you expect in the fewest files, thus minimizing the number of files that have to be read multiple times.

If you want the actual lines that contain all three regexes, use the /Y option instead of the /L option:

        grep /F- /Y *

and when prompted enter the three regexes in succession.

/N — Show Line Numbers with Hits

FIND-style output with the /N option looks like this:

    ---------- GREP.C
    [ 144]        op_showhead = ShowNoHeads;
    [ 178]        else if (op_showhead == ShowNoHeads)
    [ 366]        op_showhead = ShowNoHeads;

    ---------- GREP_MAT.C
    [  98]        op_showhead == ShowNoHeads) 

With /N and the /U option used together, the UNIX-style output looks like this:

    GREP.C:144:        op_showhead = ShowNoHeads;
    GREP.C:178:        else if (op_showhead == ShowNoHeads)
    GREP.C:366:        op_showhead = ShowNoHeads;
    GREP_MAT.C:98:        op_showhead == ShowNoHeads) 

UNIX-style output is suitable for use with the excellent freeware editor Vim.

When is a “line number” not a line number? The identifying number depends on the file read options:

The /N option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /N+.

/O — Set Output Format

By default, GREP uses an output format that’s associated with the type of the input file, as shown below. But you can override that, if you wish, by using the letter /o option and one of the following modes. (Don’t confuse letter /o with the numeric /0 option.)

/o0 (default for binary files)
That’s letter oh followed by digit zero. Use raw binary output, in the form of a hex dump side by side with any printable characters (see Note 1). Here’s a sample:
        [  11]0A202863 6F646520 706F696E 74202031  >. (code point  1<
              30292020 20                          >0)              <
        [  12]0B202863 6F646520 706F696E 74202031  >. (code point  1<
              31292020 20                          >1)              <
        [  13]0C202863 6F646520 706F696E 74202031  >. (code point  1<
              32292020 20                          >2)              <
        [  14]0D202863 6F646520 706F696E 74202031  >. (code point  1<
              33292020 20                          >3)              <
        [  15]0E202863 6F646520 706F696E 74202031  >. (code point  1<
              34292020 20                          >4)              <

For another sample, see How Does GREP Display Hits?

/o1 (default for text files read as paragraphs, /G2 option)
Display text output naïvely — similar to the output of the type command on the Windows command line. If the file contains only printable characters and line breaks, this is the best mode. If it contains control characters like backspace (ASCII 8), tab (9), Control-Z (26), or NUL (0), it may not display properly.

Before GREP release 8.0, this was the output mode for all text files, and it’s still the default output for paragraph input. I suggest you select this output mode for all plain text input files. Here’s what the output shown above would look like with /o1:

program output, with funny characters and formatting

You may have different funny characters at code points 11, 12, and 14, depending on your country and code page among other things. Regardless, /o1 isn’t a good choice for text files that contain control characters.

One interesting side effect of /o1 mode is that it’s perfect for converting UNIX or Macintosh-format text files to Windows format:

grep /o1 .* <unixfile >dosfile

Implementation note: The other output modes all format one character at a time, but this mode just dumps the whole line — that’s puts( ), for C programmers.

/o2 (default for text files read as lines)
Display text output with this “safety filter”: control characters (ASCII 0–27) in ^X format, printable characters (see Note 1) as themselves, and other characters as the sequence <NN> where NN is the two-digit hexadecimal code for the character. The same output looks like this under /o2:
        [  11]^J (code point  10)
        [  12]^K (code point  11)
        [  13]^L (code point  12)
        [  14]^M (code point  13)
        [  15]^N (code point  14)

This filter is recommended when you have text that may contain a sprinkling of control characters, and it’s the default for text files read as lines rather than paragraphs.

/o3 Display text output with this “control filter”: Start a new line after any CR+LF or LF+CR pair or any single CR or LF. With this filter, CR (^M) and LF (^J) are displayed but also have the control function of “new line”; all other characters are treated the same as for /o2. Here’s the above output once more, this time with /o3:
        [  11]^J
               (code point  10)
        [  12]^K (code point  11)
        [  13]^L (code point  12)
        [  14]^M
               (code point  13)
        [  15]^N (code point  14)
/o−1 Select an output format to match the input; this is GREP’s default. The normal selections, unless you override them, are that binary input is displayed with /o0, paragraph input is displayed with /o1, and line-oriented text input is displayed with /o2.

Note 1: Printable characters in GREP16 are ASCII 32–126. In GREP32, the definition of a printable character depends on your locale, which you can set with the /M option.

Note 2: With the /R−1 or /R−2 option you can specify two output formats. For instance, grep /R−1 /o3,0 says that you want text files displayed in mode 3 and binary files in mode 0 (zero).

/Pbefore,after — Show Context Lines around Matching Lines

GREP shows matches in context by displaying before input lines before each match and after input lines after each match. If you omit after, GREP shows before lines before and another before lines after each match. Plain /P is the same as /P2,2.

Either number can be 0. For instance, use /P0,4 if you want to show every match and the four lines that follow it. /P0 or /P0,0 tells GREP to show only the matching lines with no context lines, and is the default.

An alternative way to show context is provided by the /G2 option.

If you use the /P option, you probably want to use the /N option as well, to display line numbers. In that case, the punctuation of the line numbers distinguishes which lines are actual matches and which are displayed for context. Here is some FIND-style output from a run with the options /P1,1N set:

        ---------- GREP.C
          143     if (opcount >= argc)
        [ 144]        op_showhead = ShowNoHeads;
          145
          177             PRTDBG "with each matching line");
        [ 178]        else if (op_showhead == ShowNoHeads)
          179             PRTDBG "NO");
          365     if (myToggle('L') || myToggle('U'))
        [ 366]        op_showhead = ShowNoHeads;
          367     else if (myToggle('B'))

        ---------- GREP_MAT.C
           97         op_showwhat == ShowMatchCount ||
        [  98]        op_showhead == ShowNoHeads)
           99         headered = TRUE; 

You can see that the actual matches have square brackets around the line numbers, and the context lines do not.

In UNIX format, with the /U option in addition to /N /P, GREP displays colons around the numbers of matching lines and spaces around the numbers of context lines:

        GREP.C 143     if (opcount >= argc)
        GREP.C:144:        op_showhead = ShowNoHeads;
        GREP.C 145
        GREP.C 177             PRTDBG "with each matching line");
        GREP.C:178:        else if (op_showhead == ShowNoHeads)
        GREP.C 179             PRTDBG "NO");
        GREP.C 365     if (myToggle('L') || myToggle('U'))
        GREP.C:366:        op_showhead = ShowNoHeads;
        GREP.C 367     else if (myToggle('B'))
        GREP_MAT.C 97         op_showwhat == ShowMatchCount ||
        GREP_MAT.C:98:        op_showhead == ShowNoHeads)
        GREP_MAT.C 99         headered = TRUE; 

Interactions between the /P option and the /R option:

GREP16 has to allocate space for the preview lines within the same 64 KB data segment as all other data. Consequently, if you specify a moderately large value, particularly with a large line width (/W option), you may get a message that GREP can’t allocate space for the lines. To resolve this, use GREP32 if possible; otherwise either reduce either the line width or the first number after /P (the before number); the second number, after, has no effect on memory use.

/U — UNIX-style Output: Show Filespec with Each Hit

GREP shows the filespec on the line with each hit, instead of just once in a separate header. This UNIX-style output is useful with editors like Vim that can automatically jump to the file that contains a match. Some examples of UNIX-style output were given in List of Hits and with the /N option and the /P option.

There’s one small difference from UNIX grep output: UNIX grep suppresses the filespec when there’s only one input file, but GREP assumes that if you didn’t want the filespec you wouldn’t have specified the /U option. Neither GREP nor UNIX grep displays a filespec if input comes from a file via < redirection.

The /U option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /U+.

5.5  General Options

/Dfile or /D- or /D — Display Debugging Output

Debugging information includes whether you’re running GREP16 or GREP32, the contents of the environment variable, the values of all options specified or implied, the input files specified, the raw and interpreted values of the regex(es), details of every file scanned, execution timings, and more. This information is normally suppressed, but you may find it helpful if GREP seems to behave in a way you don’t expect or if you have a bug report.

Since the debugging information can be voluminous, if you want to see it at all you’ll usually want to specify an output file:

/Dfile Write all debug information to the given filespec. file must follow the D with no intervening space, and ends at the next space; it must not contain wildcards. GREP appends to the file if it already exists.
/D- Send debugging information to the standard output, which you can redirect (>) or pipe (|). This intersperses debug information with the normal output of GREP.
/D Send debugging information to the standard error stream (normally the screen). Be careful not to specify any other options between /D and the next space, or they’ll be taken as a filespec.

You can weed through the debugging output to some extent. GREP writes the following unique strings on most lines of debugging output, so that you can send debug output to a file and then grep the file for the bits that interest you:

/Qlevel — Suppress the Logo and Unwanted Warnings

You can set the quietness level to suppress messages you may not want to see:

/Q0 (default) Show all messages.
/Q1 Suppress the information messages, namely the program logo and the final message with counts of matches and files; all warnings still appear.
/Q2 Suppress the information messages, as well as warnings about invalid combinations of options. Warnings about missing files still appear, and so does the warning about lines that were broken in the middle, possibly missing matches (see the /W option).
/Q3 Suppress all information messages and warnings; only the actual matches (if any) will appear. This level is not recommended unless you definitely know what you’re doing, because you might miss important error messages about your input files.

Messages that force GREP to stop execution — user alerts, failure messages, and insufficient memory — are always displayed. The /Q option also has no effect on debug output (/D option).

All messages are listed later in this manual.

For compatibility with earlier releases of GREP, if you specify a plain /Q option with no level number, it means /Q3 (suppress all warnings). After any /Q option with or without a number, a plain /Q acts like /Q0 to re-enable all messages.

See also: Several of the output options act to reduce GREP’s actual output, as opposed to messages. See the /H option, the /J option, the /K option, the /C option, and the /L option.

/Z — Reset All Options

If you use the /Z option on the command line, any options in the environment variable are disregarded, and so are any preceding options on the command line. I recommend putting /Z as the first option on every GREP command in a batch file. This makes sure that GREP behaves as expected, uninfluenced by any settings in the environment variable.

The /Z option is the only single-letter option whose effect can’t be reversed. If you use /Z more than once, GREP disregards the environment variable and all command-line options up through the last /Z.

/0 or /1 — Set ERRORLEVEL to Show Whether Matches Were Found

These options control the values that GREP returns to the command shell. /0 returns 0 if there are matches or 1 if there are no matches; /1 returns 1 for matches or 0 for no matches. For more details and interactions with the /V option, see Return Values (ERRORLEVEL).

Be careful to distinguish the zero /0 option from the letter /o option.

/3 — Set ERRORLEVEL to 3 for Warnings

By default, GREP considers a run “successful” if there were no error messages, whether or not there were any warnings. In other words, the value of ERRORLEVEL doesn’t reflect any warning messages.

If you’d like to test for a “partially successful” run of GREP, meaning one where there were warnings but no errors, specify the /3 option. If you do this, then if there were warnings GREP will set ERRORLEVEL to 3 rather than the usual 0 or 1 that would indicate success.

The /3 option is a toggle. If you specify it twice (counting any uses in the environment variable and on the command line), the second occurrence cancels the first. If you don't know what's in the environment variable and definitely want to turn this option on, use /3+.

/? — Display Help

GREP displays a help message and summary of input filespecs, options, and regex forms, then exits with no further processing.

Since the help message is more than 100 lines long, you probably want to pipe it through more or a similar filter, like this:

        grep /? | more 

You can also redirect this information. For instance,

        grep /? >grephelp.txt 

sends the help text to a text file so that you can print it or view it in an editor window.

6. Messages

Because this program helps you,
please click to donate!
Because this program helps you,
please donate at
BrownMath.com/donate.

This section lists the error, warning, and information messages and prompts produced by GREP, with explanations for most of them. Only debug messages (/D option) are omitted.

All messages listed here are written to the standard error stream. If you redirect GREP's output, the redirected output will contain only the hits found by GREP (or the counts per file, if you used the /C option).

Contents:

6.1  Failure Messages

Any message that begins “GREP failure” indicates that GREP failed. While this might be a problem in your operating system, it could also be a problem in the code of GREP itself. If you suspect the latter, please send full details to BrownMath.com. If possible, first re-run the program with the /D- option and redirect output with >file; then send that output file with your trouble report.

With most of these errors, GREP returns 128 in ERRORLEVEL. Exceptions are noted in the description.

grep failure: expression length > n in expand_char_class

Your [ ] character class expands into too many characters for a basic regex. You can probably complete your task by using the /E2 option to specify an extended regex. Please consult the description of differences between basic and extended regexes.

grep failure: expression too complex in expand_char_class

See the following explanation.

grep failure: expression too complex in makepat

GREP could not parse your basic regex because it was too complicated. You may be able to complete your task by using the /E2 option to specify an extended regex. Please consult the description of differences between basic and extended regexes.

grep failure: internal error in expand_char_class

GREP got into a fugue state over your character class [ ] in a basic regex. Please report this problem to the address above.

You can probably complete your task by using the /E2 option to specify an extended regex. Please consult the description of differences between basic and extended regexes.

grep failure: internal error (pattern) in skip_match

Two lists in GREP of the constructs allowed in a basic regex are out of sync. Please report this to the address indicated above.

grep failure: internal error (pattern) in skip_pat

Two lists in GREP of the constructs allowed in a basic regex are out of sync. Please report this to the address indicated above.

grep failure: no read function in do_stream

Two lists in GREP of the file read modes are out of sync. Please report this error to the address indicated above.

6.2  Insufficient Memory

grep: insufficient memory ...

GREP couldn’t allocate enough memory from the heap, and returns 253 in ERRORLEVEL. You might try one or more of these general suggestions:

6.3  User Alert Messages

These messages all indicate something you did that prevents GREP from finishing its task. (Most programs would call them “fatal errors.”) Except as noted, GREP returns 255 in ERRORLEVEL after any of these.

grep user alert: bad chars or no value in /x option

The option requires a following number, but either you didn’t give one or you included forbidden characters. (For instance, some options allow negative numbers and others don’t.) Please consult the description of the option.

grep user alert: bad token 'token' in environment variable ORS_GREP -- options must start with - or /

You can’t store a regular expression or input filespecs in the environment variable, only options. Type

            echo %ORS_GREP%

to see the contents of the variable. (You can store a regex or input filespecs in a file and reference them in the environment variable with the /F option or /@ option respectively.)

grep user alert: can't open debug file file for append

You specified a file with the /D option, but it can’t be opened for output. Check whether your disk is full or write protected, or the file is in use by another process. GREP returns 254 in ERRORLEVEL.

grep user alert: can't open file file to read input filespecs

The file you specified with the /@ option either doesn’t exist or can’t be opened for reading. GREP returns 254 in ERRORLEVEL.

grep user alert: can't open file file to read regular expressions

The file you specified with the /F option either doesn’t exist or can’t be opened for reading. GREP returns 254 in ERRORLEVEL.

grep user alert: characters out of order in regex

You used a character range (- between square brackets), but the characters were out of order. For instance, [a-Z] is an error because Z (ASCII 90) precedes a (ASCII 97) in the collating sequence.

grep user alert: character class 'class' never ended

Your regex specified a [ to begin a character class, but there was no ] or the only ] character immediately followed the [ or [^.

To search for an actual left square bracket character, you need to precede it with a backslash \[ or else use [[] to make it a class all its own.

grep user alert: empty character class

This error message is obsolete since [] now means a character class including a ] character, not an empty class. If it occurs, please report this error to the address indicated above.

grep user alert: error at offset n of extended regex: details

Offset 0 is the first character of the extended regex.

grep user alert: for /E2 to /E4 you need 32-bit GREP

The /E2 option specifies extended regexes, but these are not supported in GREP16. Either remove the /E2 or E3 option, or use GREP32.

grep user alert: invalid option 'x'

For a quick list of options, please type grep | more or look at the GREP Quick Reference Card or the GREP Quick Start Guide. You’ll find the full descriptions in this manual.

grep user alert: line n of listfile file exceeds limit chars

Each input filespec read from file (/@ option) is limited to the longest path and filename allowed by the Microsoft run-time code. The limit is 128 characters in GREP16 and 260 in GREP32.

grep user alert: line n of regex file file exceeds 127 chars

Each regex read from file (/F option) is limited to 127 characters, even if you are reading extended regexes. This limit could be increased in a future release if it’s too restrictive.

grep user alert: malformed filespec

You specified an improper character class […] in filename globbing (wildcard expansion).

grep user alert: malformed /X pattern

Your exclusion pattern was not valid. The error message gives details of the problem, or you can consult the description of the /X option.

grep user alert: no input filespecs found in file

You specified the /@ option to submit a list of input filespecs in a file, but that file was empty.

grep user alert: no more than one @ option is allowed
grep user alert: no more than one F option is allowed
grep user alert: no regex was specified

You didn’t specify a regex on the command line, and either

grep user alert: nothing on command line

You didn’t specify options, a regex, or any files. If you were trying to generate the help message, use

            grep /? | more
grep user alert: pattern exceeds the limit of 127 characters

A basic regex can’t be longer than 127 characters. If this limit is truly a problem, it could be increased in a future release. In the meantime, you may be able to complete your task by using the /E2 option to specify an extended regex. Please consult the description of differences between basic and extended regexes.

grep user alert: /R1 is reserved for future use.
grep user alert: read error in filespec

After initially opening the named input file for reading, GREP received an error from the operating system when it tried to read another line. Perhaps some other process deleted the file while GREP had it open, or the disk drive became unavailable. GREP returns 254 in ERRORLEVEL.

grep user alert: regex can't contain \0; use [^\1-\255]

While the NUL character (ASCII 0) causes no problem in an input file, it signals the end of a basic regex. Either use the workaround suggested by the message, or use an extended regex (/E2 option).

grep user alert: search string too long in makestr

With the /E0 option, your search string must be 511 characters or less. You may be able to complete your task by using the /E2 option to specify an extended regex, but you may need to put backslashes before certain characters. Please consult the description of extended regexes.

grep user alert: second argument to M option was not recognized

The second argument to the /M option allows only specified strings.

grep user alert: the @ option requires a filespec or a hyphen

Please see the description of the /@ option.

grep user alert: the F option requires a filespec or a hyphen

Please see the description of the /F option.

grep user alert: the V option is incompatible with J

The /J option displays only matches, not the full line, record, or buffer containing them. The /V option displays lines, records, or buffers that don’t contain a match. Remove one of the options and run GREP again.

grep user alert: the X option requires a pattern

Please see the description of the /X option.

grep user alert: the X option pattern must not include a path

Exclusion patterns are tested only against the filename and extension, not the path. Please see the description of the /X option.

grep user alert: the Y option is incompatible with J

The /J option displays only matches, not the full line, record, or buffer containing them. The /Y option displays lines, records, or buffers that match every one of multiple regexes. Remove one of the options and run GREP again.

grep user alert: unsupported locale x in M option

See the description of the /M option for the supported locales. Some additional locales are supported, but if you look at Microsoft’s documentation you’ll see some locales listed that are not actually supported in the run-time library.

grep user alert: value out of range (must be min to max) for option

You specified a numeric value that is not allowed for the option. Please check the option description in this manual. Note that GREP16 and GREP32 have different valid ranges for some options.

6.4  Warning Messages

You can suppress most of these warnings with the /Q option as indicated in the message description.

grep warning: binary buffer width was increased to bnwid bytes

If you specify the /R3 or /R-1 or /R-2 option, GREP needs the binary buffer to be an even number of bytes. If you specify an odd number of bytes in the /W option, GREP adjusts it to the next higher even number.

(warning suppressed by /Q2 or higher)

grep warning: line n of listfile file is empty -- ignored

Blank lines are ignored when reading input filespecs from file (/@ option).

(warning suppressed by /Q2 or higher)

grep warning: line n of regex file file is empty -- ignored.

Blank lines are ignored when reading regexes from file (/F option).

(warning suppressed by /Q2 or higher)

grep warning: no files exist like filespec

At the end of execution, GREP checks whether it opened at least one file for each filespec on the command line. It displays this warning for each filespec that didn’t match any actual files. If you have the /S option in effect for subdirectory searches, this warning appears for each filespec when not even one directory contains a file that matches the filespec.

(GREP performs a similar diagnosis for each filespec while reading a list file; see the /@ option.)

If you used the /X option, GREP adds the reminder “Maybe your /X exclusions ruled out matching files?” See Missing Files.

(warning suppressed by /Q3)

grep warning: no files were found for any of your input filespecs

GREP displays this warning for each input filespec that doesn’t lead to opening any files. (No warning is displayed for files that exist but contain no hits.) If you have multiple filespecs on the command line or you used the /@ option, and if none on your input filespecs actually led to opening any files, GREP displays this additional warning.

If no files are found for any of your input filespecs, and there are no more serious problems, GREP returns 4 in ERRORLEVEL whether or not this warning is displayed.

(warning suppressed by /Q2 or higher)

grep warning: redirected input (<file) is ignored with named input files

You specified some input files on the command line, but you also redirected input and you weren’t using the redirected input for a list of regexes with the /F- option or a list of input files with the /@ option.

(warning suppressed by /Q3)

grep warning: short record in file file -- expected bnwid bytes, got n. Did you specify the /W option correctly?

With fixed-length binary records (/R2 option), the file size should be an exact multiple of the record size that you specify with the /W option. A partial record at the end means either that you specified the record size incorrectly, or that you meant to use the /R3 option to read free-form binary.

(warning suppressed by /Q3)

grep warning: Some matches in the middle of long lines may have been missed. You might want to try the /Wn option, or drop the /G0 option.

When reading text files, GREP release 7.5 and later automatically expand the read buffer to accommodate the longest line in the input. You’ll never see the above message unless you use the /G0 option and have long lines in your input.

With the /G0 option, GREP uses a fixed-size buffer for text files and breaks long lines in the middle. (See the /W option.) In this case, GREP keeps track of every line that is longer than your stated maximum. At the end of the run, it gives you this warning and suggests the value needed with /W to solve the problem.

You should either re-run GREP with the suggested /W option value, or drop the /G0 option, to make sure you don’t miss any matches. If you want to know which files have the oversize lines, use the /D option.

If the /G0 option is in your environment variable, you can override it by putting the /G1 option on the command line.

(warning suppressed by /Q3)

grep warning: Special Rules for the Command Line don't apply with F option

The Special Rules are a set of hacks to let you get various special characters, which have special meaning to the command prompt, into a regex. (The Special Rules, turned on by default, can be turned on or off with the /E option.) When you use the /F option to enter regexes from file or keyboard, there is no need for those hacks and they are not applied.

(warning suppressed by /Q2 or higher)

grep warning: second argument to M option requires E2 or greater

The long form of the /M option lets you redefine a “word” character for purposes of extended regexes (/E2 option). That has no effect with simple searches or basic regexes (/E0 or /E1 option).

(warning suppressed by /Q2 or higher)

grep warning: the A option is ignored when scanning only standard input

The /A option says to include hidden and system files when expanding wildcard filespecs, but that doesn’t make any sense when no input files were named. If you didn’t specify any input files on the command line or via the /@ option, the /A option is ignored.

(warning suppressed by /Q2 or higher)

grep warning: the B option is ignored when L is set

The /B option shows the name of every file read, whether or not it contains any hits. But the /L option shows only the names of files that contain hits. If you specify both options, the /L option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the B option is ignored when U is set

The /B option shows the filespec of every file read, whether or not it contains any hits, on a separate header line. But the /U option shows hits in UNIX style, with the filespec on every line. If you specify both options, the /U option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the E3 option is treated as E2 plus J1

GREP release 6.0 added the ability to display matches without the lines, records, or buffers that contained them (like the present /J1 option), but only when you specified extended regexes (like the present /E2 option). That combination was specified by /E3. In the next release the /J option was added, independent of your choice of basic or extended regex, and /E3 became obsolete; then in release 7.4 /J became /J1. /E3 is still honored for users who may have embedded it in batch files or their environment variable.

(warning suppressed by /Q2 or higher)

grep warning: the G option is ignored with /R2 or /R3

The /R2 or /R3 option tells GREP to read all files as binary. With that setting, the /G option, which tells GREP how to handle text lines, is useless.

(warning suppressed by /Q2 or higher)

grep warning: the H option is ignored when B is set

The /B option shows a file header for every file examined, whether or not it contains any hits. The /H option suppresses all file headers. If you specify both options, the /B option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the H option is ignored when C is set

The /C option shows the count of hits with every file header (and doesn’t show the actual hits), but the /H option suppresses all filespec headers. If you specify both options, the /C option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the H option is ignored when L is set

The /L option shows the names of files that contain hits (and doesn’t show the actual hits), but the /H option suppresses all filespec headers. If you specify both options, the /L option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the H option is ignored when U is set

The /U option shows matches in UNIX style, putting the filename on every line instead of displaying filename headers. The /H option suppresses filename headers, and therefore it is included in the action of the /U option.

(warning suppressed by /Q2 or higher)

grep warning: the J option is ignored when L is set

The /L option shows only the filespecs of files that contain matches, but the /J option shows actual matches. If you specify both options, the /L option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the J1 option is ignored when C is set

The /C option shows the count of hits with every file header (and doesn’t show the actual hits). The /J1 option shows actual matches (though not the lines, records, or buffers that contain them). If you specify both options, the /C option is honored.

Notice that this message mentions the J1 option specifically. You can use /J2 /C or /J3 /C to count individual matches rather than the number of lines that contain matches.

(warning suppressed by /Q2 or higher)

grep warning: the K option is ignored when L is set

The /K option displays a set maximum number of hits per file. The /L option stops and reports the file name as soon as it finds one hit. If you specify /L, that option is honored and /K is ignored.

(warning suppressed by /Q2 or higher)

grep warning: the L option is ignored when C is set

The /C option shows the filespecs of files that contain hits, with the count of hits in each file, but the /L option shows only the filespecs without the count of hits. If you specify both options, the /C option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the M option is available only in 32-bit GREP

Microsoft’s 16-bit run-time code doesn’t support locale settings, which are required to implement the /M option.

(warning suppressed by /Q3)

grep warning: the N option is ignored when C is set

The /C option shows the count of hits with every file header (and doesn’t show the actual hits), but the /N option shows line numbers with the hits. If you specify both options, the /C option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the N option is ignored when L is set

The /L option shows the filespecs of files that contain hits (not the actual hits), but the /N option shows line numbers with the hits. If you specify both options, the /L option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the O option is ignored when C is set
grep warning: the O option is ignored when L is set

The /C option shows the filespecs of files that contain hits, with the count of hits in each file, and the /L option shows only the filespecs without the count of hits. The letter /o option specifies the format of output lines, but with the /C option or /L option there aren’t any output lines.

(warning suppressed by /Q2 or higher)

grep warning: the P option is ignored when C, J, or L is set

The /P option displays context lines or records around every line or record that contains a match. The /C option, /J option, and /L option all display abbreviated information instead of the actual lines or records that contain matches. If you specify the /P option with any of the others, the other option is honored.

Exception: When the /R3 option and the /P option are both set, then the /P option specifies context bytes rather than lines, and the /J option is not ignored but required.

(warning suppressed by /Q2 or higher)

grep warning: the P option is ignored when /R3 is set but /J isn't

When you specify free-form binary (the /R3 option), the /P option refers to context bytes rather than context lines. But without the /J option, you get the whole buffer anyway. The /P option doesn’t make any sense in this case, and it’s ignored. Probably you want to run GREP again and specify a /J option.

(warning suppressed by /Q2 or higher)

grep warning: the P option is ignored with /Gmode

The /G2 option tells GREP to read and search paragraphs rather than lines. This is a more flexible way to provide context, and therefore the /P option (display context lines or records) isn’t needed with /G2.

(warning suppressed by /Q2 or higher)

grep warning: the R option applies only to named files, not standard input

You didn’t specify any filespecs on the command line or via the /@ option, but you used the /R option to specify some file format other than text. GREP always reads redirected input files (<file) and keyboard input as text.

(warning suppressed by /Q3)

grep warning: the S option is ignored when scanning only standard input

The /S option says to search subdirectories for the named files, but that can’t be done when GREP is reading only standard input because no input files were named on the command line or via the /@ option.

(warning suppressed by /Q2 or higher)

grep warning: the U option is ignored when L is set

The /L option shows the names of files that contain hits (not the actual hits), but the /U option shows hits in UNIX format (with the filespec on each line). If you specify both options, the /L option is honored.

(warning suppressed by /Q2 or higher)

grep warning: the X option is ignored when reading only standard input

You didn’t specify any filespecs on the command line or via the /@ option, but you used the /X option on the command line to exclude certain filespecs. When input is from standard input, the /X option is ignored. It is ignored silently if the /X options are in the environment variable but not on the command line.

(warning suppressed by /Q2 or higher)

grep warning: the Y option is ignored unless you read regexes from file

The /Y option says that a hit must match all of the (multiple) regexes. But you can specify only one regex on the command line. Use the /F option to specify multiple regexes in a file or from the keyboard.

(warning suppressed by /Q2 or higher)

grep warning: the Y option means nothing when there's only one regex

The /Y option says that a hit must match all of the regexes. It has no effect if you enter only one regex via the keyboard or regex file.

(warning suppressed by /Q2 or higher)

grep warning: /V /R3 will probably produce useless results without /L

The /R3 option reads files as free-form binary, and the /V option displays buffers that don’t contain matches. Probably what you want to know is which files don’t contain matches. To do this, run GREP again and add the /L option on the command line.

(warning suppressed by /Q3)

6.5  Information Messages

grep16 8.01  Copyright 1986-2021 by Stan Brown, https://BrownMath.com
grep32 8.01  Copyright 1986-2021 by Stan Brown, https://BrownMath.com

This is the program logo for GREP16 or GREP32. (In December 2021, GREP moved from OakRoadSystems.com to my other site, BrownMath.com.)

(logo suppressed by /Q1 or higher)

grep: found count matches in count files of count examined

GREP displays this summary information at the end of a run, after examining all files. The number of files examined does not include any files excluded by the /X option.

If you specified the /L option, GREP omits the number of matches and displays only the file counts.

(message suppressed by /Q1 or higher)

6.6  Prompts

If you see any of these messages, GREP is waiting for you to provide input.

filespec (n chars max):

You specified the /@- option (without redirection) to read input filespecs from the keyboard. GREP is ready for you to type the next one. If you have no more filespecs to enter, press Control-Z immediately after this prompt. (With GREP16, you need to press Enter after Control-Z.)

The longest allowed filespec is 128 characters in GREP16, 260 characters in GREP32.

line to test (n chars max):

You didn’t specify any input filespecs, and you didn’t redirect input from a file (<file) or pipe it from another command (other-command|grep). This can be a good way to explore the effects of certain regexes. After parsing the command line, GREP takes input lines from the keyboard and tests them against the regex(es). Lines are limited to the width specified with the /W option.

Only lines that contain a match are echoed to the output. (If you set the /V option, only lines that don’t contain a match are echoed.)

Press Control-Z immediately after this prompt to end the GREP run. (With GREP16, you need to press Enter after Control-Z.)

regex (127 chars max):

You specified the /F- option (without redirection) to read regular expressions from the keyboard. GREP is ready for you to type the next one. If you have no more regexes to enter, press Control-Z immediately after this prompt. (With GREP16, you need to press Enter after Control-Z.)

type filespecs for GREP to scan, one per line. When finished, type Control-Z alone on a line.

You specified the /@- option (without redirection) to read input filespecs from the keyboard. GREP has finished parsing the command line and is ready for you to type them in.

type lines to be tested. When finished, type Control-Z alone on a line. GREP will echo lines that contain a match.

If you simply forgot to specify inputs or redirection, type Control-Z right away. Otherwise please see “line to test:” above.

With the /V option, the prompt changes to “… lines that don’t contain a match.”

type regular expressions, one per line. When finished, type Control-Z alone on a line.

You specified the /F- option (without redirection) to read regular expressions from the keyboard. GREP has finished parsing the command line and is ready for you to type them in.

7. Troubleshooting and How-to

Please share any questions that had you scratching your head. They'll be added to a future version of this manual, space permitting.

Contents:

7.1  Regex Matching Problems

GREP is missing matches in my Word or Word Perfect files, even though I know they’re in there!

Binary files, including most word-processing files, may contain ASCII 26 (Control-Z) characters. These have no special meaning in a binary file but signal the end of a file being read as text. To read such files, use the /R3 option. Better yet, you can use the /R-1 or /R-2 option and let GREP figure out the type of each file automatically.

How do I show all matches, not just all lines that contain a match?

Use the /J2 option.

How do I search for a word? For example, how do I get “plain” without also getting lines with “explain”, “plains”, etc.?

GREP searches for lines that contain the string of characters represented by your regex. If you want that string of characters only when it’s a whole word, you have to tell GREP.

With GREP32, the /E4 option makes this task easy. For example,

        grep plain /e4 file1 file2 

finds “plain” as a word. Note that the definition of “word” includes letters, digits, and the underscore. For searching most text that doesn’t matter, but if your input contains something like “plain55” you might want to define “word” to be just letters, or to be any printing character. See the /M option.

With GREP16, the task can still be done but it’s less convenient. For techniques to find a single word with basic regexes, please see Finding a Word.

How do I find all lines that contain “this” but not “that”?

Use GREP as a filter and execute it twice, the first time to find all lines that contain “this” and the second time with the /V option to filter out any lines that contain “that”:

        grep this files … | grep /v that 

How do I find all files that contain “this” and “that”?

Do you want “this” and “that” on the same line and in that order? Use the regex this.*that on the command line.

Do you want files that contain “this” and “that” on the same line in either order? use the /F option to enter the two regexes and the /Y option to make the AND condition:

        grep /F- /Y files

and then when prompted enter these two lines:

        this
        that

and Control-Z at the third prompt.

Do you want files that contain “this” and “that” anywhere in the same file, not just on the same line? Use two grep calls connected with the “|” pipe character. You’ll find an example with the /L option.

I put quotes in my regex, but GREP isn’t matching correctly on them.

Command-line processors like the Windows command prompt, and also the C run-time library, treat quotes specially before GREP gets to see them. In most cases, you can have GREP treat the quotes as you intended by putting backslashes (\) in front of them. For example, if your command is

        grep "this" files …

then GREP searches for the four-character string this. GREP never sees the quotes because the operating system or the C run-time library has stripped them off. In most operating systems, the command

        grep \"this\" files …

searches for the six-character string "this". For further information, please see Quotes in a Regex.

Tip: when you’re not sure what’s going on, use GREP’s /D option to display the regex as GREP sees and interprets it:

        grep regex inputfiles /D- | grep "grep G[CR]:"

I’ve got a bunch of backslashes in my regex, and I don’t think GREP is interpreting it the way I want.

Tip: use the /D option to reveal what GREP is doing with your regex. The output can voluminous, but you can cut it down to size. Repeat your command with this added at the end:

        /D-|grep /P0,5 "grep GR:" 

You’ll see only the interpretation of the regex with most irrelevant information suppressed.

If the displayed original regex is different from what you typed, then either Windows or the C startup code has altered some of your characters. Use the /F- option and enter your regex from the keyboard, or see Special Rules for the Command Line.

If you see a line about a “massaged” regex, you’re probably running afoul of the Special Rules for the Command Line. Try entering your regex from keyboard or file with the /F option, or keep the regex on the command line but use the /E option to turn off those special rules.

Other possibilities: check whether you entered extended regex characters but didn’t specify the /E2 option to tell GREP you’re using extended regexes.

I’m trying to GREP for a character like (, ?, or {, but it doesn’t work.

These have special meanings in extended regular expressions but not in basic regexes. Make sure you haven’t turned on extended regexes; or use a backslash \ to make GREP match them as normal characters.

GREPping on a word boundary with \< and \> doesn’t work.

My subpattern with \( doesn’t work!

\| doesn’t work for alternatives!

With extended regular expressions (/E2 option), GREP uses Perl-style regexes: \b for a word boundary, ( ) for subexpressions, and plain | for alternatives.

With basic regular expressions (/E1 option, or no /E option), a word boundary can’t be used directly. However, you can still search for whole words; see Finding a Word.

\w, [:alpha:], and similar only take account of English letters. I need to work with 8-bit letters.

In GREP32, use the /M option to select an appropriate character mapping.

In GREP16, your only choice is to code the extra letters explicitly as shown in the character range example.

(Backslash for Character Types (extended regex) describes /w and similar assertions. Character Class Names (extended regex) describes [:alpha:] and other character class names.)

I used the -w option to find a word, but it didn’t work.

GREP32 uses the /E4 option to search for a regex as a stand-alone word. GREP16 users need to use the techniques shown in Finding a Word.

When I enter a character like é in my regex, the search doesn’t seem to work.

This is a problem (in GREP32 only) with how Microsoft’s startup code processes the command line. Here are three ways to get around this problem:

7.2  General Problems

What does this error or warning message mean?

Please find the message in this manual, where you should see a problem description and what you can do about it. Please let me know if the explanation is inadequate or could be improved in any way,

I got the message “insufficient memory”.

For what you can do if this occurs, see grep: insufficient memory.

I put * on the command line, but 16-bit GREP searched every file.

This is a change between releases 6.9 and 7.0. GREP16 and GREP32 now follow identical wildcard rules, and * now means “all files” in GREP16 as it always has in GREP32. If you want files with no extension, *. does the trick.

Redirecting output with >outputfile makes GREP hang up!

You’re doing something like

        grep abcde * >myout

Typically, the file myout is opened before GREP finishes reading the directory to expand that * input filespec. If GREP finds any matches, it writes them to file myout; but then when it comes to file myout in the directory it starts reading it. Finding the matches that it previously wrote, it adds them to the end of the file. The read pointer never catches up with the write pointer, and therefore GREP hangs.

The bad news is that there’s nothing in the C library to let GREP detect this problem and automatically skip reading file myout. The good news is that you can do something about it. Use the /X option to exclude file myout from reading:

        grep abcde * /Xmyout >myout

I typed my GREP command and hit the Enter key, and it just sat there.

Did GREP prompt you for keyboard input? You can halt it by pressing Control-Z then Enter.

Are you piping GREP output ( | ) to MORE or another command? No output appears until GREP has scanned all the files and the second command has done its work.

Is the disk light on your computer flashing? GREP is reading lots of input but not finding any hits.

Did you enter an extended regex with the | character? Windows interprets that character as a pipe, so it’s waiting for GREP to finish and then Windows will run GREP’s output through the second command. Press Control-Z to end GREP. Some systems, like 4DOS and TCC, accept the | if you enclose the whole regex in double quotes "…". Otherwise, use the /F- option and enter your regex from the keyboard; or see Backslash for Character Encoding (extended regex) or Special Rules for the Command Line.

I used the -r option, but GREP won’t scan files in subdirectories.

You need the -s option for subdirectories, not the -r option. GREP diverges from UNIX in this respect.

8. Wish List

Release 8.0 will be the last one for new features in this code base of GREP. Since years have passed since its release, it’s unlikely that even a bug-fix version will ever be needed.

One day, if I can ever find the time, I’ll rewrite GREP as a Windows program, with support for Unicode and ZIP files. You’d also be able to put options, regexes, and input filespecs together in a job file.

In addition, the new GREP should probably offer at least some of these enhancements, which users have requested:

Your comments and questions are welcome.

What’s New in This Reference Manual?

Full Table of Contents

Because this program helps you,
please click to donate!
Because this program helps you,
please donate at
BrownMath.com/donate.

Updates and new info: https://BrownMath.com/utils/

Site Map | Searches | Home Page | Contact