1PCRE2TEST(1)                General Commands Manual               PCRE2TEST(1)
2
3
4
5NAME
6       pcre2test - a program for testing Perl-compatible regular expressions.
7
8SYNOPSIS
9
10       pcre2test [options] [input file [output file]]
11
12       pcre2test is a test program for the PCRE2 regular expression libraries,
13       but it can also be used for  experimenting  with  regular  expressions.
14       This  document  describes the features of the test program; for details
15       of the regular expressions themselves, see the pcre2pattern  documenta-
16       tion.  For  details  of  the  PCRE2  library  function  calls and their
17       options, see the pcre2api documentation.
18
19       The input for pcre2test is a sequence of  regular  expression  patterns
20       and  subject  strings  to  be matched. There are also command lines for
21       setting defaults and controlling some special actions. The output shows
22       the  result  of  each  match attempt. Modifiers on external or internal
23       command lines, the patterns, and the subject lines specify PCRE2  func-
24       tion  options, control how the subject is processed, and what output is
25       produced.
26
27       As the original fairly simple PCRE library evolved,  it  acquired  many
28       different  features,  and  as  a  result, the original pcretest program
29       ended up with a lot of options in a messy, arcane  syntax  for  testing
30       all the features. The move to the new PCRE2 API provided an opportunity
31       to re-implement the test program as pcre2test, with a cleaner  modifier
32       syntax.  Nevertheless,  there are still many obscure modifiers, some of
33       which are specifically designed for use in conjunction  with  the  test
34       script  and  data  files that are distributed as part of PCRE2. All the
35       modifiers are documented here, some  without  much  justification,  but
36       many  of  them  are  unlikely  to  be  of  use  except when testing the
37       libraries.
38
39
40PCRE2's 8-BIT, 16-BIT AND 32-BIT LIBRARIES
41
42       Different versions of the PCRE2 library can be built to support charac-
43       ter  strings  that  are encoded in 8-bit, 16-bit, or 32-bit code units.
44       One, two, or  all  three  of  these  libraries  may  be  simultaneously
45       installed. The pcre2test program can be used to test all the libraries.
46       However, its own input and output are  always  in  8-bit  format.  When
47       testing  the  16-bit  or 32-bit libraries, patterns and subject strings
48       are converted to 16-bit or 32-bit format before  being  passed  to  the
49       library  functions.  Results are converted back to 8-bit code units for
50       output.
51
52       In the rest of this document, the names of library functions and struc-
53       tures  are  given  in  generic  form,  for example, pcre_compile(). The
54       actual names used in the libraries have a suffix _8, _16,  or  _32,  as
55       appropriate.
56
57
58INPUT ENCODING
59
60       Input  to  pcre2test is processed line by line, either by calling the C
61       library's fgets() function, or via the  libreadline  library.  In  some
62       Windows  environments  character 26 (hex 1A) causes an immediate end of
63       file, and no further data is read, so this character should be  avoided
64       unless you really want that action.
65
66       The  input  is  processed using using C's string functions, so must not
67       contain binary zeros, even though in  Unix-like  environments,  fgets()
68       treats  any  bytes  other  than newline as data characters. An error is
69       generated if a binary zero is encountered. By default subject lines are
70       processed for backslash escapes, which makes it possible to include any
71       data value in strings that are passed to the library for matching.  For
72       patterns,  there  is a facility for specifying some or all of the 8-bit
73       input characters as hexadecimal  pairs,  which  makes  it  possible  to
74       include binary zeros.
75
76   Input for the 16-bit and 32-bit libraries
77
78       When testing the 16-bit or 32-bit libraries, there is a need to be able
79       to generate character code points greater than 255 in the strings  that
80       are  passed to the library. For subject lines, backslash escapes can be
81       used. In addition, when the  utf  modifier  (see  "Setting  compilation
82       options" below) is set, the pattern and any following subject lines are
83       interpreted as UTF-8 strings and translated  to  UTF-16  or  UTF-32  as
84       appropriate.
85
86       For  non-UTF testing of wide characters, the utf8_input modifier can be
87       used. This is mutually exclusive with  utf,  and  is  allowed  only  in
88       16-bit  or  32-bit  mode.  It  causes the pattern and following subject
89       lines to be treated as UTF-8 according to the original definition  (RFC
90       2279), which allows for character values up to 0x7fffffff. Each charac-
91       ter is placed in one 16-bit or 32-bit code unit (in  the  16-bit  case,
92       values greater than 0xffff cause an error to occur).
93
94       UTF-8  (in  its  original definition) is not capable of encoding values
95       greater than 0x7fffffff, but such values can be handled by  the  32-bit
96       library. When testing this library in non-UTF mode with utf8_input set,
97       if any character is preceded by the byte 0xff (which is an invalid byte
98       in  UTF-8)  0x80000000  is  added to the character's value. This is the
99       only way of passing such code points in a pattern string.  For  subject
100       strings, using an escape sequence is preferable.
101
102
103COMMAND LINE OPTIONS
104
105       -8        If the 8-bit library has been built, this option causes it to
106                 be used (this is the default). If the 8-bit library  has  not
107                 been built, this option causes an error.
108
109       -16       If  the  16-bit library has been built, this option causes it
110                 to be used. If only the 16-bit library has been  built,  this
111                 is  the  default.  If  the 16-bit library has not been built,
112                 this option causes an error.
113
114       -32       If the 32-bit library has been built, this option  causes  it
115                 to  be  used. If only the 32-bit library has been built, this
116                 is the default. If the 32-bit library  has  not  been  built,
117                 this option causes an error.
118
119       -ac       Behave as if each pattern has the auto_callout modifier, that
120                 is, insert automatic callouts into every pattern that is com-
121                 piled.
122
123       -AC       As  for  -ac,  but in addition behave as if each subject line
124                 has the callout_extra  modifier,  that  is,  show  additional
125                 information from callouts.
126
127       -b        Behave  as  if each pattern has the fullbincode modifier; the
128                 full internal binary form of the pattern is output after com-
129                 pilation.
130
131       -C        Output  the  version  number  of  the  PCRE2 library, and all
132                 available information about the optional  features  that  are
133                 included,  and  then  exit  with  zero  exit  code. All other
134                 options are ignored. If both -C and -LM are  present,  which-
135                 ever is first is recognized.
136
137       -C option Output  information  about a specific build-time option, then
138                 exit. This functionality is intended for use in scripts  such
139                 as  RunTest.  The  following options output the value and set
140                 the exit code as indicated:
141
142                   ebcdic-nl  the code for LF (= NL) in an EBCDIC environment:
143                                0x15 or 0x25
144                                0 if used in an ASCII environment
145                                exit code is always 0
146                   linksize   the configured internal link size (2, 3, or 4)
147                                exit code is set to the link size
148                   newline    the default newline setting:
149                                CR, LF, CRLF, ANYCRLF, ANY, or NUL
150                                exit code is always 0
151                   bsr        the default setting for what \R matches:
152                                ANYCRLF or ANY
153                                exit code is always 0
154
155                 The following options output 1 for true or 0 for  false,  and
156                 set the exit code to the same value:
157
158                   backslash-C  \C is supported (not locked out)
159                   ebcdic       compiled for an EBCDIC environment
160                   jit          just-in-time support is available
161                   pcre2-16     the 16-bit library was built
162                   pcre2-32     the 32-bit library was built
163                   pcre2-8      the 8-bit library was built
164                   unicode      Unicode support is available
165
166                 If  an  unknown  option is given, an error message is output;
167                 the exit code is 0.
168
169       -d        Behave as if each pattern has the debug modifier; the  inter-
170                 nal form and information about the compiled pattern is output
171                 after compilation; -d is equivalent to -b -i.
172
173       -dfa      Behave as if each subject line has the dfa modifier; matching
174                 is  done  using the pcre2_dfa_match() function instead of the
175                 default pcre2_match().
176
177       -error number[,number,...]
178                 Call pcre2_get_error_message() for each of the error  numbers
179                 in  the  comma-separated list, display the resulting messages
180                 on the standard output, then exit with zero  exit  code.  The
181                 numbers  may  be  positive or negative. This is a convenience
182                 facility for PCRE2 maintainers.
183
184       -help     Output a brief summary these options and then exit.
185
186       -i        Behave as if each pattern has the info modifier;  information
187                 about the compiled pattern is given after compilation.
188
189       -jit      Behave  as  if  each pattern line has the jit modifier; after
190                 successful compilation, each pattern is passed to  the  just-
191                 in-time compiler, if available.
192
193       -jitverify
194                 Behave  as  if  each pattern line has the jitverify modifier;
195                 after successful compilation, each pattern is passed  to  the
196                 just-in-time  compiler,  if  available, and the use of JIT is
197                 verified.
198
199       -LM       List modifiers: write a list of available pattern and subject
200                 modifiers  to  the  standard output, then exit with zero exit
201                 code. All other options are ignored.  If both -C and -LM  are
202                 present, whichever is first is recognized.
203
204       -pattern modifier-list
205                 Behave as if each pattern line contains the given modifiers.
206
207       -q        Do not output the version number of pcre2test at the start of
208                 execution.
209
210       -S size   On Unix-like systems, set the size of the run-time  stack  to
211                 size mebibytes (units of 1024*1024 bytes).
212
213       -subject modifier-list
214                 Behave as if each subject line contains the given modifiers.
215
216       -t        Run  each compile and match many times with a timer, and out-
217                 put the resulting times per compile or  match.  When  JIT  is
218                 used,  separate  times  are given for the initial compile and
219                 the JIT compile. You can control  the  number  of  iterations
220                 that  are used for timing by following -t with a number (as a
221                 separate item on the command line). For  example,  "-t  1000"
222                 iterates 1000 times. The default is to iterate 500,000 times.
223
224       -tm       This is like -t except that it times only the matching phase,
225                 not the compile phase.
226
227       -T -TM    These behave like -t and -tm, but in addition, at the end  of
228                 a  run, the total times for all compiles and matches are out-
229                 put.
230
231       -version  Output the PCRE2 version number and then exit.
232
233
234DESCRIPTION
235
236       If pcre2test is given two filename arguments, it reads from  the  first
237       and writes to the second. If the first name is "-", input is taken from
238       the standard input. If pcre2test is given only one argument,  it  reads
239       from that file and writes to stdout. Otherwise, it reads from stdin and
240       writes to stdout.
241
242       When pcre2test is built, a configuration option  can  specify  that  it
243       should  be linked with the libreadline or libedit library. When this is
244       done, if the input is from a terminal, it is read using the  readline()
245       function. This provides line-editing and history facilities. The output
246       from the -help option states whether or not readline() will be used.
247
248       The program handles any number of tests, each of which  consists  of  a
249       set  of input lines. Each set starts with a regular expression pattern,
250       followed by any number of subject lines to be matched against that pat-
251       tern. In between sets of test data, command lines that begin with # may
252       appear. This file format, with some restrictions, can also be processed
253       by  the perltest.sh script that is distributed with PCRE2 as a means of
254       checking that the behaviour of PCRE2 and Perl is the same. For a speci-
255       fication of perltest.sh, see the comments near its beginning.
256
257       When the input is a terminal, pcre2test prompts for each line of input,
258       using "re>" to prompt for regular expression patterns, and  "data>"  to
259       prompt  for subject lines. Command lines starting with # can be entered
260       only in response to the "re>" prompt.
261
262       Each subject line is matched separately and independently. If you  want
263       to do multi-line matches, you have to use the \n escape sequence (or \r
264       or \r\n, etc., depending on the newline setting) in a  single  line  of
265       input  to encode the newline sequences. There is no limit on the length
266       of subject lines; the input buffer is automatically extended if  it  is
267       too  small.  There  are  replication features that makes it possible to
268       generate long repetitive pattern or subject  lines  without  having  to
269       supply them explicitly.
270
271       An  empty  line  or  the end of the file signals the end of the subject
272       lines for a test, at which point a  new  pattern  or  command  line  is
273       expected if there is still input to be read.
274
275
276COMMAND LINES
277
278       In  between sets of test data, a line that begins with # is interpreted
279       as a command line. If the first character is followed by white space or
280       an  exclamation  mark,  the  line is treated as a comment, and ignored.
281       Otherwise, the following commands are recognized:
282
283         #forbid_utf
284
285       Subsequent  patterns  automatically  have   the   PCRE2_NEVER_UTF   and
286       PCRE2_NEVER_UCP  options  set, which locks out the use of the PCRE2_UTF
287       and PCRE2_UCP options and the use of (*UTF) and (*UCP) at the start  of
288       patterns.  This  command  also  forces an error if a subsequent pattern
289       contains any occurrences of \P, \p, or \X, which  are  still  supported
290       when  PCRE2_UTF  is not set, but which require Unicode property support
291       to be included in the library.
292
293       This is a trigger guard that is used in test files to ensure  that  UTF
294       or  Unicode property tests are not accidentally added to files that are
295       used when Unicode support is  not  included  in  the  library.  Setting
296       PCRE2_NEVER_UTF  and  PCRE2_NEVER_UCP as a default can also be obtained
297       by the use of #pattern; the difference is that  #forbid_utf  cannot  be
298       unset,  and the automatic options are not displayed in pattern informa-
299       tion, to avoid cluttering up test output.
300
301         #load <filename>
302
303       This command is used to load a set of precompiled patterns from a file,
304       as  described  in  the  section entitled "Saving and restoring compiled
305       patterns" below.
306
307         #newline_default [<newline-list>]
308
309       When PCRE2 is built, a default newline  convention  can  be  specified.
310       This  determines which characters and/or character pairs are recognized
311       as indicating a newline in a pattern or subject string. The default can
312       be  overridden when a pattern is compiled. The standard test files con-
313       tain tests of various newline conventions,  but  the  majority  of  the
314       tests  expect  a  single  linefeed  to  be  recognized  as a newline by
315       default. Without special action the tests would fail when PCRE2 is com-
316       piled with either CR or CRLF as the default newline.
317
318       The #newline_default command specifies a list of newline types that are
319       acceptable as the default. The types must be one of CR, LF, CRLF,  ANY-
320       CRLF, ANY, or NUL (in upper or lower case), for example:
321
322         #newline_default LF Any anyCRLF
323
324       If the default newline is in the list, this command has no effect. Oth-
325       erwise, except when testing the POSIX  API,  a  newline  modifier  that
326       specifies  the  first  newline  convention in the list (LF in the above
327       example) is added to any pattern that does not already have  a  newline
328       modifier. If the newline list is empty, the feature is turned off. This
329       command is present in a number of the standard test input files.
330
331       When the POSIX API is being tested there is  no  way  to  override  the
332       default  newline  convention,  though it is possible to set the newline
333       convention from within the pattern. A warning is given if the posix  or
334       posix_nosub  modifier is used when #newline_default would set a default
335       for the non-POSIX API.
336
337         #pattern <modifier-list>
338
339       This command sets a default modifier list that applies  to  all  subse-
340       quent patterns. Modifiers on a pattern can change these settings.
341
342         #perltest
343
344       The  appearance of this line causes all subsequent modifier settings to
345       be checked for compatibility with the perltest.sh script, which is used
346       to  confirm that Perl gives the same results as PCRE2. Also, apart from
347       comment lines, #pattern commands, and #subject  commands  that  set  or
348       unset  "mark", no command lines are permitted, because they and many of
349       the modifiers are specific to pcre2test, and should not be used in test
350       files  that  are  also  processed by perltest.sh. The #perltest command
351       helps detect tests that are accidentally put in the wrong file.
352
353         #pop [<modifiers>]
354         #popcopy [<modifiers>]
355
356       These commands are used to manipulate the stack of  compiled  patterns,
357       as  described  in  the  section entitled "Saving and restoring compiled
358       patterns" below.
359
360         #save <filename>
361
362       This command is used to save a set of compiled patterns to a  file,  as
363       described  in  the section entitled "Saving and restoring compiled pat-
364       terns" below.
365
366         #subject <modifier-list>
367
368       This command sets a default modifier list that applies  to  all  subse-
369       quent  subject lines. Modifiers on a subject line can change these set-
370       tings.
371
372
373MODIFIER SYNTAX
374
375       Modifier lists are used with both pattern and subject lines. Items in a
376       list are separated by commas followed by optional white space. Trailing
377       whitespace in a modifier list is ignored. Some modifiers may  be  given
378       for  both patterns and subject lines, whereas others are valid only for
379       one  or  the  other.  Each  modifier  has  a  long  name,  for  example
380       "anchored",  and  some of them must be followed by an equals sign and a
381       value, for example, "offset=12". Values cannot  contain  comma  charac-
382       ters,  but may contain spaces. Modifiers that do not take values may be
383       preceded by a minus sign to turn off a previous setting.
384
385       A few of the more common modifiers can also be specified as single let-
386       ters,  for  example "i" for "caseless". In documentation, following the
387       Perl convention, these are written with a slash ("the /i modifier") for
388       clarity.  Abbreviated  modifiers  must all be concatenated in the first
389       item of a modifier list. If the first item is not recognized as a  long
390       modifier  name, it is interpreted as a sequence of these abbreviations.
391       For example:
392
393         /abc/ig,newline=cr,jit=3
394
395       This is a pattern line whose modifier list starts with  two  one-letter
396       modifiers  (/i  and  /g).  The lower-case abbreviated modifiers are the
397       same as used in Perl.
398
399
400PATTERN SYNTAX
401
402       A pattern line must start with one of the following characters  (common
403       symbols, excluding pattern meta-characters):
404
405         / ! " ' ` - = _ : ; , % & @ ~
406
407       This  is  interpreted  as the pattern's delimiter. A regular expression
408       may be continued over several input lines, in which  case  the  newline
409       characters are included within it. It is possible to include the delim-
410       iter within the pattern by escaping it with a backslash, for example
411
412         /abc\/def/
413
414       If you do this, the escape and the delimiter form part of the  pattern,
415       but since the delimiters are all non-alphanumeric, this does not affect
416       its interpretation. If the terminating delimiter  is  immediately  fol-
417       lowed by a backslash, for example,
418
419         /abc/\
420
421       then  a  backslash  is added to the end of the pattern. This is done to
422       provide a way of testing the error condition that arises if  a  pattern
423       finishes with a backslash, because
424
425         /abc\/
426
427       is  interpreted as the first line of a pattern that starts with "abc/",
428       causing pcre2test to read the next line as a continuation of the  regu-
429       lar expression.
430
431       A pattern can be followed by a modifier list (details below).
432
433
434SUBJECT LINE SYNTAX
435
436       Before    each   subject   line   is   passed   to   pcre2_match()   or
437       pcre2_dfa_match(), leading and trailing white space is removed, and the
438       line is scanned for backslash escapes, unless the subject_literal modi-
439       fier was set for the pattern. The following provide a means of encoding
440       non-printing characters in a visible way:
441
442         \a         alarm (BEL, \x07)
443         \b         backspace (\x08)
444         \e         escape (\x27)
445         \f         form feed (\x0c)
446         \n         newline (\x0a)
447         \r         carriage return (\x0d)
448         \t         tab (\x09)
449         \v         vertical tab (\x0b)
450         \nnn       octal character (up to 3 octal digits); always
451                      a byte unless > 255 in UTF-8 or 16-bit or 32-bit mode
452         \o{dd...}  octal character (any number of octal digits}
453         \xhh       hexadecimal byte (up to 2 hex digits)
454         \x{hh...}  hexadecimal character (any number of hex digits)
455
456       The use of \x{hh...} is not dependent on the use of the utf modifier on
457       the pattern. It is recognized always. There may be any number of  hexa-
458       decimal  digits  inside  the  braces; invalid values provoke error mes-
459       sages.
460
461       Note that \xhh specifies one byte rather than one  character  in  UTF-8
462       mode;  this  makes it possible to construct invalid UTF-8 sequences for
463       testing purposes. On the other hand, \x{hh} is interpreted as  a  UTF-8
464       character  in UTF-8 mode, generating more than one byte if the value is
465       greater than 127.  When testing the 8-bit library not  in  UTF-8  mode,
466       \x{hh} generates one byte for values less than 256, and causes an error
467       for greater values.
468
469       In UTF-16 mode, all 4-digit \x{hhhh} values are accepted. This makes it
470       possible to construct invalid UTF-16 sequences for testing purposes.
471
472       In  UTF-32  mode,  all  4- to 8-digit \x{...} values are accepted. This
473       makes it possible to construct invalid  UTF-32  sequences  for  testing
474       purposes.
475
476       There is a special backslash sequence that specifies replication of one
477       or more characters:
478
479         \[<characters>]{<count>}
480
481       This makes it possible to test long strings without having  to  provide
482       them as part of the file. For example:
483
484         \[abc]{4}
485
486       is  converted to "abcabcabcabc". This feature does not support nesting.
487       To include a closing square bracket in the characters, code it as \x5D.
488
489       A backslash followed by an equals sign marks the  end  of  the  subject
490       string and the start of a modifier list. For example:
491
492         abc\=notbol,notempty
493
494       If  the  subject  string is empty and \= is followed by whitespace, the
495       line is treated as a comment line, and is not used  for  matching.  For
496       example:
497
498         \= This is a comment.
499         abc\= This is an invalid modifier list.
500
501       A  backslash  followed  by  any  other  non-alphanumeric character just
502       escapes that character. A backslash followed by anything else causes an
503       error.  However,  if the very last character in the line is a backslash
504       (and there is no modifier list), it is ignored. This  gives  a  way  of
505       passing  an  empty line as data, since a real empty line terminates the
506       data input.
507
508       If the subject_literal modifier is set for a pattern, all subject lines
509       that follow are treated as literals, with no special treatment of back-
510       slashes.  No replication is possible, and any subject modifiers must be
511       set as defaults by a #subject command.
512
513
514PATTERN MODIFIERS
515
516       There  are  several types of modifier that can appear in pattern lines.
517       Except where noted below, they may also be used in #pattern commands. A
518       pattern's  modifier  list can add to or override default modifiers that
519       were set by a previous #pattern command.
520
521   Setting compilation options
522
523       The following modifiers set options for pcre2_compile(). Most  of  them
524       set  bits  in  the  options  argument of that function, but those whose
525       names start with PCRE2_EXTRA are additional options that are set in the
526       compile  context.  For  the  main options, there are some single-letter
527       abbreviations that are the same as Perl options. There is special  han-
528       dling  for  /x:  if  a second x is present, PCRE2_EXTENDED is converted
529       into  PCRE2_EXTENDED_MORE  as  in  Perl.  A   third   appearance   adds
530       PCRE2_EXTENDED  as  well,  though  this  makes no difference to the way
531       pcre2_compile() behaves. See pcre2api for a description of the  effects
532       of these options.
533
534             allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
535             allow_surrogate_escapes   set PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES
536             alt_bsux                  set PCRE2_ALT_BSUX
537             alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
538             alt_verbnames             set PCRE2_ALT_VERBNAMES
539             anchored                  set PCRE2_ANCHORED
540             auto_callout              set PCRE2_AUTO_CALLOUT
541             bad_escape_is_literal     set PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL
542         /i  caseless                  set PCRE2_CASELESS
543             dollar_endonly            set PCRE2_DOLLAR_ENDONLY
544         /s  dotall                    set PCRE2_DOTALL
545             dupnames                  set PCRE2_DUPNAMES
546             endanchored               set PCRE2_ENDANCHORED
547         /x  extended                  set PCRE2_EXTENDED
548         /xx extended_more             set PCRE2_EXTENDED_MORE
549             firstline                 set PCRE2_FIRSTLINE
550             literal                   set PCRE2_LITERAL
551             match_line                set PCRE2_EXTRA_MATCH_LINE
552             match_unset_backref       set PCRE2_MATCH_UNSET_BACKREF
553             match_word                set PCRE2_EXTRA_MATCH_WORD
554         /m  multiline                 set PCRE2_MULTILINE
555             never_backslash_c         set PCRE2_NEVER_BACKSLASH_C
556             never_ucp                 set PCRE2_NEVER_UCP
557             never_utf                 set PCRE2_NEVER_UTF
558         /n  no_auto_capture           set PCRE2_NO_AUTO_CAPTURE
559             no_auto_possess           set PCRE2_NO_AUTO_POSSESS
560             no_dotstar_anchor         set PCRE2_NO_DOTSTAR_ANCHOR
561             no_start_optimize         set PCRE2_NO_START_OPTIMIZE
562             no_utf_check              set PCRE2_NO_UTF_CHECK
563             ucp                       set PCRE2_UCP
564             ungreedy                  set PCRE2_UNGREEDY
565             use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
566             utf                       set PCRE2_UTF
567
568       As well as turning on the PCRE2_UTF option, the utf modifier causes all
569       non-printing characters in output  strings  to  be  printed  using  the
570       \x{hh...}  notation. Otherwise, those less than 0x100 are output in hex
571       without the curly brackets. Setting utf in 16-bit or 32-bit  mode  also
572       causes  pattern  and  subject  strings  to  be  translated to UTF-16 or
573       UTF-32, respectively, before being passed to library functions.
574
575   Setting compilation controls
576
577       The following modifiers  affect  the  compilation  process  or  request
578       information  about  the  pattern. There are single-letter abbreviations
579       for some that are heavily used in the test files.
580
581             bsr=[anycrlf|unicode]     specify \R handling
582         /B  bincode                   show binary code without lengths
583             callout_info              show callout information
584             convert=<options>         request foreign pattern conversion
585             convert_glob_escape=c     set glob escape character
586             convert_glob_separator=c  set glob separator character
587             convert_length            set convert buffer length
588             debug                     same as info,fullbincode
589             framesize                 show matching frame size
590             fullbincode               show binary code with lengths
591         /I  info                      show info about compiled pattern
592             hex                       unquoted characters are hexadecimal
593             jit[=<number>]            use JIT
594             jitfast                   use JIT fast path
595             jitverify                 verify JIT use
596             locale=<name>             use this locale
597             max_pattern_length=<n>    set the maximum pattern length
598             memory                    show memory used
599             newline=<type>            set newline type
600             null_context              compile with a NULL context
601             parens_nest_limit=<n>     set maximum parentheses depth
602             posix                     use the POSIX API
603             posix_nosub               use the POSIX API with REG_NOSUB
604             push                      push compiled pattern onto the stack
605             pushcopy                  push a copy onto the stack
606             stackguard=<number>       test the stackguard feature
607             subject_literal           treat all subject lines as literal
608             tables=[0|1|2]            select internal tables
609             use_length                do not zero-terminate the pattern
610             utf8_input                treat input as UTF-8
611
612       The effects of these modifiers are described in the following sections.
613
614   Newline and \R handling
615
616       The bsr modifier specifies what \R in a pattern should match. If it  is
617       set  to  "anycrlf",  \R  matches  CR, LF, or CRLF only. If it is set to
618       "unicode", \R matches any Unicode newline sequence. The default can  be
619       specified when PCRE2 is built; if it is not, the default is set to Uni-
620       code.
621
622       The newline modifier specifies which characters are to  be  interpreted
623       as newlines, both in the pattern and in subject lines. The type must be
624       one of CR, LF, CRLF, ANYCRLF, ANY, or NUL (in upper or lower case).
625
626   Information about a pattern
627
628       The debug modifier is a shorthand for info,fullbincode, requesting  all
629       available information.
630
631       The bincode modifier causes a representation of the compiled code to be
632       output after compilation. This information does not contain length  and
633       offset values, which ensures that the same output is generated for dif-
634       ferent internal link sizes and different code  unit  widths.  By  using
635       bincode,  the  same  regression tests can be used in different environ-
636       ments.
637
638       The fullbincode modifier, by contrast, does include length  and  offset
639       values.  This is used in a few special tests that run only for specific
640       code unit widths and link sizes, and is also useful for one-off tests.
641
642       The info modifier  requests  information  about  the  compiled  pattern
643       (whether  it  is anchored, has a fixed first character, and so on). The
644       information is obtained from the  pcre2_pattern_info()  function.  Here
645       are some typical examples:
646
647           re> /(?i)(^a|^b)/m,info
648         Capturing subpattern count = 1
649         Compile options: multiline
650         Overall options: caseless multiline
651         First code unit at start or follows newline
652         Subject length lower bound = 1
653
654           re> /(?i)abc/info
655         Capturing subpattern count = 0
656         Compile options: <none>
657         Overall options: caseless
658         First code unit = 'a' (caseless)
659         Last code unit = 'c' (caseless)
660         Subject length lower bound = 3
661
662       "Compile  options"  are those specified by modifiers; "overall options"
663       have added options that are taken or deduced from the pattern. If  both
664       sets  of  options are the same, just a single "options" line is output;
665       if there are no options, the line is  omitted.  "First  code  unit"  is
666       where  any  match must start; if there is more than one they are listed
667       as "starting code units". "Last code unit" is  the  last  literal  code
668       unit  that  must  be  present in any match. This is not necessarily the
669       last character. These lines are omitted if no starting or  ending  code
670       units are recorded.
671
672       The  framesize modifier shows the size, in bytes, of the storage frames
673       used by pcre2_match() for handling backtracking. The  size  depends  on
674       the number of capturing parentheses in the pattern.
675
676       The  callout_info  modifier requests information about all the callouts
677       in the pattern. A list of them is output at the end of any other infor-
678       mation that is requested. For each callout, either its number or string
679       is given, followed by the item that follows it in the pattern.
680
681   Passing a NULL context
682
683       Normally, pcre2test passes a context block to pcre2_compile().  If  the
684       null_context  modifier  is  set,  however,  NULL is passed. This is for
685       testing that pcre2_compile() behaves correctly in this  case  (it  uses
686       default values).
687
688   Specifying pattern characters in hexadecimal
689
690       The  hex  modifier specifies that the characters of the pattern, except
691       for substrings enclosed in single or double quotes, are  to  be  inter-
692       preted  as  pairs  of hexadecimal digits. This feature is provided as a
693       way of creating patterns that contain binary zeros and other non-print-
694       ing  characters.  White space is permitted between pairs of digits. For
695       example, this pattern contains three characters:
696
697         /ab 32 59/hex
698
699       Parts of such a pattern are taken literally  if  quoted.  This  pattern
700       contains  nine characters, only two of which are specified in hexadeci-
701       mal:
702
703         /ab "literal" 32/hex
704
705       Either single or double quotes may be used. There is no way of  includ-
706       ing  the delimiter within a substring. The hex and expand modifiers are
707       mutually exclusive.
708
709   Specifying the pattern's length
710
711       By default, patterns are passed to the compiling functions as zero-ter-
712       minated  strings but can be passed by length instead of being zero-ter-
713       minated. The use_length modifier causes this to happen. Using a  length
714       happens  automatically  (whether  or not use_length is set) when hex is
715       set, because patterns  specified  in  hexadecimal  may  contain  binary
716       zeros.
717
718       If hex or use_length is used with the POSIX wrapper API (see "Using the
719       POSIX wrapper API" below), the REG_PEND extension is used to  pass  the
720       pattern's length.
721
722   Specifying wide characters in 16-bit and 32-bit modes
723
724       In 16-bit and 32-bit modes, all input is automatically treated as UTF-8
725       and translated to UTF-16 or UTF-32 when the utf modifier  is  set.  For
726       testing the 16-bit and 32-bit libraries in non-UTF mode, the utf8_input
727       modifier can be used. It is mutually exclusive with  utf.  Input  lines
728       are interpreted as UTF-8 as a means of specifying wide characters. More
729       details are given in "Input encoding" above.
730
731   Generating long repetitive patterns
732
733       Some tests use long patterns that are very repetitive. Instead of  cre-
734       ating  a very long input line for such a pattern, you can use a special
735       repetition feature, similar to the  one  described  for  subject  lines
736       above.  If  the  expand  modifier is present on a pattern, parts of the
737       pattern that have the form
738
739         \[<characters>]{<count>}
740
741       are expanded before the pattern is passed to pcre2_compile(). For exam-
742       ple, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
743       cannot be nested. An initial "\[" sequence is recognized only  if  "]{"
744       followed  by  decimal  digits and "}" is found later in the pattern. If
745       not, the characters remain in the pattern unaltered. The expand and hex
746       modifiers are mutually exclusive.
747
748       If  part  of an expanded pattern looks like an expansion, but is really
749       part of the actual pattern, unwanted expansion can be avoided by giving
750       two values in the quantifier. For example, \[AB]{6000,6000} is not rec-
751       ognized as an expansion item.
752
753       If the info modifier is set on an expanded pattern, the result  of  the
754       expansion is included in the information that is output.
755
756   JIT compilation
757
758       Just-in-time  (JIT)  compiling  is  a heavyweight optimization that can
759       greatly speed up pattern matching. See the pcre2jit  documentation  for
760       details.  JIT  compiling  happens, optionally, after a pattern has been
761       successfully compiled into an internal form. The JIT compiler  converts
762       this to optimized machine code. It needs to know whether the match-time
763       options PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used,
764       because  different  code  is generated for the different cases. See the
765       partial modifier in "Subject Modifiers" below for details of how  these
766       options are specified for each match attempt.
767
768       JIT  compilation  is  requested  by the jit pattern modifier, which may
769       optionally be followed by an equals sign and a number in the range 0 to
770       7.   The  three bits that make up the number specify which of the three
771       JIT operating modes are to be compiled:
772
773         1  compile JIT code for non-partial matching
774         2  compile JIT code for soft partial matching
775         4  compile JIT code for hard partial matching
776
777       The possible values for the jit modifier are therefore:
778
779         0  disable JIT
780         1  normal matching only
781         2  soft partial matching only
782         3  normal and soft partial matching
783         4  hard partial matching only
784         6  soft and hard partial matching only
785         7  all three modes
786
787       If no number is given, 7 is  assumed.  The  phrase  "partial  matching"
788       means a call to pcre2_match() with either the PCRE2_PARTIAL_SOFT or the
789       PCRE2_PARTIAL_HARD option set. Note that such a call may return a  com-
790       plete match; the options enable the possibility of a partial match, but
791       do not require it. Note also that if you request JIT  compilation  only
792       for  partial  matching  (for example, jit=2) but do not set the partial
793       modifier on a subject line, that match will not use  JIT  code  because
794       none was compiled for non-partial matching.
795
796       If  JIT compilation is successful, the compiled JIT code will automati-
797       cally be used when an appropriate type of match  is  run,  except  when
798       incompatible  run-time options are specified. For more details, see the
799       pcre2jit documentation. See also the jitstack modifier below for a  way
800       of setting the size of the JIT stack.
801
802       If  the  jitfast  modifier is specified, matching is done using the JIT
803       "fast path" interface, pcre2_jit_match(), which skips some of the  san-
804       ity  checks that are done by pcre2_match(), and of course does not work
805       when JIT is not supported. If jitfast is specified without  jit,  jit=7
806       is assumed.
807
808       If  the jitverify modifier is specified, information about the compiled
809       pattern shows whether JIT compilation was or  was  not  successful.  If
810       jitverify  is  specified without jit, jit=7 is assumed. If JIT compila-
811       tion is successful when jitverify is set, the text "(JIT)" is added  to
812       the first output line after a match or non match when JIT-compiled code
813       was actually used in the match.
814
815   Setting a locale
816
817       The locale modifier must specify the name of a locale, for example:
818
819         /pattern/locale=fr_FR
820
821       The given locale is set, pcre2_maketables() is called to build a set of
822       character  tables for the locale, and this is then passed to pcre2_com-
823       pile() when compiling the regular expression. The same tables are  used
824       when  matching the following subject lines. The locale modifier applies
825       only to the pattern on which it appears, but can be given in a #pattern
826       command  if a default is needed. Setting a locale and alternate charac-
827       ter tables are mutually exclusive.
828
829   Showing pattern memory
830
831       The memory modifier causes the size in bytes of the memory used to hold
832       the  compiled  pattern  to be output. This does not include the size of
833       the pcre2_code block; it is just the actual compiled data. If the  pat-
834       tern  is  subsequently  passed to the JIT compiler, the size of the JIT
835       compiled code is also output. Here is an example:
836
837           re> /a(b)c/jit,memory
838         Memory allocation (code space): 21
839         Memory allocation (JIT code): 1910
840
841
842   Limiting nested parentheses
843
844       The parens_nest_limit modifier sets a limit  on  the  depth  of  nested
845       parentheses  in  a  pattern.  Breaching  the limit causes a compilation
846       error.  The default for the library is set when  PCRE2  is  built,  but
847       pcre2test  sets  its  own default of 220, which is required for running
848       the standard test suite.
849
850   Limiting the pattern length
851
852       The max_pattern_length modifier sets a limit, in  code  units,  to  the
853       length of pattern that pcre2_compile() will accept. Breaching the limit
854       causes a compilation  error.  The  default  is  the  largest  number  a
855       PCRE2_SIZE variable can hold (essentially unlimited).
856
857   Using the POSIX wrapper API
858
859       The  posix  and posix_nosub modifiers cause pcre2test to call PCRE2 via
860       the POSIX wrapper API rather than its native API. When  posix_nosub  is
861       used,  the  POSIX  option  REG_NOSUB  is passed to regcomp(). The POSIX
862       wrapper supports only the 8-bit library. Note that it  does  not  imply
863       POSIX matching semantics; for more detail see the pcre2posix documenta-
864       tion. The following pattern modifiers set  options  for  the  regcomp()
865       function:
866
867         caseless           REG_ICASE
868         multiline          REG_NEWLINE
869         dotall             REG_DOTALL     )
870         ungreedy           REG_UNGREEDY   ) These options are not part of
871         ucp                REG_UCP        )   the POSIX standard
872         utf                REG_UTF8       )
873
874       The  regerror_buffsize  modifier  specifies a size for the error buffer
875       that is passed to regerror() in the event of a compilation  error.  For
876       example:
877
878         /abc/posix,regerror_buffsize=20
879
880       This  provides  a means of testing the behaviour of regerror() when the
881       buffer is too small for the error message. If  this  modifier  has  not
882       been set, a large buffer is used.
883
884       The  aftertext  and  allaftertext  subject  modifiers work as described
885       below. All other modifiers are either ignored, with a warning  message,
886       or cause an error.
887
888       The  pattern  is  passed  to  regcomp()  as a zero-terminated string by
889       default, but if the use_length or hex modifiers are set,  the  REG_PEND
890       extension is used to pass it by length.
891
892   Testing the stack guard feature
893
894       The  stackguard  modifier  is  used  to  test the use of pcre2_set_com-
895       pile_recursion_guard(), a function that is  provided  to  enable  stack
896       availability  to  be checked during compilation (see the pcre2api docu-
897       mentation for details). If the number  specified  by  the  modifier  is
898       greater than zero, pcre2_set_compile_recursion_guard() is called to set
899       up callback from pcre2_compile() to a local function. The  argument  it
900       receives  is  the current nesting parenthesis depth; if this is greater
901       than the value given by the modifier, non-zero is returned, causing the
902       compilation to be aborted.
903
904   Using alternative character tables
905
906       The  value  specified for the tables modifier must be one of the digits
907       0, 1, or 2. It causes a specific set of built-in character tables to be
908       passed to pcre2_compile(). This is used in the PCRE2 tests to check be-
909       haviour with different character tables. The digit specifies the tables
910       as follows:
911
912         0   do not pass any special character tables
913         1   the default ASCII tables, as distributed in
914               pcre2_chartables.c.dist
915         2   a set of tables defining ISO 8859 characters
916
917       In  table 2, some characters whose codes are greater than 128 are iden-
918       tified as letters, digits, spaces,  etc.  Setting  alternate  character
919       tables and a locale are mutually exclusive.
920
921   Setting certain match controls
922
923       The following modifiers are really subject modifiers, and are described
924       under "Subject Modifiers" below. However, they may  be  included  in  a
925       pattern's  modifier  list, in which case they are applied to every sub-
926       ject line that is processed with that pattern. These modifiers  do  not
927       affect the compilation process.
928
929             aftertext                  show text after match
930             allaftertext               show text after captures
931             allcaptures                show all captures
932             allusedtext                show all consulted text
933             altglobal                  alternative global matching
934         /g  global                     global matching
935             jitstack=<n>               set size of JIT stack
936             mark                       show mark values
937             replace=<string>           specify a replacement string
938             startchar                  show starting character when relevant
939             substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
940             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
941             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
942             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
943
944       These  modifiers may not appear in a #pattern command. If you want them
945       as defaults, set them in a #subject command.
946
947   Specifying literal subject lines
948
949       If the subject_literal modifier is present on a pattern, all  the  sub-
950       ject lines that it matches are taken as literal strings, with no inter-
951       pretation of backslashes. It is not possible to set  subject  modifiers
952       on  such  lines, but any that are set as defaults by a #subject command
953       are recognized.
954
955   Saving a compiled pattern
956
957       When a pattern with the push modifier is successfully compiled,  it  is
958       pushed  onto  a  stack  of compiled patterns, and pcre2test expects the
959       next line to contain a new pattern (or a command) instead of a  subject
960       line. This facility is used when saving compiled patterns to a file, as
961       described in the section entitled "Saving and restoring  compiled  pat-
962       terns"  below.  If pushcopy is used instead of push, a copy of the com-
963       piled pattern is stacked, leaving the original  as  current,  ready  to
964       match  the  following  input  lines. This provides a way of testing the
965       pcre2_code_copy() function.   The  push  and  pushcopy   modifiers  are
966       incompatible  with  compilation  modifiers  such  as global that act at
967       match time. Any that are specified are ignored (for the stacked  copy),
968       with a warning message, except for replace, which causes an error. Note
969       that jitverify, which is allowed, does not carry through to any  subse-
970       quent matching that uses a stacked pattern.
971
972   Testing foreign pattern conversion
973
974       The  experimental  foreign pattern conversion functions in PCRE2 can be
975       tested by setting the convert modifier. Its argument is  a  colon-sepa-
976       rated  list  of  options,  which  set  the  equivalent  option  for the
977       pcre2_pattern_convert() function:
978
979         glob                    PCRE2_CONVERT_GLOB
980         glob_no_starstar        PCRE2_CONVERT_GLOB_NO_STARSTAR
981         glob_no_wild_separator  PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
982         posix_basic             PCRE2_CONVERT_POSIX_BASIC
983         posix_extended          PCRE2_CONVERT_POSIX_EXTENDED
984         unset                   Unset all options
985
986       The "unset" value is useful for turning off a default that has been set
987       by a #pattern command. When one of these options is set, the input pat-
988       tern is passed to pcre2_pattern_convert(). If the  conversion  is  suc-
989       cessful,  the  result  is  reflected  in  the output and then passed to
990       pcre2_compile(). The normal utf and no_utf_check options, if set, cause
991       the  PCRE2_CONVERT_UTF  and  PCRE2_CONVERT_NO_UTF_CHECK  options  to be
992       passed to pcre2_pattern_convert().
993
994       By default, the conversion function is allowed to allocate a buffer for
995       its  output.  However, if the convert_length modifier is set to a value
996       greater than zero, pcre2test passes a buffer of the given length.  This
997       makes it possible to test the length check.
998
999       The  convert_glob_escape  and  convert_glob_separator  modifiers can be
1000       used to specify the escape and separator characters for  glob  process-
1001       ing, overriding the defaults, which are operating-system dependent.
1002
1003
1004SUBJECT MODIFIERS
1005
1006       The modifiers that can appear in subject lines and the #subject command
1007       are of two types.
1008
1009   Setting match options
1010
1011       The   following   modifiers   set   options   for   pcre2_match()    or
1012       pcre2_dfa_match(). See pcreapi for a description of their effects.
1013
1014             anchored                  set PCRE2_ANCHORED
1015             endanchored               set PCRE2_ENDANCHORED
1016             dfa_restart               set PCRE2_DFA_RESTART
1017             dfa_shortest              set PCRE2_DFA_SHORTEST
1018             no_jit                    set PCRE2_NO_JIT
1019             no_utf_check              set PCRE2_NO_UTF_CHECK
1020             notbol                    set PCRE2_NOTBOL
1021             notempty                  set PCRE2_NOTEMPTY
1022             notempty_atstart          set PCRE2_NOTEMPTY_ATSTART
1023             noteol                    set PCRE2_NOTEOL
1024             partial_hard (or ph)      set PCRE2_PARTIAL_HARD
1025             partial_soft (or ps)      set PCRE2_PARTIAL_SOFT
1026
1027       The  partial matching modifiers are provided with abbreviations because
1028       they appear frequently in tests.
1029
1030       If the posix or posix_nosub modifier was present on the pattern,  caus-
1031       ing the POSIX wrapper API to be used, the only option-setting modifiers
1032       that have any effect are notbol, notempty, and noteol, causing REG_NOT-
1033       BOL,  REG_NOTEMPTY,  and  REG_NOTEOL,  respectively,  to  be  passed to
1034       regexec(). The other modifiers are ignored, with a warning message.
1035
1036       There is one additional modifier that can be used with the POSIX  wrap-
1037       per. It is ignored (with a warning) if used for non-POSIX matching.
1038
1039             posix_startend=<n>[:<m>]
1040
1041       This  causes  the  subject  string  to be passed to regexec() using the
1042       REG_STARTEND option, which uses offsets to specify which  part  of  the
1043       string  is  searched.  If  only  one number is given, the end offset is
1044       passed as the end of the subject string. For more detail  of  REG_STAR-
1045       TEND,  see the pcre2posix documentation. If the subject string contains
1046       binary zeros (coded as escapes such as \x{00}  because  pcre2test  does
1047       not support actual binary zeros in its input), you must use posix_star-
1048       tend to specify its length.
1049
1050   Setting match controls
1051
1052       The following modifiers affect the matching process  or  request  addi-
1053       tional  information.  Some  of  them may also be specified on a pattern
1054       line (see above), in which case they apply to every subject  line  that
1055       is matched against that pattern.
1056
1057             aftertext                  show text after match
1058             allaftertext               show text after captures
1059             allcaptures                show all captures
1060             allusedtext                show all consulted text (non-JIT only)
1061             altglobal                  alternative global matching
1062             callout_capture            show captures at callout time
1063             callout_data=<n>           set a value to pass via callouts
1064             callout_error=<n>[:<m>]    control callout error
1065             callout_extra              show extra callout information
1066             callout_fail=<n>[:<m>]     control callout failure
1067             callout_no_where           do not show position of a callout
1068             callout_none               do not supply a callout function
1069             copy=<number or name>      copy captured substring
1070             depth_limit=<n>            set a depth limit
1071             dfa                        use pcre2_dfa_match()
1072             find_limits                find match and depth limits
1073             get=<number or name>       extract captured substring
1074             getall                     extract all captured substrings
1075         /g  global                     global matching
1076             heap_limit=<n>             set a limit on heap memory (Kbytes)
1077             jitstack=<n>               set size of JIT stack
1078             mark                       show mark values
1079             match_limit=<n>            set a match limit
1080             memory                     show heap memory usage
1081             null_context               match with a NULL context
1082             offset=<n>                 set starting offset
1083             offset_limit=<n>           set offset limit
1084             ovector=<n>                set size of output vector
1085             recursion_limit=<n>        obsolete synonym for depth_limit
1086             replace=<string>           specify a replacement string
1087             startchar                  show startchar when relevant
1088             startoffset=<n>            same as offset=<n>
1089             substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
1090             substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1091             substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1092             substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
1093             zero_terminate             pass the subject as zero-terminated
1094
1095       The effects of these modifiers are described in the following sections.
1096       When matching via the POSIX wrapper API, the  aftertext,  allaftertext,
1097       and  ovector subject modifiers work as described below. All other modi-
1098       fiers are either ignored, with a warning message, or cause an error.
1099
1100   Showing more text
1101
1102       The aftertext modifier requests that as well as outputting the part  of
1103       the subject string that matched the entire pattern, pcre2test should in
1104       addition output the remainder of the subject string. This is useful for
1105       tests where the subject contains multiple copies of the same substring.
1106       The allaftertext modifier requests the same action  for  captured  sub-
1107       strings as well as the main matched substring. In each case the remain-
1108       der is output on the following line with a plus character following the
1109       capture number.
1110
1111       The  allusedtext modifier requests that all the text that was consulted
1112       during a successful pattern match by the interpreter should  be  shown.
1113       This  feature  is not supported for JIT matching, and if requested with
1114       JIT it is ignored (with  a  warning  message).  Setting  this  modifier
1115       affects the output if there is a lookbehind at the start of a match, or
1116       a lookahead at the end, or if \K is used  in  the  pattern.  Characters
1117       that  precede or follow the start and end of the actual match are indi-
1118       cated in the output by '<' or '>' characters underneath them.  Here  is
1119       an example:
1120
1121           re> /(?<=pqr)abc(?=xyz)/
1122         data> 123pqrabcxyz456\=allusedtext
1123          0: pqrabcxyz
1124             <<<   >>>
1125
1126       This  shows  that  the  matched string is "abc", with the preceding and
1127       following strings "pqr" and "xyz"  having  been  consulted  during  the
1128       match (when processing the assertions).
1129
1130       The  startchar  modifier  requests  that the starting character for the
1131       match be indicated, if it is different to  the  start  of  the  matched
1132       string. The only time when this occurs is when \K has been processed as
1133       part of the match. In this situation, the output for the matched string
1134       is  displayed  from  the  starting  character instead of from the match
1135       point, with circumflex characters under  the  earlier  characters.  For
1136       example:
1137
1138           re> /abc\Kxyz/
1139         data> abcxyz\=startchar
1140          0: abcxyz
1141             ^^^
1142
1143       Unlike  allusedtext, the startchar modifier can be used with JIT.  How-
1144       ever, these two modifiers are mutually exclusive.
1145
1146   Showing the value of all capture groups
1147
1148       The allcaptures modifier requests that the values of all potential cap-
1149       tured parentheses be output after a match. By default, only those up to
1150       the highest one actually used in the match are output (corresponding to
1151       the  return  code from pcre2_match()). Groups that did not take part in
1152       the match are output as "<unset>". This modifier is  not  relevant  for
1153       DFA  matching  (which does no capturing); it is ignored, with a warning
1154       message, if present.
1155
1156   Testing callouts
1157
1158       A callout function is supplied when pcre2test calls the library  match-
1159       ing  functions,  unless callout_none is specified. Its behaviour can be
1160       controlled by various modifiers listed above  whose  names  begin  with
1161       callout_. Details are given in the section entitled "Callouts" below.
1162
1163   Finding all matches in a string
1164
1165       Searching for all possible matches within a subject can be requested by
1166       the global or altglobal modifier. After finding a match,  the  matching
1167       function  is  called  again to search the remainder of the subject. The
1168       difference between global and altglobal is that  the  former  uses  the
1169       start_offset  argument  to  pcre2_match() or pcre2_dfa_match() to start
1170       searching at a new point within the entire string (which is  what  Perl
1171       does), whereas the latter passes over a shortened subject. This makes a
1172       difference to the matching process if the pattern begins with a lookbe-
1173       hind assertion (including \b or \B).
1174
1175       If  an  empty  string  is  matched,  the  next  match  is done with the
1176       PCRE2_NOTEMPTY_ATSTART and PCRE2_ANCHORED flags set, in order to search
1177       for another, non-empty, match at the same point in the subject. If this
1178       match fails, the start offset is advanced,  and  the  normal  match  is
1179       retried.  This  imitates the way Perl handles such cases when using the
1180       /g modifier or the split() function.  Normally,  the  start  offset  is
1181       advanced  by  one  character,  but if the newline convention recognizes
1182       CRLF as a newline, and the current character is CR followed by  LF,  an
1183       advance of two characters occurs.
1184
1185   Testing substring extraction functions
1186
1187       The  copy  and  get  modifiers  can  be  used  to  test  the pcre2_sub-
1188       string_copy_xxx() and pcre2_substring_get_xxx() functions.  They can be
1189       given  more than once, and each can specify a group name or number, for
1190       example:
1191
1192          abcd\=copy=1,copy=3,get=G1
1193
1194       If the #subject command is used to set default copy and/or  get  lists,
1195       these  can  be unset by specifying a negative number to cancel all num-
1196       bered groups and an empty name to cancel all named groups.
1197
1198       The getall modifier tests  pcre2_substring_list_get(),  which  extracts
1199       all captured substrings.
1200
1201       If  the  subject line is successfully matched, the substrings extracted
1202       by the convenience functions are output with  C,  G,  or  L  after  the
1203       string  number  instead  of  a colon. This is in addition to the normal
1204       full list. The string length (that is, the return from  the  extraction
1205       function) is given in parentheses after each substring, followed by the
1206       name when the extraction was by name.
1207
1208   Testing the substitution function
1209
1210       If the replace modifier is  set,  the  pcre2_substitute()  function  is
1211       called  instead of one of the matching functions. Note that replacement
1212       strings cannot contain commas, because a comma signifies the end  of  a
1213       modifier. This is not thought to be an issue in a test program.
1214
1215       Unlike  subject strings, pcre2test does not process replacement strings
1216       for escape sequences. In UTF mode, a replacement string is  checked  to
1217       see  if it is a valid UTF-8 string. If so, it is correctly converted to
1218       a UTF string of the appropriate code unit width. If it is not  a  valid
1219       UTF-8  string, the individual code units are copied directly. This pro-
1220       vides a means of passing an invalid UTF-8 string for testing purposes.
1221
1222       The following modifiers set options (in additional to the normal  match
1223       options) for pcre2_substitute():
1224
1225         global                      PCRE2_SUBSTITUTE_GLOBAL
1226         substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
1227         substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
1228         substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
1229         substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
1230
1231
1232       After  a  successful  substitution, the modified string is output, pre-
1233       ceded by the number of replacements. This may be zero if there were  no
1234       matches. Here is a simple example of a substitution test:
1235
1236         /abc/replace=xxx
1237             =abc=abc=
1238          1: =xxx=abc=
1239             =abc=abc=\=global
1240          2: =xxx=xxx=
1241
1242       Subject  and replacement strings should be kept relatively short (fewer
1243       than 256 characters) for substitution tests, as fixed-size buffers  are
1244       used.  To  make it easy to test for buffer overflow, if the replacement
1245       string starts with a number in square brackets, that number  is  passed
1246       to  pcre2_substitute()  as  the  size  of  the  output buffer, with the
1247       replacement string starting at the next character. Here is  an  example
1248       that tests the edge case:
1249
1250         /abc/
1251             123abc123\=replace=[10]XYZ
1252          1: 123XYZ123
1253             123abc123\=replace=[9]XYZ
1254         Failed: error -47: no more memory
1255
1256       The    default    action    of    pcre2_substitute()   is   to   return
1257       PCRE2_ERROR_NOMEMORY when the output buffer is too small.  However,  if
1258       the  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  option is set (by using the sub-
1259       stitute_overflow_length modifier), pcre2_substitute() continues  to  go
1260       through  the  motions of matching and substituting, in order to compute
1261       the size of buffer that is required. When this happens, pcre2test shows
1262       the required buffer length (which includes space for the trailing zero)
1263       as part of the error message. For example:
1264
1265         /abc/substitute_overflow_length
1266             123abc123\=replace=[9]XYZ
1267         Failed: error -47: no more memory: 10 code units are needed
1268
1269       A replacement string is ignored with POSIX and DFA matching. Specifying
1270       partial  matching  provokes  an  error return ("bad option value") from
1271       pcre2_substitute().
1272
1273   Setting the JIT stack size
1274
1275       The jitstack modifier provides a way of setting the maximum stack  size
1276       that  is  used  by the just-in-time optimization code. It is ignored if
1277       JIT optimization is not being used. The value is a number of  kibibytes
1278       (units  of  1024  bytes). Setting zero reverts to the default of 32KiB.
1279       Providing a stack that is larger than the default is necessary only for
1280       very  complicated  patterns.  If  jitstack is set non-zero on a subject
1281       line it overrides any value that was set on the pattern.
1282
1283   Setting heap, match, and depth limits
1284
1285       The heap_limit, match_limit, and depth_limit modifiers set  the  appro-
1286       priate  limits  in the match context. These values are ignored when the
1287       find_limits modifier is specified.
1288
1289   Finding minimum limits
1290
1291       If the find_limits modifier is present on  a  subject  line,  pcre2test
1292       calls  the  relevant matching function several times, setting different
1293       values   in   the    match    context    via    pcre2_set_heap_limit(),
1294       pcre2_set_match_limit(),  or pcre2_set_depth_limit() until it finds the
1295       minimum values for each parameter that allows  the  match  to  complete
1296       without error. If JIT is being used, only the match limit is relevant.
1297
1298       When using this modifier, the pattern should not contain any limit set-
1299       tings such as (*LIMIT_MATCH=...)  within  it.  If  such  a  setting  is
1300       present and is lower than the minimum matching value, the minimum value
1301       cannot be found because pcre2_set_match_limit() etc. are only  able  to
1302       reduce the value of an in-pattern limit; they cannot increase it.
1303
1304       For  non-DFA  matching,  the minimum depth_limit number is a measure of
1305       how much nested backtracking happens (that is, how deeply the pattern's
1306       tree  is  searched).  In the case of DFA matching, depth_limit controls
1307       the depth of recursive calls of the internal function that is used  for
1308       handling pattern recursion, lookaround assertions, and atomic groups.
1309
1310       For non-DFA matching, the match_limit number is a measure of the amount
1311       of backtracking that takes place, and learning the minimum value can be
1312       instructive.  For  most  simple matches, the number is quite small, but
1313       for patterns with very large numbers of matching possibilities, it  can
1314       become  large very quickly with increasing length of subject string. In
1315       the case of DFA matching, match_limit  controls  the  total  number  of
1316       calls, both recursive and non-recursive, to the internal matching func-
1317       tion, thus controlling the overall amount of computing resource that is
1318       used.
1319
1320       For  both  kinds  of  matching,  the  heap_limit  number,  which  is in
1321       kibibytes (units of 1024 bytes), limits the amount of heap memory  used
1322       for matching. A value of zero disables the use of any heap memory; many
1323       simple pattern matches can be done without using the heap, so  zero  is
1324       not an unreasonable setting.
1325
1326   Showing MARK names
1327
1328
1329       The mark modifier causes the names from backtracking control verbs that
1330       are returned from calls to pcre2_match() to be displayed. If a mark  is
1331       returned  for a match, non-match, or partial match, pcre2test shows it.
1332       For a match, it is on a line by itself, tagged with  "MK:".  Otherwise,
1333       it is added to the non-match message.
1334
1335   Showing memory usage
1336
1337       The  memory modifier causes pcre2test to log the sizes of all heap mem-
1338       ory  allocation  and  freeing  calls  that  occur  during  a  call   to
1339       pcre2_match()  or  pcre2_dfa_match().  These  occur  only  when a match
1340       requires a bigger vector than the default for remembering  backtracking
1341       points  (pcre2_match())  or for internal workspace (pcre2_dfa_match()).
1342       In many cases there will be no heap memory used and therefore no  addi-
1343       tional output. No heap memory is allocated during matching with JIT, so
1344       in that case the memory modifier never has any effect. For  this  modi-
1345       fier  to  work,  the  null_context modifier must not be set on both the
1346       pattern and the subject, though it can be set on one or the other.
1347
1348   Setting a starting offset
1349
1350       The offset modifier sets an offset  in  the  subject  string  at  which
1351       matching starts. Its value is a number of code units, not characters.
1352
1353   Setting an offset limit
1354
1355       The  offset_limit  modifier  sets  a limit for unanchored matches. If a
1356       match cannot be found starting at or before this offset in the subject,
1357       a "no match" return is given. The data value is a number of code units,
1358       not characters. When this modifier is used, the use_offset_limit  modi-
1359       fier must have been set for the pattern; if not, an error is generated.
1360
1361   Setting the size of the output vector
1362
1363       The  ovector  modifier  applies  only  to  the subject line in which it
1364       appears, though of course it can also be used to set  a  default  in  a
1365       #subject  command. It specifies the number of pairs of offsets that are
1366       available for storing matching information. The default is 15.
1367
1368       A value of zero is useful when testing the POSIX API because it  causes
1369       regexec() to be called with a NULL capture vector. When not testing the
1370       POSIX API, a value of  zero  is  used  to  cause  pcre2_match_data_cre-
1371       ate_from_pattern()  to  be  called, in order to create a match block of
1372       exactly the right size for the pattern. (It is not possible to create a
1373       match  block  with  a zero-length ovector; there is always at least one
1374       pair of offsets.)
1375
1376   Passing the subject as zero-terminated
1377
1378       By default, the subject string is passed to a native API matching func-
1379       tion with its correct length. In order to test the facility for passing
1380       a zero-terminated string, the zero_terminate modifier is  provided.  It
1381       causes  the length to be passed as PCRE2_ZERO_TERMINATED. When matching
1382       via the POSIX interface, this modifier is ignored, with a warning.
1383
1384       When testing pcre2_substitute(), this modifier also has the  effect  of
1385       passing the replacement string as zero-terminated.
1386
1387   Passing a NULL context
1388
1389       Normally,   pcre2test   passes   a   context  block  to  pcre2_match(),
1390       pcre2_dfa_match() or pcre2_jit_match(). If the null_context modifier is
1391       set,  however,  NULL  is  passed. This is for testing that the matching
1392       functions behave correctly in this case (they use default values). This
1393       modifier  cannot  be used with the find_limits modifier or when testing
1394       the substitution function.
1395
1396
1397THE ALTERNATIVE MATCHING FUNCTION
1398
1399       By default,  pcre2test  uses  the  standard  PCRE2  matching  function,
1400       pcre2_match() to match each subject line. PCRE2 also supports an alter-
1401       native matching function, pcre2_dfa_match(), which operates in  a  dif-
1402       ferent  way, and has some restrictions. The differences between the two
1403       functions are described in the pcre2matching documentation.
1404
1405       If the dfa modifier is set, the alternative matching function is  used.
1406       This  function  finds all possible matches at a given point in the sub-
1407       ject. If, however, the dfa_shortest modifier is set,  processing  stops
1408       after  the  first  match is found. This is always the shortest possible
1409       match.
1410
1411
1412DEFAULT OUTPUT FROM pcre2test
1413
1414       This section describes the output when the  normal  matching  function,
1415       pcre2_match(), is being used.
1416
1417       When  a  match  succeeds,  pcre2test  outputs the list of captured sub-
1418       strings, starting with number 0 for the string that matched  the  whole
1419       pattern.    Otherwise,  it  outputs  "No  match"  when  the  return  is
1420       PCRE2_ERROR_NOMATCH, or "Partial  match:"  followed  by  the  partially
1421       matching  substring  when the return is PCRE2_ERROR_PARTIAL. (Note that
1422       this is the entire substring that  was  inspected  during  the  partial
1423       match;  it  may  include  characters before the actual match start if a
1424       lookbehind assertion, \K, \b, or \B was involved.)
1425
1426       For any other return, pcre2test outputs the PCRE2 negative error number
1427       and  a  short  descriptive  phrase. If the error is a failed UTF string
1428       check, the code unit offset of the start of the  failing  character  is
1429       also output. Here is an example of an interactive pcre2test run.
1430
1431         $ pcre2test
1432         PCRE2 version 10.22 2016-07-29
1433
1434           re> /^abc(\d+)/
1435         data> abc123
1436          0: abc123
1437          1: 123
1438         data> xyz
1439         No match
1440
1441       Unset capturing substrings that are not followed by one that is set are
1442       not shown by pcre2test unless the allcaptures modifier is specified. In
1443       the following example, there are two capturing substrings, but when the
1444       first data line is matched, the second, unset substring is  not  shown.
1445       An  "internal" unset substring is shown as "<unset>", as for the second
1446       data line.
1447
1448           re> /(a)|(b)/
1449         data> a
1450          0: a
1451          1: a
1452         data> b
1453          0: b
1454          1: <unset>
1455          2: b
1456
1457       If the strings contain any non-printing characters, they are output  as
1458       \xhh  escapes  if  the  value is less than 256 and UTF mode is not set.
1459       Otherwise they are output as \x{hh...} escapes. See below for the defi-
1460       nition  of  non-printing  characters. If the aftertext modifier is set,
1461       the output for substring 0 is followed by the the rest of  the  subject
1462       string, identified by "0+" like this:
1463
1464           re> /cat/aftertext
1465         data> cataract
1466          0: cat
1467          0+ aract
1468
1469       If  global  matching  is  requested, the results of successive matching
1470       attempts are output in sequence, like this:
1471
1472           re> /\Bi(\w\w)/g
1473         data> Mississippi
1474          0: iss
1475          1: ss
1476          0: iss
1477          1: ss
1478          0: ipp
1479          1: pp
1480
1481       "No match" is output only if the first match attempt fails. Here is  an
1482       example  of  a  failure  message (the offset 4 that is specified by the
1483       offset modifier is past the end of the subject string):
1484
1485           re> /xyz/
1486         data> xyz\=offset=4
1487         Error -24 (bad offset value)
1488
1489       Note that whereas patterns can be continued over several lines (a plain
1490       ">"  prompt  is used for continuations), subject lines may not. However
1491       newlines can be included in a subject by means of the \n escape (or \r,
1492       \r\n, etc., depending on the newline sequence setting).
1493
1494
1495OUTPUT FROM THE ALTERNATIVE MATCHING FUNCTION
1496
1497       When the alternative matching function, pcre2_dfa_match(), is used, the
1498       output consists of a list of all the matches that start  at  the  first
1499       point in the subject where there is at least one match. For example:
1500
1501           re> /(tang|tangerine|tan)/
1502         data> yellow tangerine\=dfa
1503          0: tangerine
1504          1: tang
1505          2: tan
1506
1507       Using  the normal matching function on this data finds only "tang". The
1508       longest matching string is always  given  first  (and  numbered  zero).
1509       After  a  PCRE2_ERROR_PARTIAL  return,  the output is "Partial match:",
1510       followed by the partially matching substring. Note  that  this  is  the
1511       entire  substring  that  was inspected during the partial match; it may
1512       include characters before the actual match start if a lookbehind asser-
1513       tion, \b, or \B was involved. (\K is not supported for DFA matching.)
1514
1515       If global matching is requested, the search for further matches resumes
1516       at the end of the longest match. For example:
1517
1518           re> /(tang|tangerine|tan)/g
1519         data> yellow tangerine and tangy sultana\=dfa
1520          0: tangerine
1521          1: tang
1522          2: tan
1523          0: tang
1524          1: tan
1525          0: tan
1526
1527       The alternative matching function does not support  substring  capture,
1528       so  the  modifiers  that are concerned with captured substrings are not
1529       relevant.
1530
1531
1532RESTARTING AFTER A PARTIAL MATCH
1533
1534       When the alternative matching function has given  the  PCRE2_ERROR_PAR-
1535       TIAL return, indicating that the subject partially matched the pattern,
1536       you can restart the match with additional subject data by means of  the
1537       dfa_restart modifier. For example:
1538
1539           re> /^\d?\d(jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)\d\d$/
1540         data> 23ja\=P,dfa
1541         Partial match: 23ja
1542         data> n05\=dfa,dfa_restart
1543          0: n05
1544
1545       For  further  information  about partial matching, see the pcre2partial
1546       documentation.
1547
1548
1549CALLOUTS
1550
1551       If the pattern contains any callout requests, pcre2test's callout func-
1552       tion  is  called during matching unless callout_none is specified. This
1553       works with both matching functions, and with JIT, though there are some
1554       differences  in behaviour. The output for callouts with numerical argu-
1555       ments and those with string arguments is slightly different.
1556
1557   Callouts with numerical arguments
1558
1559       By default, the callout function displays the callout number, the start
1560       and  current positions in the subject text at the callout time, and the
1561       next pattern item to be tested. For example:
1562
1563         --->pqrabcdef
1564           0    ^  ^     \d
1565
1566       This output indicates that  callout  number  0  occurred  for  a  match
1567       attempt  starting  at  the fourth character of the subject string, when
1568       the pointer was at the seventh character, and  when  the  next  pattern
1569       item  was  \d.  Just  one circumflex is output if the start and current
1570       positions are the same, or if the current position precedes  the  start
1571       position, which can happen if the callout is in a lookbehind assertion.
1572
1573       Callouts numbered 255 are assumed to be automatic callouts, inserted as
1574       a result of the auto_callout pattern modifier. In this case, instead of
1575       showing  the  callout  number, the offset in the pattern, preceded by a
1576       plus, is output. For example:
1577
1578           re> /\d?[A-E]\*/auto_callout
1579         data> E*
1580         --->E*
1581          +0 ^      \d?
1582          +3 ^      [A-E]
1583          +8 ^^     \*
1584         +10 ^ ^
1585          0: E*
1586
1587       If a pattern contains (*MARK) items, an additional line is output when-
1588       ever  a  change  of  latest mark is passed to the callout function. For
1589       example:
1590
1591           re> /a(*MARK:X)bc/auto_callout
1592         data> abc
1593         --->abc
1594          +0 ^       a
1595          +1 ^^      (*MARK:X)
1596         +10 ^^      b
1597         Latest Mark: X
1598         +11 ^ ^     c
1599         +12 ^  ^
1600          0: abc
1601
1602       The mark changes between matching "a" and "b", but stays the  same  for
1603       the  rest  of  the match, so nothing more is output. If, as a result of
1604       backtracking, the mark reverts to being unset, the  text  "<unset>"  is
1605       output.
1606
1607   Callouts with string arguments
1608
1609       The output for a callout with a string argument is similar, except that
1610       instead of outputting a callout number before the position  indicators,
1611       the  callout  string  and  its  offset in the pattern string are output
1612       before the reflection of the subject string, and the subject string  is
1613       reflected for each callout. For example:
1614
1615           re> /^ab(?C'first')cd(?C"second")ef/
1616         data> abcdefg
1617         Callout (7): 'first'
1618         --->abcdefg
1619             ^ ^         c
1620         Callout (20): "second"
1621         --->abcdefg
1622             ^   ^       e
1623          0: abcdef
1624
1625
1626   Callout modifiers
1627
1628       The  callout  function in pcre2test returns zero (carry on matching) by
1629       default, but you can use a callout_fail modifier in a subject  line  to
1630       change this and other parameters of the callout (see below).
1631
1632       If the callout_capture modifier is set, the current captured groups are
1633       output when a callout occurs. This is useful only for non-DFA matching,
1634       as  pcre2_dfa_match()  does  not  support capturing, so no captures are
1635       ever shown.
1636
1637       The normal callout output, showing the callout number or pattern offset
1638       (as  described above) is suppressed if the callout_no_where modifier is
1639       set.
1640
1641       When using the interpretive  matching  function  pcre2_match()  without
1642       JIT,  setting  the callout_extra modifier causes additional output from
1643       pcre2test's callout function to be generated. For the first callout  in
1644       a  match  attempt at a new starting position in the subject, "New match
1645       attempt" is output. If there has been a backtrack since the last  call-
1646       out (or start of matching if this is the first callout), "Backtrack" is
1647       output, followed by "No other matching paths" if  the  backtrack  ended
1648       the previous match attempt. For example:
1649
1650          re> /(a+)b/auto_callout,no_start_optimize,no_auto_possess
1651         data> aac\=callout_extra
1652         New match attempt
1653         --->aac
1654          +0 ^       (
1655          +1 ^       a+
1656          +3 ^ ^     )
1657          +4 ^ ^     b
1658         Backtrack
1659         --->aac
1660          +3 ^^      )
1661          +4 ^^      b
1662         Backtrack
1663         No other matching paths
1664         New match attempt
1665         --->aac
1666          +0  ^      (
1667          +1  ^      a+
1668          +3  ^^     )
1669          +4  ^^     b
1670         Backtrack
1671         No other matching paths
1672         New match attempt
1673         --->aac
1674          +0   ^     (
1675          +1   ^     a+
1676         Backtrack
1677         No other matching paths
1678         New match attempt
1679         --->aac
1680          +0    ^    (
1681          +1    ^    a+
1682         No match
1683
1684       Notice  that  various  optimizations must be turned off if you want all
1685       possible matching paths to be  scanned.  If  no_start_optimize  is  not
1686       used,  there  is an immediate "no match", without any callouts, because
1687       the starting optimization fails to find "b" in the  subject,  which  it
1688       knows  must  be  present for any match. If no_auto_possess is not used,
1689       the "a+" item is turned into "a++", which reduces the number  of  back-
1690       tracks.
1691
1692       The  callout_extra modifier has no effect if used with the DFA matching
1693       function, or with JIT.
1694
1695   Return values from callouts
1696
1697       The default return from the callout  function  is  zero,  which  allows
1698       matching to continue. The callout_fail modifier can be given one or two
1699       numbers. If there is only one number, 1 is returned instead of 0 (caus-
1700       ing matching to backtrack) when a callout of that number is reached. If
1701       two numbers (<n>:<m>) are given, 1 is  returned  when  callout  <n>  is
1702       reached  and  there  have been at least <m> callouts. The callout_error
1703       modifier is similar, except that PCRE2_ERROR_CALLOUT is returned, caus-
1704       ing  the entire matching process to be aborted. If both these modifiers
1705       are set for the same callout number,  callout_error  takes  precedence.
1706       Note  that  callouts  with string arguments are always given the number
1707       zero.
1708
1709       The callout_data modifier can be given an unsigned or a  negative  num-
1710       ber.   This  is  set  as the "user data" that is passed to the matching
1711       function, and passed back when the callout  function  is  invoked.  Any
1712       value  other  than  zero  is  used as a return from pcre2test's callout
1713       function.
1714
1715       Inserting callouts can be helpful when using pcre2test to check compli-
1716       cated  regular expressions. For further information about callouts, see
1717       the pcre2callout documentation.
1718
1719
1720NON-PRINTING CHARACTERS
1721
1722       When pcre2test is outputting text in the compiled version of a pattern,
1723       bytes  other  than 32-126 are always treated as non-printing characters
1724       and are therefore shown as hex escapes.
1725
1726       When pcre2test is outputting text that is a matched part of  a  subject
1727       string,  it behaves in the same way, unless a different locale has been
1728       set for the pattern (using the locale  modifier).  In  this  case,  the
1729       isprint()  function  is  used  to distinguish printing and non-printing
1730       characters.
1731
1732
1733SAVING AND RESTORING COMPILED PATTERNS
1734
1735       It is possible to save compiled patterns  on  disc  or  elsewhere,  and
1736       reload them later, subject to a number of restrictions. JIT data cannot
1737       be saved. The host on which the patterns are reloaded must  be  running
1738       the same version of PCRE2, with the same code unit width, and must also
1739       have the same endianness, pointer width  and  PCRE2_SIZE  type.  Before
1740       compiled  patterns  can be saved they must be serialized, that is, con-
1741       verted to a stream of bytes. A single byte stream may contain any  num-
1742       ber  of  compiled  patterns,  but  they must all use the same character
1743       tables. A single copy of the tables is included in the byte stream (its
1744       size is 1088 bytes).
1745
1746       The  functions  whose  names  begin  with pcre2_serialize_ are used for
1747       serializing and de-serializing. They are described in the  pcre2serial-
1748       ize  documentation.  In  this  section  we  describe  the  features  of
1749       pcre2test that can be used to test these functions.
1750
1751       Note that "serialization" in PCRE2 does not convert  compiled  patterns
1752       to  an  abstract  format  like Java or .NET. It just makes a reloadable
1753       byte code stream.  Hence the restrictions on reloading mentioned above.
1754
1755       In pcre2test, when a pattern with push modifier  is  successfully  com-
1756       piled,  it  is  pushed onto a stack of compiled patterns, and pcre2test
1757       expects the next line to contain a new pattern (or command) instead  of
1758       a subject line. By contrast, the pushcopy modifier causes a copy of the
1759       compiled pattern to be stacked,  leaving  the  original  available  for
1760       immediate matching. By using push and/or pushcopy, a number of patterns
1761       can be compiled and retained. These  modifiers  are  incompatible  with
1762       posix, and control modifiers that act at match time are ignored (with a
1763       message) for the stacked patterns. The jitverify modifier applies  only
1764       at compile time.
1765
1766       The command
1767
1768         #save <filename>
1769
1770       causes all the stacked patterns to be serialized and the result written
1771       to the named file. Afterwards, all the stacked patterns are freed.  The
1772       command
1773
1774         #load <filename>
1775
1776       reads  the  data in the file, and then arranges for it to be de-serial-
1777       ized, with the resulting compiled patterns added to the pattern  stack.
1778       The  pattern  on the top of the stack can be retrieved by the #pop com-
1779       mand, which must be followed by  lines  of  subjects  that  are  to  be
1780       matched  with  the pattern, terminated as usual by an empty line or end
1781       of file. This command may be followed by  a  modifier  list  containing
1782       only  control  modifiers that act after a pattern has been compiled. In
1783       particular,  hex,  posix,  posix_nosub,  push,  and  pushcopy  are  not
1784       allowed,  nor are any option-setting modifiers.  The JIT modifiers are,
1785       however permitted. Here is an example that saves and reloads  two  pat-
1786       terns.
1787
1788         /abc/push
1789         /xyz/push
1790         #save tempfile
1791         #load tempfile
1792         #pop info
1793         xyz
1794
1795         #pop jit,bincode
1796         abc
1797
1798       If  jitverify  is  used with #pop, it does not automatically imply jit,
1799       which is different behaviour from when it is used on a pattern.
1800
1801       The #popcopy command is analagous to the pushcopy modifier in  that  it
1802       makes current a copy of the topmost stack pattern, leaving the original
1803       still on the stack.
1804
1805
1806SEE ALSO
1807
1808       pcre2(3),  pcre2api(3),  pcre2callout(3),  pcre2jit,  pcre2matching(3),
1809       pcre2partial(d), pcre2pattern(3), pcre2serialize(3).
1810
1811
1812AUTHOR
1813
1814       Philip Hazel
1815       University Computing Service
1816       Cambridge, England.
1817
1818
1819REVISION
1820
1821       Last updated: 21 July 2018
1822       Copyright (c) 1997-2018 University of Cambridge.
1823