1README file for PCRE2 (Perl-compatible regular expression library) 2------------------------------------------------------------------ 3 4PCRE2 is a re-working of the original PCRE library to provide an entirely new 5API. The latest release of PCRE2 is always available in three alternative 6formats from: 7 8 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.gz 9 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.tar.bz2 10 ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre/pcre2-xxx.zip 11 12There is a mailing list for discussion about the development of PCRE (both the 13original and new APIs) at pcre-dev@exim.org. You can access the archives and 14subscribe or manage your subscription here: 15 16 https://lists.exim.org/mailman/listinfo/pcre-dev 17 18Please read the NEWS file if you are upgrading from a previous release. The 19contents of this README file are: 20 21 The PCRE2 APIs 22 Documentation for PCRE2 23 Contributions by users of PCRE2 24 Building PCRE2 on non-Unix-like systems 25 Building PCRE2 without using autotools 26 Building PCRE2 using autotools 27 Retrieving configuration information 28 Shared libraries 29 Cross-compiling using autotools 30 Making new tarballs 31 Testing PCRE2 32 Character tables 33 File manifest 34 35 36The PCRE2 APIs 37-------------- 38 39PCRE2 is written in C, and it has its own API. There are three sets of 40functions, one for the 8-bit library, which processes strings of bytes, one for 41the 16-bit library, which processes strings of 16-bit values, and one for the 4232-bit library, which processes strings of 32-bit values. There are no C++ 43wrappers. 44 45The distribution does contain a set of C wrapper functions for the 8-bit 46library that are based on the POSIX regular expression API (see the pcre2posix 47man page). These can be found in a library called libpcre2-posix. Note that 48this just provides a POSIX calling interface to PCRE2; the regular expressions 49themselves still follow Perl syntax and semantics. The POSIX API is restricted, 50and does not give full access to all of PCRE2's facilities. 51 52The header file for the POSIX-style functions is called pcre2posix.h. The 53official POSIX name is regex.h, but I did not want to risk possible problems 54with existing files of that name by distributing it that way. To use PCRE2 with 55an existing program that uses the POSIX API, pcre2posix.h will have to be 56renamed or pointed at by a link. 57 58If you are using the POSIX interface to PCRE2 and there is already a POSIX 59regex library installed on your system, as well as worrying about the regex.h 60header file (as mentioned above), you must also take care when linking programs 61to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they 62may pick up the POSIX functions of the same name from the other library. 63 64One way of avoiding this confusion is to compile PCRE2 with the addition of 65-Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the 66compiler flags (CFLAGS if you are using "configure" -- see below). This has the 67effect of renaming the functions so that the names no longer clash. Of course, 68you have to do the same thing for your applications, or write them using the 69new names. 70 71 72Documentation for PCRE2 73----------------------- 74 75If you install PCRE2 in the normal way on a Unix-like system, you will end up 76with a set of man pages whose names all start with "pcre2". The one that is 77just called "pcre2" lists all the others. In addition to these man pages, the 78PCRE2 documentation is supplied in two other forms: 79 80 1. There are files called doc/pcre2.txt, doc/pcre2grep.txt, and 81 doc/pcre2test.txt in the source distribution. The first of these is a 82 concatenation of the text forms of all the section 3 man pages except the 83 listing of pcre2demo.c and those that summarize individual functions. The 84 other two are the text forms of the section 1 man pages for the pcre2grep 85 and pcre2test commands. These text forms are provided for ease of scanning 86 with text editors or similar tools. They are installed in 87 <prefix>/share/doc/pcre2, where <prefix> is the installation prefix 88 (defaulting to /usr/local). 89 90 2. A set of files containing all the documentation in HTML form, hyperlinked 91 in various ways, and rooted in a file called index.html, is distributed in 92 doc/html and installed in <prefix>/share/doc/pcre2/html. 93 94 95Building PCRE2 on non-Unix-like systems 96--------------------------------------- 97 98For a non-Unix-like system, please read the file NON-AUTOTOOLS-BUILD, though if 99your system supports the use of "configure" and "make" you may be able to build 100PCRE2 using autotools in the same way as for many Unix-like systems. 101 102PCRE2 can also be configured using CMake, which can be run in various ways 103(command line, GUI, etc). This creates Makefiles, solution files, etc. The file 104NON-AUTOTOOLS-BUILD has information about CMake. 105 106PCRE2 has been compiled on many different operating systems. It should be 107straightforward to build PCRE2 on any system that has a Standard C compiler and 108library, because it uses only Standard C functions. 109 110 111Building PCRE2 without using autotools 112-------------------------------------- 113 114The use of autotools (in particular, libtool) is problematic in some 115environments, even some that are Unix or Unix-like. See the NON-AUTOTOOLS-BUILD 116file for ways of building PCRE2 without using autotools. 117 118 119Building PCRE2 using autotools 120------------------------------ 121 122The following instructions assume the use of the widely used "configure; make; 123make install" (autotools) process. 124 125To build PCRE2 on system that supports autotools, first run the "configure" 126command from the PCRE2 distribution directory, with your current directory set 127to the directory where you want the files to be created. This command is a 128standard GNU "autoconf" configuration script, for which generic instructions 129are supplied in the file INSTALL. 130 131Most commonly, people build PCRE2 within its own distribution directory, and in 132this case, on many systems, just running "./configure" is sufficient. However, 133the usual methods of changing standard defaults are available. For example: 134 135CFLAGS='-O2 -Wall' ./configure --prefix=/opt/local 136 137This command specifies that the C compiler should be run with the flags '-O2 138-Wall' instead of the default, and that "make install" should install PCRE2 139under /opt/local instead of the default /usr/local. 140 141If you want to build in a different directory, just run "configure" with that 142directory as current. For example, suppose you have unpacked the PCRE2 source 143into /source/pcre2/pcre2-xxx, but you want to build it in 144/build/pcre2/pcre2-xxx: 145 146cd /build/pcre2/pcre2-xxx 147/source/pcre2/pcre2-xxx/configure 148 149PCRE2 is written in C and is normally compiled as a C library. However, it is 150possible to build it as a C++ library, though the provided building apparatus 151does not have any features to support this. 152 153There are some optional features that can be included or omitted from the PCRE2 154library. They are also documented in the pcre2build man page. 155 156. By default, both shared and static libraries are built. You can change this 157 by adding one of these options to the "configure" command: 158 159 --disable-shared 160 --disable-static 161 162 (See also "Shared libraries on Unix-like systems" below.) 163 164. By default, only the 8-bit library is built. If you add --enable-pcre2-16 to 165 the "configure" command, the 16-bit library is also built. If you add 166 --enable-pcre2-32 to the "configure" command, the 32-bit library is also 167 built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8 168 to disable building the 8-bit library. 169 170. If you want to include support for just-in-time (JIT) compiling, which can 171 give large performance improvements on certain platforms, add --enable-jit to 172 the "configure" command. This support is available only for certain hardware 173 architectures. If you try to enable it on an unsupported architecture, there 174 will be a compile time error. If in doubt, use --enable-jit=auto, which 175 enables JIT only if the current hardware is supported. 176 177. If you are enabling JIT under SELinux you may also want to add 178 --enable-jit-sealloc, which enables the use of an execmem allocator in JIT 179 that is compatible with SELinux. This has no effect if JIT is not enabled. 180 181. If you do not want to make use of the default support for UTF-8 Unicode 182 character strings in the 8-bit library, UTF-16 Unicode character strings in 183 the 16-bit library, or UTF-32 Unicode character strings in the 32-bit 184 library, you can add --disable-unicode to the "configure" command. This 185 reduces the size of the libraries. It is not possible to configure one 186 library with Unicode support, and another without, in the same configuration. 187 It is also not possible to use --enable-ebcdic (see below) with Unicode 188 support, so if this option is set, you must also use --disable-unicode. 189 190 When Unicode support is available, the use of a UTF encoding still has to be 191 enabled by setting the PCRE2_UTF option at run time or starting a pattern 192 with (*UTF). When PCRE2 is compiled with Unicode support, its input can only 193 either be ASCII or UTF-8/16/32, even when running on EBCDIC platforms. 194 195 As well as supporting UTF strings, Unicode support includes support for the 196 \P, \p, and \X sequences that recognize Unicode character properties. 197 However, only the basic two-letter properties such as Lu are supported. 198 Escape sequences such as \d and \w in patterns do not by default make use of 199 Unicode properties, but can be made to do so by setting the PCRE2_UCP option 200 or starting a pattern with (*UCP). 201 202. You can build PCRE2 to recognize either CR or LF or the sequence CRLF, or any 203 of the preceding, or any of the Unicode newline sequences, or the NUL (zero) 204 character as indicating the end of a line. Whatever you specify at build time 205 is the default; the caller of PCRE2 can change the selection at run time. The 206 default newline indicator is a single LF character (the Unix standard). You 207 can specify the default newline indicator by adding --enable-newline-is-cr, 208 --enable-newline-is-lf, --enable-newline-is-crlf, 209 --enable-newline-is-anycrlf, --enable-newline-is-any, or 210 --enable-newline-is-nul to the "configure" command, respectively. 211 212. By default, the sequence \R in a pattern matches any Unicode line ending 213 sequence. This is independent of the option specifying what PCRE2 considers 214 to be the end of a line (see above). However, the caller of PCRE2 can 215 restrict \R to match only CR, LF, or CRLF. You can make this the default by 216 adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R"). 217 218. In a pattern, the escape sequence \C matches a single code unit, even in a 219 UTF mode. This can be dangerous because it breaks up multi-code-unit 220 characters. You can build PCRE2 with the use of \C permanently locked out by 221 adding --enable-never-backslash-C (note the upper case C) to the "configure" 222 command. When \C is allowed by the library, individual applications can lock 223 it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option. 224 225. PCRE2 has a counter that limits the depth of nesting of parentheses in a 226 pattern. This limits the amount of system stack that a pattern uses when it 227 is compiled. The default is 250, but you can change it by setting, for 228 example, 229 230 --with-parens-nest-limit=500 231 232. PCRE2 has a counter that can be set to limit the amount of computing resource 233 it uses when matching a pattern. If the limit is exceeded during a match, the 234 match fails. The default is ten million. You can change the default by 235 setting, for example, 236 237 --with-match-limit=500000 238 239 on the "configure" command. This is just the default; individual calls to 240 pcre2_match() or pcre2_dfa_match() can supply their own value. There is more 241 discussion in the pcre2api man page (search for pcre2_set_match_limit). 242 243. There is a separate counter that limits the depth of nested backtracking 244 (pcre2_match()) or nested function calls (pcre2_dfa_match()) during a 245 matching process, which indirectly limits the amount of heap memory that is 246 used, and in the case of pcre2_dfa_match() the amount of stack as well. This 247 counter also has a default of ten million, which is essentially "unlimited". 248 You can change the default by setting, for example, 249 250 --with-match-limit-depth=5000 251 252 There is more discussion in the pcre2api man page (search for 253 pcre2_set_depth_limit). 254 255. You can also set an explicit limit on the amount of heap memory used by 256 the pcre2_match() and pcre2_dfa_match() interpreters: 257 258 --with-heap-limit=500 259 260 The units are kibibytes (units of 1024 bytes). This limit does not apply when 261 the JIT optimization (which has its own memory control features) is used. 262 There is more discussion on the pcre2api man page (search for 263 pcre2_set_heap_limit). 264 265. In the 8-bit library, the default maximum compiled pattern size is around 266 64 kibibytes. You can increase this by adding --with-link-size=3 to the 267 "configure" command. PCRE2 then uses three bytes instead of two for offsets 268 to different parts of the compiled pattern. In the 16-bit library, 269 --with-link-size=3 is the same as --with-link-size=4, which (in both 270 libraries) uses four-byte offsets. Increasing the internal link size reduces 271 performance in the 8-bit and 16-bit libraries. In the 32-bit library, the 272 link size setting is ignored, as 4-byte offsets are always used. 273 274. For speed, PCRE2 uses four tables for manipulating and identifying characters 275 whose code point values are less than 256. By default, it uses a set of 276 tables for ASCII encoding that is part of the distribution. If you specify 277 278 --enable-rebuild-chartables 279 280 a program called dftables is compiled and run in the default C locale when 281 you obey "make". It builds a source file called pcre2_chartables.c. If you do 282 not specify this option, pcre2_chartables.c is created as a copy of 283 pcre2_chartables.c.dist. See "Character tables" below for further 284 information. 285 286. It is possible to compile PCRE2 for use on systems that use EBCDIC as their 287 character code (as opposed to ASCII/Unicode) by specifying 288 289 --enable-ebcdic --disable-unicode 290 291 This automatically implies --enable-rebuild-chartables (see above). However, 292 when PCRE2 is built this way, it always operates in EBCDIC. It cannot support 293 both EBCDIC and UTF-8/16/32. There is a second option, --enable-ebcdic-nl25, 294 which specifies that the code value for the EBCDIC NL character is 0x25 295 instead of the default 0x15. 296 297. If you specify --enable-debug, additional debugging code is included in the 298 build. This option is intended for use by the PCRE2 maintainers. 299 300. In environments where valgrind is installed, if you specify 301 302 --enable-valgrind 303 304 PCRE2 will use valgrind annotations to mark certain memory regions as 305 unaddressable. This allows it to detect invalid memory accesses, and is 306 mostly useful for debugging PCRE2 itself. 307 308. In environments where the gcc compiler is used and lcov version 1.6 or above 309 is installed, if you specify 310 311 --enable-coverage 312 313 the build process implements a code coverage report for the test suite. The 314 report is generated by running "make coverage". If ccache is installed on 315 your system, it must be disabled when building PCRE2 for coverage reporting. 316 You can do this by setting the environment variable CCACHE_DISABLE=1 before 317 running "make" to build PCRE2. There is more information about coverage 318 reporting in the "pcre2build" documentation. 319 320. When JIT support is enabled, pcre2grep automatically makes use of it, unless 321 you add --disable-pcre2grep-jit to the "configure" command. 322 323. There is support for calling external programs during matching in the 324 pcre2grep command, using PCRE2's callout facility with string arguments. This 325 support can be disabled by adding --disable-pcre2grep-callout to the 326 "configure" command. 327 328. The pcre2grep program currently supports only 8-bit data files, and so 329 requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use 330 libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by 331 specifying one or both of 332 333 --enable-pcre2grep-libz 334 --enable-pcre2grep-libbz2 335 336 Of course, the relevant libraries must be installed on your system. 337 338. The default starting size (in bytes) of the internal buffer used by pcre2grep 339 can be set by, for example: 340 341 --with-pcre2grep-bufsize=51200 342 343 The value must be a plain integer. The default is 20480. The amount of memory 344 used by pcre2grep is actually three times this number, to allow for "before" 345 and "after" lines. If very long lines are encountered, the buffer is 346 automatically enlarged, up to a fixed maximum size. 347 348. The default maximum size of pcre2grep's internal buffer can be set by, for 349 example: 350 351 --with-pcre2grep-max-bufsize=2097152 352 353 The default is either 1048576 or the value of --with-pcre2grep-bufsize, 354 whichever is the larger. 355 356. It is possible to compile pcre2test so that it links with the libreadline 357 or libedit libraries, by specifying, respectively, 358 359 --enable-pcre2test-libreadline or --enable-pcre2test-libedit 360 361 If this is done, when pcre2test's input is from a terminal, it reads it using 362 the readline() function. This provides line-editing and history facilities. 363 Note that libreadline is GPL-licenced, so if you distribute a binary of 364 pcre2test linked in this way, there may be licensing issues. These can be 365 avoided by linking with libedit (which has a BSD licence) instead. 366 367 Enabling libreadline causes the -lreadline option to be added to the 368 pcre2test build. In many operating environments with a sytem-installed 369 readline library this is sufficient. However, in some environments (e.g. if 370 an unmodified distribution version of readline is in use), it may be 371 necessary to specify something like LIBS="-lncurses" as well. This is 372 because, to quote the readline INSTALL, "Readline uses the termcap functions, 373 but does not link with the termcap or curses library itself, allowing 374 applications which link with readline the to choose an appropriate library." 375 If you get error messages about missing functions tgetstr, tgetent, tputs, 376 tgetflag, or tgoto, this is the problem, and linking with the ncurses library 377 should fix it. 378 379. There is a special option called --enable-fuzz-support for use by people who 380 want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit 381 library. If set, it causes an extra library called libpcre2-fuzzsupport.a to 382 be built, but not installed. This contains a single function called 383 LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the 384 length of the string. When called, this function tries to compile the string 385 as a pattern, and if that succeeds, to match it. This is done both with no 386 options and with some random options bits that are generated from the string. 387 Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to 388 be created. This is normally run under valgrind or used when PCRE2 is 389 compiled with address sanitizing enabled. It calls the fuzzing function and 390 outputs information about it is doing. The input strings are specified by 391 arguments: if an argument starts with "=" the rest of it is a literal input 392 string. Otherwise, it is assumed to be a file name, and the contents of the 393 file are the test string. 394 395. Releases before 10.30 could be compiled with --disable-stack-for-recursion, 396 which caused pcre2_match() to use individual blocks on the heap for 397 backtracking instead of recursive function calls (which use the stack). This 398 is now obsolete since pcre2_match() was refactored always to use the heap (in 399 a much more efficient way than before). This option is retained for backwards 400 compatibility, but has no effect other than to output a warning. 401 402The "configure" script builds the following files for the basic C library: 403 404. Makefile the makefile that builds the library 405. src/config.h build-time configuration options for the library 406. src/pcre2.h the public PCRE2 header file 407. pcre2-config script that shows the building settings such as CFLAGS 408 that were set for "configure" 409. libpcre2-8.pc ) 410. libpcre2-16.pc ) data for the pkg-config command 411. libpcre2-32.pc ) 412. libpcre2-posix.pc ) 413. libtool script that builds shared and/or static libraries 414 415Versions of config.h and pcre2.h are distributed in the src directory of PCRE2 416tarballs under the names config.h.generic and pcre2.h.generic. These are 417provided for those who have to build PCRE2 without using "configure" or CMake. 418If you use "configure" or CMake, the .generic versions are not used. 419 420The "configure" script also creates config.status, which is an executable 421script that can be run to recreate the configuration, and config.log, which 422contains compiler output from tests that "configure" runs. 423 424Once "configure" has run, you can run "make". This builds whichever of the 425libraries libpcre2-8, libpcre2-16 and libpcre2-32 are configured, and a test 426program called pcre2test. If you enabled JIT support with --enable-jit, another 427test program called pcre2_jit_test is built as well. If the 8-bit library is 428built, libpcre2-posix and the pcre2grep command are also built. Running 429"make" with the -j option may speed up compilation on multiprocessor systems. 430 431The command "make check" runs all the appropriate tests. Details of the PCRE2 432tests are given below in a separate section of this document. The -j option of 433"make" can also be used when running the tests. 434 435You can use "make install" to install PCRE2 into live directories on your 436system. The following are installed (file names are all relative to the 437<prefix> that is set when "configure" is run): 438 439 Commands (bin): 440 pcre2test 441 pcre2grep (if 8-bit support is enabled) 442 pcre2-config 443 444 Libraries (lib): 445 libpcre2-8 (if 8-bit support is enabled) 446 libpcre2-16 (if 16-bit support is enabled) 447 libpcre2-32 (if 32-bit support is enabled) 448 libpcre2-posix (if 8-bit support is enabled) 449 450 Configuration information (lib/pkgconfig): 451 libpcre2-8.pc 452 libpcre2-16.pc 453 libpcre2-32.pc 454 libpcre2-posix.pc 455 456 Header files (include): 457 pcre2.h 458 pcre2posix.h 459 460 Man pages (share/man/man{1,3}): 461 pcre2grep.1 462 pcre2test.1 463 pcre2-config.1 464 pcre2.3 465 pcre2*.3 (lots more pages, all starting "pcre2") 466 467 HTML documentation (share/doc/pcre2/html): 468 index.html 469 *.html (lots more pages, hyperlinked from index.html) 470 471 Text file documentation (share/doc/pcre2): 472 AUTHORS 473 COPYING 474 ChangeLog 475 LICENCE 476 NEWS 477 README 478 pcre2.txt (a concatenation of the man(3) pages) 479 pcre2test.txt the pcre2test man page 480 pcre2grep.txt the pcre2grep man page 481 pcre2-config.txt the pcre2-config man page 482 483If you want to remove PCRE2 from your system, you can run "make uninstall". 484This removes all the files that "make install" installed. However, it does not 485remove any directories, because these are often shared with other programs. 486 487 488Retrieving configuration information 489------------------------------------ 490 491Running "make install" installs the command pcre2-config, which can be used to 492recall information about the PCRE2 configuration and installation. For example: 493 494 pcre2-config --version 495 496prints the version number, and 497 498 pcre2-config --libs8 499 500outputs information about where the 8-bit library is installed. This command 501can be included in makefiles for programs that use PCRE2, saving the programmer 502from having to remember too many details. Run pcre2-config with no arguments to 503obtain a list of possible arguments. 504 505The pkg-config command is another system for saving and retrieving information 506about installed libraries. Instead of separate commands for each library, a 507single command is used. For example: 508 509 pkg-config --libs libpcre2-16 510 511The data is held in *.pc files that are installed in a directory called 512<prefix>/lib/pkgconfig. 513 514 515Shared libraries 516---------------- 517 518The default distribution builds PCRE2 as shared libraries and static libraries, 519as long as the operating system supports shared libraries. Shared library 520support relies on the "libtool" script which is built as part of the 521"configure" process. 522 523The libtool script is used to compile and link both shared and static 524libraries. They are placed in a subdirectory called .libs when they are newly 525built. The programs pcre2test and pcre2grep are built to use these uninstalled 526libraries (by means of wrapper scripts in the case of shared libraries). When 527you use "make install" to install shared libraries, pcre2grep and pcre2test are 528automatically re-built to use the newly installed shared libraries before being 529installed themselves. However, the versions left in the build directory still 530use the uninstalled libraries. 531 532To build PCRE2 using static libraries only you must use --disable-shared when 533configuring it. For example: 534 535./configure --prefix=/usr/gnu --disable-shared 536 537Then run "make" in the usual way. Similarly, you can use --disable-static to 538build only shared libraries. 539 540 541Cross-compiling using autotools 542------------------------------- 543 544You can specify CC and CFLAGS in the normal way to the "configure" command, in 545order to cross-compile PCRE2 for some other host. However, you should NOT 546specify --enable-rebuild-chartables, because if you do, the dftables.c source 547file is compiled and run on the local host, in order to generate the inbuilt 548character tables (the pcre2_chartables.c file). This will probably not work, 549because dftables.c needs to be compiled with the local compiler, not the cross 550compiler. 551 552When --enable-rebuild-chartables is not specified, pcre2_chartables.c is 553created by making a copy of pcre2_chartables.c.dist, which is a default set of 554tables that assumes ASCII code. Cross-compiling with the default tables should 555not be a problem. 556 557If you need to modify the character tables when cross-compiling, you should 558move pcre2_chartables.c.dist out of the way, then compile dftables.c by hand 559and run it on the local host to make a new version of pcre2_chartables.c.dist. 560Then when you cross-compile PCRE2 this new version of the tables will be used. 561 562 563Making new tarballs 564------------------- 565 566The command "make dist" creates three PCRE2 tarballs, in tar.gz, tar.bz2, and 567zip formats. The command "make distcheck" does the same, but then does a trial 568build of the new distribution to ensure that it works. 569 570If you have modified any of the man page sources in the doc directory, you 571should first run the PrepareRelease script before making a distribution. This 572script creates the .txt and HTML forms of the documentation from the man pages. 573 574 575Testing PCRE2 576------------- 577 578To test the basic PCRE2 library on a Unix-like system, run the RunTest script. 579There is another script called RunGrepTest that tests the pcre2grep command. 580When JIT support is enabled, a third test program called pcre2_jit_test is 581built. Both the scripts and all the program tests are run if you obey "make 582check". For other environments, see the instructions in NON-AUTOTOOLS-BUILD. 583 584The RunTest script runs the pcre2test test program (which is documented in its 585own man page) on each of the relevant testinput files in the testdata 586directory, and compares the output with the contents of the corresponding 587testoutput files. RunTest uses a file called testtry to hold the main output 588from pcre2test. Other files whose names begin with "test" are used as working 589files in some tests. 590 591Some tests are relevant only when certain build-time options were selected. For 592example, the tests for UTF-8/16/32 features are run only when Unicode support 593is available. RunTest outputs a comment when it skips a test. 594 595Many (but not all) of the tests that are not skipped are run twice if JIT 596support is available. On the second run, JIT compilation is forced. This 597testing can be suppressed by putting "nojit" on the RunTest command line. 598 599The entire set of tests is run once for each of the 8-bit, 16-bit and 32-bit 600libraries that are enabled. If you want to run just one set of tests, call 601RunTest with either the -8, -16 or -32 option. 602 603If valgrind is installed, you can run the tests under it by putting "valgrind" 604on the RunTest command line. To run pcre2test on just one or more specific test 605files, give their numbers as arguments to RunTest, for example: 606 607 RunTest 2 7 11 608 609You can also specify ranges of tests such as 3-6 or 3- (meaning 3 to the 610end), or a number preceded by ~ to exclude a test. For example: 611 612 Runtest 3-15 ~10 613 614This runs tests 3 to 15, excluding test 10, and just ~13 runs all the tests 615except test 13. Whatever order the arguments are in, the tests are always run 616in numerical order. 617 618You can also call RunTest with the single argument "list" to cause it to output 619a list of tests. 620 621The test sequence starts with "test 0", which is a special test that has no 622input file, and whose output is not checked. This is because it will be 623different on different hardware and with different configurations. The test 624exists in order to exercise some of pcre2test's code that would not otherwise 625be run. 626 627Tests 1 and 2 can always be run, as they expect only plain text strings (not 628UTF) and make no use of Unicode properties. The first test file can be fed 629directly into the perltest.sh script to check that Perl gives the same results. 630The only difference you should see is in the first few lines, where the Perl 631version is given instead of the PCRE2 version. The second set of tests check 632auxiliary functions, error detection, and run-time flags that are specific to 633PCRE2. It also uses the debugging flags to check some of the internals of 634pcre2_compile(). 635 636If you build PCRE2 with a locale setting that is not the standard C locale, the 637character tables may be different (see next paragraph). In some cases, this may 638cause failures in the second set of tests. For example, in a locale where the 639isprint() function yields TRUE for characters in the range 128-255, the use of 640[:isascii:] inside a character class defines a different set of characters, and 641this shows up in this test as a difference in the compiled code, which is being 642listed for checking. For example, where the comparison test output contains 643[\x00-\x7f] the test might contain [\x00-\xff], and similarly in some other 644cases. This is not a bug in PCRE2. 645 646Test 3 checks pcre2_maketables(), the facility for building a set of character 647tables for a specific locale and using them instead of the default tables. The 648script uses the "locale" command to check for the availability of the "fr_FR", 649"french", or "fr" locale, and uses the first one that it finds. If the "locale" 650command fails, or if its output doesn't include "fr_FR", "french", or "fr" in 651the list of available locales, the third test cannot be run, and a comment is 652output to say why. If running this test produces an error like this: 653 654 ** Failed to set locale "fr_FR" 655 656it means that the given locale is not available on your system, despite being 657listed by "locale". This does not mean that PCRE2 is broken. There are three 658alternative output files for the third test, because three different versions 659of the French locale have been encountered. The test passes if its output 660matches any one of them. 661 662Tests 4 and 5 check UTF and Unicode property support, test 4 being compatible 663with the perltest.sh script, and test 5 checking PCRE2-specific things. 664 665Tests 6 and 7 check the pcre2_dfa_match() alternative matching function, in 666non-UTF mode and UTF-mode with Unicode property support, respectively. 667 668Test 8 checks some internal offsets and code size features, but it is run only 669when Unicode support is enabled. The output is different in 8-bit, 16-bit, and 67032-bit modes and for different link sizes, so there are different output files 671for each mode and link size. 672 673Tests 9 and 10 are run only in 8-bit mode, and tests 11 and 12 are run only in 67416-bit and 32-bit modes. These are tests that generate different output in 6758-bit mode. Each pair are for general cases and Unicode support, respectively. 676 677Test 13 checks the handling of non-UTF characters greater than 255 by 678pcre2_dfa_match() in 16-bit and 32-bit modes. 679 680Test 14 contains some special UTF and UCP tests that give different output for 681different code unit widths. 682 683Test 15 contains a number of tests that must not be run with JIT. They check, 684among other non-JIT things, the match-limiting features of the intepretive 685matcher. 686 687Test 16 is run only when JIT support is not available. It checks that an 688attempt to use JIT has the expected behaviour. 689 690Test 17 is run only when JIT support is available. It checks JIT complete and 691partial modes, match-limiting under JIT, and other JIT-specific features. 692 693Tests 18 and 19 are run only in 8-bit mode. They check the POSIX interface to 694the 8-bit library, without and with Unicode support, respectively. 695 696Test 20 checks the serialization functions by writing a set of compiled 697patterns to a file, and then reloading and checking them. 698 699Tests 21 and 22 test \C support when the use of \C is not locked out, without 700and with UTF support, respectively. Test 23 tests \C when it is locked out. 701 702Tests 24 and 25 test the experimental pattern conversion functions, without and 703with UTF support, respectively. 704 705 706Character tables 707---------------- 708 709For speed, PCRE2 uses four tables for manipulating and identifying characters 710whose code point values are less than 256. By default, a set of tables that is 711built into the library is used. The pcre2_maketables() function can be called 712by an application to create a new set of tables in the current locale. This are 713passed to PCRE2 by calling pcre2_set_character_tables() to put a pointer into a 714compile context. 715 716The source file called pcre2_chartables.c contains the default set of tables. 717By default, this is created as a copy of pcre2_chartables.c.dist, which 718contains tables for ASCII coding. However, if --enable-rebuild-chartables is 719specified for ./configure, a different version of pcre2_chartables.c is built 720by the program dftables (compiled from dftables.c), which uses the ANSI C 721character handling functions such as isalnum(), isalpha(), isupper(), 722islower(), etc. to build the table sources. This means that the default C 723locale that is set for your system will control the contents of these default 724tables. You can change the default tables by editing pcre2_chartables.c and 725then re-building PCRE2. If you do this, you should take care to ensure that the 726file does not get automatically re-generated. The best way to do this is to 727move pcre2_chartables.c.dist out of the way and replace it with your customized 728tables. 729 730When the dftables program is run as a result of --enable-rebuild-chartables, 731it uses the default C locale that is set on your system. It does not pay 732attention to the LC_xxx environment variables. In other words, it uses the 733system's default locale rather than whatever the compiling user happens to have 734set. If you really do want to build a source set of character tables in a 735locale that is specified by the LC_xxx variables, you can run the dftables 736program by hand with the -L option. For example: 737 738 ./dftables -L pcre2_chartables.c.special 739 740The first two 256-byte tables provide lower casing and case flipping functions, 741respectively. The next table consists of three 32-byte bit maps which identify 742digits, "word" characters, and white space, respectively. These are used when 743building 32-byte bit maps that represent character classes for code points less 744than 256. The final 256-byte table has bits indicating various character types, 745as follows: 746 747 1 white space character 748 2 letter 749 4 decimal digit 750 8 hexadecimal digit 751 16 alphanumeric or '_' 752 128 regular expression metacharacter or binary zero 753 754You should not alter the set of characters that contain the 128 bit, as that 755will cause PCRE2 to malfunction. 756 757 758File manifest 759------------- 760 761The distribution should contain the files listed below. 762 763(A) Source files for the PCRE2 library functions and their headers are found in 764 the src directory: 765 766 src/dftables.c auxiliary program for building pcre2_chartables.c 767 when --enable-rebuild-chartables is specified 768 769 src/pcre2_chartables.c.dist a default set of character tables that assume 770 ASCII coding; unless --enable-rebuild-chartables is 771 specified, used by copying to pcre2_chartables.c 772 773 src/pcre2posix.c ) 774 src/pcre2_auto_possess.c ) 775 src/pcre2_compile.c ) 776 src/pcre2_config.c ) 777 src/pcre2_context.c ) 778 src/pcre2_convert.c ) 779 src/pcre2_dfa_match.c ) 780 src/pcre2_error.c ) 781 src/pcre2_extuni.c ) 782 src/pcre2_find_bracket.c ) 783 src/pcre2_jit_compile.c ) 784 src/pcre2_jit_match.c ) sources for the functions in the library, 785 src/pcre2_jit_misc.c ) and some internal functions that they use 786 src/pcre2_maketables.c ) 787 src/pcre2_match.c ) 788 src/pcre2_match_data.c ) 789 src/pcre2_newline.c ) 790 src/pcre2_ord2utf.c ) 791 src/pcre2_pattern_info.c ) 792 src/pcre2_serialize.c ) 793 src/pcre2_string_utils.c ) 794 src/pcre2_study.c ) 795 src/pcre2_substitute.c ) 796 src/pcre2_substring.c ) 797 src/pcre2_tables.c ) 798 src/pcre2_ucd.c ) 799 src/pcre2_valid_utf.c ) 800 src/pcre2_xclass.c ) 801 802 src/pcre2_printint.c debugging function that is used by pcre2test, 803 src/pcre2_fuzzsupport.c function for (optional) fuzzing support 804 805 src/config.h.in template for config.h, when built by "configure" 806 src/pcre2.h.in template for pcre2.h when built by "configure" 807 src/pcre2posix.h header for the external POSIX wrapper API 808 src/pcre2_internal.h header for internal use 809 src/pcre2_intmodedep.h a mode-specific internal header 810 src/pcre2_ucp.h header for Unicode property handling 811 812 sljit/* source files for the JIT compiler 813 814(B) Source files for programs that use PCRE2: 815 816 src/pcre2demo.c simple demonstration of coding calls to PCRE2 817 src/pcre2grep.c source of a grep utility that uses PCRE2 818 src/pcre2test.c comprehensive test program 819 src/pcre2_jit_test.c JIT test program 820 821(C) Auxiliary files: 822 823 132html script to turn "man" pages into HTML 824 AUTHORS information about the author of PCRE2 825 ChangeLog log of changes to the code 826 CleanTxt script to clean nroff output for txt man pages 827 Detrail script to remove trailing spaces 828 HACKING some notes about the internals of PCRE2 829 INSTALL generic installation instructions 830 LICENCE conditions for the use of PCRE2 831 COPYING the same, using GNU's standard name 832 Makefile.in ) template for Unix Makefile, which is built by 833 ) "configure" 834 Makefile.am ) the automake input that was used to create 835 ) Makefile.in 836 NEWS important changes in this release 837 NON-AUTOTOOLS-BUILD notes on building PCRE2 without using autotools 838 PrepareRelease script to make preparations for "make dist" 839 README this file 840 RunTest a Unix shell script for running tests 841 RunGrepTest a Unix shell script for pcre2grep tests 842 aclocal.m4 m4 macros (generated by "aclocal") 843 config.guess ) files used by libtool, 844 config.sub ) used only when building a shared library 845 configure a configuring shell script (built by autoconf) 846 configure.ac ) the autoconf input that was used to build 847 ) "configure" and config.h 848 depcomp ) script to find program dependencies, generated by 849 ) automake 850 doc/*.3 man page sources for PCRE2 851 doc/*.1 man page sources for pcre2grep and pcre2test 852 doc/index.html.src the base HTML page 853 doc/html/* HTML documentation 854 doc/pcre2.txt plain text version of the man pages 855 doc/pcre2test.txt plain text documentation of test program 856 install-sh a shell script for installing files 857 libpcre2-8.pc.in template for libpcre2-8.pc for pkg-config 858 libpcre2-16.pc.in template for libpcre2-16.pc for pkg-config 859 libpcre2-32.pc.in template for libpcre2-32.pc for pkg-config 860 libpcre2-posix.pc.in template for libpcre2-posix.pc for pkg-config 861 ltmain.sh file used to build a libtool script 862 missing ) common stub for a few missing GNU programs while 863 ) installing, generated by automake 864 mkinstalldirs script for making install directories 865 perltest.sh Script for running a Perl test program 866 pcre2-config.in source of script which retains PCRE2 information 867 testdata/testinput* test data for main library tests 868 testdata/testoutput* expected test results 869 testdata/grep* input and output for pcre2grep tests 870 testdata/* other supporting test files 871 872(D) Auxiliary files for cmake support 873 874 cmake/COPYING-CMAKE-SCRIPTS 875 cmake/FindPackageHandleStandardArgs.cmake 876 cmake/FindEditline.cmake 877 cmake/FindReadline.cmake 878 CMakeLists.txt 879 config-cmake.h.in 880 881(E) Auxiliary files for building PCRE2 "by hand" 882 883 src/pcre2.h.generic ) a version of the public PCRE2 header file 884 ) for use in non-"configure" environments 885 src/config.h.generic ) a version of config.h for use in non-"configure" 886 ) environments 887 888Philip Hazel 889Email local part: ph10 890Email domain: cam.ac.uk 891Last updated: 17 June 2018 892