1Change Log for PCRE2
2--------------------
3
4
5Version 10.22 29-July-2016
6--------------------------
7
81. Applied Jason Hood's patches to RunTest.bat and testdata/wintestoutput3
9to fix problems with running the tests under Windows.
10
112. Implemented a facility for quoting literal characters within hexadecimal
12patterns in pcre2test, to make it easier to create patterns with just a few
13non-printing characters.
14
153. Binary zeros are not supported in pcre2test input files. It now detects them
16and gives an error.
17
184. Updated the valgrind parameters in RunTest: (a) changed smc-check=all to
19smc-check=all-non-file; (b) changed obj:* in the suppression file to obj:??? so
20that it matches only unknown objects.
21
225. Updated the maintenance script maint/ManyConfigTests to make it easier to
23select individual groups of tests.
24
256. When the POSIX wrapper function regcomp() is called, the REG_NOSUB option
26used to set PCRE2_NO_AUTO_CAPTURE when calling pcre2_compile(). However, this
27disables the use of back references (and subroutine calls), which are supported
28by other implementations of regcomp() with RE_NOSUB. Therefore, REG_NOSUB no
29longer causes PCRE2_NO_AUTO_CAPTURE to be set, though it still ignores nmatch
30and pmatch when regexec() is called.
31
327. Because of 6 above, pcre2test has been modified with a new modifier called
33posix_nosub, to call regcomp() with REG_NOSUB. Previously the no_auto_capture
34modifier had this effect. That option is now ignored when the POSIX API is in
35use.
36
378. Minor tidies to the pcre2demo.c sample program, including more comments
38about its 8-bit-ness.
39
409. Detect unmatched closing parentheses and give the error in the pre-scan
41instead of later. Previously the pre-scan carried on and could give a
42misleading incorrect error message. For example, /(?J)(?'a'))(?'a')/ gave a
43message about invalid duplicate group names.
44
4510. It has happened that pcre2test was accidentally linked with another POSIX
46regex library instead of libpcre2-posix. In this situation, a call to regcomp()
47(in the other library) may succeed, returning zero, but of course putting its
48own data into the regex_t block. In one example the re_pcre2_code field was
49left as NULL, which made pcre2test think it had not got a compiled POSIX regex,
50so it treated the next line as another pattern line, resulting in a confusing
51error message. A check has been added to pcre2test to see if the data returned
52from a successful call of regcomp() are valid for PCRE2's regcomp(). If they
53are not, an error message is output and the pcre2test run is abandoned. The
54message points out the possibility of a mis-linking. Hopefully this will avoid
55some head-scratching the next time this happens.
56
5711. A pattern such as /(?<=((?C)0))/, which has a callout inside a lookbehind
58assertion, caused pcre2test to output a very large number of spaces when the
59callout was taken, making the program appearing to loop.
60
6112. A pattern that included (*ACCEPT) in the middle of a sufficiently deeply
62nested set of parentheses of sufficient size caused an overflow of the
63compiling workspace (which was diagnosed, but of course is not desirable).
64
6513. Detect missing closing parentheses during the pre-pass for group
66identification.
67
6814. Changed some integer variable types and put in a number of casts, following
69a report of compiler warnings from Visual Studio 2013 and a few tests with
70gcc's -Wconversion (which still throws up a lot).
71
7215. Implemented pcre2_code_copy(), and added pushcopy and #popcopy to pcre2test
73for testing it.
74
7516. Change 66 for 10.21 introduced the use of snprintf() in PCRE2's version of
76regerror(). When the error buffer is too small, my version of snprintf() puts a
77binary zero in the final byte. Bug #1801 seems to show that other versions do
78not do this, leading to bad output from pcre2test when it was checking for
79buffer overflow. It no longer assumes a binary zero at the end of a too-small
80regerror() buffer.
81
8217. Fixed typo ("&&" for "&") in pcre2_study(). Fortunately, this could not
83actually affect anything, by sheer luck.
84
8518. Two minor fixes for MSVC compilation: (a) removal of apparently incorrect
86"const" qualifiers in pcre2test and (b) defining snprintf as _snprintf for
87older MSVC compilers. This has been done both in src/pcre2_internal.h for most
88of the library, and also in src/pcre2posix.c, which no longer includes
89pcre2_internal.h (see 24 below).
90
9119. Applied Chris Wilson's patch (Bugzilla #1681) to CMakeLists.txt for MSVC
92static compilation. Subsequently applied Chris Wilson's second patch, putting
93the first patch under a new option instead of being unconditional when
94PCRE_STATIC is set.
95
9620. Updated pcre2grep to set stdout as binary when run under Windows, so as not
97to convert \r\n at the ends of reflected lines into \r\r\n. This required
98ensuring that other output that is written to stdout (e.g. file names) uses the
99appropriate line terminator: \r\n for Windows, \n otherwise.
100
10121. When a line is too long for pcre2grep's internal buffer, show the maximum
102length in the error message.
103
10422. Added support for string callouts to pcre2grep (Zoltan's patch with PH
105additions).
106
10723. RunTest.bat was missing a "set type" line for test 22.
108
10924. The pcre2posix.c file was including pcre2_internal.h, and using some
110"private" knowledge of the data structures. This is unnecessary; the code has
111been re-factored and no longer includes pcre2_internal.h.
112
11325. A racing condition is fixed in JIT reported by Mozilla.
114
11526. Minor code refactor to avoid "array subscript is below array bounds"
116compiler warning.
117
11827. Minor code refactor to avoid "left shift of negative number" warning.
119
12028. Add a bit more sanity checking to pcre2_serialize_decode() and document
121that it expects trusted data.
122
12329. Fix typo in pcre2_jit_test.c
124
12530. Due to an oversight, pcre2grep was not making use of JIT when available.
126This is now fixed.
127
12831. The RunGrepTest script is updated to use the valgrind suppressions file
129when testing with JIT under valgrind (compare 10.21/51 below). The suppressions
130file is updated so that is now the same as for PCRE1: it suppresses the
131Memcheck warnings Addr16 and Cond in unknown objects (that is, JIT-compiled
132code). Also changed smc-check=all to smc-check=all-non-file as was done for
133RunTest (see 4 above).
134
13532. Implemented the PCRE2_NO_JIT option for pcre2_match().
136
13733. Fix typo that gave a compiler error when JIT not supported.
138
13934. Fix comment describing the returns from find_fixedlength().
140
14135. Fix potential negative index in pcre2test.
142
14336. Calls to pcre2_get_error_message() with error numbers that are never
144returned by PCRE2 functions were returning empty strings. Now the error code
145PCRE2_ERROR_BADDATA is returned. A facility has been added to pcre2test to
146show the texts for given error numbers (i.e. to call pcre2_get_error_message()
147and display what it returns) and a few representative error codes are now
148checked in RunTest.
149
15037. Added "&& !defined(__INTEL_COMPILER)" to the test for __GNUC__ in
151pcre2_match.c, in anticipation that this is needed for the same reason it was
152recently added to pcrecpp.cc in PCRE1.
153
15438. Using -o with -M in pcre2grep could cause unnecessary repeated output when
155the match extended over a line boundary, as it tried to find more matches "on
156the same line" - but it was already over the end.
157
15839. Allow \C in lookbehinds and DFA matching in UTF-32 mode (by converting it
159to the same code as '.' when PCRE2_DOTALL is set).
160
16140. Fix two clang compiler warnings in pcre2test when only one code unit width
162is supported.
163
16441. Upgrade RunTest to automatically re-run test 2 with a large (64M) stack if
165it fails when running the interpreter with a 16M stack (and if changing the
166stack size via pcre2test is possible). This avoids having to manually set a
167large stack size when testing with clang.
168
16942. Fix register overwite in JIT when SSE2 acceleration is enabled.
170
17143. Detect integer overflow in pcre2test pattern and data repetition counts.
172
17344. In pcre2test, ignore "allcaptures" after DFA matching.
174
17545. Fix unaligned accesses on x86. Patch by Marc Mutz.
176
17746. Fix some more clang compiler warnings.
178
179
180Version 10.21 12-January-2016
181-----------------------------
182
1831. Improve matching speed of patterns starting with + or * in JIT.
184
1852. Use memchr() to find the first character in an unanchored match in 8-bit
186mode in the interpreter. This gives a significant speed improvement.
187
1883. Removed a redundant copy of the opcode_possessify table in the
189pcre2_auto_possessify.c source.
190
1914. Fix typos in dftables.c for z/OS.
192
1935. Change 36 for 10.20 broke the handling of [[:>:]] and [[:<:]] in that
194processing them could involve a buffer overflow if the following character was
195an opening parenthesis.
196
1976. Change 36 for 10.20 also introduced a bug in processing this pattern:
198/((?x)(*:0))#(?'/. Specifically: if a setting of (?x) was followed by a (*MARK)
199setting (which (*:0) is), then (?x) did not get unset at the end of its group
200during the scan for named groups, and hence the external # was incorrectly
201treated as a comment and the invalid (?' at the end of the pattern was not
202diagnosed. This caused a buffer overflow during the real compile. This bug was
203discovered by Karl Skomski with the LLVM fuzzer.
204
2057. Moved the pcre2_find_bracket() function from src/pcre2_compile.c into its
206own source module to avoid a circular dependency between src/pcre2_compile.c
207and src/pcre2_study.c
208
2098. A callout with a string argument containing an opening square bracket, for
210example /(?C$[$)(?<]/, was incorrectly processed and could provoke a buffer
211overflow. This bug was discovered by Karl Skomski with the LLVM fuzzer.
212
2139. The handling of callouts during the pre-pass for named group identification
214has been tightened up.
215
21610. The quantifier {1} can be ignored, whether greedy, non-greedy, or
217possessive. This is a very minor optimization.
218
21911. A possessively repeated conditional group that could match an empty string,
220for example, /(?(R))*+/, was incorrectly compiled.
221
22212. The Unicode tables have been updated to Unicode 8.0.0 (thanks to Christian
223Persch).
224
22513. An empty comment (?#) in a pattern was incorrectly processed and could
226provoke a buffer overflow. This bug was discovered by Karl Skomski with the
227LLVM fuzzer.
228
22914. Fix infinite recursion in the JIT compiler when certain patterns such as
230/(?:|a|){100}x/ are analysed.
231
23215. Some patterns with character classes involving [: and \\ were incorrectly
233compiled and could cause reading from uninitialized memory or an incorrect
234error diagnosis. Examples are: /[[:\\](?<[::]/ and /[[:\\](?'abc')[a:]. The
235first of these bugs was discovered by Karl Skomski with the LLVM fuzzer.
236
23716. Pathological patterns containing many nested occurrences of [: caused
238pcre2_compile() to run for a very long time. This bug was found by the LLVM
239fuzzer.
240
24117. A missing closing parenthesis for a callout with a string argument was not
242being diagnosed, possibly leading to a buffer overflow. This bug was found by
243the LLVM fuzzer.
244
24518. A conditional group with only one branch has an implicit empty alternative
246branch and must therefore be treated as potentially matching an empty string.
247
24819. If (?R was followed by - or + incorrect behaviour happened instead of a
249diagnostic. This bug was discovered by Karl Skomski with the LLVM fuzzer.
250
25120. Another bug that was introduced by change 36 for 10.20: conditional groups
252whose condition was an assertion preceded by an explicit callout with a string
253argument might be incorrectly processed, especially if the string contained \Q.
254This bug was discovered by Karl Skomski with the LLVM fuzzer.
255
25621. Compiling PCRE2 with the sanitize options of clang showed up a number of
257very pedantic coding infelicities and a buffer overflow while checking a UTF-8
258string if the final multi-byte UTF-8 character was truncated.
259
26022. For Perl compatibility in EBCDIC environments, ranges such as a-z in a
261class, where both values are literal letters in the same case, omit the
262non-letter EBCDIC code points within the range.
263
26423. Finding the minimum matching length of complex patterns with back
265references and/or recursions can take a long time. There is now a cut-off that
266gives up trying to find a minimum length when things get too complex.
267
26824. An optimization has been added that speeds up finding the minimum matching
269length for patterns containing repeated capturing groups or recursions.
270
27125. If a pattern contained a back reference to a group whose number was
272duplicated as a result of appearing in a (?|...) group, the computation of the
273minimum matching length gave a wrong result, which could cause incorrect "no
274match" errors. For such patterns, a minimum matching length cannot at present
275be computed.
276
27726. Added a check for integer overflow in conditions (?(<digits>) and
278(?(R<digits>). This omission was discovered by Karl Skomski with the LLVM
279fuzzer.
280
28127. Fixed an issue when \p{Any} inside an xclass did not read the current
282character.
283
28428. If pcre2grep was given the -q option with -c or -l, or when handling a
285binary file, it incorrectly wrote output to stdout.
286
28729. The JIT compiler did not restore the control verb head in case of *THEN
288control verbs. This issue was found by Karl Skomski with a custom LLVM fuzzer.
289
29030. The way recursive references such as (?3) are compiled has been re-written
291because the old way was the cause of many issues. Now, conversion of the group
292number into a pattern offset does not happen until the pattern has been
293completely compiled. This does mean that detection of all infinitely looping
294recursions is postponed till match time. In the past, some easy ones were
295detected at compile time. This re-writing was done in response to yet another
296bug found by the LLVM fuzzer.
297
29831. A test for a back reference to a non-existent group was missing for items
299such as \987. This caused incorrect code to be compiled. This issue was found
300by Karl Skomski with a custom LLVM fuzzer.
301
30232. Error messages for syntax errors following \g and \k were giving inaccurate
303offsets in the pattern.
304
30533. Improve the performance of starting single character repetitions in JIT.
306
30734. (*LIMIT_MATCH=) now gives an error instead of setting the value to 0.
308
30935. Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now
310give the right offset instead of zero.
311
31236. The JIT compiler should not check repeats after a {0,1} repeat byte code.
313This issue was found by Karl Skomski with a custom LLVM fuzzer.
314
31537. The JIT compiler should restore the control chain for empty possessive
316repeats. This issue was found by Karl Skomski with a custom LLVM fuzzer.
317
31838. A bug which was introduced by the single character repetition optimization
319was fixed.
320
32139. Match limit check added to recursion. This issue was found by Karl Skomski
322with a custom LLVM fuzzer.
323
32440. Arrange for the UTF check in pcre2_match() and pcre2_dfa_match() to look
325only at the part of the subject that is relevant when the starting offset is
326non-zero.
327
32841. Improve first character match in JIT with SSE2 on x86.
329
33042. Fix two assertion fails in JIT. These issues were found by Karl Skomski
331with a custom LLVM fuzzer.
332
33343. Correct the setting of CMAKE_C_FLAGS in CMakeLists.txt (patch from Roy Ivy
334III).
335
33644. Fix bug in RunTest.bat for new test 14, and adjust the script for the added
337test (there are now 20 in total).
338
33945. Fixed a corner case of range optimization in JIT.
340
34146. Add the ${*MARK} facility to pcre2_substitute().
342
34347. Modifier lists in pcre2test were splitting at spaces without the required
344commas.
345
34648. Implemented PCRE2_ALT_VERBNAMES.
347
34849. Fixed two issues in JIT. These were found by Karl Skomski with a custom
349LLVM fuzzer.
350
35150. The pcre2test program has been extended by adding the #newline_default
352command. This has made it possible to run the standard tests when PCRE2 is
353compiled with either CR or CRLF as the default newline convention. As part of
354this work, the new command was added to several test files and the testing
355scripts were modified. The pcre2grep tests can now also be run when there is no
356LF in the default newline convention.
357
35851. The RunTest script has been modified so that, when JIT is used and valgrind
359is specified, a valgrind suppressions file is set up to ignore "Invalid read of
360size 16" errors because these are false positives when the hardware supports
361the SSE2 instruction set.
362
36352. It is now possible to have comment lines amid the subject strings in
364pcre2test (and perltest.sh) input.
365
36653. Implemented PCRE2_USE_OFFSET_LIMIT and pcre2_set_offset_limit().
367
36854. Add the null_context modifier to pcre2test so that calling pcre2_compile()
369and the matching functions with NULL contexts can be tested.
370
37155. Implemented PCRE2_SUBSTITUTE_EXTENDED.
372
37356. In a character class such as [\W\p{Any}] where both a negative-type escape
374("not a word character") and a property escape were present, the property
375escape was being ignored.
376
37757. Fixed integer overflow for patterns whose minimum matching length is very,
378very large.
379
38058. Implemented --never-backslash-C.
381
38259. Change 55 above introduced a bug by which certain patterns provoked the
383erroneous error "\ at end of pattern".
384
38560. The special sequences [[:<:]] and [[:>:]] gave rise to incorrect compiling
386errors or other strange effects if compiled in UCP mode. Found with libFuzzer
387and AddressSanitizer.
388
38961. Whitespace at the end of a pcre2test pattern line caused a spurious error
390message if there were only single-character modifiers. It should be ignored.
391
39262. The use of PCRE2_NO_AUTO_CAPTURE could cause incorrect compilation results
393or segmentation errors for some patterns. Found with libFuzzer and
394AddressSanitizer.
395
39663. Very long names in (*MARK) or (*THEN) etc. items could provoke a buffer
397overflow.
398
39964. Improve error message for overly-complicated patterns.
400
40165. Implemented an optional replication feature for patterns in pcre2test, to
402make it easier to test long repetitive patterns. The tests for 63 above are
403converted to use the new feature.
404
40566. In the POSIX wrapper, if regerror() was given too small a buffer, it could
406misbehave.
407
40867. In pcre2_substitute() in UTF mode, the UTF validity check on the
409replacement string was happening before the length setting when the replacement
410string was zero-terminated.
411
41268. In pcre2_substitute() in UTF mode, PCRE2_NO_UTF_CHECK can be set for the
413second and subsequent calls to pcre2_match().
414
41569. There was no check for integer overflow for a replacement group number in
416pcre2_substitute(). An added check for a number greater than the largest group
417number in the pattern means this is not now needed.
418
41970. The PCRE2-specific VERSION condition didn't work correctly if only one
420digit was given after the decimal point, or if more than two digits were given.
421It now works with one or two digits, and gives a compile time error if more are
422given.
423
42471. In pcre2_substitute() there was the possibility of reading one code unit
425beyond the end of the replacement string.
426
42772. The code for checking a subject's UTF-32 validity for a pattern with a
428lookbehind involved an out-of-bounds pointer, which could potentially cause
429trouble in some environments.
430
43173. The maximum lookbehind length was incorrectly calculated for patterns such
432as /(?<=(a)(?-1))x/ which have a recursion within a backreference.
433
43474. Give an error if a lookbehind assertion is longer than 65535 code units.
435
43675. Give an error in pcre2_substitute() if a match ends before it starts (as a
437result of the use of \K).
438
43976. Check the length of subpattern names and the names in (*MARK:xx) etc.
440dynamically to avoid the possibility of integer overflow.
441
44277. Implement pcre2_set_max_pattern_length() so that programs can restrict the
443size of patterns that they are prepared to handle.
444
44578. (*NO_AUTO_POSSESS) was not working.
446
44779. Adding group information caching improves the speed of compiling when
448checking whether a group has a fixed length and/or could match an empty string,
449especially when recursion or subroutine calls are involved. However, this
450cannot be used when (?| is present in the pattern because the same number may
451be used for groups of different sizes. To catch runaway patterns in this
452situation, counts have been introduced to the functions that scan for empty
453branches or compute fixed lengths.
454
45580. Allow for the possibility of the size of the nest_save structure not being
456a factor of the size of the compiling workspace (it currently is).
457
45881. Check for integer overflow in minimum length calculation and cap it at
45965535.
460
46182. Small optimizations in code for finding the minimum matching length.
462
46383. Lock out configuring for EBCDIC with non-8-bit libraries.
464
46584. Test for error code <= 0 in regerror().
466
46785. Check for too many replacements (more than INT_MAX) in pcre2_substitute().
468
46986. Avoid the possibility of computing with an out-of-bounds pointer (though
470not dereferencing it) while handling lookbehind assertions.
471
47287. Failure to get memory for the match data in regcomp() is now given as a
473regcomp() error instead of waiting for regexec() to pick it up.
474
47588. In pcre2_substitute(), ensure that CRLF is not split when it is a valid
476newline sequence.
477
47889. Paranoid check in regcomp() for bad error code from pcre2_compile().
479
48090. Run test 8 (internal offsets and code sizes) for link sizes 3 and 4 as well
481as for link size 2.
482
48391. Document that JIT has a limit on pattern size, and give more information
484about JIT compile failures in pcre2test.
485
48692. Implement PCRE2_INFO_HASBACKSLASHC.
487
48893. Re-arrange valgrind support code in pcre2test to avoid spurious reports
489with JIT (possibly caused by SSE2?).
490
49194. Support offset_limit in JIT.
492
49395. A sequence such as [[:punct:]b] that is, a POSIX character class followed
494by a single ASCII character in a class item, was incorrectly compiled in UCP
495mode. The POSIX class got lost, but only if the single character followed it.
496
49796. [:punct:] in UCP mode was matching some characters in the range 128-255
498that should not have been matched.
499
50097. If [:^ascii:] or [:^xdigit:] are present in a non-negated class, all
501characters with code points greater than 255 are in the class. When a Unicode
502property was also in the class (if PCRE2_UCP is set, escapes such as \w are
503turned into Unicode properties), wide characters were not correctly handled,
504and could fail to match.
505
50698. In pcre2test, make the "startoffset" modifier a synonym of "offset",
507because it sets the "startoffset" parameter for pcre2_match().
508
50999. If PCRE2_AUTO_CALLOUT was set on a pattern that had a (?# comment between
510an item and its qualifier (for example, A(?#comment)?B) pcre2_compile()
511misbehaved. This bug was found by the LLVM fuzzer.
512
513100. The error for an invalid UTF pattern string always gave the code unit
514offset as zero instead of where the invalidity was found.
515
516101. Further to 97 above, negated classes such as [^[:^ascii:]\d] were also not
517working correctly in UCP mode.
518
519102. Similar to 99 above, if an isolated \E was present between an item and its
520qualifier when PCRE2_AUTO_CALLOUT was set, pcre2_compile() misbehaved. This bug
521was found by the LLVM fuzzer.
522
523103. The POSIX wrapper function regexec() crashed if the option REG_STARTEND
524was set when the pmatch argument was NULL. It now returns REG_INVARG.
525
526104. Allow for up to 32-bit numbers in the ordin() function in pcre2grep.
527
528105. An empty \Q\E sequence between an item and its qualifier caused
529pcre2_compile() to misbehave when auto callouts were enabled. This bug
530was found by the LLVM fuzzer.
531
532106. If both PCRE2_ALT_VERBNAMES and PCRE2_EXTENDED were set, and a (*MARK) or
533other verb "name" ended with whitespace immediately before the closing
534parenthesis, pcre2_compile() misbehaved. Example: /(*:abc )/, but only when
535both those options were set.
536
537107. In a number of places pcre2_compile() was not handling NULL characters
538correctly, and pcre2test with the "bincode" modifier was not always correctly
539displaying fields containing NULLS:
540
541   (a) Within /x extended #-comments
542   (b) Within the "name" part of (*MARK) and other *verbs
543   (c) Within the text argument of a callout
544
545108. If a pattern that was compiled with PCRE2_EXTENDED started with white
546space or a #-type comment that was followed by (?-x), which turns off
547PCRE2_EXTENDED, and there was no subsequent (?x) to turn it on again,
548pcre2_compile() assumed that (?-x) applied to the whole pattern and
549consequently mis-compiled it. This bug was found by the LLVM fuzzer. The fix
550for this bug means that a setting of any of the (?imsxJU) options at the start
551of a pattern is no longer transferred to the options that are returned by
552PCRE2_INFO_ALLOPTIONS. In fact, this was an anachronism that should have
553changed when the effects of those options were all moved to compile time.
554
555109. An escaped closing parenthesis in the "name" part of a (*verb) when
556PCRE2_ALT_VERBNAMES was set caused pcre2_compile() to malfunction. This bug
557was found by the LLVM fuzzer.
558
559110. Implemented PCRE2_SUBSTITUTE_UNSET_EMPTY, and updated pcre2test to make it
560possible to test it.
561
562111. "Harden" pcre2test against ridiculously large values in modifiers and
563command line arguments.
564
565112. Implemented PCRE2_SUBSTITUTE_UNKNOWN_UNSET and PCRE2_SUBSTITUTE_OVERFLOW_
566LENGTH.
567
568113. Fix printing of *MARK names that contain binary zeroes in pcre2test.
569
570
571Version 10.20 30-June-2015
572--------------------------
573
5741. Callouts with string arguments have been added.
575
5762. Assertion code generator in JIT has been optimized.
577
5783. The invalid pattern (?(?C) has a missing assertion condition at the end. The
579pcre2_compile() function read past the end of the input before diagnosing an
580error. This bug was discovered by the LLVM fuzzer.
581
5824. Implemented pcre2_callout_enumerate().
583
5845. Fix JIT compilation of conditional blocks whose assertion is converted to
585(*FAIL). E.g: /(?(?!))/.
586
5876. The pattern /(?(?!)^)/ caused references to random memory. This bug was
588discovered by the LLVM fuzzer.
589
5907. The assertion (?!) is optimized to (*FAIL). This was not handled correctly
591when this assertion was used as a condition, for example (?(?!)a|b). In
592pcre2_match() it worked by luck; in pcre2_dfa_match() it gave an incorrect
593error about an unsupported item.
594
5958. For some types of pattern, for example /Z*(|d*){216}/, the auto-
596possessification code could take exponential time to complete. A recursion
597depth limit of 1000 has been imposed to limit the resources used by this
598optimization. This infelicity was discovered by the LLVM fuzzer.
599
6009. A pattern such as /(*UTF)[\S\V\H]/, which contains a negated special class
601such as \S in non-UCP mode, explicit wide characters (> 255) can be ignored
602because \S ensures they are all in the class. The code for doing this was
603interacting badly with the code for computing the amount of space needed to
604compile the pattern, leading to a buffer overflow. This bug was discovered by
605the LLVM fuzzer.
606
60710. A pattern such as /((?2)+)((?1))/ which has mutual recursion nested inside
608other kinds of group caused stack overflow at compile time. This bug was
609discovered by the LLVM fuzzer.
610
61111. A pattern such as /(?1)(?#?'){8}(a)/ which had a parenthesized comment
612between a subroutine call and its quantifier was incorrectly compiled, leading
613to buffer overflow or other errors. This bug was discovered by the LLVM fuzzer.
614
61512. The illegal pattern /(?(?<E>.*!.*)?)/ was not being diagnosed as missing an
616assertion after (?(. The code was failing to check the character after (?(?<
617for the ! or = that would indicate a lookbehind assertion. This bug was
618discovered by the LLVM fuzzer.
619
62013. A pattern such as /X((?2)()*+){2}+/ which has a possessive quantifier with
621a fixed maximum following a group that contains a subroutine reference was
622incorrectly compiled and could trigger buffer overflow. This bug was discovered
623by the LLVM fuzzer.
624
62514. Negative relative recursive references such as (?-7) to non-existent
626subpatterns were not being diagnosed and could lead to unpredictable behaviour.
627This bug was discovered by the LLVM fuzzer.
628
62915. The bug fixed in 14 was due to an integer variable that was unsigned when
630it should have been signed. Some other "int" variables, having been checked,
631have either been changed to uint32_t or commented as "must be signed".
632
63316. A mutual recursion within a lookbehind assertion such as (?<=((?2))((?1)))
634caused a stack overflow instead of the diagnosis of a non-fixed length
635lookbehind assertion. This bug was discovered by the LLVM fuzzer.
636
63717. The use of \K in a positive lookbehind assertion in a non-anchored pattern
638(e.g. /(?<=\Ka)/) could make pcre2grep loop.
639
64018. There was a similar problem to 17 in pcre2test for global matches, though
641the code there did catch the loop.
642
64319. If a greedy quantified \X was preceded by \C in UTF mode (e.g. \C\X*),
644and a subsequent item in the pattern caused a non-match, backtracking over the
645repeated \X did not stop, but carried on past the start of the subject, causing
646reference to random memory and/or a segfault. There were also some other cases
647where backtracking after \C could crash. This set of bugs was discovered by the
648LLVM fuzzer.
649
65020. The function for finding the minimum length of a matching string could take
651a very long time if mutual recursion was present many times in a pattern, for
652example, /((?2){73}(?2))((?1))/. A better mutual recursion detection method has
653been implemented. This infelicity was discovered by the LLVM fuzzer.
654
65521. Implemented PCRE2_NEVER_BACKSLASH_C.
656
65722. The feature for string replication in pcre2test could read from freed
658memory if the replication required a buffer to be extended, and it was not
659working properly in 16-bit and 32-bit modes. This issue was discovered by a
660fuzzer: see http://lcamtuf.coredump.cx/afl/.
661
66223. Added the PCRE2_ALT_CIRCUMFLEX option.
663
66424. Adjust the treatment of \8 and \9 to be the same as the current Perl
665behaviour.
666
66725. Static linking against the PCRE2 library using the pkg-config module was
668failing on missing pthread symbols.
669
67026. If a group that contained a recursive back reference also contained a
671forward reference subroutine call followed by a non-forward-reference
672subroutine call, for example /.((?2)(?R)\1)()/, pcre2_compile() failed to
673compile correct code, leading to undefined behaviour or an internally detected
674error. This bug was discovered by the LLVM fuzzer.
675
67627. Quantification of certain items (e.g. atomic back references) could cause
677incorrect code to be compiled when recursive forward references were involved.
678For example, in this pattern: /(?1)()((((((\1++))\x85)+)|))/. This bug was
679discovered by the LLVM fuzzer.
680
68128. A repeated conditional group whose condition was a reference by name caused
682a buffer overflow if there was more than one group with the given name. This
683bug was discovered by the LLVM fuzzer.
684
68529. A recursive back reference by name within a group that had the same name as
686another group caused a buffer overflow. For example: /(?J)(?'d'(?'d'\g{d}))/.
687This bug was discovered by the LLVM fuzzer.
688
68930. A forward reference by name to a group whose number is the same as the
690current group, for example in this pattern: /(?|(\k'Pm')|(?'Pm'))/, caused a
691buffer overflow at compile time. This bug was discovered by the LLVM fuzzer.
692
69331. Fix -fsanitize=undefined warnings for left shifts of 1 by 31 (it treats 1
694as an int; fixed by writing it as 1u).
695
69632. Fix pcre2grep compile when -std=c99 is used with gcc, though it still gives
697a warning for "fileno" unless -std=gnu99 us used.
698
69933. A lookbehind assertion within a set of mutually recursive subpatterns could
700provoke a buffer overflow. This bug was discovered by the LLVM fuzzer.
701
70234. Give an error for an empty subpattern name such as (?'').
703
70435. Make pcre2test give an error if a pattern that follows #forbud_utf contains
705\P, \p, or \X.
706
70736. The way named subpatterns are handled has been refactored. There is now a
708pre-pass over the regex which does nothing other than identify named
709subpatterns and count the total captures. This means that information about
710named patterns is known before the rest of the compile. In particular, it means
711that forward references can be checked as they are encountered. Previously, the
712code for handling forward references was contorted and led to several errors in
713computing the memory requirements for some patterns, leading to buffer
714overflows.
715
71637. There was no check for integer overflow in subroutine calls such as (?123).
717
71838. The table entry for \l in EBCDIC environments was incorrect, leading to its
719being treated as a literal 'l' instead of causing an error.
720
72139. If a non-capturing group containing a conditional group that could match
722an empty string was repeated, it was not identified as matching an empty string
723itself. For example: /^(?:(?(1)x|)+)+$()/.
724
72540. In an EBCDIC environment, pcretest was mishandling the escape sequences
726\a and \e in test subject lines.
727
72841. In an EBCDIC environment, \a in a pattern was converted to the ASCII
729instead of the EBCDIC value.
730
73142. The handling of \c in an EBCDIC environment has been revised so that it is
732now compatible with the specification in Perl's perlebcdic page.
733
73443. Single character repetition in JIT has been improved. 20-30% speedup
735was achieved on certain patterns.
736
73744. The EBCDIC character 0x41 is a non-breaking space, equivalent to 0xa0 in
738ASCII/Unicode. This has now been added to the list of characters that are
739recognized as white space in EBCDIC.
740
74145. When PCRE2 was compiled without Unicode support, the use of \p and \P gave
742an error (correctly) when used outside a class, but did not give an error
743within a class.
744
74546. \h within a class was incorrectly compiled in EBCDIC environments.
746
74747. JIT should return with error when the compiled pattern requires
748more stack space than the maximum.
749
75048. Fixed a memory leak in pcre2grep when a locale is set.
751
752
753Version 10.10 06-March-2015
754---------------------------
755
7561. When a pattern is compiled, it remembers the highest back reference so that
757when matching, if the ovector is too small, extra memory can be obtained to
758use instead. A conditional subpattern whose condition is a check on a capture
759having happened, such as, for example in the pattern /^(?:(a)|b)(?(1)A|B)/, is
760another kind of back reference, but it was not setting the highest
761backreference number. This mattered only if pcre2_match() was called with an
762ovector that was too small to hold the capture, and there was no other kind of
763back reference (a situation which is probably quite rare). The effect of the
764bug was that the condition was always treated as FALSE when the capture could
765not be consulted, leading to a incorrect behaviour by pcre2_match(). This bug
766has been fixed.
767
7682. Functions for serialization and deserialization of sets of compiled patterns
769have been added.
770
7713. The value that is returned by PCRE2_INFO_SIZE has been corrected to remove
772excess code units at the end of the data block that may occasionally occur if
773the code for calculating the size over-estimates. This change stops the
774serialization code copying uninitialized data, to which valgrind objects. The
775documentation of PCRE2_INFO_SIZE was incorrect in stating that the size did not
776include the general overhead. This has been corrected.
777
7784. All code units in every slot in the table of group names are now set, again
779in order to avoid accessing uninitialized data when serializing.
780
7815. The (*NO_JIT) feature is implemented.
782
7836. If a bug that caused pcre2_compile() to use more memory than allocated was
784triggered when using valgrind, the code in (3) above passed a stupidly large
785value to valgrind. This caused a crash instead of an "internal error" return.
786
7877. A reference to a duplicated named group (either a back reference or a test
788for being set in a conditional) that occurred in a part of the pattern where
789PCRE2_DUPNAMES was not set caused the amount of memory needed for the pattern
790to be incorrectly calculated, leading to overwriting.
791
7928. A mutually recursive set of back references such as (\2)(\1) caused a
793segfault at compile time (while trying to find the minimum matching length).
794The infinite loop is now broken (with the minimum length unset, that is, zero).
795
7969. If an assertion that was used as a condition was quantified with a minimum
797of zero, matching went wrong. In particular, if the whole group had unlimited
798repetition and could match an empty string, a segfault was likely. The pattern
799(?(?=0)?)+ is an example that caused this. Perl allows assertions to be
800quantified, but not if they are being used as conditions, so the above pattern
801is faulted by Perl. PCRE2 has now been changed so that it also rejects such
802patterns.
803
80410. The error message for an invalid quantifier has been changed from "nothing
805to repeat" to "quantifier does not follow a repeatable item".
806
80711. If a bad UTF string is compiled with NO_UTF_CHECK, it may succeed, but
808scanning the compiled pattern in subsequent auto-possessification can get out
809of step and lead to an unknown opcode. Previously this could have caused an
810infinite loop. Now it generates an "internal error" error. This is a tidyup,
811not a bug fix; passing bad UTF with NO_UTF_CHECK is documented as having an
812undefined outcome.
813
81412. A UTF pattern containing a "not" match of a non-ASCII character and a
815subroutine reference could loop at compile time. Example: /[^\xff]((?1))/.
816
81713. The locale test (RunTest 3) has been upgraded. It now checks that a locale
818that is found in the output of "locale -a" can actually be set by pcre2test
819before it is accepted. Previously, in an environment where a locale was listed
820but would not set (an example does exist), the test would "pass" without
821actually doing anything. Also the fr_CA locale has been added to the list of
822locales that can be used.
823
82414. Fixed a bug in pcre2_substitute(). If a replacement string ended in a
825capturing group number without parentheses, the last character was incorrectly
826literally included at the end of the replacement string.
827
82815. A possessive capturing group such as (a)*+ with a minimum repeat of zero
829failed to allow the zero-repeat case if pcre2_match() was called with an
830ovector too small to capture the group.
831
83216. Improved error message in pcre2test when setting the stack size (-S) fails.
833
83417. Fixed two bugs in CMakeLists.txt: (1) Some lines had got lost in the
835transfer from PCRE1, meaning that CMake configuration failed if "build tests"
836was selected. (2) The file src/pcre2_serialize.c had not been added to the list
837of PCRE2 sources, which caused a failure to build pcre2test.
838
83918. Fixed typo in pcre2_serialize.c (DECL instead of DEFN) that causes problems
840only on Windows.
841
84219. Use binary input when reading back saved serialized patterns in pcre2test.
843
84420. Added RunTest.bat for running the tests under Windows.
845
84621. "make distclean" was not removing config.h, a file that may be created for
847use with CMake.
848
84922. A pattern such as "((?2){0,1999}())?", which has a group containing a
850forward reference repeated a large (but limited) number of times within a
851repeated outer group that has a zero minimum quantifier, caused incorrect code
852to be compiled, leading to the error "internal error: previously-checked
853referenced subpattern not found" when an incorrect memory address was read.
854This bug was reported as "heap overflow", discovered by Kai Lu of Fortinet's
855FortiGuard Labs. (Added 24-March-2015: CVE-2015-2325 was given to this.)
856
85723. A pattern such as "((?+1)(\1))/" containing a forward reference subroutine
858call within a group that also contained a recursive back reference caused
859incorrect code to be compiled. This bug was reported as "heap overflow",
860discovered by Kai Lu of Fortinet's FortiGuard Labs. (Added 24-March-2015:
861CVE-2015-2326 was given to this.)
862
86324. Computing the size of the JIT read-only data in advance has been a source
864of various issues, and new ones are still appear unfortunately. To fix
865existing and future issues, size computation is eliminated from the code,
866and replaced by on-demand memory allocation.
867
86825. A pattern such as /(?i)[A-`]/, where characters in the other case are
869adjacent to the end of the range, and the range contained characters with more
870than one other case, caused incorrect behaviour when compiled in UTF mode. In
871that example, the range a-j was left out of the class.
872
873
874Version 10.00 05-January-2015
875-----------------------------
876
877Version 10.00 is the first release of PCRE2, a revised API for the PCRE
878library. Changes prior to 10.00 are logged in the ChangeLog file for the old
879API, up to item 20 for release 8.36.
880
881The code of the library was heavily revised as part of the new API
882implementation. Details of each and every modification were not individually
883logged. In addition to the API changes, the following changes were made. They
884are either new functionality, or bug fixes and other noticeable changes of
885behaviour that were implemented after the code had been forked.
886
8871. Including Unicode support at build time is now enabled by default, but it
888can optionally be disabled. It is not enabled by default at run time (no
889change).
890
8912. The test program, now called pcre2test, was re-specified and almost
892completely re-written. Its input is not compatible with input for pcretest.
893
8943. Patterns may start with (*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the
895PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is
896matched by that pattern.
897
8984. For the benefit of those who use PCRE2 via some other application, that is,
899not writing the function calls themselves, it is possible to check the PCRE2
900version by matching a pattern such as /(?(VERSION>=10)yes|no)/ against a
901string such as "yesno".
902
9035. There are case-equivalent Unicode characters whose encodings use different
904numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is
905theoretically possible for this to happen in UTF-16 too.) If a backreference to
906a group containing one of these characters was greedily repeated, and during
907the match a backtrack occurred, the subject might be backtracked by the wrong
908number of code units. For example, if /^(\x{23a})\1*(.)/ is matched caselessly
909(and in UTF-8 mode) against "\x{23a}\x{2c65}\x{2c65}\x{2c65}", group 2 should
910capture the final character, which is the three bytes E2, B1, and A5 in UTF-8.
911Incorrect backtracking meant that group 2 captured only the last two bytes.
912This bug has been fixed; the new code is slower, but it is used only when the
913strings matched by the repetition are not all the same length.
914
9156. A pattern such as /()a/ was not setting the "first character must be 'a'"
916information. This applied to any pattern with a group that matched no
917characters, for example: /(?:(?=.)|(?<!x))a/.
918
9197. When an (*ACCEPT) is triggered inside capturing parentheses, it arranges for
920those parentheses to be closed with whatever has been captured so far. However,
921it was failing to mark any other groups between the highest capture so far and
922the currrent group as "unset". Thus, the ovector for those groups contained
923whatever was previously there. An example is the pattern /(x)|((*ACCEPT))/ when
924matched against "abcd".
925
9268. The pcre2_substitute() function has been implemented.
927
9289. If an assertion used as a condition was quantified with a minimum of zero
929(an odd thing to do, but it happened), SIGSEGV or other misbehaviour could
930occur.
931
93210. The PCRE2_NO_DOTSTAR_ANCHOR option has been implemented.
933
934****
935