README.md
1regex
2=====
3A Rust library for parsing, compiling, and executing regular expressions. Its
4syntax is similar to Perl-style regular expressions, but lacks a few features
5like look around and backreferences. In exchange, all searches execute in
6linear time with respect to the size of the regular expression and search text.
7Much of the syntax and implementation is inspired
8by [RE2](https://github.com/google/re2).
9
10[![Build status](https://github.com/rust-lang/regex/workflows/ci/badge.svg)](https://github.com/rust-lang/regex/actions)
11[![](https://meritbadge.herokuapp.com/regex)](https://crates.io/crates/regex)
12[![Rust](https://img.shields.io/badge/rust-1.28.0%2B-blue.svg?maxAge=3600)](https://github.com/rust-lang/regex)
13
14### Documentation
15
16[Module documentation with examples](https://docs.rs/regex).
17The module documentation also includes a comprehensive description of the
18syntax supported.
19
20Documentation with examples for the various matching functions and iterators
21can be found on the
22[`Regex` type](https://docs.rs/regex/*/regex/struct.Regex.html).
23
24### Usage
25
26Add this to your `Cargo.toml`:
27
28```toml
29[dependencies]
30regex = "1"
31```
32
33and this to your crate root (if you're using Rust 2015):
34
35```rust
36extern crate regex;
37```
38
39Here's a simple example that matches a date in YYYY-MM-DD format and prints the
40year, month and day:
41
42```rust
43use regex::Regex;
44
45fn main() {
46 let re = Regex::new(r"(?x)
47(?P<year>\d{4}) # the year
48-
49(?P<month>\d{2}) # the month
50-
51(?P<day>\d{2}) # the day
52").unwrap();
53 let caps = re.captures("2010-03-14").unwrap();
54
55 assert_eq!("2010", &caps["year"]);
56 assert_eq!("03", &caps["month"]);
57 assert_eq!("14", &caps["day"]);
58}
59```
60
61If you have lots of dates in text that you'd like to iterate over, then it's
62easy to adapt the above example with an iterator:
63
64```rust
65use regex::Regex;
66
67const TO_SEARCH: &'static str = "
68On 2010-03-14, foo happened. On 2014-10-14, bar happened.
69";
70
71fn main() {
72 let re = Regex::new(r"(\d{4})-(\d{2})-(\d{2})").unwrap();
73
74 for caps in re.captures_iter(TO_SEARCH) {
75 // Note that all of the unwraps are actually OK for this regex
76 // because the only way for the regex to match is if all of the
77 // capture groups match. This is not true in general though!
78 println!("year: {}, month: {}, day: {}",
79 caps.get(1).unwrap().as_str(),
80 caps.get(2).unwrap().as_str(),
81 caps.get(3).unwrap().as_str());
82 }
83}
84```
85
86This example outputs:
87
88```text
89year: 2010, month: 03, day: 14
90year: 2014, month: 10, day: 14
91```
92
93### Usage: Avoid compiling the same regex in a loop
94
95It is an anti-pattern to compile the same regular expression in a loop since
96compilation is typically expensive. (It takes anywhere from a few microseconds
97to a few **milliseconds** depending on the size of the regex.) Not only is
98compilation itself expensive, but this also prevents optimizations that reuse
99allocations internally to the matching engines.
100
101In Rust, it can sometimes be a pain to pass regular expressions around if
102they're used from inside a helper function. Instead, we recommend using the
103[`lazy_static`](https://crates.io/crates/lazy_static) crate to ensure that
104regular expressions are compiled exactly once.
105
106For example:
107
108```rust,ignore
109use regex::Regex;
110
111fn some_helper_function(text: &str) -> bool {
112 lazy_static! {
113 static ref RE: Regex = Regex::new("...").unwrap();
114 }
115 RE.is_match(text)
116}
117```
118
119Specifically, in this example, the regex will be compiled when it is used for
120the first time. On subsequent uses, it will reuse the previous compilation.
121
122### Usage: match regular expressions on `&[u8]`
123
124The main API of this crate (`regex::Regex`) requires the caller to pass a
125`&str` for searching. In Rust, an `&str` is required to be valid UTF-8, which
126means the main API can't be used for searching arbitrary bytes.
127
128To match on arbitrary bytes, use the `regex::bytes::Regex` API. The API
129is identical to the main API, except that it takes an `&[u8]` to search
130on instead of an `&str`. By default, `.` will match any *byte* using
131`regex::bytes::Regex`, while `.` will match any *UTF-8 encoded Unicode scalar
132value* using the main API.
133
134This example shows how to find all null-terminated strings in a slice of bytes:
135
136```rust
137use regex::bytes::Regex;
138
139let re = Regex::new(r"(?P<cstr>[^\x00]+)\x00").unwrap();
140let text = b"foo\x00bar\x00baz\x00";
141
142// Extract all of the strings without the null terminator from each match.
143// The unwrap is OK here since a match requires the `cstr` capture to match.
144let cstrs: Vec<&[u8]> =
145 re.captures_iter(text)
146 .map(|c| c.name("cstr").unwrap().as_bytes())
147 .collect();
148assert_eq!(vec![&b"foo"[..], &b"bar"[..], &b"baz"[..]], cstrs);
149```
150
151Notice here that the `[^\x00]+` will match any *byte* except for `NUL`. When
152using the main API, `[^\x00]+` would instead match any valid UTF-8 sequence
153except for `NUL`.
154
155### Usage: match multiple regular expressions simultaneously
156
157This demonstrates how to use a `RegexSet` to match multiple (possibly
158overlapping) regular expressions in a single scan of the search text:
159
160```rust
161use regex::RegexSet;
162
163let set = RegexSet::new(&[
164 r"\w+",
165 r"\d+",
166 r"\pL+",
167 r"foo",
168 r"bar",
169 r"barfoo",
170 r"foobar",
171]).unwrap();
172
173// Iterate over and collect all of the matches.
174let matches: Vec<_> = set.matches("foobar").into_iter().collect();
175assert_eq!(matches, vec![0, 2, 3, 4, 6]);
176
177// You can also test whether a particular regex matched:
178let matches = set.matches("foobar");
179assert!(!matches.matched(5));
180assert!(matches.matched(6));
181```
182
183### Usage: enable SIMD optimizations
184
185SIMD optimizations are enabled automatically on Rust stable 1.27 and newer.
186For nightly versions of Rust, this requires a recent version with the SIMD
187features stabilized.
188
189
190### Usage: a regular expression parser
191
192This repository contains a crate that provides a well tested regular expression
193parser, abstract syntax and a high-level intermediate representation for
194convenient analysis. It provides no facilities for compilation or execution.
195This may be useful if you're implementing your own regex engine or otherwise
196need to do analysis on the syntax of a regular expression. It is otherwise not
197recommended for general use.
198
199[Documentation `regex-syntax`.](https://docs.rs/regex-syntax)
200
201
202### Crate features
203
204This crate comes with several features that permit tweaking the trade off
205between binary size, compilation time and runtime performance. Users of this
206crate can selectively disable Unicode tables, or choose from a variety of
207optimizations performed by this crate to disable.
208
209When all of these features are disabled, runtime match performance may be much
210worse, but if you're matching on short strings, or if high performance isn't
211necessary, then such a configuration is perfectly serviceable. To disable
212all such features, use the following `Cargo.toml` dependency configuration:
213
214```toml
215[dependencies.regex]
216version = "1.3"
217default-features = false
218# regex currently requires the standard library, you must re-enable it.
219features = ["std"]
220```
221
222This will reduce the dependency tree of `regex` down to a single crate
223(`regex-syntax`).
224
225The full set of features one can disable are
226[in the "Crate features" section of the documentation](https://docs.rs/regex/*/#crate-features).
227
228
229### Minimum Rust version policy
230
231This crate's minimum supported `rustc` version is `1.28.0`.
232
233The current **tentative** policy is that the minimum Rust version required
234to use this crate can be increased in minor version updates. For example, if
235regex 1.0 requires Rust 1.20.0, then regex 1.0.z for all values of `z` will
236also require Rust 1.20.0 or newer. However, regex 1.y for `y > 0` may require a
237newer minimum version of Rust.
238
239In general, this crate will be conservative with respect to the minimum
240supported version of Rust.
241
242
243### License
244
245This project is licensed under either of
246
247 * Apache License, Version 2.0, ([LICENSE-APACHE](LICENSE-APACHE) or
248 https://www.apache.org/licenses/LICENSE-2.0)
249 * MIT license ([LICENSE-MIT](LICENSE-MIT) or
250 https://opensource.org/licenses/MIT)
251
252at your option.
253
254The data in `regex-syntax/src/unicode_tables/` is licensed under the Unicode
255License Agreement
256([LICENSE-UNICODE](https://www.unicode.org/copyright.html#License)).
257