| <!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="Match regular expressions on arbitrary bytes."><meta name="keywords" content="rust, rustlang, rust-lang, bytes"><title>regex::bytes - Rust</title><link rel="preload" as="font" type="font/woff2" crossorigin href="../../SourceSerif4-Regular.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../FiraSans-Regular.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../FiraSans-Medium.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../SourceCodePro-Regular.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../SourceSerif4-Bold.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../SourceCodePro-Semibold.ttf.woff2"><link rel="stylesheet" href="../../normalize.css"><link rel="stylesheet" href="../../rustdoc.css" id="mainThemeStyle"><link rel="stylesheet" href="../../ayu.css" disabled><link rel="stylesheet" href="../../dark.css" disabled><link rel="stylesheet" href="../../light.css" id="themeStyle"><script id="default-settings" ></script><script src="../../storage.js"></script><script defer src="../../main.js"></script><noscript><link rel="stylesheet" href="../../noscript.css"></noscript><link rel="alternate icon" type="image/png" href="../../favicon-16x16.png"><link rel="alternate icon" type="image/png" href="../../favicon-32x32.png"><link rel="icon" type="image/svg+xml" href="../../favicon.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle">☰</button><a class="sidebar-logo" href="../../regex/index.html"><div class="logo-container"><img class="rust-logo" src="../../rust-logo.svg" alt="logo"></div></a><h2></h2></nav><nav class="sidebar"><a class="sidebar-logo" href="../../regex/index.html"><div class="logo-container"><img class="rust-logo" src="../../rust-logo.svg" alt="logo"></div></a><h2 class="location"><a href="#">Module bytes</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#structs">Structs</a></li><li><a href="#traits">Traits</a></li></ul></section></div></nav><main><div class="width-limiter"><nav class="sub"><form class="search-form"><div class="search-container"><span></span><input class="search-input" name="search" autocomplete="off" spellcheck="false" placeholder="Click or press ‘S’ to search, ‘?’ for more options…" type="search"><div id="help-button" title="help" tabindex="-1"><a href="../../help.html">?</a></div><div id="settings-menu" tabindex="-1"><a href="../../settings.html" title="settings"><img width="22" height="22" alt="Change settings" src="../../wheel.svg"></a></div></div></form></nav><section id="main-content" class="content"><div class="main-heading"><h1 class="fqn">Module <a href="../index.html">regex</a>::<wbr><a class="mod" href="#">bytes</a><button id="copy-path" onclick="copy_path(this)" title="Copy item path to clipboard"><img src="../../clipboard.svg" width="19" height="18" alt="Copy item path"></button></h1><span class="out-of-band"><a class="srclink" href="../../src/regex/lib.rs.html#760">source</a> · <a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">[<span class="inner">−</span>]</a></span></div><details class="rustdoc-toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>Match regular expressions on arbitrary bytes.</p> |
| <p>This module provides a nearly identical API to the one found in the |
| top-level of this crate. There are two important differences:</p> |
| <ol> |
| <li>Matching is done on <code>&[u8]</code> instead of <code>&str</code>. Additionally, <code>Vec<u8></code> |
| is used where <code>String</code> would have been used.</li> |
| <li>Unicode support can be disabled even when disabling it would result in |
| matching invalid UTF-8 bytes.</li> |
| </ol> |
| <h2 id="example-match-null-terminated-string"><a href="#example-match-null-terminated-string">Example: match null terminated string</a></h2> |
| <p>This shows how to find all null-terminated strings in a slice of bytes:</p> |
| |
| <div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">let </span>re = Regex::new(<span class="string">r"(?-u)(?P<cstr>[^\x00]+)\x00"</span>).unwrap(); |
| <span class="kw">let </span>text = <span class="string">b"foo\x00bar\x00baz\x00"</span>; |
| |
| <span class="comment">// Extract all of the strings without the null terminator from each match. |
| // The unwrap is OK here since a match requires the `cstr` capture to match. |
| </span><span class="kw">let </span>cstrs: Vec<<span class="kw-2">&</span>[u8]> = |
| re.captures_iter(text) |
| .map(|c| c.name(<span class="string">"cstr"</span>).unwrap().as_bytes()) |
| .collect(); |
| <span class="macro">assert_eq!</span>(<span class="macro">vec!</span>[<span class="kw-2">&</span><span class="string">b"foo"</span>[..], <span class="kw-2">&</span><span class="string">b"bar"</span>[..], <span class="kw-2">&</span><span class="string">b"baz"</span>[..]], cstrs);</code></pre></div> |
| <h2 id="example-selectively-enable-unicode-support"><a href="#example-selectively-enable-unicode-support">Example: selectively enable Unicode support</a></h2> |
| <p>This shows how to match an arbitrary byte pattern followed by a UTF-8 encoded |
| string (e.g., to extract a title from a Matroska file):</p> |
| |
| <div class="example-wrap"><pre class="rust rust-example-rendered"><code><span class="kw">let </span>re = Regex::new( |
| <span class="string">r"(?-u)\x7b\xa9(?:[\x80-\xfe]|[\x40-\xff].)(?u:(.*))" |
| </span>).unwrap(); |
| <span class="kw">let </span>text = <span class="string">b"\x12\xd0\x3b\x5f\x7b\xa9\x85\xe2\x98\x83\x80\x98\x54\x76\x68\x65"</span>; |
| <span class="kw">let </span>caps = re.captures(text).unwrap(); |
| |
| <span class="comment">// Notice that despite the `.*` at the end, it will only match valid UTF-8 |
| // because Unicode mode was enabled with the `u` flag. Without the `u` flag, |
| // the `.*` would match the rest of the bytes. |
| </span><span class="kw">let </span>mat = caps.get(<span class="number">1</span>).unwrap(); |
| <span class="macro">assert_eq!</span>((<span class="number">7</span>, <span class="number">10</span>), (mat.start(), mat.end())); |
| |
| <span class="comment">// If there was a match, Unicode mode guarantees that `title` is valid UTF-8. |
| </span><span class="kw">let </span>title = str::from_utf8(<span class="kw-2">&</span>caps[<span class="number">1</span>]).unwrap(); |
| <span class="macro">assert_eq!</span>(<span class="string">"☃"</span>, title);</code></pre></div> |
| <p>In general, if the Unicode flag is enabled in a capture group and that capture |
| is part of the overall match, then the capture is <em>guaranteed</em> to be valid |
| UTF-8.</p> |
| <h2 id="syntax"><a href="#syntax">Syntax</a></h2> |
| <p>The supported syntax is pretty much the same as the syntax for Unicode |
| regular expressions with a few changes that make sense for matching arbitrary |
| bytes:</p> |
| <ol> |
| <li>The <code>u</code> flag can be disabled even when disabling it might cause the regex to |
| match invalid UTF-8. When the <code>u</code> flag is disabled, the regex is said to be in |
| “ASCII compatible” mode.</li> |
| <li>In ASCII compatible mode, neither Unicode scalar values nor Unicode |
| character classes are allowed.</li> |
| <li>In ASCII compatible mode, Perl character classes (<code>\w</code>, <code>\d</code> and <code>\s</code>) |
| revert to their typical ASCII definition. <code>\w</code> maps to <code>[[:word:]]</code>, <code>\d</code> maps |
| to <code>[[:digit:]]</code> and <code>\s</code> maps to <code>[[:space:]]</code>.</li> |
| <li>In ASCII compatible mode, word boundaries use the ASCII compatible <code>\w</code> to |
| determine whether a byte is a word byte or not.</li> |
| <li>Hexadecimal notation can be used to specify arbitrary bytes instead of |
| Unicode codepoints. For example, in ASCII compatible mode, <code>\xFF</code> matches the |
| literal byte <code>\xFF</code>, while in Unicode mode, <code>\xFF</code> is a Unicode codepoint that |
| matches its UTF-8 encoding of <code>\xC3\xBF</code>. Similarly for octal notation when |
| enabled.</li> |
| <li>In ASCII compatible mode, <code>.</code> matches any <em>byte</em> except for <code>\n</code>. When the |
| <code>s</code> flag is additionally enabled, <code>.</code> matches any byte.</li> |
| </ol> |
| <h2 id="performance"><a href="#performance">Performance</a></h2> |
| <p>In general, one should expect performance on <code>&[u8]</code> to be roughly similar to |
| performance on <code>&str</code>.</p> |
| </div></details><h2 id="structs" class="small-section-header"><a href="#structs">Structs</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.CaptureLocations.html" title="regex::bytes::CaptureLocations struct">CaptureLocations</a></div><div class="item-right docblock-short">CaptureLocations is a low level representation of the raw offsets of each |
| submatch.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.CaptureMatches.html" title="regex::bytes::CaptureMatches struct">CaptureMatches</a></div><div class="item-right docblock-short">An iterator that yields all non-overlapping capture groups matching a |
| particular regular expression.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.CaptureNames.html" title="regex::bytes::CaptureNames struct">CaptureNames</a></div><div class="item-right docblock-short">An iterator over the names of all possible captures.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Captures.html" title="regex::bytes::Captures struct">Captures</a></div><div class="item-right docblock-short">Captures represents a group of captured byte strings for a single match.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Match.html" title="regex::bytes::Match struct">Match</a></div><div class="item-right docblock-short">Match represents a single match of a regex in a haystack.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Matches.html" title="regex::bytes::Matches struct">Matches</a></div><div class="item-right docblock-short">An iterator over all non-overlapping matches for a particular string.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.NoExpand.html" title="regex::bytes::NoExpand struct">NoExpand</a></div><div class="item-right docblock-short"><code>NoExpand</code> indicates literal byte string replacement.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Regex.html" title="regex::bytes::Regex struct">Regex</a></div><div class="item-right docblock-short">A compiled regular expression for matching arbitrary bytes.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.RegexBuilder.html" title="regex::bytes::RegexBuilder struct">RegexBuilder</a></div><div class="item-right docblock-short">A configurable builder for a regular expression.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.RegexSet.html" title="regex::bytes::RegexSet struct">RegexSet</a></div><div class="item-right docblock-short">Match multiple (possibly overlapping) regular expressions in a single scan.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.RegexSetBuilder.html" title="regex::bytes::RegexSetBuilder struct">RegexSetBuilder</a></div><div class="item-right docblock-short">A configurable builder for a set of regular expressions.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.ReplacerRef.html" title="regex::bytes::ReplacerRef struct">ReplacerRef</a></div><div class="item-right docblock-short">By-reference adaptor for a <code>Replacer</code></div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.SetMatches.html" title="regex::bytes::SetMatches struct">SetMatches</a></div><div class="item-right docblock-short">A set of matches returned by a regex set.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.SetMatchesIntoIter.html" title="regex::bytes::SetMatchesIntoIter struct">SetMatchesIntoIter</a></div><div class="item-right docblock-short">An owned iterator over the set of matches from a regex set.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.SetMatchesIter.html" title="regex::bytes::SetMatchesIter struct">SetMatchesIter</a></div><div class="item-right docblock-short">A borrowed iterator over the set of matches from a regex set.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Split.html" title="regex::bytes::Split struct">Split</a></div><div class="item-right docblock-short">Yields all substrings delimited by a regular expression match.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.SplitN.html" title="regex::bytes::SplitN struct">SplitN</a></div><div class="item-right docblock-short">Yields at most <code>N</code> substrings delimited by a regular expression match.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.SubCaptureMatches.html" title="regex::bytes::SubCaptureMatches struct">SubCaptureMatches</a></div><div class="item-right docblock-short">An iterator that yields all capturing matches in the order in which they |
| appear in the regex.</div></div></div><h2 id="traits" class="small-section-header"><a href="#traits">Traits</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="trait" href="trait.Replacer.html" title="regex::bytes::Replacer trait">Replacer</a></div><div class="item-right docblock-short">Replacer describes types that can be used to replace matches in a byte |
| string.</div></div></div></section></div></main><div id="rustdoc-vars" data-root-path="../../" data-current-crate="regex" data-themes="ayu,dark,light" data-resource-suffix="" data-rustdoc-version="1.66.0-nightly (5c8bff74b 2022-10-21)" ></div></body></html> |