blob: 2d886f3afb4ef5001203aaa88bd5f1a67ed397a5 [file] [log] [blame]
<!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="Provides literal extraction from `Hir` expressions."><meta name="keywords" content="rust, rustlang, rust-lang, literal"><title>regex_syntax::hir::literal - Rust</title><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceSerif4-Regular.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../FiraSans-Regular.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../FiraSans-Medium.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceCodePro-Regular.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceSerif4-Bold.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceCodePro-Semibold.ttf.woff2"><link rel="stylesheet" href="../../../normalize.css"><link rel="stylesheet" href="../../../rustdoc.css" id="mainThemeStyle"><link rel="stylesheet" href="../../../ayu.css" disabled><link rel="stylesheet" href="../../../dark.css" disabled><link rel="stylesheet" href="../../../light.css" id="themeStyle"><script id="default-settings" ></script><script src="../../../storage.js"></script><script defer src="../../../main.js"></script><noscript><link rel="stylesheet" href="../../../noscript.css"></noscript><link rel="alternate icon" type="image/png" href="../../../favicon-16x16.png"><link rel="alternate icon" type="image/png" href="../../../favicon-32x32.png"><link rel="icon" type="image/svg+xml" href="../../../favicon.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle">&#9776;</button><a class="sidebar-logo" href="../../../regex_syntax/index.html"><div class="logo-container"><img class="rust-logo" src="../../../rust-logo.svg" alt="logo"></div></a><h2></h2></nav><nav class="sidebar"><a class="sidebar-logo" href="../../../regex_syntax/index.html"><div class="logo-container"><img class="rust-logo" src="../../../rust-logo.svg" alt="logo"></div></a><h2 class="location"><a href="#">Module literal</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#structs">Structs</a></li><li><a href="#enums">Enums</a></li><li><a href="#functions">Functions</a></li></ul></section></div></nav><main><div class="width-limiter"><nav class="sub"><form class="search-form"><div class="search-container"><span></span><input class="search-input" name="search" autocomplete="off" spellcheck="false" placeholder="Click or press ‘S’ to search, ‘?’ for more options…" type="search"><div id="help-button" title="help" tabindex="-1"><a href="../../../help.html">?</a></div><div id="settings-menu" tabindex="-1"><a href="../../../settings.html" title="settings"><img width="22" height="22" alt="Change settings" src="../../../wheel.svg"></a></div></div></form></nav><section id="main-content" class="content"><div class="main-heading"><h1 class="fqn">Module <a href="../../index.html">regex_syntax</a>::<wbr><a href="../index.html">hir</a>::<wbr><a class="mod" href="#">literal</a><button id="copy-path" onclick="copy_path(this)" title="Copy item path to clipboard"><img src="../../../clipboard.svg" width="19" height="18" alt="Copy item path"></button></h1><span class="out-of-band"><a class="srclink" href="../../../src/regex_syntax/hir/literal.rs.html#1-3165">source</a> · <a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">[<span class="inner">&#x2212;</span>]</a></span></div><details class="rustdoc-toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>Provides literal extraction from <code>Hir</code> expressions.</p>
<p>An <a href="struct.Extractor.html" title="Extractor"><code>Extractor</code></a> pulls literals out of <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expressions and returns a
<a href="struct.Seq.html" title="Seq"><code>Seq</code></a> of <a href="struct.Literal.html" title="Literal"><code>Literal</code></a>s.</p>
<p>The purpose of literal extraction is generally to provide avenues for
optimizing regex searches. The main idea is that substring searches can be an
order of magnitude faster than a regex search. Therefore, if one can execute
a substring search to find candidate match locations and only run the regex
search at those locations, then it is possible for huge improvements in
performance to be realized.</p>
<p>With that said, literal optimizations are generally a black art because even
though substring search is generally faster, if the number of candidates
produced is high, then it can create a lot of overhead by ping-ponging between
the substring search and the regex search.</p>
<p>Here are some heuristics that might be used to help increase the chances of
effective literal optimizations:</p>
<ul>
<li>Stick to small <a href="struct.Seq.html" title="Seq"><code>Seq</code></a>s. If you search for too many literals, it’s likely
to lead to substring search that is only a little faster than a regex search,
and thus the overhead of using literal optimizations in the first place might
make things slower overall.</li>
<li>The literals in your <a href="struct.Seq.html" title="Seq"><code>Seq</code></a> shoudn’t be too short. In general, longer is
better. A sequence corresponding to single bytes that occur frequently in the
haystack, for example, is probably a bad literal optimization because it’s
likely to produce many false positive candidates. Longer literals are less
likely to match, and thus probably produce fewer false positives.</li>
<li>If it’s possible to estimate the approximate frequency of each byte according
to some pre-computed background distribution, it is possible to compute a score
of how “good” a <code>Seq</code> is. If a <code>Seq</code> isn’t good enough, you might consider
skipping the literal optimization and just use the regex engine.</li>
</ul>
<p>(It should be noted that there are always pathological cases that can make
any kind of literal optimization be a net slower result. This is why it
might be a good idea to be conservative, or to even provide a means for
literal optimizations to be dynamically disabled if they are determined to be
ineffective according to some measure.)</p>
<p>You’re encouraged to explore the methods on <a href="struct.Seq.html" title="Seq"><code>Seq</code></a>, which permit shrinking
the size of sequences in a preference-order preserving fashion.</p>
<p>Finally, note that it isn’t strictly necessary to use an <a href="struct.Extractor.html" title="Extractor"><code>Extractor</code></a>. Namely,
an <code>Extractor</code> only uses public APIs of the <a href="struct.Seq.html" title="Seq"><code>Seq</code></a> and <a href="struct.Literal.html" title="Literal"><code>Literal</code></a> types,
so it is possible to implement your own extractor. For example, for n-grams
or “inner” literals (i.e., not prefix or suffix literals). The <code>Extractor</code>
is mostly responsible for the case analysis over <code>Hir</code> expressions. Much of
the “trickier” parts are how to combine literal sequences, and that is all
implemented on <a href="struct.Seq.html" title="Seq"><code>Seq</code></a>.</p>
</div></details><h2 id="structs" class="small-section-header"><a href="#structs">Structs</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Extractor.html" title="regex_syntax::hir::literal::Extractor struct">Extractor</a></div><div class="item-right docblock-short">Extracts prefix or suffix literal sequences from <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expressions.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Literal.html" title="regex_syntax::hir::literal::Literal struct">Literal</a></div><div class="item-right docblock-short">A single literal extracted from an <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expression.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Seq.html" title="regex_syntax::hir::literal::Seq struct">Seq</a></div><div class="item-right docblock-short">A sequence of literals.</div></div></div><h2 id="enums" class="small-section-header"><a href="#enums">Enums</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="enum" href="enum.ExtractKind.html" title="regex_syntax::hir::literal::ExtractKind enum">ExtractKind</a></div><div class="item-right docblock-short">The kind of literals to extract from an <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expression.</div></div></div><h2 id="functions" class="small-section-header"><a href="#functions">Functions</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="fn" href="fn.rank.html" title="regex_syntax::hir::literal::rank fn">rank</a></div><div class="item-right docblock-short">Returns the “rank” of the given byte.</div></div></div></section></div></main><div id="rustdoc-vars" data-root-path="../../../" data-current-crate="regex_syntax" data-themes="ayu,dark,light" data-resource-suffix="" data-rustdoc-version="1.66.0-nightly (5c8bff74b 2022-10-21)" ></div></body></html>