| <!DOCTYPE html><html lang="en"><head><meta charset="utf-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><meta name="generator" content="rustdoc"><meta name="description" content="Provides literal extraction from `Hir` expressions."><meta name="keywords" content="rust, rustlang, rust-lang, literal"><title>regex_syntax::hir::literal - Rust</title><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceSerif4-Regular.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../FiraSans-Regular.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../FiraSans-Medium.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceCodePro-Regular.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceSerif4-Bold.ttf.woff2"><link rel="preload" as="font" type="font/woff2" crossorigin href="../../../SourceCodePro-Semibold.ttf.woff2"><link rel="stylesheet" href="../../../normalize.css"><link rel="stylesheet" href="../../../rustdoc.css" id="mainThemeStyle"><link rel="stylesheet" href="../../../ayu.css" disabled><link rel="stylesheet" href="../../../dark.css" disabled><link rel="stylesheet" href="../../../light.css" id="themeStyle"><script id="default-settings" ></script><script src="../../../storage.js"></script><script defer src="../../../main.js"></script><noscript><link rel="stylesheet" href="../../../noscript.css"></noscript><link rel="alternate icon" type="image/png" href="../../../favicon-16x16.png"><link rel="alternate icon" type="image/png" href="../../../favicon-32x32.png"><link rel="icon" type="image/svg+xml" href="../../../favicon.svg"></head><body class="rustdoc mod"><!--[if lte IE 11]><div class="warning">This old browser is unsupported and will most likely display funky things.</div><![endif]--><nav class="mobile-topbar"><button class="sidebar-menu-toggle">☰</button><a class="sidebar-logo" href="../../../regex_syntax/index.html"><div class="logo-container"><img class="rust-logo" src="../../../rust-logo.svg" alt="logo"></div></a><h2></h2></nav><nav class="sidebar"><a class="sidebar-logo" href="../../../regex_syntax/index.html"><div class="logo-container"><img class="rust-logo" src="../../../rust-logo.svg" alt="logo"></div></a><h2 class="location"><a href="#">Module literal</a></h2><div class="sidebar-elems"><section><ul class="block"><li><a href="#structs">Structs</a></li><li><a href="#enums">Enums</a></li><li><a href="#functions">Functions</a></li></ul></section></div></nav><main><div class="width-limiter"><nav class="sub"><form class="search-form"><div class="search-container"><span></span><input class="search-input" name="search" autocomplete="off" spellcheck="false" placeholder="Click or press ‘S’ to search, ‘?’ for more options…" type="search"><div id="help-button" title="help" tabindex="-1"><a href="../../../help.html">?</a></div><div id="settings-menu" tabindex="-1"><a href="../../../settings.html" title="settings"><img width="22" height="22" alt="Change settings" src="../../../wheel.svg"></a></div></div></form></nav><section id="main-content" class="content"><div class="main-heading"><h1 class="fqn">Module <a href="../../index.html">regex_syntax</a>::<wbr><a href="../index.html">hir</a>::<wbr><a class="mod" href="#">literal</a><button id="copy-path" onclick="copy_path(this)" title="Copy item path to clipboard"><img src="../../../clipboard.svg" width="19" height="18" alt="Copy item path"></button></h1><span class="out-of-band"><a class="srclink" href="../../../src/regex_syntax/hir/literal.rs.html#1-3165">source</a> · <a id="toggle-all-docs" href="javascript:void(0)" title="collapse all docs">[<span class="inner">−</span>]</a></span></div><details class="rustdoc-toggle top-doc" open><summary class="hideme"><span>Expand description</span></summary><div class="docblock"><p>Provides literal extraction from <code>Hir</code> expressions.</p> |
| <p>An <a href="struct.Extractor.html" title="Extractor"><code>Extractor</code></a> pulls literals out of <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expressions and returns a |
| <a href="struct.Seq.html" title="Seq"><code>Seq</code></a> of <a href="struct.Literal.html" title="Literal"><code>Literal</code></a>s.</p> |
| <p>The purpose of literal extraction is generally to provide avenues for |
| optimizing regex searches. The main idea is that substring searches can be an |
| order of magnitude faster than a regex search. Therefore, if one can execute |
| a substring search to find candidate match locations and only run the regex |
| search at those locations, then it is possible for huge improvements in |
| performance to be realized.</p> |
| <p>With that said, literal optimizations are generally a black art because even |
| though substring search is generally faster, if the number of candidates |
| produced is high, then it can create a lot of overhead by ping-ponging between |
| the substring search and the regex search.</p> |
| <p>Here are some heuristics that might be used to help increase the chances of |
| effective literal optimizations:</p> |
| <ul> |
| <li>Stick to small <a href="struct.Seq.html" title="Seq"><code>Seq</code></a>s. If you search for too many literals, it’s likely |
| to lead to substring search that is only a little faster than a regex search, |
| and thus the overhead of using literal optimizations in the first place might |
| make things slower overall.</li> |
| <li>The literals in your <a href="struct.Seq.html" title="Seq"><code>Seq</code></a> shoudn’t be too short. In general, longer is |
| better. A sequence corresponding to single bytes that occur frequently in the |
| haystack, for example, is probably a bad literal optimization because it’s |
| likely to produce many false positive candidates. Longer literals are less |
| likely to match, and thus probably produce fewer false positives.</li> |
| <li>If it’s possible to estimate the approximate frequency of each byte according |
| to some pre-computed background distribution, it is possible to compute a score |
| of how “good” a <code>Seq</code> is. If a <code>Seq</code> isn’t good enough, you might consider |
| skipping the literal optimization and just use the regex engine.</li> |
| </ul> |
| <p>(It should be noted that there are always pathological cases that can make |
| any kind of literal optimization be a net slower result. This is why it |
| might be a good idea to be conservative, or to even provide a means for |
| literal optimizations to be dynamically disabled if they are determined to be |
| ineffective according to some measure.)</p> |
| <p>You’re encouraged to explore the methods on <a href="struct.Seq.html" title="Seq"><code>Seq</code></a>, which permit shrinking |
| the size of sequences in a preference-order preserving fashion.</p> |
| <p>Finally, note that it isn’t strictly necessary to use an <a href="struct.Extractor.html" title="Extractor"><code>Extractor</code></a>. Namely, |
| an <code>Extractor</code> only uses public APIs of the <a href="struct.Seq.html" title="Seq"><code>Seq</code></a> and <a href="struct.Literal.html" title="Literal"><code>Literal</code></a> types, |
| so it is possible to implement your own extractor. For example, for n-grams |
| or “inner” literals (i.e., not prefix or suffix literals). The <code>Extractor</code> |
| is mostly responsible for the case analysis over <code>Hir</code> expressions. Much of |
| the “trickier” parts are how to combine literal sequences, and that is all |
| implemented on <a href="struct.Seq.html" title="Seq"><code>Seq</code></a>.</p> |
| </div></details><h2 id="structs" class="small-section-header"><a href="#structs">Structs</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Extractor.html" title="regex_syntax::hir::literal::Extractor struct">Extractor</a></div><div class="item-right docblock-short">Extracts prefix or suffix literal sequences from <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expressions.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Literal.html" title="regex_syntax::hir::literal::Literal struct">Literal</a></div><div class="item-right docblock-short">A single literal extracted from an <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expression.</div></div><div class="item-row"><div class="item-left module-item"><a class="struct" href="struct.Seq.html" title="regex_syntax::hir::literal::Seq struct">Seq</a></div><div class="item-right docblock-short">A sequence of literals.</div></div></div><h2 id="enums" class="small-section-header"><a href="#enums">Enums</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="enum" href="enum.ExtractKind.html" title="regex_syntax::hir::literal::ExtractKind enum">ExtractKind</a></div><div class="item-right docblock-short">The kind of literals to extract from an <a href="../struct.Hir.html" title="Hir"><code>Hir</code></a> expression.</div></div></div><h2 id="functions" class="small-section-header"><a href="#functions">Functions</a></h2><div class="item-table"><div class="item-row"><div class="item-left module-item"><a class="fn" href="fn.rank.html" title="regex_syntax::hir::literal::rank fn">rank</a></div><div class="item-right docblock-short">Returns the “rank” of the given byte.</div></div></div></section></div></main><div id="rustdoc-vars" data-root-path="../../../" data-current-crate="regex_syntax" data-themes="ayu,dark,light" data-resource-suffix="" data-rustdoc-version="1.66.0-nightly (5c8bff74b 2022-10-21)" ></div></body></html> |