| <?xml version="1.0"?> |
| <!-- |
| ~ Licensed to the Apache Software Foundation (ASF) under one or more |
| ~ contributor license agreements. See the NOTICE file distributed with |
| ~ this work for additional information regarding copyright ownership. |
| ~ The ASF licenses this file to you under the Apache License, Version 2.0 |
| ~ (the "License"); you may not use this file except in compliance with |
| ~ the License. You may obtain a copy of the License at |
| ~ |
| ~ http://www.apache.org/licenses/LICENSE-2.0 |
| ~ |
| ~ Unless required by applicable law or agreed to in writing, software |
| ~ distributed under the License is distributed on an "AS IS" BASIS, |
| ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| ~ See the License for the specific language governing permissions and |
| ~ limitations under the License. |
| --> |
| |
| <!DOCTYPE document |
| [ |
| <!ENTITY sect-num '21'> |
| ]> |
| <document prev="functions.html" next="hints_and_tips.html" id="$Id$"> |
| |
| <properties> |
| <title>User's Manual: Regular Expressions</title> |
| </properties> |
| |
| <body> |
| |
| <section name="§-num;. Regular Expressions" anchor="regex"> |
| <subsection name="§-num;.1 Overview" anchor="overview"> |
| <p> |
| JMeter includes the pattern matching software <a href="http://attic.apache.org/projects/jakarta-oro.html">Apache Jakarta ORO</a> |
| <br/> |
| There is some documentation for this on the Jakarta web-site, for example |
| <a href="http://archimedes.fas.harvard.edu/scrapbook/jakarta-oro-2.0.6/docs/api/org/apache/oro/text/regex/package-summary.html"> |
| a summary of the pattern matching characters</a> |
| </p> |
| <p> |
| There is also documentation on an older incarnation of the product at |
| <a href="http://www.savarese.org/oro/docs/OROMatcher/index.html">OROMatcher User's guide</a>, which might prove useful. |
| </p> |
| <note>With JMeter version 5.5 the Regex implementation can be switched from Oro to the JDK based one by setting |
| the JMeter property <code>jmeter.regex.engine</code> to some value different than <code>oro</code>.</note> |
| <p> |
| The pattern matching is very similar to the pattern matching in Perl. |
| A full installation of Perl will include plenty of documentation on regular expressions - look for <code>perlrequick</code>, |
| <code>perlretut</code>, <code>perlre</code> and <code>perlreref</code>. |
| </p> |
| <p> |
| It is worth stressing the difference between "<em>contains</em>" and "<em>matches</em>", as used on the Response Assertion test element: |
| </p> |
| <dl> |
| <dt>"<em>contains</em>"</dt><dd> means that the regular expression matched at least some part of the target, |
| so '<code>alphabet</code>' "<em>contains</em>" '<code>ph.b.</code>' because the regular expression matches the substring '<code>phabe</code>'. |
| </dd> |
| <dt> |
| "<em>matches</em>"</dt><dd> means that the regular expression matched the whole target. |
| So '<code>alphabet</code>' is "<em>matched</em>" by '<code>al.*t</code>'. |
| </dd> |
| </dl> |
| <p>In this case, it is equivalent to wrapping the regular expression in <code>^</code> and <code>$</code>, viz '<code>^al.*t$</code>'. |
| </p> |
| <p>However, this is not always the case. |
| For example, the regular expression '<code>alp|.lp.*</code>' is "<em>contained</em>" in '<code>alphabet</code>', |
| but does not "<em>match</em>" '<code>alphabet</code>'. |
| </p> |
| <p>Why? Because when the pattern matcher finds the sequence '<code>alp</code>' in '<code>alphabet</code>', it stops trying any other |
| combinations - and '<code>alp</code>' is not the same as '<code>alphabet</code>', as it does not include '<code>habet</code>'. |
| </p> |
| <note> |
| Unlike Perl, there is no need to (i.e. do not) enclose the regular expression in <code>//</code>. |
| </note> |
| <p> |
| So how does one use the modifiers <code>ismx</code> etc. if there is no trailing <code>/</code>? |
| The solution is to use <i>extended regular expressions</i>, i.e. <code>/abc/i</code> becomes <code>(?i)abc</code>. |
| See also <a href="#placement">Placement of modifiers</a> below. |
| </p> |
| </subsection> |
| <subsection name="§-num;.2 Examples" anchor="examples"> |
| <h3>Extract single string</h3> |
| <p> |
| Suppose you want to match the following portion of a web-page: |
| <br/> |
| <code>name="file" value="readme.txt"></code> |
| <br/> |
| and you want to extract <code>readme.txt</code>. |
| <br/> |
| A suitable regular expression would be: |
| <br/> |
| <code>name="file" value="(.+?)"></code> |
| <p> |
| The special characters above are: |
| </p> |
| <dl> |
| <dt><code>(</code> and <code>)</code></dt><dd>these enclose the portion of the match string to be returned</dd> |
| <dt><code>.</code></dt><dd>match any character</dd> |
| <dt><code>+</code></dt><dd>one or more times</dd> |
| <dt><code>?</code></dt><dd>don't be greedy, i.e. stop when first match succeeds</dd> |
| </dl> |
| <p> |
| Note: without the <code>?</code>, the <code>.+</code> would continue past the first <code>"></code> |
| until it found the last possible <code>"></code> - which is probably not what was intended. |
| </p> |
| <p> |
| Note: although the above expression works, it's more efficient to use the following expression: |
| <br/> |
| <code>name="file" value="([^"]+)"></code> |
| where<br></br> |
| <code>[^"]</code> - means match anything except <code>"</code><br></br> |
| In this case, the matching engine can stop looking as soon as it sees the first <code>"</code>, |
| whereas in the previous case the engine has to check that it has found <code>"></code> rather than say <code>" ></code>. |
| </p> |
| <h3>Extract multiple strings</h3> |
| <p> |
| Suppose you want to match the following portion of a web-page:<br/> |
| <code>name="file.name" value="readme.txt"</code> |
| and you want to extract both <code>file.name</code> and <code>readme.txt</code>. |
| <br/> |
| A suitable regular expression would be: |
| <br/> |
| <code>name="([^"]+)" value="([^"]+)"</code> |
| <br/> |
| This would create 2 groups, which could be used in the JMeter Regular Expression Extractor template as <code>$1$</code> and <code>$2$</code>. |
| </p> |
| <p> |
| The JMeter Regex Extractor saves the values of the groups in additional variables. |
| </p> |
| <p> |
| For example, assume: |
| </p> |
| <ul> |
| <li>Reference Name: <code>MYREF</code></li> |
| <li>Regex: <code>name="(.+?)" value="(.+?)"</code></li> |
| <li>Template: <code>$1$$2$</code></li> |
| </ul> |
| <note>Do not enclose the regular expression in <code>/ /</code></note> |
| <p> |
| The following variables would be set: |
| </p> |
| <dl> |
| <dt><code>MYREF</code></dt><dd><code>file.namereadme.txt</code></dd> |
| <dt><code>MYREF_g0</code></dt><dd><code>name="file.name" value="readme.txt"</code></dd> |
| <dt><code>MYREF_g1</code></dt><dd><code>file.name</code></dd> |
| <dt><code>MYREF_g2</code></dt><dd><code>readme.txt</code></dd> |
| </dl> |
| These variables can be referred to later on in the JMeter test plan, as <code>${MYREF}</code>, <code>${MYREF_g1}</code> etc. |
| </p> |
| </subsection> |
| <subsection name="§-num;.3 Line mode" anchor="line_mode"> |
| <p>The pattern matching behaves in various slightly different ways, |
| depending on the setting of the multi-line and single-line modifiers. |
| Note that the single-line and multi-line operators have nothing to do with each other; |
| they can be specified independently. |
| </p> |
| <h3>Single-line mode</h3> |
| <p> |
| Single-line mode only affects how the '<code>.</code>' meta-character is interpreted. |
| </p> |
| <p> |
| Default behaviour is that '<code>.</code>' matches any character except newline. |
| In single-line mode, '<code>.</code>' also matches newline. |
| </p> |
| |
| <h3>Multi-line mode</h3> |
| <p> |
| Multi-line mode only affects how the meta-characters '<code>^</code>' and '<code>$</code>' are interpreted. |
| </p> |
| <p> |
| Default behaviour is that '<code>^</code>' and '<code>$</code>' only match at the very beginning and end of the string. |
| When Multi-line mode is used, the '<code>^</code>' metacharacter matches at the beginning of every line, |
| and the '<code>$</code>' metacharacter matches at the end of every line.</p> |
| |
| </subsection> |
| |
| <subsection name="§-num;.4 Meta characters" anchor="meta_chars"> |
| <p> |
| Regular expressions use certain characters as meta characters - these characters have a special meaning to the RE engine. |
| Such characters must be escaped by preceding them with <code>\</code> (backslash) in order to treat them as ordinary characters. |
| Here is a list of the meta characters and their meaning (please check the ORO documentation if in doubt). |
| </p> |
| <dl> |
| <dt><code>(</code> and <code>)</code></dt><dd>grouping</dd> |
| <dt><code>[</code> and <code>]</code></dt><dd>character classes</dd> |
| <dt><code>{</code> and <code>}</code></dt><dd>repetition</dd> |
| <dt><code>*</code>, <code>+</code> and <code>?</code></dt><dd>repetition</dd> |
| <dt><code>.</code></dt><dd>wild-card character</dd> |
| <dt><code>\</code></dt><dd>escape character</dd> |
| <dt><code>|</code></dt><dd>alternatives</dd> |
| <dt><code>^</code> and <code>$</code></dt><dd>start and end of string or line</dd> |
| </dl> |
| <note> |
| Please note that ORO does not support the <code>\Q</code> and <code>\E</code> meta-characters. |
| [In other RE engines, these can be used to quote a portion of an RE so that the meta-characters stand for themselves.] |
| You can use function to do the equivalent, see <a href="functions.html#__escapeOroRegexpChars">${__escapeOroRegexpChars(valueToEscape)}</a>. |
| </note> |
| <p> |
| The following Perl5 extended regular expressions are supported by ORO. |
| |
| <dl> |
| <dt><code>(?#text)</code></dt> |
| <dd>An embedded comment causing text to be ignored.</dd> |
| <dt><code>(?:regexp)</code></dt> |
| <dd>Groups things like "<code>()</code>" but doesn't cause the group match to be saved.</dd> |
| <dt><code>(?=regexp)</code></dt> |
| <dd>A zero-width positive lookahead assertion. For example, <code>\w+(?=\s)</code> matches a word followed by whitespace, without including whitespace in the MatchResult.</dd> |
| <dt><code>(?!regexp)</code></dt> |
| <dd>A zero-width negative lookahead assertion. For example <code>foo(?!bar)</code> matches any occurrence of "<code>foo</code>" that |
| isn't followed by "<code>bar</code>". Remember that this is a zero-width assertion, which means that <code>a(?!b)d</code> will |
| match <code>ad</code> because <code>a</code> is followed by a character that is not <code>b</code> (the <code>d</code>) and a <code>d</code> |
| follows the zero-width assertion.</dd> |
| <dt><code>(?imsx)</code></dt> |
| <dd>One or more embedded pattern-match modifiers. <code>i</code> enables case insensitivity, <code>m</code> enables multiline treatment |
| of the input, <code>s</code> enables single line treatment of the input, and <code>x</code> enables extended whitespace comments.</dd> |
| </dl> |
| <b>Note that <code>(?<=regexp)</code> - lookbehind - is not supported.</b> |
| </p> |
| |
| </subsection> |
| |
| <subsection name="§-num;.5 Placement of modifiers" anchor="placement"> |
| <p> |
| Modifiers can be placed anywhere in the regex, and apply from that point onwards. |
| [A bug in ORO means that they cannot be used at the very end of the regex. |
| However they would have no effect there anyway.] |
| </p> |
| <p> |
| The single-line <code>(?s)</code> and multi-line <code>(?m)</code> modifiers are normally placed at the start of the regex. |
| </p> |
| <p> |
| The ignore-case modifier <code>(?i)</code> may be usefully applied to just part of a regex, |
| for example:</p> |
| <source> |
| Match ExAct case or (?i)ArBiTrARY(?-i) case |
| </source> |
| would match <code>Match ExAct case or arbitrary case</code> as well as <code>Match ExAct case or ARBitrary case</code>, but not <code>Match exact case or ArBiTrARY case</code>. |
| </subsection> |
| </section> |
| |
| <section name="§-num;.6 Testing Regular Expressions" anchor="testing_expressions"> |
| <p> |
| Since JMeter 2.4, the listener <a href="component_reference.html#View_Results_Tree">View Results Tree</a> |
| include a RegExp Tester to test regular expressions directly on sampler response data. |
| </p> |
| <p> |
| There is a <a href="http://www.regexplanet.com/advanced/java/index.html">Website</a> to test Java Regular expressions. |
| </p> |
| <p> |
| Another approach is to use a simple test plan to test the regular expressions. |
| The Java Request sampler can be used to generate a sample, or the HTTP Sampler can be used to load a file. |
| Add a Debug Sampler and a Tree View Listener and changes to the regular expression can be tested quickly, |
| without needing to access any external servers. |
| </p> |
| </section> |
| |
| </body> |
| </document> |