| <?xml version="1.0"?> |
| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one or more |
| contributor license agreements. See the NOTICE file distributed with |
| this work for additional information regarding copyright ownership. |
| The ASF licenses this file to You under the Apache License, Version 2.0 |
| (the "License"); you may not use this file except in compliance with |
| the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, software |
| distributed under the License is distributed on an "AS IS" BASIS, |
| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| See the License for the specific language governing permissions and |
| limitations under the License. |
| --> |
| <!DOCTYPE document |
| [ |
| <!ENTITY sect-num '20'> |
| ]> |
| <document prev="functions.html" next="hints_and_tips.html" id="$Id$"> |
| |
| <properties> |
| <title>User's Manual: Regular Expressions</title> |
| </properties> |
| |
| <body> |
| |
| <section name="§-num;. Regular Expressions" anchor="regex"> |
| <subsection name="§-num;.1 Overview" anchor="overview"> |
| <p> |
| JMeter includes the pattern matching software <a href="http://jakarta.apache.org/oro/">Apache Jakarta ORO</a> |
| <br/> |
| There is some documentation for this on the Jakarta web-site, for example |
| <a href="http://jakarta.apache.org/oro/api/org/apache/oro/text/regex/package-summary.html"> |
| a summary of the pattern matching characters</a> |
| </p> |
| <p> |
| There is also documentation on an older incarnation of the product at |
| <a href="http://www.savarese.org/oro/docs/OROMatcher/index.html">OROMatcher User's guide</a>, which might prove useful. |
| </p> |
| <p> |
| The pattern matching is very similar to the pattern matching in Perl. |
| A full installation of Perl will include plenty of documentation on regular expressions - look for perlrequick, perlretut, perlre, perlreref. |
| </p> |
| <p> |
| It is worth stressing the difference between "contains" and "matches", as used on the Response Assertion test element: |
| </p> |
| <ul> |
| <li> |
| "contains" means that the regular expression matched at least some part of the target, |
| so 'alphabet' "contains" 'ph.b.' because the regular expression matches the substring 'phabe'. |
| </li> |
| <li> |
| "matches" means that the regular expression matched the whole target. |
| So 'alphabet' is "matched" by 'al.*t'. |
| </li> |
| </ul> |
| <p>In this case, it is equivalent to wrapping the regular expression in ^ and $, viz '^al.*t$'. |
| </p> |
| <p>However, this is not always the case. |
| For example, the regular expression 'alp|.lp.*' is "contained" in 'alphabet', but does not match 'alphabet'. |
| </p> |
| <p>Why? Because when the pattern matcher finds the sequence 'alp' in 'alphabet', it stops trying any other combinations - and 'alp' is not the same as 'alphabet', as it does not include 'habet'. |
| </p> |
| <p> |
| Note: unlike Perl, there is no need to (i.e. do not) enclose the regular expression in //. |
| </p> |
| <p> |
| So how does one use the modifiers ismx etc if there is no trailing /? |
| The solution is to use <i>extended regular expressions</i>, i.e. /abc/i becomes (?i)abc. |
| See also <a href="#placement">Placement of modifiers</a> below. |
| </p> |
| </subsection> |
| <subsection name="§-num;.2 Examples" anchor="examples"> |
| <h3>Extract single string</h3> |
| <p> |
| Suppose you want to match the following portion of a web-page: |
| <br/> |
| <code>name="file" value="readme.txt"></code> |
| <br/> |
| and you want to extract <code>readme.txt</code>. |
| <br/> |
| A suitable regular expression would be: |
| <br/> |
| <code>name="file" value="(.+?)"></code> |
| <p> |
| The special characters above are: |
| </p> |
| <ul> |
| <li>( and ) - these enclose the portion of the match string to be returned</li> |
| <li>. - match any character</li> |
| <li>+ - one or more times</li> |
| <li>? - don't be greedy, i.e. stop when first match succeeds</li> |
| </ul> |
| <p> |
| Note: without the ?, the .+ would continue past the first <code>"></code> |
| until it found the last possible <code>"></code> - which is probably not what was intended. |
| </p> |
| <p> |
| Note: although the above expression works, it's more efficient to use the following expression: |
| <br/> |
| <code>name="file" value="([^"]+)"></code> |
| where<br></br> |
| [^"] - means match anything except "<br></br> |
| In this case, the matching engine can stop looking as soon as it sees the first <code>"</code>, |
| whereas in the previous case the engine has to check that it has found <code>"></code> rather than say <code>" ></code>. |
| </p> |
| <h3>Extract multiple strings</h3> |
| <p> |
| Suppose you want to match the following portion of a web-page:<br/> |
| <code>name="file.name" value="readme.txt"</code> |
| and you want to extract both <code>file.name</code> and <code>readme.txt</code>. |
| <br/> |
| A suitable reqular expression would be: |
| <br/> |
| <code>name="([^"]+)" value="([^"]+)"</code> |
| <br/> |
| This would create 2 groups, which could be used in the JMeter Regular Expression Extractor template as $1$ and $2$. |
| </p> |
| <p> |
| The JMeter Regex Extractor saves the values of the groups in additional variables. |
| </p> |
| <p> |
| For example, assume: |
| </p> |
| <ul> |
| <li>Reference Name: MYREF</li> |
| <li>Regex: name="(.+?)" value="(.+?)"</li> |
| <li>Template: $1$$2$</li> |
| </ul> |
| <note>Do not enclose the regular expression in / /</note> |
| <p> |
| The following variables would be set: |
| </p> |
| <ul> |
| <li>MYREF: file.namereadme.txt</li> |
| <li>MYREF_g0: name="file.name" value="readme.txt"</li> |
| <li>MYREF_g1: file.name</li> |
| <li>MYREF_g2: readme.txt</li> |
| </ul> |
| These variables can be referred to later on in the JMeter test plan, as ${MYREF}, ${MYREF_g1} etc |
| </p> |
| </subsection> |
| <subsection name="§-num;.3 Line mode" anchor="line_mode"> |
| <p>The pattern matching behaves in various slightly different ways, |
| depending on the setting of the multi-line and single-line modifiers. |
| Note that the single-line and multi-line operators have nothing to do with each other; |
| they can be specified independently. |
| </p> |
| <h3>Single-line mode</h3> |
| <p> |
| Single-line mode only affects how the '.' meta-character is interpreted. |
| </p> |
| <p> |
| Default behaviour is that '.' matches any character except newline. |
| In single-line mode, '.' also matches newline. |
| </p> |
| |
| <h3>Multi-line mode</h3> |
| <p> |
| Multi-line mode only affects how the meta-characters '^' and '$' are interpreted. |
| </p> |
| <p> |
| Default behaviour is that '^' and '$' only match at the very beginning and end of the string. |
| When Multi-line mode is used, the '^' metacharacter matches at the beginning of every line, |
| and the '$' metacharacter matches at the end of every line.</p> |
| |
| </subsection> |
| |
| <subsection name="§-num;.4 Meta characters" anchor="meta_chars"> |
| <p> |
| Regular expressions use certain characters as meta characters - these characters have a special meaning to the RE engine. |
| Such characters must be escaped by preceeding them with \ (backslash) in order to treat them as ordinary characters. |
| Here is a list of the meta characters and their meaning (please check the ORO documentation if in doubt). |
| </p> |
| <ul> |
| <li>( ) - grouping</li> |
| <li>[ ] - character classes</li> |
| <li>{ } - repetition</li> |
| <li>* + ? - repetition</li> |
| <li>. - wild-card character</li> |
| <li>\ - escape character</li> |
| <li>| - alternatives</li> |
| <li>^ $ - start and end of string or line</li> |
| </ul> |
| <note> |
| <p>Please note that ORO does not support the \Q and \E meta-characters. |
| [In other RE engines, these can be used to quote a portion of an RE so that the meta-characters stand for themselves.] |
| You can use function to do the equivalent, see <a href="functions.html#__escapeOroRegexpChars">${__escapeOroRegexpChars(valueToEscape)}</a>. |
| </p> |
| </note> |
| <p> |
| The following Perl5 extended regular expressions are supported by ORO. |
| |
| <dl> |
| <dt>(?#text)</dt> |
| <dd>An embedded comment causing text to be ignored.</dd> |
| <dt>(?:regexp)</dt> |
| <dd>Groups things like "()" but doesn't cause the group match to be saved.</dd> |
| <dt>(?=regexp)</dt> |
| <dd>A zero-width positive lookahead assertion. For example, \w+(?=\s) matches a word followed by whitespace, without including whitespace in the MatchResult.</dd> |
| <dt>(?!regexp)</dt> |
| <dd>A zero-width negative lookahead assertion. For example foo(?!bar) matches any occurrence of "foo" that isn't followed by "bar". Remember that this is a zero-width assertion, which means that a(?!b)d will match ad because a is followed by a character that is not b (the d) and a d follows the zero-width assertion.</dd> |
| <dt>(?imsx)</dt> |
| <dd>One or more embedded pattern-match modifiers. i enables case insensitivity, m enables multiline treatment of the input, s enables single line treatment of the input, and x enables extended whitespace comments.</dd> |
| </dl> |
| <b>Note that <code>(?<=regexp)</code> - lookbehind - is not supported.</b> |
| </p> |
| |
| </subsection> |
| |
| <subsection name="§-num;.5 Placement of modifiers" anchor="placement"> |
| <p> |
| Modifiers can be placed anywhere in the regex, and apply from that point onwards. |
| [A bug in ORO means that they cannot be used at the very end of the regex. |
| However they would have no effect there anyway.] |
| </p> |
| <p> |
| The single-line (?s) and multi-line (?m) modifiers are normally placed at the start of the regex. |
| </p> |
| <p> |
| The ignore-case modifier (?i) may be usefully applied to just part of a regex, |
| for example: |
| <pre> |
| Match ExAct case or (?i)ArBiTrARY(?-i) case |
| </pre> |
| </p> |
| </subsection> |
| </section> |
| |
| <section name="§-num;.6 Testing Regular Expressions" anchor="testing_expressions"> |
| <p> |
| Since JMeter 2.4, the listener <a href="component_reference.html#View_Results_Tree">View Results Tree</a> |
| include a RegExp Tester to test regular expressions directly on sampler response data. |
| </p> |
| <p> |
| There is a <a href="http://jakarta.apache.org/oro/demo.html">demo</a> applet for Apache JMeter ORO. |
| </p> |
| <p> |
| Another approach is to use a simple test plan to test the regular expressions. |
| The Java Request sampler can be used to generate a sample, or the HTTP Sampler can be used to load a file. |
| Add a Debug Sampler and a Tree View Listener and changes to the regular expression can be tested quickly, |
| without needing to access any external servers. |
| </p> |
| </section> |
| |
| </body> |
| </document> |