You can use the following string functions in Drill queries:
Function | Return Type |
---|---|
[BYTE_SUBSTR]({{ site.baseurl }}/docs/string-manipulation/#byte_substr) | BINARY or VARCHAR |
[CHAR_LENGTH]({{ site.baseurl }}/docs/string-manipulation/#char_length) | INTEGER |
[CONCAT]({{ site.baseurl }}/docs/string-manipulation/#concat) | VARCHAR |
[ILIKE]({{ site.baseurl }}/docs/string-manipulation/#ilike) | BOOLEAN |
[INITCAP]({{ site.baseurl }}/docs/string-manipulation/#initcap) | VARCHAR |
[ISNUMERIC]({{ site.baseurl }}/docs/string-manipulation/#isnumeric) | INTEGER |
[LENGTH]({{ site.baseurl }}/docs/string-manipulation/#length) | INTEGER |
[LIKE]({{ site.baseurl }}/docs/string-manipulation/#like) | BOOLEAN |
[LOWER]({{ site.baseurl }}/docs/string-manipulation/#lower) | VARCHAR |
[LPAD]({{ site.baseurl }}/docs/string-manipulation/#lpad) | VARCHAR |
[LTRIM]({{ site.baseurl }}/docs/string-manipulation/#ltrim) | VARCHAR |
[POSITION]({{ site.baseurl }}/docs/string-manipulation/#position) | INTEGER |
[REGEXP_MATCHES]({{ site.baseurl }}/docs/string-manipulation/#regexp_matches) | BOOLEAN |
[REGEXP_REPLACE]({{ site.baseurl }}/docs/string-manipulation/#regexp_replace) | VARCHAR |
[REPEAT]({{ site.baseurl }}/docs/string-manipulation/#repeat) | VARCHAR |
[REPLACE]({{ site.baseurl }}/docs/string-manipulation/#replace) | VARCHAR |
[REVERSE]({{ site.baseurl }}/docs/string-manipulation/#reverse) | VARCHAR |
[RPAD]({{ site.baseurl }}/docs/string-manipulation/#rpad) | VARCHAR |
[RTRIM]({{ site.baseurl }}/docs/string-manipulation/#rtrim) | VARCHAR |
[SPLIT]({{ site.baseurl }}/docs/string-manipulation/#split) | INTEGER |
[SPLIT_PART]({{ site.baseurl }}/docs/string-manipulation/#split_part) | VARCHAR |
[STRPOS]({{ site.baseurl }}/docs/string-manipulation/#strpos) | INTEGER |
[SUBSTR]({{ site.baseurl }}/docs/string-manipulation/#substr) | VARCHAR |
[TOASCII]({{ site.baseurl }}/docs/string-manipulation/#toascii) | VARCHAR |
[TRIM]({{ site.baseurl }}/docs/string-manipulation/#trim) | VARCHAR |
[UPPER]({{ site.baseurl }}/docs/string-manipulation/#upper) | VARCHAR |
{% include startnote.html %} As is the case for indexes througout SQL, indexes into character strings are 1-based. {% include endnote.html %}
Returns in binary format a substring of the input string.
BYTE_SUBSTR( string-expression, start [, length [(string-expression)]] )
string-expression is the entire string, a column name having string values for example. start is a start position in the string. 1 is the first position. length is the number of characters to the right of the start position to include in the output expressed in either of the following ways:
Combine the use of BYTE_SUBSTR and CONVERT_FROM to separate parts of a HBase composite key for example.
A composite HBase row key consists of strings followed by a reverse timestamp (long). For example: AMZN_9223370655563575807. Use BYTE_SUBSTR and CONVERT_FROM to separate parts of a HBase composite key.
SELECT CONVERT_FROM(BYTE_SUBSTR(row_key,6,19),'UTF8') FROM root.`mydata` LIMIT 1; |---------------------| | EXPR$0 | |---------------------| | 9223370655563575807 | |---------------------| 1 rows selected (0.271 seconds) SELECT CONVERT_FROM(BYTE_SUBSTR(row_key,6,length(row_key)),'UTF8') FROM root.`mydata` LIMIT 1; |---------------------| | EXPR$0 | |---------------------| | 9223370655563575807 | |---------------------| 1 rows selected (0.271 seconds)
Returns the number of characters in the input string.
CHAR_LENGTH(string)
You can use the alias CHARACTER_LENGTH.
SELECT CHAR_LENGTH('Drill rocks') FROM (VALUES(1)); |------------| | EXPR$0 | |------------| | 11 | |------------| 1 row selected (0.127 seconds)
Concatenates arguments.
CONCAT(string [, string [, ...] )
SELECT CONCAT('Drill', ' ', 1.0, ' ', 'release') FROM (VALUES(1)); |-------------------| | EXPR$0 | |-------------------| | Drill 1.0 release | |-------------------| 1 row selected (0.134 seconds)
Alternatively, you can use the [string concatenation operation]({{ site.baseurl }}/docs/operators/#string-concatenate-operator) to concatenate strings.
Performs a case-insensitive comparison of the input string with a pattern and returns true in the case of a match.
Note that it is necessary to quote ILIKE function calls using backticks because ILIKE is also a SQL keyword.
`ILIKE`(string, pattern)
SELECT `ILIKE`('abcde', 'ABC%') FROM (VALUES(1)); |--------| | EXPR$0 | |--------| | true | |--------| 1 row selected (0.185 seconds)
SELECT `ILIKE`(last_name, 'Spence') FROM cp.`employee.json` limit 3; |--------| | EXPR$0 | |--------| | false | | false | | true | |--------| 3 rows selected (0.17 seconds)
Returns the string using initial caps.
INITCAP(string)
SELECT INITCAP('apache drill release 1.0') FROM (VALUES(1)); |--------------------------| | EXPR$0 | |--------------------------| | Apache Drill Release 1.0 | |--------------------------| 1 row selected (0.106 seconds)
Returns a 1 if the string argument matches the regular expression [-+]?\d+(\.\d+)?
otherwise a 0.
ISNUMERIC(string)
SELECT ISNUMERIC('3.1415926'); |--------| | EXPR$0 | |--------| | 1 | |--------| 1 row selected (0.185 seconds)
Returns the number of characters in the string.
LENGTH( string [, encoding] )
SELECT LENGTH('apache drill release 1.0') FROM (VALUES(1)); |------------| | EXPR$0 | |------------| | 24 | |------------| 1 row selected (0.127 seconds) SELECT LENGTH(row_key, 'UTF8') FROM root.`students`; |------------| | EXPR$0 | |------------| | 8 | | 8 | | 8 | | 8 | |------------| 4 rows selected (0.259 seconds)
Performs a case-sensitive comparison of the input string with a pattern and returns true in the case of a match.
LIKE(string, pattern)
SELECT LIKE('abcde', 'ABC%') FROM (VALUES(1)); |--------| | EXPR$0 | |--------| | false | |--------| 1 row selected (0.185 seconds)
SELECT LIKE(last_name, 'Spence') FROM cp.`employee.json` limit 3; |--------| | EXPR$0 | |--------| | false | | false | | true | |--------| 3 rows selected (0.17 seconds)
Alternatively, you can use the [like operator]({{ site.baseurl }}/docs/operators/#pattern-matching-operators) to compare a string with a pattern.
Converts the characters in the input string to lowercase.
LOWER (string)
SELECT LOWER('Apache Drill') FROM (VALUES(1)); |---------------| | EXPR$0 | |---------------| | apache drill | |---------------| 1 row selected (0.103 seconds)
Pads the string to the length specified by prepending the fill or a space. Truncates the string if it is longer than the specified length. .
LPAD (string, length [, fill text])
SELECT LPAD('Release 1.0', 27, 'of Apache Drill 1.0') FROM (VALUES(1)); |------------------------------| | EXPR$0 | |------------------------------| | of Apache Drill Release 1.0 | |------------------------------| 1 row selected (0.132 seconds)
Removes any characters from the beginning of string1 that match the characters in string2.
LTRIM(string1, string2)
SELECT LTRIM('Apache Drill', 'Apache ') FROM (VALUES(1)); |------------| | EXPR$0 | |------------| | Drill | |------------| 1 row selected (0.131 seconds) SELECT LTRIM('A powerful tool Apache Drill', 'Apache ') FROM (VALUES(1)); |----------------------------| | EXPR$0 | |----------------------------| | owerful tool Apache Drill | |----------------------------| 1 row selected (0.1 seconds)
Returns the location of the first occurrence of a substring of the input string, or 0 if the substring does not occur.
POSITION('substring' in 'string')
SELECT POSITION('c' in 'Apache Drill') FROM (VALUES(1)); |------------| | EXPR$0 | |------------| | 4 | |------------| 1 row selected (0.12 seconds)
Matches a regexp pattern to a target string. Returns a boolean value: true if the value matches the regexp, false if the value does not match the regexp.
REGEXP_MATCHES(string_expression, pattern)
string_expression is the string to be matched.
pattern is the Java regular expression pattern.
{% include startnote.html %} The regular expression . matches any character except a line terminator unless the DOTALL flag is specified.
By default, the regular expressions ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire input sequence. If MULTILINE mode is activated then ^ matches at the beginning of input and after any line terminator except at the end of input. When in MULTILINE mode $ matches just before a line terminator or the end of the input sequence. {% include endnote.html %}
Shows several POSIX metacharacters that return true for the given string expressions:
select regexp_matches('abc', 'abc|def') as a, regexp_matches('cat', '[hc]at$') as b, regexp_matches('cat', '.at') as c, regexp_matches('cat', '[hc]at') as d, regexp_matches('cat', '[^b]at') as e, regexp_matches('cat', '^[hc]at') as f, regexp_matches('[a]', '\[.\]') as g, regexp_matches('sat', 's.*') as h, regexp_matches('sat','[^hc]at') as i, regexp_matches('hat', '[hc]?at') as j, regexp_matches('cchchat', '[hc]*at') as k, regexp_matches('chat', '[hc]+at') as l; |------|------|------|------|------|------|------|------|------|------|------|------| | a | b | c | d | e | f | g | h | i | j | k | l | |------|------|------|------|------|------|------|------|------|------|------|------| | true | true | true | true | true | true | true | true | true | true | true | true | |------|------|------|------|------|------|------|------|------|------|------|------|
Shows case-sensitivity:
select regexp_matches('abc', 'A*.C'); |--------| | EXPR$0 | |--------| | false | |--------| select regexp_matches('abc', 'a*.c'); |--------| | EXPR$0 | |--------| | true | |--------|
Shows how to turn on different pattern flags to enable case-insensitive matching, to permit whitespace and comments in pattern, etc. Complete list of flags you can find in Java Doc):
select regexp_matches('String with newline character ', '(?s).*'); +--------+ | EXPR$0 | +--------+ | true | +--------+ select regexp_matches('ABC', '(?i)abc'); +--------+ | EXPR$0 | +--------+ | true | +--------+ select regexp_matches('abc', '(?x)a b c'); +--------+ | EXPR$0 | +--------+ | true | +--------+
Substitutes new text for substrings that match Java regular expression patterns.
REGEXP_REPLACE(source_char, pattern, replacement)
source is the character expression to be replaced.
pattern is the regular expression.
replacement is the string to substitute for the source.
Capturing groups may defined in pattern using parentheses (...)
and referenced by number from replacement using $1
, $2
, etc.
Replace a‘s with b’s in this string.
SELECT REGEXP_REPLACE('abc, acd, ade, aef', 'a', 'b') FROM (VALUES(1)); |---------------------| | EXPR$0 | |---------------------| | bbc, bcd, bde, bef | |---------------------| 1 row selected (0.105 seconds)
Use the regular expression a followed by a period (.) in the same query to replace all a's and the subsequent character.
SELECT REGEXP_REPLACE('abc, acd, ade, aef', 'a.','b') FROM (VALUES(1)); |----------------| | EXPR$0 | |----------------| | bc, bd, be, bf | |----------------| 1 row selected (0.113 seconds)
Extract the text inside double quotes using a capturing group.
SELECT REGEXP_REPLACE('The quick "brown fox" jumps over...', '.*"(.*)".*', '$1') FROM (VALUES(1)); |-----------| | EXPR$0 | |-----------| | brown fox | |-----------|
Returns the input string repeated the specified number of times.
REPEAT(string, n_times)
SELECT REPEAT('Drill', 3); |-----------------| | EXPR$0 | |-----------------| | DrillDrillDrill | |-----------------|
Replaces all occurrences of the specified substring with another specified substring.
REPLACE(string, from, to)
SELECT REPLACE('The quick brown fox jumps...', 'jumps', 'sleeps') |-------------------------------| | EXPR$0 | |-------------------------------| | The quick brown fox sleeps... | |-------------------------------|
Returns the reverse of the input string.
REVERSE (string)
SELECT REVERSE('abcdefg'); |---------| | EXPR$0 | |---------| | gfedcba | |---------|
Pads the string to the length specified. Appends the text you specify after the fill keyword using spaces for the fill if you provide no text or insufficient text to achieve the length. Truncates the string if it is longer than the specified length.
RPAD (string, length [, fill text])
SELECT RPAD('Apache Drill ', 22, 'Release 1.0') FROM (VALUES(1)); |------------------------| | EXPR$0 | |------------------------| | Apache Drill Release 1 | |------------------------| 1 row selected (0.107 seconds)
Removes any characters from the end of string1 that match the characters in string2.
RTRIM(string1, string2)
SELECT RTRIM('Apache Drill', 'Drill ') FROM (VALUES(1)); |--------| | EXPR$0 | |--------| | Apache | |--------| 1 row selected (0.135 seconds) SELECT RTRIM('1.0 Apache Tomcat 1.0', 'Drill 1.0') from (VALUES(1)); |--------------------| | EXPR$0 | |--------------------| | 1.0 Apache Tomcat | |--------------------| 1 row selected (0.102 seconds)
Splits the input string into a list of substrings using the specified delimiter character.
SPLIT(string, delimiter)
The delimiter must not be null and must contain a single character.
SELECT split('The quick brown fox jumps...', ' '); |------------------------------------| | EXPR$0 | |------------------------------------| | [The, quick, brown, fox, jumps...] | |------------------------------------|
Return the string part at start or from start to end after splitting the input string using the specified delimiter.
SPLIT_PART(string, delimiter, start[, end])
The delimiter string may be multiple characters long. The start can be a positive integer, or a negative integer. The end must have the same sign as and be greater than or equal to start if provided. The start and end count from beginning if are positive, from end if are negative.
SELECT split_part('The | quick | brown | fox | jumps', ' | ', 4); |--------| | EXPR$0 | |--------| | fox | |--------| SELECT split_part('The | quick | brown | fox | jumps', ' | ', -2); |--------| | EXPR$0 | |--------| | fox | |--------| SELECT split_part('The | quick | brown | fox | jumps', ' | ', 4, 5); |--------------| | EXPR$0 | |--------------| | fox | jumps | |--------------| SELECT split_part('The | quick | brown | fox | jumps', ' | ', 4, 10); |--------------| | EXPR$0 | |--------------| | fox | jumps | |--------------| SELECT split_part('The | quick | brown | fox | jumps', ' | ', -2, -1); |--------------| | EXPR$0 | |--------------| | fox | jumps | |--------------| SELECT split_part('The | quick | brown | fox | jumps', ' | ', -10, -4); |--------------| | EXPR$0 | |--------------| | The | quick | |--------------|
Returns the location of the first occurrence of a substring of the input string, or 0 if the substring does not occur.
STRPOS(string, substring)
SELECT STRPOS('Apache Drill', 'Drill') FROM (VALUES(1)); |--------| | EXPR$0 | |--------| | 8 | |--------| 1 row selected (0.22 seconds)
Returns
SUBSTR(string, start) SUBSTR(string, start, length) SUBSTR(string, regexp)
''
is returned.SELECT SUBSTR('Apache Drill', 8) FROM (VALUES(1)); |--------| | EXPR$0 | |--------| | Drill | |--------| 1 row selected (0.134 seconds) SELECT SUBSTR('Apache Drill', 3, 2) FROM (VALUES(1)); |--------| | EXPR$0 | |--------| | ac | |--------| 1 row selected (0.129 seconds) SELECT SUBSTR('On 1970-01-01 the quick brown fox jumped...', '[\d-]+') |------------| | EXPR$0 | |------------| | 1970-01-01 | |------------|
Removes any characters from the beginning, end, or both sides of string2 that match the characters in string1.
TRIM ([leading | trailing | both] [string1] from string2)
SELECT TRIM(trailing 'l' from 'Drill') FROM (VALUES(1)); |------------| | EXPR$0 | |------------| | Dri | |------------| 1 row selected (0.172 seconds) SELECT TRIM(both 'l' from 'long live Drill') FROM (VALUES(1)); |---------------| | EXPR$0 | |---------------| | ong live Dri | |---------------| 1 row selected (0.104 seconds) SELECT TRIM(leading 'l' from 'long live Drill') FROM (VALUES(1)); |-----------------| | EXPR$0 | |-----------------| | ong live Drill | |-----------------| 1 row selected (0.101 seconds)
Converts the characters in the input string to uppercase.
UPPER (string)
SELECT UPPER('Apache Drill') FROM (VALUES(1)); |--------------| | EXPR$0 | |--------------| | APACHE DRILL | |--------------| 1 row selected (0.081 seconds)