src/test/modules/test_regex/README - cloudberry - Git at Google

 test_regex is a module for testing the regular expression package.
 It is mostly meant to allow us to absorb Tcl's regex test suite.
 Therefore, there are provisions to exercise regex features that
 aren't currently exposed at the SQL level by PostgreSQL.

 Currently, one function is provided:

 test_regex(pattern text, string text, flags text) returns setof text[]

 Reports an error if the pattern is an invalid regex.  Otherwise,
 the first row of output contains the number of subexpressions,
 followed by words reporting set bit(s) in the regex's re_info field.
 If the pattern doesn't match the string, that's all.
 If the pattern does match, the next row contains the whole match
 as the first array element.  If there are parenthesized subexpression(s),
 following array elements contain the matches to those subexpressions.
 If the "g" (glob) flag is set, then additional row(s) of output similarly
 report any additional matches.

 The "flags" argument is a string of zero or more single-character
 flags that modify the behavior of the regex package or the test
 function.  As described in Tcl's reg.test file:

 The flag characters are complex and a bit eclectic.  Generally speaking,
 lowercase letters are compile options, uppercase are expected re_info
 bits, and nonalphabetics are match options, controls for how the test is
 run, or testing options.  The one small surprise is that AREs are the
 default, and you must explicitly request lesser flavors of RE.  The flags
 are as follows.  It is admitted that some are not very mnemonic.

 	-	no-op (placeholder)
 	0	report indices not actual strings
 		(This substitutes for Tcl's -indices switch)
 	!	expect partial match, report start position anyway
 	%	force small state-set cache in matcher (to test cache replace)
 	^	beginning of string is not beginning of line
 	$	end of string is not end of line
 	*	test is Unicode-specific, needs big character set
 	+	provide fake xy equivalence class and ch collating element
 		(Note: the equivalence class is implemented, the
 		collating element is not; so references to [.ch.] fail)
 	,	set REG_PROGRESS (only useful in REG_DEBUG builds)
 	.	set REG_DUMP (only useful in REG_DEBUG builds)
 	:	set REG_MTRACE (only useful in REG_DEBUG builds)
 	;	set REG_FTRACE (only useful in REG_DEBUG builds)

 	&	test as both ARE and BRE
 		(Not implemented in Postgres, we use separate tests)
 	b	BRE
 	e	ERE
 	a	turn advanced-features bit on (error unless ERE already)
 	q	literal string, no metacharacters at all

 	g	global match (find all matches)
 	i	case-independent matching
 	o	("opaque") do not return match locations
 	p	newlines are half-magic, excluded from . and [^ only
 	w	newlines are half-magic, significant to ^ and $ only
 	n	newlines are fully magic, both effects
 	x	expanded RE syntax
 	t	incomplete-match reporting
 	c	canmatch (equivalent to "t0!", in Postgres implementation)
 	s	match only at start (REG_BOSONLY)

 	A	backslash-_a_lphanumeric seen
 	B	ERE/ARE literal-_b_race heuristic used
 	E	backslash (_e_scape) seen within []
 	H	looka_h_ead constraint seen
 	I	_i_mpossible to match
 	L	_l_ocale-specific construct seen
 	M	unportable (_m_achine-specific) construct seen
 	N	RE can match empty (_n_ull) string
 	P	non-_P_OSIX construct seen
 	Q	{} _q_uantifier seen
 	R	back _r_eference seen
 	S	POSIX-un_s_pecified syntax seen
 	T	prefers shortest (_t_iny)
 	U	saw original-POSIX botch: unmatched right paren in ERE (_u_gh)
	test_regex is a module for testing the regular expression package.
	It is mostly meant to allow us to absorb Tcl's regex test suite.
	Therefore, there are provisions to exercise regex features that
	aren't currently exposed at the SQL level by PostgreSQL.

	Currently, one function is provided:

	test_regex(pattern text, string text, flags text) returns setof text[]

	Reports an error if the pattern is an invalid regex. Otherwise,
	the first row of output contains the number of subexpressions,
	followed by words reporting set bit(s) in the regex's re_info field.
	If the pattern doesn't match the string, that's all.
	If the pattern does match, the next row contains the whole match
	as the first array element. If there are parenthesized subexpression(s),
	following array elements contain the matches to those subexpressions.
	If the "g" (glob) flag is set, then additional row(s) of output similarly
	report any additional matches.

	The "flags" argument is a string of zero or more single-character
	flags that modify the behavior of the regex package or the test
	function. As described in Tcl's reg.test file:

	The flag characters are complex and a bit eclectic. Generally speaking,
	lowercase letters are compile options, uppercase are expected re_info
	bits, and nonalphabetics are match options, controls for how the test is
	run, or testing options. The one small surprise is that AREs are the
	default, and you must explicitly request lesser flavors of RE. The flags
	are as follows. It is admitted that some are not very mnemonic.

	- no-op (placeholder)
	0 report indices not actual strings
	(This substitutes for Tcl's -indices switch)
	! expect partial match, report start position anyway
	% force small state-set cache in matcher (to test cache replace)
	^ beginning of string is not beginning of line
	$ end of string is not end of line
	* test is Unicode-specific, needs big character set
	+ provide fake xy equivalence class and ch collating element
	(Note: the equivalence class is implemented, the
	collating element is not; so references to [.ch.] fail)
	, set REG_PROGRESS (only useful in REG_DEBUG builds)
	. set REG_DUMP (only useful in REG_DEBUG builds)
	: set REG_MTRACE (only useful in REG_DEBUG builds)
	; set REG_FTRACE (only useful in REG_DEBUG builds)

	& test as both ARE and BRE
	(Not implemented in Postgres, we use separate tests)
	b BRE
	e ERE
	a turn advanced-features bit on (error unless ERE already)
	q literal string, no metacharacters at all

	g global match (find all matches)
	i case-independent matching
	o ("opaque") do not return match locations
	p newlines are half-magic, excluded from . and [^ only
	w newlines are half-magic, significant to ^ and $ only
	n newlines are fully magic, both effects
	x expanded RE syntax
	t incomplete-match reporting
	c canmatch (equivalent to "t0!", in Postgres implementation)
	s match only at start (REG_BOSONLY)

	A backslash-_a_lphanumeric seen
	B ERE/ARE literal-_b_race heuristic used
	E backslash (_e_scape) seen within []
	H looka_h_ead constraint seen
	I _i_mpossible to match
	L _l_ocale-specific construct seen
	M unportable (_m_achine-specific) construct seen
	N RE can match empty (_n_ull) string
	P non-_P_OSIX construct seen
	Q {} _q_uantifier seen
	R back _r_eference seen
	S POSIX-un_s_pecified syntax seen
	T prefers shortest (_t_iny)
	U saw original-POSIX botch: unmatched right paren in ERE (_u_gh)