| NAME |
| pcreposix - POSIX API for Perl-compatible regular expres- |
| sions. |
| |
| |
| |
| SYNOPSIS |
| #include <pcreposix.h> |
| |
| int regcomp(regex_t *preg, const char *pattern, |
| int cflags); |
| |
| int regexec(regex_t *preg, const char *string, |
| size_t nmatch, regmatch_t pmatch[], int eflags); |
| |
| size_t regerror(int errcode, const regex_t *preg, |
| char *errbuf, size_t errbuf_size); |
| |
| void regfree(regex_t *preg); |
| |
| |
| |
| DESCRIPTION |
| This set of functions provides a POSIX-style API to the PCRE |
| regular expression package. See the pcre documentation for a |
| description of the native API, which contains additional |
| functionality. |
| |
| The functions described here are just wrapper functions that |
| ultimately call the native API. Their prototypes are defined |
| in the pcreposix.h header file, and on Unix systems the |
| library itself is called pcreposix.a, so can be accessed by |
| adding -lpcreposix to the command for linking an application |
| which uses them. Because the POSIX functions call the native |
| ones, it is also necessary to add -lpcre. |
| |
| I have implemented only those option bits that can be rea- |
| sonably mapped to PCRE native options. In addition, the |
| options REG_EXTENDED and REG_NOSUB are defined with the |
| value zero. They have no effect, but since programs that are |
| written to the POSIX interface often use them, this makes it |
| easier to slot in PCRE as a replacement library. Other POSIX |
| options are not even defined. |
| |
| When PCRE is called via these functions, it is only the API |
| that is POSIX-like in style. The syntax and semantics of the |
| regular expressions themselves are still those of Perl, sub- |
| ject to the setting of various PCRE options, as described |
| below. |
| |
| The header for these functions is supplied as pcreposix.h to |
| avoid any potential clash with other POSIX libraries. It |
| can, of course, be renamed or aliased as regex.h, which is |
| the "correct" name. It provides two structure types, regex_t |
| for compiled internal forms, and regmatch_t for returning |
| captured substrings. It also defines some constants whose |
| names start with "REG_"; these are used for setting options |
| and identifying error codes. |
| |
| |
| |
| COMPILING A PATTERN |
| The function regcomp() is called to compile a pattern into |
| an internal form. The pattern is a C string terminated by a |
| binary zero, and is passed in the argument pattern. The preg |
| argument is a pointer to a regex_t structure which is used |
| as a base for storing information about the compiled expres- |
| sion. |
| |
| The argument cflags is either zero, or contains one or more |
| of the bits defined by the following macros: |
| |
| REG_ICASE |
| |
| The PCRE_CASELESS option is set when the expression is |
| passed for compilation to the native function. |
| |
| REG_NEWLINE |
| |
| The PCRE_MULTILINE option is set when the expression is |
| passed for compilation to the native function. |
| |
| The yield of regcomp() is zero on success, and non-zero oth- |
| erwise. The preg structure is filled in on success, and one |
| member of the structure is publicized: re_nsub contains the |
| number of capturing subpatterns in the regular expression. |
| Various error codes are defined in the header file. |
| |
| |
| |
| MATCHING A PATTERN |
| The function regexec() is called to match a pre-compiled |
| pattern preg against a given string, which is terminated by |
| a zero byte, subject to the options in eflags. These can be: |
| |
| REG_NOTBOL |
| |
| The PCRE_NOTBOL option is set when calling the underlying |
| PCRE matching function. |
| |
| REG_NOTEOL |
| |
| The PCRE_NOTEOL option is set when calling the underlying |
| PCRE matching function. |
| |
| The portion of the string that was matched, and also any |
| captured substrings, are returned via the pmatch argument, |
| which points to an array of nmatch structures of type |
| regmatch_t, containing the members rm_so and rm_eo. These |
| contain the offset to the first character of each substring |
| and the offset to the first character after the end of each |
| substring, respectively. The 0th element of the vector |
| relates to the entire portion of string that was matched; |
| subsequent elements relate to the capturing subpatterns of |
| the regular expression. Unused entries in the array have |
| both structure members set to -1. |
| |
| A successful match yields a zero return; various error codes |
| are defined in the header file, of which REG_NOMATCH is the |
| "expected" failure code. |
| |
| |
| |
| ERROR MESSAGES |
| The regerror() function maps a non-zero errorcode from |
| either regcomp or regexec to a printable message. If preg is |
| not NULL, the error should have arisen from the use of that |
| structure. A message terminated by a binary zero is placed |
| in errbuf. The length of the message, including the zero, is |
| limited to errbuf_size. The yield of the function is the |
| size of buffer needed to hold the whole message. |
| |
| |
| |
| STORAGE |
| Compiling a regular expression causes memory to be allocated |
| and associated with the preg structure. The function reg- |
| free() frees all such memory, after which preg may no longer |
| be used as a compiled expression. |
| |
| |
| |
| AUTHOR |
| Philip Hazel <ph10@cam.ac.uk> |
| University Computing Service, |
| New Museums Site, |
| Cambridge CB2 3QG, England. |
| Phone: +44 1223 334714 |
| |
| Copyright (c) 1997-1999 University of Cambridge. |