doc/developer-guide/internal-libraries/TextView.en.rst - trafficserver - Git at Google

 .. Licensed to the Apache Software Foundation (ASF) under one
    or more contributor license agreements.  See the NOTICE file
    distributed with this work for additional information
    regarding copyright ownership.  The ASF licenses this file
    to you under the Apache License, Version 2.0 (the
    "License"); you may not use this file except in compliance
    with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

    Unless required by applicable law or agreed to in writing,
    software distributed under the License is distributed on an
    "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
    KIND, either express or implied.  See the License for the
    specific language governing permissions and limitations
    under the License.

 .. include:: ../../common.defs

 .. default-domain:: cpp

 TextView
 *************

 Synopsis
 ========

 .. code-block:: cpp

     #include <ts/TextView.h>`

 .. class:: TextView

 This class acts as a view in to memory allocated / owned elsewhere. It is in effect a pointer and
 should be treated as such (e.g. care must be taken to avoid dangling references by knowing where the
 memory really is). The purpose is to provide string manipulation that is fast, efficient, and
 non-modifying, particularly when temporary "copies" are needed.


 Description
 ===========

 :class:`TextView` is a subclass of :code:`std::string_view` and has all of those methods. In addition it
 provides a number of ancillary methods of common string manipulation methods.

 A :class:`TextView` should be treated as an enhanced character pointer that both a location and a
 size. This is when makes it possible to pass sub strings around without having to make copies or
 allocation additional memory. This comes at the cost of keeping track of the actual owner of the
 string memory and making sure the :class:`TextView` does not outlive the memory owner, just as with
 a normal pointer type. Internal for |TS| any place that passes a :code:`char *` and a size is an
 excellent candidate for using a :class:`TextView` as it is more convenient and no more risky than
 the existing arguments.

 In deciding between :code:`std::string_view` and :class:`TextView` remember that these easily and
 cheaply cross convert. In general if the string is treated as a block of data, :code:`std::string_view`
 is better. If the contents of the string are to be examined / parsed non-uniformly then
 :class:`TextView` is better. For example, if the string is used simply as a key or a hash source,
 use :code:`std::string_view`. Or, if the string may contain substrings of interests such as key / value
 pairs, then use a :class:`TextView`.

 :class:`TextView` provides a variety of methods for manipulating the view as a string. These are
 provided as families of overloads differentiated by how characters are compared. There are four
 flavors.

 * Direct, a pointer to the target character.
 * Comparison, an explicit character value to compare.
 * Set, a set of characters (described by a :class:`TextView`) which are compared, any one of which matches.
 * Predicate, a function that takes a single character argument and returns a bool to indicate a match.

 If the latter three are inadequate the first, the direct pointer, can be used after finding the
 appropriate character through some other mechanism.

 The increment operator for :class:`TextView` shrinks the view by one character from the front
 which allows stepping through the view in normal way, although the string view itself should be the
 loop condition, not a dereference of it.

 .. code-block:: cpp

    TextView v;
    size_t hash = 0;
    for ( ; v ; ++v) hash = hash * 13 + * v;

 Because the view acts as a container of characters, this can be done non-destructively.

 .. code-block:: cpp

    TextView v;
    size_t hash = 0;
    for (char c : v) hash = hash * 13 + c;

 Views are cheap to construct therefore making a copy to use destructively is very inexpensive.

 :class:`MemSpan` provides a :code:`find` method that searches for a matching value. The type of this
 value can be anything that is fixed sized and supports the equality operator. The view is treated as
 an array of the type and searched sequentially for a matching value. The value type is treated as
 having no identity and cheap to copy, in the manner of a integral type.

 Parsing with TextView
 -----------------------

 A primary use of :class:`TextView` is to do field oriented parsing. It is easy and fast to split
 strings in to fields without modifying the original data. For example, assume that :arg:`value`
 contains a null terminated string which is possibly several tokens separated by commas.

 .. code-block:: cpp

    #include <ctype.h>
    parse_token(const char* value) {
      TextView v(value); // construct assuming null terminated string.
      while (v) {
        TextView token(v.extractPrefix(','));
        token.trim(&isspace);
        if (token) {
          // process token
        }
      }
    }

 If :arg:`value` was ``bob  ,dave, sam`` then :arg:`token` would be successively ``bob``, ``dave``,
 ``sam``. After `sam` was extracted :arg:`value` would be empty and the loop would exit. :arg:`token`
 can be empty in the case of adjacent delimiters or a trailing delimiter. Note that no memory
 allocation at all is done because each view is a pointer in to :arg:`value` and there is no need to
 put nul characters in the source string meaning no need to duplicate it to prevent permanent
 changes.

 What if the tokens were key / value pairs, of the form `key=value`? This is can be done as in the following example.

 .. code-block:: cpp

    #include <ctype.h>
    parse_token(const char* source) {
      TextView in(source); // construct assuming null terminated string.
      while (in) {
        TextView value(in.extractPrefix(','));
        TextView key(value.trim(&isspace).splitPrefix('=').rtrim(&isspace));
        if (key) {
          // it's a key=value token with key and value set appropriately.
          value.ltrim(&isspace); // clip potential space after '='.
        } else {
          // it's just a single token which is in value.
        }
      }
    }

 Nested delimiters are handled by further splitting in a recursive way which, because the original
 string is never modified, is straight forward.

 History
 =======

 The first attempt at this functionality was in the TSConfig library in the :code:`ts::Buffer` and
 :code:`ts::ConstBuffer` classes. Originally intended just as raw memory views,
 :code:`ts::ConstBuffer` in particular was repeated enhanced to provide better support for strings.
 The header was eventually moved from :literal:`lib/tsconfig` to :literal:`lib/ts` and was used in
 various parts of the |TS| core.

 There was then a proposal to make these classes available to plugin writers as they proved handy in
 the core. A suggested alternative was `Boost.StringRef
 <http://www.boost.org/doc/libs/1_61_0/libs/utility/doc/html/string_ref.html>`_ which provides a
 similar functionality using :code:`std::string` as the base of the pre-allocated memory. A version
 of the header was ported to |TS| (by stripping all the Boost support and cross includes) but in use
 proved to provide little of the functionality available in :code:`ts::ConstBuffer`. If extensive
 reworking was required in any case, it seemed better to start from scratch and build just what was
 useful in the |TS| context.

 The next step was the :code:`TextView` class which turned out reasonably well. It was then
 suggested that more support for raw memory (as opposed to memory presumed to contain printable ASCII
 data) would be useful. An attempt was made to do this but the differences in arguments, subtle
 method differences, and return types made that infeasible. Instead :class:`MemSpan` was split off to
 provide a :code:`void*` oriented view. String specific methods were stripped out and a few
 non-character based methods added.
	.. Licensed to the Apache Software Foundation (ASF) under one
	or more contributor license agreements. See the NOTICE file
	distributed with this work for additional information
	regarding copyright ownership. The ASF licenses this file
	to you under the Apache License, Version 2.0 (the
	"License"); you may not use this file except in compliance
	with the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing,
	software distributed under the License is distributed on an
	"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
	KIND, either express or implied. See the License for the
	specific language governing permissions and limitations
	under the License.

	.. include:: ../../common.defs

	.. default-domain:: cpp

	TextView
	*************

	Synopsis
	========

	.. code-block:: cpp

	#include <ts/TextView.h>`

	.. class:: TextView

	This class acts as a view in to memory allocated / owned elsewhere. It is in effect a pointer and
	should be treated as such (e.g. care must be taken to avoid dangling references by knowing where the
	memory really is). The purpose is to provide string manipulation that is fast, efficient, and
	non-modifying, particularly when temporary "copies" are needed.


	Description
	===========

	:class:`TextView` is a subclass of :code:`std::string_view` and has all of those methods. In addition it
	provides a number of ancillary methods of common string manipulation methods.

	A :class:`TextView` should be treated as an enhanced character pointer that both a location and a
	size. This is when makes it possible to pass sub strings around without having to make copies or
	allocation additional memory. This comes at the cost of keeping track of the actual owner of the
	string memory and making sure the :class:`TextView` does not outlive the memory owner, just as with
	a normal pointer type. Internal for \|TS\| any place that passes a :code:`char *` and a size is an
	excellent candidate for using a :class:`TextView` as it is more convenient and no more risky than
	the existing arguments.

	In deciding between :code:`std::string_view` and :class:`TextView` remember that these easily and
	cheaply cross convert. In general if the string is treated as a block of data, :code:`std::string_view`
	is better. If the contents of the string are to be examined / parsed non-uniformly then
	:class:`TextView` is better. For example, if the string is used simply as a key or a hash source,
	use :code:`std::string_view`. Or, if the string may contain substrings of interests such as key / value
	pairs, then use a :class:`TextView`.

	:class:`TextView` provides a variety of methods for manipulating the view as a string. These are
	provided as families of overloads differentiated by how characters are compared. There are four
	flavors.

	* Direct, a pointer to the target character.
	* Comparison, an explicit character value to compare.
	* Set, a set of characters (described by a :class:`TextView`) which are compared, any one of which matches.
	* Predicate, a function that takes a single character argument and returns a bool to indicate a match.

	If the latter three are inadequate the first, the direct pointer, can be used after finding the
	appropriate character through some other mechanism.

	The increment operator for :class:`TextView` shrinks the view by one character from the front
	which allows stepping through the view in normal way, although the string view itself should be the
	loop condition, not a dereference of it.

	.. code-block:: cpp

	TextView v;
	size_t hash = 0;
	for ( ; v ; ++v) hash = hash * 13 + * v;

	Because the view acts as a container of characters, this can be done non-destructively.

	.. code-block:: cpp

	TextView v;
	size_t hash = 0;
	for (char c : v) hash = hash * 13 + c;

	Views are cheap to construct therefore making a copy to use destructively is very inexpensive.

	:class:`MemSpan` provides a :code:`find` method that searches for a matching value. The type of this
	value can be anything that is fixed sized and supports the equality operator. The view is treated as
	an array of the type and searched sequentially for a matching value. The value type is treated as
	having no identity and cheap to copy, in the manner of a integral type.

	Parsing with TextView
	-----------------------

	A primary use of :class:`TextView` is to do field oriented parsing. It is easy and fast to split
	strings in to fields without modifying the original data. For example, assume that :arg:`value`
	contains a null terminated string which is possibly several tokens separated by commas.

	.. code-block:: cpp

	#include <ctype.h>
	parse_token(const char* value) {
	TextView v(value); // construct assuming null terminated string.
	while (v) {
	TextView token(v.extractPrefix(','));
	token.trim(&isspace);
	if (token) {
	// process token
	}
	}
	}

	If :arg:`value` was ``bob ,dave, sam`` then :arg:`token` would be successively ``bob``, ``dave``,
	``sam``. After `sam` was extracted :arg:`value` would be empty and the loop would exit. :arg:`token`
	can be empty in the case of adjacent delimiters or a trailing delimiter. Note that no memory
	allocation at all is done because each view is a pointer in to :arg:`value` and there is no need to
	put nul characters in the source string meaning no need to duplicate it to prevent permanent
	changes.

	What if the tokens were key / value pairs, of the form `key=value`? This is can be done as in the following example.

	.. code-block:: cpp

	#include <ctype.h>
	parse_token(const char* source) {
	TextView in(source); // construct assuming null terminated string.
	while (in) {
	TextView value(in.extractPrefix(','));
	TextView key(value.trim(&isspace).splitPrefix('=').rtrim(&isspace));
	if (key) {
	// it's a key=value token with key and value set appropriately.
	value.ltrim(&isspace); // clip potential space after '='.
	} else {
	// it's just a single token which is in value.
	}
	}
	}

	Nested delimiters are handled by further splitting in a recursive way which, because the original
	string is never modified, is straight forward.

	History
	=======

	The first attempt at this functionality was in the TSConfig library in the :code:`ts::Buffer` and
	:code:`ts::ConstBuffer` classes. Originally intended just as raw memory views,
	:code:`ts::ConstBuffer` in particular was repeated enhanced to provide better support for strings.
	The header was eventually moved from :literal:`lib/tsconfig` to :literal:`lib/ts` and was used in
	various parts of the \|TS\| core.

	There was then a proposal to make these classes available to plugin writers as they proved handy in
	the core. A suggested alternative was `Boost.StringRef
	<http://www.boost.org/doc/libs/1_61_0/libs/utility/doc/html/string_ref.html>`_ which provides a
	similar functionality using :code:`std::string` as the base of the pre-allocated memory. A version
	of the header was ported to \|TS\| (by stripping all the Boost support and cross includes) but in use
	proved to provide little of the functionality available in :code:`ts::ConstBuffer`. If extensive
	reworking was required in any case, it seemed better to start from scratch and build just what was
	useful in the \|TS\| context.

	The next step was the :code:`TextView` class which turned out reasonably well. It was then
	suggested that more support for raw memory (as opposed to memory presumed to contain printable ASCII
	data) would be useful. An attempt was made to do this but the differences in arguments, subtle
	method differences, and return types made that infeasible. Instead :class:`MemSpan` was split off to
	provide a :code:`void*` oriented view. String specific methods were stripped out and a few
	non-character based methods added.