APRDesign - apr - Git at Google

 Design of APR

 The Apache Portable Run-time libraries have been designed to provide a common
 interface to low level routines across any platform.  The original goal of APR
 was to combine all code in Apache to one common code base.  This is not the
 correct approach however, so the goal of APR has changed.

 There are places where common code is not a good thing.  For example, how to
 map requests to either threads or processes should be platform specific.
 APR's place is now to combine any code that can be safely combined without
 sacrificing performance.

 To this end we have created a set of operations that are required for cross
 platfrom development.  There may be other types that are desired and those
 will be implemented in the future.  The first version of APR will focus on
 what Apache 2.0 needs.  Of course, anything that is submitted will be
 considered for inclusion.

 This document will discuss the structure of APR, and how best to contribute
 code to the effort.

 APR On Windows

 APR on Windows is different from APR on all other systems, because it
 doesn't use autoconf. On Unix, apr_private.h (private to APR) and apr.h
 (public, used by applications that use APR) are generated by autoconf
 from acconfig.h and apr.h.in respectively. On Windows, apr_private.h
 and apr.h are created from apr_private.hw and apr.hw respectively.

 !!!***  If you add code to acconfig.h or tests to configure.in or aclocal.m4,
         please give some thought to whether or not Windows needs this addition
         as well.  A general rule of thumb, is that if it is a feature macro,
         such as APR_HAS_THREADS, Windows needs it.  If the definition is going
         to be used in a public APR header file, such as apr_general.h, Windows
         needs it.

         The only time it is safe to add a macro or test without also adding
         the macro to apr*.hw, is if the macro tells APR how to build.  For
         example, a test for a header file does not need to be added to Windows.
 ***!!!

 APR Features

 One of the goals of APR is to provide a common set of features across all
 platforms.  This is an admirable goal, it is also not realisitic.  We cannot
 expect to be able to implement ALL features on ALL platforms.  So we are
 going to do the next best thing.  Provide a common interface to ALL APR
 features on MOST platforms.

 APR developers should create FEATURE MACROS for any feature that is not
 available on ALL platforms.  This should be a simple definition which has
 the form:

 APR_HAS_FEATURE

 This macro should evaluate to true if APR has this feature on this platform.
 For example, Linux and Windows have mmap'ed files, and APR is providing an
 interface for mmapp'ing a file.  On both Linux and Windows, APR_HAS_MMAP
 should evaluate to one, and the ap_mmap_* functions should map files into
 memory and return the appropriate status codes.

 If your OS of choice does not have mmap'ed files, APR_HAS_MMAP should evaluate
 to zero, and all ap_mmap_* functions should not be defined.  The second step
 is a precaution that will allow us to break at compile time if a programmer
 tries to use unsupported functions.

 APR types

 The base types in APR
 file_io     File I/O, including pipes
 lib         A portable library originally used in Apache.  This contains
             memory management, tables, and arrays.
 locks       Mutex and reader/writer locks
 misc        Any APR type which doesn't have any other place to belong
 network_io  Network I/O
 shmem       Shared Memory (Not currently implemented)
 signal      Asynchronous Signals
 threadproc  Threads and Processes
 time        Time

 Directory Structure

 Each type has a base directory.  Inside this base directory, are
 subdirectories, which contain the actual code.  These subdirectories are named
 after the platforms the are compiled on.  Unix is also used as a common
 directory.  If the code you are writing is POSIX based, you should look at the
 code in the unix directory.  A good rule of thumb, is that if more than half
 your code needs to be ifdef'ed out, and the structures required for your code
 are substantively different from the POSIX code, you should create a new
 directory.

 Currently, the APR code is written for Unix, BeOS, Windows, and OS/2.  An
 example of the directory structure is the file I/O directory:

 apr
   |
    ->  file_io
           |
            -> unix            The Unix and common base code
           |
            -> win32           The Windows code
           |
            -> os2             The OS/2 code

 Obviously, BeOS does not have a directory.  This is because BeOS is currently
 using the Unix directory for it's file_io.  In the near future, it will be
 possible to use indiviual files from the Unix directory.

 There are a few special top level directories.  These are test, inc, include,
 and libs.  Test is a directory which stores all test programs.  It is expected
 that if a new type is developed, there will also be a new test program, to
 help people port this new type to different platforms.  Inc is a directory for
 internal header files.  This directory is likely to go away soon.  Include is
 a directory which stores all required APR header files for external use.  The
 distinction between internal and external header files will be made soon.
 Finally, libs is a generated directory.  When APR finishes building, it will
 store it's library files in the libs directory.

 Creating an APR Type

 The current design of APR requires that APR types be incomplete.  It is not
 possible to write flexible portable code if programs can access the internals
 of APR types.  This is because different platforms are likely to define
 different native types.

 For this reason, each platform defines a structure in their own directories.
 Those structures are then typedef'ed in an external header file.  For example
 in file_io/unix/fileio.h:

     struct ap_file_t {
         ap_context_t *cntxt;
         int filedes;
         FILE *filehand;
         ...
     }

 In include/apr_file_io.h:
     typedef struct ap_file_t    ap_file_t;

 This will cause a compiler error if somebody tries to access the filedes field
 in this strcture.  Windows does not have a filedes field, so obviously, it is
 important that programs not be able to access these.

 The only exception to the incomplete type rule can be found in apr_portable.h.
 This file defines the native types for each platform.  Using these types, it
 is possible to extract native types for any APR type.

 You may notice the ap_context_t field.  All APR types have this field.  This
 type is used to allocate memory within APR.

 New Function

 When creating a new function, please try to adhere to these rules.

 1)  Result arguments should be the first arguments.
 2)  If a function needs a context, it should be the last argument.
 3)  These rules are flexible, especially if it makes the code easier
     to understand because it mimics a standard function.

 Documentation

 Whenever a new function is added to APR, it MUST be documented.  New
 functions will not be committed unless there are docs to go along with them.
 The documentation should be a comment block above the function in the header
 file.

 The format for the comment block is:

 /**
  * Brief description of the function
  * @param parma_1_name explanation
  * @param parma_2_name explanation
  * @param parma_n_name explanation
  * @tip Any extra information people should know.
  * @deffunc function prototype if required
  */

 The last line is not strictly needed.  The parser in ScanDoc is not perfect
 yet, and it can not parse prototypes that are in any form other than
 	return_type program_name(type1 param1, type2 param2, ...)
 This means that any function prototype that resembles:
 	APR_EXPORT(ap_status_t) ap_foo(int f1, char *f2)
 will need the deffunc.

 For an actual example, look at any file in the include directory (ap_tables.h
 hasn't been done yet).

 APR Error reporting

 Most APR functions should return an ap_status_t type.  The only time an
 APR function does not return an ap_status_t is if it absolutly CAN NOT
 fail.  Examples of this would be filling out an array when you know you are
 not beyond the array's range.  If it cannot fail on your platform, but it
 could conceivably fail on another platform, it should return an ap_status_t.
 Unless you are sure, return an ap_status_t.  :-)

 All platform return errno values unchanged.  Each platform can also have
 one system error type, which can be returned after an offset is added.
 There are five types of error values in APR, each with it's own offset.

     Name			Purpose
 0) 			This is 0 for all platforms and isn't really defined
  			anywhere, but it is the offset for errno values.
 			(This has no name because it isn't actually defined,
                         but completeness we are discussing it here).
 1) APR_OS_START_ERROR	This is platform dependant, and is the offset at which
 			APR errors start to be defined.  (Canonical error
 			values are also defined in this section.  [Canonical
 			error values are discussed later]).
 2) APR_OS_START_STATUS	This is platform dependant, and is the offset at which
 			APR status values start.
 4) APR_OS_START_USEERR	This is platform dependant, and is the offset at which
 			APR apps can begin to add their own error codes.
 3) APR_OS_START_SYSERR	This is platform dependant, and is the offset at which
 			system error values begin.

 All of these definitions can be found in apr_errno.h for all platforms.  When
 an error occurs in an APR function, the function must return an error code.
 If the error occurred in a system call and that system call uses errno to
 report an error, then the code is returned unchanged.  For example:

     if (open(fname, oflags, 0777) < 0)
         return errno;


 The next place an error can occur is a system call that uses some error value
 other than the primary error value on a platform.  This can also be handled
 by APR applications.  For example:

     if (CreateFile(fname, oflags, sharemod, NULL,
                    createflags, attributes,0) == INVALID_HANDLE_VALUE
         return (GetLAstError() + APR_OS_START_SYSERR);

 These two examples implement the same function for two different platforms.
 Obviously even if the underlying problem is the same on both platforms, this
 will result in two different error codes being returned.  This is OKAY, and
 is correct for APR.  APR relies on the fact that most of the time an error
 occurs, the program logs the error and continues, it does not try to
 programatically solve the problem.  This does not mean we have not provided
 support for programmatically solving the problem, it just isn't the default
 case.  We'll get to how this problem is solved in a little while.

 If the error occurs in an APR function but it is not due to a system call,
 but it is actually an APR error or just a status code from APR, then the
 appropriate code should be returned.  These codes are defined in apr_errno.h
 and are self explanatory.

 No APR code should ever return a code between APR_OS_START_USEERR and
 APR_OS_START_SYSERR, those codes are reserved for APR applications.

 To programmatically correct an error in a running application, the error codes
 need to be consistent across platforms.  This should make sense.  To get
 consistent error codes, APR provides a function ap_canonical_error().
 This function will take as input any ap_status_t value, and return a small
 subset of canonical APR error codes.  These codes will be equivalent to
 Unix errno's.  Why is it a small subset?  Because we don't want to try to
 convert everything in the first pass.  As more programs require that more
 error codes are converted, they will be added to this function.

 Why did APR take this approach?  There are two ways to deal with error
 codes portably.

 1)  return the same error code across all platforms.  2)  return platform
 specific error codes and convert them when necessary.

 The problem with option number one is that it takes time to convert error
 codes to a common code, and most of the time programs want to just output
 an error string.  If we convert all errors to a common subset, we have four
 steps to output an error string:

     make syscall that fails
         convert to common error code                 step 1
         return common error code
             check for success
             call error output function               step 2
                 convert back to system error         step 3
                 output error string                  step 4

 By keeping the errors platform specific, we can output error strings in two
 steps.

     make syscall that fails
         return error code
             check for success
             call error output function               step 1
                 output error string                  step 2

 Less often, programs change their execution based on what error was returned.
 This is no more expensive using option 2 and it is using option 1, but we
 put the onus of converting the error code on the programmer themselves.
 For example, using option 1:

     make syscall that fails
         convert to common error code
         return common error code
             decide execution basd on common error code

 Using option 2:

     make syscall that fails
         return error code
             convert to common error code (using ap_canonical_error)
             decide execution based on common error code

 Finally, there is one more operation on error codes.  You can get a string
 that explains in human readable form what has happened.  To do this using
 APR, call ap_strerror().

 On all platforms ap_strerror takes the form:

 char *ap_strerror(ap_status_t err)
 {
     if (err < APR_OS_START_ERRNO2)
         return (platform dependant error string generator)
     if (err < APR_OS_START_ERROR)
         return (platform dependant error string generator for
                 supplemental error values)
     if (err < APR_OS_SYSERR)
         return (APR generated error or status string)
     if (err == 0)
         return "No error was found"
     else
         return "APR doesn't understand this error value"
 }

 Notice, this does not handle canonicalized error values well.  Those will
 return "APR doesn't understand this error value" on some platforms and
 an actual error string on others.  To deal with this, just get the
 string before canonicalizing your error code.

 The other problem with option 1, is that it is a lossy conversion.  For
 example, Windows and OS/2 have a couple hundred error codes, but POSIX errno
 only defines about 50 errno values.  This means that if we convert to a
 canonical error value immediately, there is no way for the programmer to
 get the actual system error.
	Design of APR

	The Apache Portable Run-time libraries have been designed to provide a common
	interface to low level routines across any platform. The original goal of APR
	was to combine all code in Apache to one common code base. This is not the
	correct approach however, so the goal of APR has changed.

	There are places where common code is not a good thing. For example, how to
	map requests to either threads or processes should be platform specific.
	APR's place is now to combine any code that can be safely combined without
	sacrificing performance.

	To this end we have created a set of operations that are required for cross
	platfrom development. There may be other types that are desired and those
	will be implemented in the future. The first version of APR will focus on
	what Apache 2.0 needs. Of course, anything that is submitted will be
	considered for inclusion.

	This document will discuss the structure of APR, and how best to contribute
	code to the effort.

	APR On Windows

	APR on Windows is different from APR on all other systems, because it
	doesn't use autoconf. On Unix, apr_private.h (private to APR) and apr.h
	(public, used by applications that use APR) are generated by autoconf
	from acconfig.h and apr.h.in respectively. On Windows, apr_private.h
	and apr.h are created from apr_private.hw and apr.hw respectively.

	!!!*** If you add code to acconfig.h or tests to configure.in or aclocal.m4,
	please give some thought to whether or not Windows needs this addition
	as well. A general rule of thumb, is that if it is a feature macro,
	such as APR_HAS_THREADS, Windows needs it. If the definition is going
	to be used in a public APR header file, such as apr_general.h, Windows
	needs it.

	The only time it is safe to add a macro or test without also adding
	the macro to apr*.hw, is if the macro tells APR how to build. For
	example, a test for a header file does not need to be added to Windows.
	***!!!

	APR Features

	One of the goals of APR is to provide a common set of features across all
	platforms. This is an admirable goal, it is also not realisitic. We cannot
	expect to be able to implement ALL features on ALL platforms. So we are
	going to do the next best thing. Provide a common interface to ALL APR
	features on MOST platforms.

	APR developers should create FEATURE MACROS for any feature that is not
	available on ALL platforms. This should be a simple definition which has
	the form:

	APR_HAS_FEATURE

	This macro should evaluate to true if APR has this feature on this platform.
	For example, Linux and Windows have mmap'ed files, and APR is providing an
	interface for mmapp'ing a file. On both Linux and Windows, APR_HAS_MMAP
	should evaluate to one, and the ap_mmap_* functions should map files into
	memory and return the appropriate status codes.

	If your OS of choice does not have mmap'ed files, APR_HAS_MMAP should evaluate
	to zero, and all ap_mmap_* functions should not be defined. The second step
	is a precaution that will allow us to break at compile time if a programmer
	tries to use unsupported functions.

	APR types

	The base types in APR
	file_io File I/O, including pipes
	lib A portable library originally used in Apache. This contains
	memory management, tables, and arrays.
	locks Mutex and reader/writer locks
	misc Any APR type which doesn't have any other place to belong
	network_io Network I/O
	shmem Shared Memory (Not currently implemented)
	signal Asynchronous Signals
	threadproc Threads and Processes
	time Time

	Directory Structure

	Each type has a base directory. Inside this base directory, are
	subdirectories, which contain the actual code. These subdirectories are named
	after the platforms the are compiled on. Unix is also used as a common
	directory. If the code you are writing is POSIX based, you should look at the
	code in the unix directory. A good rule of thumb, is that if more than half
	your code needs to be ifdef'ed out, and the structures required for your code
	are substantively different from the POSIX code, you should create a new
	directory.

	Currently, the APR code is written for Unix, BeOS, Windows, and OS/2. An
	example of the directory structure is the file I/O directory:

	apr
	\|
	-> file_io
	\|
	-> unix The Unix and common base code
	\|
	-> win32 The Windows code
	\|
	-> os2 The OS/2 code

	Obviously, BeOS does not have a directory. This is because BeOS is currently
	using the Unix directory for it's file_io. In the near future, it will be
	possible to use indiviual files from the Unix directory.

	There are a few special top level directories. These are test, inc, include,
	and libs. Test is a directory which stores all test programs. It is expected
	that if a new type is developed, there will also be a new test program, to
	help people port this new type to different platforms. Inc is a directory for
	internal header files. This directory is likely to go away soon. Include is
	a directory which stores all required APR header files for external use. The
	distinction between internal and external header files will be made soon.
	Finally, libs is a generated directory. When APR finishes building, it will
	store it's library files in the libs directory.

	Creating an APR Type

	The current design of APR requires that APR types be incomplete. It is not
	possible to write flexible portable code if programs can access the internals
	of APR types. This is because different platforms are likely to define
	different native types.

	For this reason, each platform defines a structure in their own directories.
	Those structures are then typedef'ed in an external header file. For example
	in file_io/unix/fileio.h:

	struct ap_file_t {
	ap_context_t *cntxt;
	int filedes;
	FILE *filehand;
	...
	}

	In include/apr_file_io.h:
	typedef struct ap_file_t ap_file_t;

	This will cause a compiler error if somebody tries to access the filedes field
	in this strcture. Windows does not have a filedes field, so obviously, it is
	important that programs not be able to access these.

	The only exception to the incomplete type rule can be found in apr_portable.h.
	This file defines the native types for each platform. Using these types, it
	is possible to extract native types for any APR type.

	You may notice the ap_context_t field. All APR types have this field. This
	type is used to allocate memory within APR.

	New Function

	When creating a new function, please try to adhere to these rules.

	1) Result arguments should be the first arguments.
	2) If a function needs a context, it should be the last argument.
	3) These rules are flexible, especially if it makes the code easier
	to understand because it mimics a standard function.

	Documentation

	Whenever a new function is added to APR, it MUST be documented. New
	functions will not be committed unless there are docs to go along with them.
	The documentation should be a comment block above the function in the header
	file.

	The format for the comment block is:

	/**
	* Brief description of the function
	* @param parma_1_name explanation
	* @param parma_2_name explanation
	* @param parma_n_name explanation
	* @tip Any extra information people should know.
	* @deffunc function prototype if required
	*/

	The last line is not strictly needed. The parser in ScanDoc is not perfect
	yet, and it can not parse prototypes that are in any form other than
	return_type program_name(type1 param1, type2 param2, ...)
	This means that any function prototype that resembles:
	APR_EXPORT(ap_status_t) ap_foo(int f1, char *f2)
	will need the deffunc.

	For an actual example, look at any file in the include directory (ap_tables.h
	hasn't been done yet).

	APR Error reporting

	Most APR functions should return an ap_status_t type. The only time an
	APR function does not return an ap_status_t is if it absolutly CAN NOT
	fail. Examples of this would be filling out an array when you know you are
	not beyond the array's range. If it cannot fail on your platform, but it
	could conceivably fail on another platform, it should return an ap_status_t.
	Unless you are sure, return an ap_status_t. :-)

	All platform return errno values unchanged. Each platform can also have
	one system error type, which can be returned after an offset is added.
	There are five types of error values in APR, each with it's own offset.

	Name Purpose
	0) This is 0 for all platforms and isn't really defined
	anywhere, but it is the offset for errno values.
	(This has no name because it isn't actually defined,
	but completeness we are discussing it here).
	1) APR_OS_START_ERROR This is platform dependant, and is the offset at which
	APR errors start to be defined. (Canonical error
	values are also defined in this section. [Canonical
	error values are discussed later]).
	2) APR_OS_START_STATUS This is platform dependant, and is the offset at which
	APR status values start.
	4) APR_OS_START_USEERR This is platform dependant, and is the offset at which
	APR apps can begin to add their own error codes.
	3) APR_OS_START_SYSERR This is platform dependant, and is the offset at which
	system error values begin.

	All of these definitions can be found in apr_errno.h for all platforms. When
	an error occurs in an APR function, the function must return an error code.
	If the error occurred in a system call and that system call uses errno to
	report an error, then the code is returned unchanged. For example:

	if (open(fname, oflags, 0777) < 0)
	return errno;


	The next place an error can occur is a system call that uses some error value
	other than the primary error value on a platform. This can also be handled
	by APR applications. For example:

	if (CreateFile(fname, oflags, sharemod, NULL,
	createflags, attributes,0) == INVALID_HANDLE_VALUE
	return (GetLAstError() + APR_OS_START_SYSERR);

	These two examples implement the same function for two different platforms.
	Obviously even if the underlying problem is the same on both platforms, this
	will result in two different error codes being returned. This is OKAY, and
	is correct for APR. APR relies on the fact that most of the time an error
	occurs, the program logs the error and continues, it does not try to
	programatically solve the problem. This does not mean we have not provided
	support for programmatically solving the problem, it just isn't the default
	case. We'll get to how this problem is solved in a little while.

	If the error occurs in an APR function but it is not due to a system call,
	but it is actually an APR error or just a status code from APR, then the
	appropriate code should be returned. These codes are defined in apr_errno.h
	and are self explanatory.

	No APR code should ever return a code between APR_OS_START_USEERR and
	APR_OS_START_SYSERR, those codes are reserved for APR applications.

	To programmatically correct an error in a running application, the error codes
	need to be consistent across platforms. This should make sense. To get
	consistent error codes, APR provides a function ap_canonical_error().
	This function will take as input any ap_status_t value, and return a small
	subset of canonical APR error codes. These codes will be equivalent to
	Unix errno's. Why is it a small subset? Because we don't want to try to
	convert everything in the first pass. As more programs require that more
	error codes are converted, they will be added to this function.

	Why did APR take this approach? There are two ways to deal with error
	codes portably.

	1) return the same error code across all platforms. 2) return platform
	specific error codes and convert them when necessary.

	The problem with option number one is that it takes time to convert error
	codes to a common code, and most of the time programs want to just output
	an error string. If we convert all errors to a common subset, we have four
	steps to output an error string:

	make syscall that fails
	convert to common error code step 1
	return common error code
	check for success
	call error output function step 2
	convert back to system error step 3
	output error string step 4

	By keeping the errors platform specific, we can output error strings in two
	steps.

	make syscall that fails
	return error code
	check for success
	call error output function step 1
	output error string step 2

	Less often, programs change their execution based on what error was returned.
	This is no more expensive using option 2 and it is using option 1, but we
	put the onus of converting the error code on the programmer themselves.
	For example, using option 1:

	make syscall that fails
	convert to common error code
	return common error code
	decide execution basd on common error code

	Using option 2:

	make syscall that fails
	return error code
	convert to common error code (using ap_canonical_error)
	decide execution based on common error code

	Finally, there is one more operation on error codes. You can get a string
	that explains in human readable form what has happened. To do this using
	APR, call ap_strerror().

	On all platforms ap_strerror takes the form:

	char *ap_strerror(ap_status_t err)
	{
	if (err < APR_OS_START_ERRNO2)
	return (platform dependant error string generator)
	if (err < APR_OS_START_ERROR)
	return (platform dependant error string generator for
	supplemental error values)
	if (err < APR_OS_SYSERR)
	return (APR generated error or status string)
	if (err == 0)
	return "No error was found"
	else
	return "APR doesn't understand this error value"
	}

	Notice, this does not handle canonicalized error values well. Those will
	return "APR doesn't understand this error value" on some platforms and
	an actual error string on others. To deal with this, just get the
	string before canonicalizing your error code.

	The other problem with option 1, is that it is a lossy conversion. For
	example, Windows and OS/2 have a couple hundred error codes, but POSIX errno
	only defines about 50 errno values. This means that if we convert to a
	canonical error value immediately, there is no way for the programmer to
	get the actual system error.