| <!-- $Id: $ --> |
| |
| <section id="internals"> |
| <title>Rivet Internals</title> |
| <para> |
| This section easily falls out of date, as new code is added, old |
| code is removed, and changes are made. The best place to look |
| is the source code itself. If you are interested in the changes |
| themselves, the Subversion revision control system |
| (<command>svn</command>) can provide you with information about |
| what has been happening with the code. |
| </para> |
| <section> |
| <title>Rivet approach to Apache Multiprocessing Models</title> |
| <para> |
| The Apache HTTP web server has an extremely modular architecture |
| that made it very popular among web developers. Most of the server |
| features can be implemented in external modules, including some of |
| the way the server interfaces to the operative system. The multiprocessing |
| modules are meant to provide different models for distributing the |
| server workload but also to cope with different operative systems |
| having their specific architectures and services. |
| </para> |
| <para> |
| From the very beginning mod_rivet was designed to work with |
| the <ulink url="&apachedoc-prefork;">prefork MPM</ulink> |
| MPM (Multi Processing Module) which assumes the OS to have 'fork' capabilities. |
| This prerequisite basically restricted mod_rivet to work only with |
| Unix-like operative systems. Starting with version 3.0 we reorganized |
| mod_rivet to offer a design that could work together with more MPM and |
| hopefully pave the way to support different OS that have no 'fork' |
| call. At the same time we tried to preserve some of the basic |
| features of mod_rivet when working with the prefork MPM, chiefly the feature of |
| the Unix fork system call of 'cloning' a parent process |
| memory into its child, thus allowing fast initialization of interpreters. |
| </para> |
| <para> |
| The central design of mod_rivet now relies on the idea of <quote>MPM bridges</quote>, |
| loadable modules that are responsible to adapt the module procedural design to |
| a given class of Apache MPMs. This design is open to the development of more |
| MPM bridges coping with different multi-processing models but also to the development of |
| different approaches to resource consumption and workload balance. Currently we have 3 bridges: |
| </para> |
| <itemizedlist> |
| <listitem>rivet_prefork_mpm.c: a bridge for the prefork MPM</listitem> |
| <listitem>rivet_worker_mpm.c: a threaded bridge creating a pool of threads |
| each running Tcl interpreters and communicating with the worker MPM threads |
| through a thread safe queue. This bridge is needed by the worker MPM.</listitem> |
| <listitem>rivet_lazy_mpm.c: a threaded bridge where Tcl threads are |
| started <quote>on demand</quote>. The bridge creates no threads and Tcl interpreters |
| at start up and only when requests come in Tcl execution threads are created. |
| This bridge is explained in detail in the <xref linkend="lazybridge">next section</xref>. |
| Since the resource demand at startup is minimal this bridge should suit |
| development machines that go through frequent web server restarts.</listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>mod_rivet MPM Bridge callbacks</title> |
| <para> |
| A bridge is a loadable library implementing different ways to handle |
| specific features needed to mod_rivet. It was originally meant as a way |
| to handle the prefork/worker/event MPM specificities, at the same time |
| avoiding the need to stuff the |
| code with conditional statements that would have implied useless complexity (an |
| instance of the Apache web server can run only an MPM at a time), |
| error prone programming and performance costs. |
| New bridges could be imagined also to implement different models of workload |
| and resource management (like the resources demanded by the Tcl interpreters). |
| We designed an interface between the core of mod_rivet and its MPM bridges |
| based on a set of functions defined in the rivet_bridge_table structure. |
| </para> |
| <programlisting>typedef struct _mpm_bridge_table { |
| RivetBridge_ServerInit *mpm_server_init; |
| RivetBridge_ChildInit *mpm_child_init; |
| RivetBridge_Request *mpm_request; |
| RivetBridge_Finalize *mpm_finalize; |
| RivetBridge_Exit_Handler *mpm_exit_handler; |
| RivetBridge_Thread_Interp *mpm_thread_interp; |
| } rivet_bridge_table;</programlisting> |
| <para> |
| <itemizedlist> |
| <listitem> |
| <emphasis>mpm_server_init</emphasis>: pointer to any |
| specific server inititalization function. This field can be NULL |
| if no bridge specific initialization is needed. The core of |
| mod_rivet runs the <command>ServerInitScript</command> before |
| calling this function. |
| </listitem> |
| <listitem><emphasis>mpm_child_init</emphasis>: Bridge specific |
| child process initialization. If the pointer is assigned with |
| a non-NULL value the function is called by Rivet_ChildInit. |
| </listitem> |
| <listitem> |
| <emphasis>mpm_request</emphasis>: This pointer must |
| be a valid function pointer to the content generator |
| implemented by the bridge. If the pointer is not defined the Apache |
| web server will stop at start up. This condition is motivated by |
| the need of avoiding useless testing of the pointer. The fundamental |
| purpose of a content generator module (like mod_rivet) is to respond |
| to requests creating content, thus whatever it is |
| a content generating function must exist (during the early stages of |
| development you can create a simple test function for that). In a |
| threaded MPM this function typically prepares the request processing |
| stuffing somewhere the pointer to the request_rec structure |
| passed by the web server and then it calls some method to communicate |
| these data to the Tcl execution thread waiting for result to be |
| returned. The <quote>prefork</quote> bridge is an exception since there |
| are no threads and the bridge calls directly Rivet_SendContent |
| </listitem> |
| <listitem> |
| <emphasis>mpm_finalize</emphasis>: pointer to a finalization |
| function called during a child process exit. This function is registered |
| as child process memory pool cleanup function. If the pointer is NULL |
| the pool is given a default cleanup function (apr_pool_cleanup_null) |
| defined in src/mod_rivet/mod_rivet.c. For instance the finalize function |
| in the <emphasis>worker</emphasis> MPM bridge notifies |
| a supervisor thread demanding the whole pool of threads running Tcl |
| interpreters to orderly exit. This pointer can be NULL if the bridge |
| has no special need when a child process must exit (unlikely if you have |
| multiple threads running) |
| </listitem> |
| <listitem> |
| <emphasis>mpm_exit_handler</emphasis>: mod_rivet replaces |
| the core <command>exit</command> command with a new one |
| (<command>::rivet::exit</command>). This command must handle |
| the process exit in the best possible way for the bridge and the |
| threading model it implements (for the 2 current threaded bridges this implies |
| signaling the threads to exit). The <command>::rivet::exit</command> |
| actually doesn't terminate the process, but interrupts execution |
| returning a specific error code commands <command>::rivet::catch</command> |
| and <command>::rivet::try</command> can detect. Before the process is terminated |
| the <command>AbortScript</command> script is fired and <command>::rivet::abort_code</command> |
| returns a message describing the exit condition. For instance |
| the <emphasis>worker</emphasis> MPM bridge the finalize function |
| is called after the current thread itself is set up for termination. |
| See function Rivet_ExitCmd in |
| <ulink url="https://svn.apache.org/repos/asf/tcl/rivet/trunk/src/mod_rivet_ng/rivetCore.c">rivetCore.c</ulink> |
| to have details on how and at what stage this callback is invoked. |
| </listitem> |
| <listitem> |
| <emphasis>mpm_thread_interp</emphasis> must be a function returning |
| the interpreter object (a pointer to record of type |
| <command>rivet_thread_interp</command>) associated |
| to a given configuration as stored in a <command>rivet_server_conf*</command> |
| object. This element was temporarily introduced in the |
| <command>mpm_bridge_table</command> table and should be accessed |
| through the macro RIVET_PEEK_INTERP. |
| <programlisting>interp_obj = RIVET_PEEK_INTERP(private,private->conf);</programlisting> |
| Every bridge implementation should have its own way to store interpreter data and manage their |
| status. So this macro (and associated function) should hide from the module core function |
| the specific approach followed in a particular bridge |
| </listitem> |
| </itemizedlist> |
| </para> |
| </section> |
| |
| <section> |
| <title>Server Initialization and MPM Bridge</title> |
| <para> |
| </para> |
| </section> |
| <section> |
| <title>RivetChan</title> |
| <para> |
| The <structname>RivetChan</structname> system was created in |
| order to have an actual Tcl channel that we could redirect |
| standard output to. This enables us use, for instance, the |
| regular <command>puts</command> command in .rvt pages. It |
| works by creating a channel that buffers output, and, at |
| predetermined times, passes it on to Apache's I/O system. |
| Tcl's regular standard output is replaced with an instance of |
| this channel type, so that, by default, output will go to the |
| web page. |
| </para> |
| </section> |
| <section> |
| <title>The <command>global</command> Command</title> |
| <para> |
| Rivet aims to run standard Tcl code with as few surprises as |
| possible. At times this involves some compromises - in this |
| case regarding the <command>global</command> command. The |
| problem is that the command will create truly global |
| variables. If the user is just cut'n'pasting some Tcl code |
| into Rivet, they most likely just want to be able to share the |
| variable in question with other procs, and don't really care |
| if the variable is actually persistant between pages. The |
| solution we have created is to create a proc |
| <command>::request::global</command> that takes the place of |
| the <command>global</command> command in Rivet templates. If |
| you really need a true global variable, use either |
| <command>::global</command> or add the :: namespace qualifier |
| to variables you wish to make global. |
| </para> |
| </section> |
| <section> |
| <title>Page Parsing, Execution and Caching</title> |
| <para> |
| When a Rivet page is requested, it is transformed into an |
| ordinary Tcl script by parsing the file for the <? ?> |
| processing instruction tags. Everything outside these tags |
| becomes a large <command>puts</command> statement, and |
| everything inside them remains Tcl code. |
| </para> |
| <para> |
| Each .rvt file is evaluated in its own |
| <constant>::request</constant> namespace, so that it is not |
| necessary to create and tear down interpreters after each |
| page. By running in its own namespace, though, each page will |
| not run afoul of local variables created by other scripts, |
| because they will be deleted automatically when the namespace |
| goes away after Apache finishes handling the request. |
| <note> |
| One current problem with this system is that while |
| variables are garbage collected, file handles are not, so |
| that it is very important that Rivet script authors make |
| sure to close all the files they open. |
| </note> |
| </para> |
| <para> |
| After a script has been loaded and parsed into it's "pure Tcl" |
| form, it is also cached, so that it may be used in the future |
| without having to reload it (and re-parse it) from the disk. |
| The number of scripts stored in memory is configurable. This |
| feature can significantly improve performance. |
| </para> |
| </section> |
| <section> |
| <title>Extending Rivet by developing C code procedures</title> |
| <para> |
| Rivet endows the Tcl interpreter with new commands |
| serving as interface between the application layer and the |
| Apache web server. Many of these commands |
| are meaningful only when a HTTP request is under way and |
| therefore a request_rec object allocated by the framework |
| is existing and was passed to mod_rivet as argument of a callback. |
| In case commands have to gain access to a valid request_rec |
| object the C procedure must check if such |
| a pointer exists and it's initialized |
| with valid data. For this purpose the procedure handling requests |
| (Rivet_SendContent) makes a copy of such pointer and keeps it |
| in an internal structure. The copy is set to NULL just before |
| returning to the framework, right after mod_rivet's has |
| carried out its request processing. When the pointer copy is NULL |
| the module is outside any request processing and this |
| condition invalidates the execution of |
| many of the Rivet commands. In case they are called |
| (for example in a ChildInitScript, GlobalInitScript, |
| ServerInitScript or ChildExitScript) they fail with a Tcl error |
| you can handle with a <command>catch</command> command. |
| </para> |
| <para> |
| For this purpose in <option>src/rivet.h</option> the macro |
| CHECK_REQUEST_REC was defined accepting two arguments: the thread |
| private data object and the command name. If the pointer is NULL |
| the macro calls Tcl_NoRequestRec and returns TCL_ERROR |
| causing the command to fail. These are the steps to follow |
| in order to write a new C language command for mod_rivet |
| </para> |
| <itemizedlist> |
| <listitem> |
| Define the command and associated C language procedure |
| in src/mod_rivet_ng/rivetCore.c using the macro |
| <option>RIVET_OBJ_CMD</option> |
| <programlisting>RIVET_OBJ_CMD("mycmd",Rivet_MyCmd,private)</programlisting> |
| This macro ensures the command is defined as <command>::rivet::mycmd</command> |
| and its ClientData pointer is defined with the thread private data |
| </listitem> |
| <listitem> |
| Add the code of Rivet_MyCmd to src/mod_rivet_ng/rivetCore.c (in case |
| the code resides in a different file also src/Makefile.am should be |
| changed to tell the build system how to compile the code and |
| link it into mod_rivet.so) |
| </listitem> |
| <listitem> |
| If the code must have access to the request record in <command>private->r</command> |
| use the macro THREAD_PRIVATE_DATA in order to claim the thread private data, then |
| check for the validity of the pointer using the macro |
| CHECK_REQUEST_REC(private,"::rivet::<cmd_name>") |
| |
| <programlisting>TCL_CMD_HEADER(Rivet_MyCmd) |
| { |
| /* we have to get the thread private data */ |
| |
| THREAD_PRIVATE_DATA(private) |
| |
| /* if ::rivet::mycmd works within a request processing we have |
| * to check if 'private' is carrying a non null request_rec pointer |
| */ |
| |
| CHECK_REQUEST_REC(private,"::rivet::mycmd"); |
| .... |
| |
| return TCL_OK; |
| }</programlisting> |
| </listitem> |
| <listitem> |
| Add a test for this command in <option>tests/checkfails.tcl</option>. For |
| instance |
| <programlisting>... |
| check_fail no_body |
| check_fail virtual_filename unkn |
| check_fail my_cmd <arg1> <arg2> |
| ....</programlisting> |
| Where <option><arg1> <arg2></option> are optional |
| arguments in case the command has different forms depending on |
| the arguments. Then, if <command>::rivet::mycmd</command> must fail also |
| <option>tests/failtest.tcl</option> should modified as |
| <programlisting>virtual_filename->1 |
| mycmd->1</programlisting> |
| The value associated to the test must be <option>0</option> in case the |
| command doesn't need to test the <command>private->r</command> pointer. |
| </listitem> |
| </itemizedlist> |
| </section> |
| <section> |
| <title>Debugging Rivet and Apache</title> |
| <para> |
| If you are interested in hacking on Rivet, you're welcome to |
| contribute! Invariably, when working with code, things go |
| wrong, and it's necessary to do some debugging. In a server |
| environment like Apache, it can be a bit more difficult to |
| find the right way to do this. Here are some techniques to |
| try. |
| </para> |
| <para> |
| The first thing you should know is that Apache can be launched |
| as a <emphasis>single process</emphasis> with the |
| <option>-X</option> argument: |
| </para> |
| |
| <programlisting>httpd -X</programlisting>. |
| |
| <para> |
| On Linux, one of the first things to try is the system call |
| tracer, <command>strace</command>. You don't even have to |
| recompile Rivet or Apache for this to work. |
| </para> |
| |
| <programlisting>strace -o /tmp/outputfile -S 1000 httpd -X</programlisting> |
| |
| <para> |
| This command will run httpd in the system call tracer, |
| which leaves its output (there is potentially a lot of it) in |
| <filename>/tmp/outputfile</filename>. The <option>-S</option> |
| option tells <command></command>strace to only record the |
| first 1000 bytes of a syscall. Some calls such as |
| <function>write</function> can potentially be much longer than |
| this, so you may want to increase this number. The results |
| are a list of all the system calls made by the program. You |
| want to look at the end, where the failure presumably occured, |
| to see if you can find anything that looks like an error. If |
| you're not sure what to make of the results, you can always |
| ask on the Rivet development mailing list. |
| </para> |
| |
| <para> |
| If <command>strace</command> (or its equivalent on your |
| operating system) doesn't answer your question, it may be time |
| to debug Apache and Rivet. To do this, you will need to rebuild mod_rivet. |
| First of all you have to configure the build by running the |
| <command>./configure</command> script with the |
| <option>-enable-symbols</option> option and after you have |
| set the CFLAGS and LDFLAGS environment variables |
| </para> |
| |
| <programlisting>export CFLAGS="-g -O0" |
| export LDFLAGS="-g" |
| ./configure --enable-symbols ...... |
| make |
| make install</programlisting> |
| <para> |
| Arguments to <command>./configure</command> must fit your Apache HTTP |
| web server installation. See the output produced by |
| </para> |
| <programlisting>./configure --help</programlisting> |
| <para> |
| And check the <xref linkend="installation">installation</xref> page to |
| have further information. |
| Since it's easier to debug a single process, we'll still run |
| Apache in single process mode with -X: |
| </para> |
| |
| <programlisting>@ashland [~] $ gdb /usr/sbin/apache.dbg |
| GNU gdb 5.3-debian |
| Copyright 2002 Free Software Foundation, Inc. |
| GDB is free software, covered by the GNU General Public License, and you are |
| welcome to change it and/or distribute copies of it under certain conditions. |
| Type "show copying" to see the conditions. |
| There is absolutely no warranty for GDB. Type "show warranty" for details. |
| This GDB was configured as "powerpc-linux"... |
| (gdb) run -X |
| Starting program: /usr/sbin/apache.dbg -X |
| [New Thread 16384 (LWP 13598)] |
| . |
| . |
| .</programlisting> |
| |
| <para> |
| When your apache session is up and running, you can request a |
| web page with the browser, and see where things go wrong (if |
| you are dealing with a crash, for instance). |
| </para> |
| </section> |
| </section> |