| <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" |
| "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
| |
| <html xmlns="http://www.w3.org/1999/xhtml"> |
| <head> |
| <meta name="generator" content="HTML Tidy, see www.w3.org" /> |
| |
| <title>The Apache EBCDIC Port</title> |
| </head> |
| <!-- Background white, links blue (unvisited), navy (visited), red (active) --> |
| |
| <body bgcolor="#FFFFFF" text="#000000" link="#0000FF" |
| vlink="#000080" alink="#FF0000"> |
| <!--#include virtual="header.html" --> |
| |
| <blockquote> |
| <strong>Warning:</strong> This document has not been updated |
| to take into account changes made in the 2.0 version of the |
| Apache HTTP Server. Some of the information may still be |
| relevant, but please use it with care. |
| </blockquote> |
| |
| <h1 align="CENTER">Overview of the Apache EBCDIC Port</h1> |
| |
| <p>Version 1.3 of the Apache HTTP Server is the first version |
| which includes a port to a (non-ASCII) mainframe machine which |
| uses the EBCDIC character set as its native codeset.<br /> |
| (It is the SIEMENS family of mainframes running the <a |
| href="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD |
| operating system</a>. This mainframe OS nowadays features a |
| SVR4-derived POSIX subsystem).</p> |
| |
| <p>The port was started initially to</p> |
| |
| <ul> |
| <li>prove the feasibility of porting <a |
| href="http://dev.apache.org/">the Apache HTTP server</a> to |
| this platform</li> |
| |
| <li>find a "worthy and capable" successor for the venerable |
| <a href="http://www.w3.org/Daemon/">CERN-3.0</a> daemon |
| (which was ported a couple of years ago), and to</li> |
| |
| <li>prove that Apache's preforking process model can on this |
| platform easily outperform the accept-fork-serve model used |
| by CERN by a factor of 5 or more.</li> |
| </ul> |
| <br /> |
| <br /> |
| |
| |
| <p>This document serves as a rationale to describe some of the |
| design decisions of the port to this machine.</p> |
| |
| <h2 align="CENTER">Design Goals</h2> |
| |
| <p>One objective of the EBCDIC port was to maintain enough |
| backwards compatibility with the (EBCDIC) CERN server to make |
| the transition to the new server attractive and easy. This |
| required the addition of a configurable method to define |
| whether a HTML document was stored in ASCII (the only format |
| accepted by the old server) or in EBCDIC (the native document |
| format in the POSIX subsystem, and therefore the only realistic |
| format in which the other POSIX tools like grep or sed could |
| operate on the documents). The current solution to this is a |
| "pseudo-MIME-format" which is intercepted and interpreted by |
| the Apache server (see below). Future versions might solve the |
| problem by defining an "ebcdic-handler" for all documents which |
| must be converted.</p> |
| |
| <h2 align="CENTER">Technical Solution</h2> |
| |
| <p>Since all Apache input and output is based upon the BUFF |
| data type and its methods, the easiest solution was to add the |
| conversion to the BUFF handling routines. The conversion must |
| be settable at any time, so a BUFF flag was added which defines |
| whether a BUFF object has currently enabled conversion or not. |
| This flag is modified at several points in the HTTP |
| protocol:</p> |
| |
| <ul> |
| <li><strong>set</strong> before a request is received |
| (because the request and the request header lines are always |
| in ASCII format)</li> |
| |
| <li><strong>set/unset</strong> when the request body is |
| received - depending on the content type of the request body |
| (because the request body may contain ASCII text or a binary |
| file)</li> |
| |
| <li><strong>set</strong> before a reply header is sent |
| (because the response header lines are always in ASCII |
| format)</li> |
| |
| <li><strong>set/unset</strong> when the response body is sent |
| - depending on the content type of the response body (because |
| the response body may contain text or a binary file)</li> |
| </ul> |
| <br /> |
| <br /> |
| |
| |
| <h2 align="CENTER">Porting Notes</h2> |
| |
| <ol> |
| <li> |
| The relevant changes in the source are #ifdef'ed into two |
| categories: |
| |
| <dl> |
| <dt><code><strong>#ifdef |
| CHARSET_EBCDIC</strong></code></dt> |
| |
| <dd>Code which is needed for any EBCDIC based machine. |
| This includes character translations, differences in |
| contiguity of the two character sets, flags which |
| indicate which part of the HTTP protocol has to be |
| converted and which part doesn't <em>etc.</em></dd> |
| |
| <dt><code><strong>#ifdef _OSD_POSIX</strong></code></dt> |
| |
| <dd>Code which is needed for the SIEMENS BS2000/OSD |
| mainframe platform only. This deals with include file |
| differences and socket implementation topics which are |
| only required on the BS2000/OSD platform.</dd> |
| </dl> |
| </li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li>The possibility to translate between ASCII and EBCDIC at |
| the socket level (on BS2000 POSIX, there is a socket option |
| which supports this) was intentionally <em>not</em> chosen, |
| because the byte stream at the HTTP protocol level consists |
| of a mixture of protocol related strings and non-protocol |
| related raw file data. HTTP protocol strings are always |
| encoded in ASCII (the GET request, any Header: lines, the |
| chunking information <em>etc.</em>) whereas the file transfer |
| parts (<em>i.e.</em>, GIF images, CGI output <em>etc.</em>) |
| should usually be just "passed through" by the server. This |
| separation between "protocol string" and "raw data" is |
| reflected in the server code by functions like bgets() or |
| rvputs() for strings, and functions like bwrite() for binary |
| data. A global translation of everything would therefore be |
| inadequate.<br /> |
| (In the case of text files of course, provisions must be |
| made so that EBCDIC documents are always served in |
| ASCII)</li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li>This port therefore features a built-in protocol level |
| conversion for the server-internal strings (which the |
| compiler translated to EBCDIC strings) and thus for all |
| server-generated documents. The hard coded ASCII escapes \012 |
| and \015 which are ubiquitous in the server code are an |
| exception: they are already the binary encoding of the ASCII |
| \n and \r and must not be converted to ASCII a second time. |
| This exception is only relevant for server-generated strings; |
| and <em>external</em> EBCDIC documents are not expected to |
| contain ASCII newline characters.</li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li>By examining the call hierarchy for the BUFF management |
| routines, I added an "ebcdic/ascii conversion layer" which |
| would be crossed on every puts/write/get/gets, and a |
| conversion flag which allowed enabling/disabling the |
| conversions on-the-fly. Usually, a document crosses this |
| layer twice from its origin source (a file or CGI output) to |
| its destination (the requesting client): <samp>file -> |
| Apache</samp>, and <samp>Apache -> client</samp>.<br /> |
| The server can now read the header lines of a CGI-script |
| output in EBCDIC format, and then find out that the remainder |
| of the script's output is in ASCII (like in the case of the |
| output of a WWW Counter program: the document body contains a |
| GIF image). All header processing is done in the native |
| EBCDIC format; the server then determines, based on the type |
| of document being served, whether the document body (except |
| for the chunking information, of course) is in ASCII already |
| or must be converted from EBCDIC.</li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li> |
| For Text documents (MIME types text/plain, text/html |
| <em>etc.</em>), an implicit translation to ASCII can be |
| used, or (if the users prefer to store some documents in |
| raw ASCII form for faster serving, or because the files |
| reside on a NFS-mounted directory tree) can be served |
| without conversion.<br /> |
| <strong>Example:</strong> |
| |
| <blockquote> |
| to serve files with the suffix .ahtml as a raw ASCII |
| text/html document without implicit conversion (and |
| suffix .ascii as ASCII text/plain), use the directives: |
| <pre> |
| AddType text/x-ascii-html .ahtml |
| AddType text/x-ascii-plain .ascii |
| |
| </pre> |
| </blockquote> |
| Similarly, any text/foo MIME type can be served as "raw |
| ASCII" by configuring a MIME type "text/x-ascii-foo" for it |
| using AddType. |
| </li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li>Non-text documents are always served "binary" without |
| conversion. This seems to be the most sensible choice for, |
| .<em>e.g.</em>, GIF/ZIP/AU file types. This of course |
| requires the user to copy them to the mainframe host using |
| the "rcp -b" binary switch.</li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li>Server parsed files are always assumed to be in native |
| (<em>i.e.</em>, EBCDIC) format as used on the machine, and |
| are converted after processing.</li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| |
| <li>For CGI output, the CGI script determines whether a |
| conversion is needed or not: by setting the appropriate |
| Content-Type, text files can be converted, or GIF output can |
| be passed through unmodified. An example for the latter case |
| is the wwwcount program which we ported as well.</li> |
| |
| <li style="list-style: none"><br /> |
| </li> |
| </ol> |
| <br /> |
| <br /> |
| |
| |
| <h2 align="CENTER">Document Storage Notes</h2> |
| |
| <h3 align="CENTER">Binary Files</h3> |
| |
| <p>All files with a <samp>Content-Type:</samp> which does not |
| start with <samp>text/</samp> are regarded as <em>binary |
| files</em> by the server and are not subject to any conversion. |
| Examples for binary files are GIF images, gzip-compressed files |
| and the like.</p> |
| |
| <p>When exchanging binary files between the mainframe host and |
| a Unix machine or Windows PC, be sure to use the ftp "binary" |
| (<samp>TYPE I</samp>) command, or use the |
| <samp>rcp -b</samp> command from the mainframe host (the |
| -b switch is not supported in unix rcp's).</p> |
| |
| <h3 align="CENTER">Text Documents</h3> |
| |
| <p>The default assumption of the server is that Text Files |
| (<em>i.e.</em>, all files whose <samp>Content-Type:</samp> |
| starts with <samp>text/</samp>) are stored in the native |
| character set of the host, EBCDIC.</p> |
| |
| <h3 align="CENTER">Server Side Included Documents</h3> |
| |
| <p>SSI documents must currently be stored in EBCDIC only. No |
| provision is made to convert it from ASCII before |
| processing.</p> |
| |
| <h2 align="CENTER">Apache Modules' Status</h2> |
| |
| <table border="1" align="middle"> |
| <tr> |
| <th>Module</th> |
| |
| <th>Status</th> |
| |
| <th>Notes</th> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">http_core</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_access</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_actions</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_alias</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_asis</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_auth</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_auth_anon</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_auth_dbm</td> |
| |
| <td align="CENTER">?</td> |
| |
| <td>with own libdb.a</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_autoindex</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_cern_meta</td> |
| |
| <td align="CENTER">?</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_cgi</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_digest</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_dir</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_so</td> |
| |
| <td align="CENTER">-</td> |
| |
| <td>no shared libs</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_env</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_example</td> |
| |
| <td align="CENTER">-</td> |
| |
| <td>(test bed only)</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_expires</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_headers</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_imap</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_include</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_info</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_log_agent</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_log_config</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_log_referer</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_mime</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_mime_magic</td> |
| |
| <td align="CENTER">?</td> |
| |
| <td>not ported yet</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_negotiation</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_proxy</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_rewrite</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td>untested</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_setenvif</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_speling</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_status</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_unique_id</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_userdir</td> |
| |
| <td align="CENTER">+</td> |
| |
| <td> |
| </td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT">mod_usertrack</td> |
| |
| <td align="CENTER">?</td> |
| |
| <td>untested</td> |
| </tr> |
| </table> |
| |
| <h2 align="CENTER">Third Party Modules' Status</h2> |
| |
| <table border="1" align="middle"> |
| <tr> |
| <th>Module</th> |
| |
| <th>Status</th> |
| |
| <th>Notes</th> |
| </tr> |
| |
| <tr> |
| <td align="LEFT"><a |
| href="http://java.apache.org/">mod_jserv</a> </td> |
| |
| <td align="CENTER">-</td> |
| |
| <td>JAVA still being ported.</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT"><a href="http://www.php.net/">mod_php3</a> |
| </td> |
| |
| <td align="CENTER">+</td> |
| |
| <td>mod_php3 runs fine, with LDAP and GD and FreeType |
| libraries</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT"><a |
| href="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html"> |
| mod_put</a> </td> |
| |
| <td align="CENTER">?</td> |
| |
| <td>untested</td> |
| </tr> |
| |
| <tr> |
| <td align="LEFT"><a |
| href="ftp://hachiman.vidya.com/pub/apache/">mod_session</a> |
| </td> |
| |
| <td align="CENTER">-</td> |
| |
| <td>untested</td> |
| </tr> |
| </table> |
| <!--#include virtual="footer.html" --> |
| </body> |
| </html> |
| |