|  | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> | 
|  | <HTML> | 
|  | <HEAD> | 
|  | <TITLE>The Apache EBCDIC Port</TITLE> | 
|  | </HEAD> | 
|  |  | 
|  | <!-- Background white, links blue (unvisited), navy (visited), red (active) --> | 
|  | <BODY | 
|  | BGCOLOR="#FFFFFF" | 
|  | TEXT="#000000" | 
|  | LINK="#0000FF" | 
|  | VLINK="#000080" | 
|  | ALINK="#FF0000" | 
|  | > | 
|  | <!--#include virtual="header.html" --> | 
|  |  | 
|  | <blockquote><strong>Warning:</strong> | 
|  | This document has not been updated to take into account changes | 
|  | made in the 2.0 version of the Apache HTTP Server.  Some of the | 
|  | information may still be relevant, but please use it | 
|  | with care. | 
|  | </blockquote> | 
|  |  | 
|  | <H1 ALIGN="CENTER">Overview of the Apache EBCDIC Port</H1> | 
|  |  | 
|  | <P> | 
|  | Version 1.3 of the Apache HTTP Server is the first version which | 
|  | includes a port to a (non-ASCII) mainframe machine which uses | 
|  | the EBCDIC character set as its native codeset.<BR> | 
|  | (It is the SIEMENS family of mainframes running the | 
|  | <A HREF="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD | 
|  | operating system</A>. This mainframe OS nowadays features a | 
|  | SVR4-derived POSIX subsystem). | 
|  | </P> | 
|  |  | 
|  | <P> | 
|  | The port was started initially to | 
|  | <UL> | 
|  | <LI> prove the feasibility of porting | 
|  | <A HREF="http://dev.apache.org/">the Apache HTTP server</A> | 
|  | to this platform | 
|  | <LI> find a "worthy and capable" successor for the venerable | 
|  | <A HREF="http://www.w3.org/Daemon/">CERN-3.0</A> daemon | 
|  | (which was ported a couple of years ago), and to | 
|  | <LI> prove that Apache's preforking process model can on this platform | 
|  | easily outperform the accept-fork-serve model used by CERN by a | 
|  | factor of 5 or more. | 
|  | </UL> | 
|  | </P> | 
|  |  | 
|  | <P> | 
|  | This document serves as a rationale to describe some of the design | 
|  | decisions of the port to this machine. | 
|  | </P> | 
|  |  | 
|  | <H2 ALIGN=CENTER>Design Goals</H2> | 
|  | <P> | 
|  | One objective of the EBCDIC port was to maintain enough backwards | 
|  | compatibility with the (EBCDIC) CERN server to make the transition to | 
|  | the new server attractive and easy. This required the addition of | 
|  | a configurable method to define whether a HTML document was stored | 
|  | in ASCII (the only format accepted by the old server) or in EBCDIC | 
|  | (the native document format in the POSIX subsystem, and therefore | 
|  | the only realistic format in which the other POSIX tools like grep | 
|  | or sed could operate on the documents). The current solution to | 
|  | this is a "pseudo-MIME-format" which is intercepted and | 
|  | interpreted by the Apache server (see below). Future versions | 
|  | might solve the problem by defining an "ebcdic-handler" for all | 
|  | documents which must be converted. | 
|  | </P> | 
|  |  | 
|  | <H2 ALIGN=CENTER>Technical Solution</H2> | 
|  | <P> | 
|  | Since all Apache input and output is based upon the BUFF data type | 
|  | and its methods, the easiest solution was to add the conversion to | 
|  | the BUFF handling routines. The conversion must be settable at any | 
|  | time, so a BUFF flag was added which defines whether a BUFF object | 
|  | has currently enabled conversion or not. This flag is modified at | 
|  | several points in the HTTP protocol: | 
|  | <UL> | 
|  | <LI><STRONG>set</STRONG> before a request is received (because the | 
|  | request and the request header lines are always in ASCII | 
|  | format) | 
|  |  | 
|  | <LI><STRONG>set/unset</STRONG> when the request body is | 
|  | received - depending on the content type of the request body | 
|  | (because the request body may contain ASCII text or a binary file) | 
|  |  | 
|  | <LI><STRONG>set</STRONG> before a reply header is sent (because the | 
|  | response header lines are always in ASCII format) | 
|  |  | 
|  | <LI><STRONG>set/unset</STRONG> when the response body is | 
|  | sent - depending on the content type of the response body | 
|  | (because the response body may contain text or a binary file) | 
|  | </UL> | 
|  | </P> | 
|  |  | 
|  | <H2 ALIGN=CENTER>Porting Notes</H2> | 
|  | <P> | 
|  | <OL> | 
|  | <LI> | 
|  | The relevant changes in the source are #ifdef'ed into two | 
|  | categories: | 
|  | <DL> | 
|  | <DT><CODE><STRONG>#ifdef CHARSET_EBCDIC</STRONG></CODE> | 
|  | <DD>Code which is needed for any EBCDIC based machine. This | 
|  | includes character translations, differences in | 
|  | contiguity of the two character sets, flags which | 
|  | indicate which part of the HTTP protocol has to be | 
|  | converted and which part doesn't <EM>etc.</EM> | 
|  | <DT><CODE><STRONG>#ifdef _OSD_POSIX</STRONG></CODE> | 
|  | <DD>Code which is needed for the SIEMENS BS2000/OSD | 
|  | mainframe platform only. This deals with include file | 
|  | differences and socket implementation topics which are | 
|  | only required on the BS2000/OSD platform. | 
|  | </DL> | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | The possibility to translate between ASCII and EBCDIC at the | 
|  | socket level (on BS2000 POSIX, there is a socket option which | 
|  | supports this) was intentionally <EM>not</EM> chosen, because | 
|  | the byte stream at the HTTP protocol level consists of a | 
|  | mixture of protocol related strings and non-protocol related | 
|  | raw file data. HTTP protocol strings are always encoded in | 
|  | ASCII (the GET request, any Header: lines, the chunking | 
|  | information <EM>etc.</EM>) whereas the file transfer parts (<EM>i.e.</EM>, GIF | 
|  | images, CGI output <EM>etc.</EM>) should usually be just "passed through" | 
|  | by the server. This separation between "protocol string" and | 
|  | "raw data" is reflected in the server code by functions like | 
|  | bgets() or rvputs() for strings, and functions like bwrite() | 
|  | for binary data. A global translation of everything would | 
|  | therefore be inadequate.<BR> | 
|  | (In the case of text files of course, provisions must be made so | 
|  | that EBCDIC documents are always served in ASCII) | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | This port therefore features a built-in protocol level conversion | 
|  | for the server-internal strings (which the compiler translated to | 
|  | EBCDIC strings) and thus for all server-generated documents. | 
|  | The hard coded ASCII escapes \012 and \015 which are | 
|  | ubiquitous in the server code are an exception: they are | 
|  | already the binary encoding of the ASCII \n and \r and must | 
|  | not be converted to ASCII a second time. This exception is | 
|  | only relevant for server-generated strings; and <EM>external</EM> | 
|  | EBCDIC documents are not expected to contain ASCII newline characters. | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | By examining the call hierarchy for the BUFF management | 
|  | routines, I added an "ebcdic/ascii conversion layer" which | 
|  | would be crossed on every puts/write/get/gets, and a | 
|  | conversion flag which allowed enabling/disabling the | 
|  | conversions on-the-fly. Usually, a document crosses this | 
|  | layer twice from its origin source (a file or CGI output) to | 
|  | its destination (the requesting client): <SAMP>file -> | 
|  | Apache</SAMP>, and <SAMP>Apache -> client</SAMP>.<BR> | 
|  | The server can now read the header | 
|  | lines of a CGI-script output in EBCDIC format, and then find | 
|  | out that the remainder of the script's output is in ASCII | 
|  | (like in the case of the output of a WWW Counter program: the | 
|  | document body contains a GIF image). All header processing is | 
|  | done in the native EBCDIC format; the server then determines, | 
|  | based on the type of document being served, whether the | 
|  | document body (except for the chunking information, of | 
|  | course) is in ASCII already or must be converted from EBCDIC. | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | For Text documents (MIME types text/plain, text/html <EM>etc.</EM>), | 
|  | an implicit translation to ASCII can be used, or (if the | 
|  | users prefer to store some documents in raw ASCII form for | 
|  | faster serving, or because the files reside on a NFS-mounted | 
|  | directory tree) can be served without conversion. | 
|  | <BR> | 
|  | <STRONG>Example:</STRONG><BLOCKQUOTE> | 
|  | to serve files with the suffix .ahtml as a raw ASCII text/html | 
|  | document without implicit conversion (and suffix .ascii | 
|  | as ASCII text/plain), use the directives:<PRE> | 
|  | AddType  text/x-ascii-html  .ahtml | 
|  | AddType  text/x-ascii-plain .ascii | 
|  | </PRE></BLOCKQUOTE> | 
|  | Similarly, any text/XXXX MIME type can be served as "raw ASCII" by | 
|  | configuring a MIME type "text/x-ascii-XXXX" for it using AddType. | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | Non-text documents are always served "binary" without conversion. | 
|  | This seems to be the most sensible choice for, .<EM>e.g.</EM>, GIF/ZIP/AU | 
|  | file types. This of course requires the user to copy them to the | 
|  | mainframe host using the "rcp -b" binary switch. | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | Server parsed files are always assumed to be in native (<EM>i.e.</EM>, | 
|  | EBCDIC) format as used on the machine, and are converted after | 
|  | processing. | 
|  | </LI><BR> | 
|  |  | 
|  | <LI> | 
|  | For CGI output, the CGI script determines whether a conversion is | 
|  | needed or not: by setting the appropriate Content-Type, text files | 
|  | can be converted, or GIF output can be passed through unmodified. | 
|  | An example for the latter case is the wwwcount program which we ported | 
|  | as well. | 
|  | </LI><BR> | 
|  | </OL> | 
|  | </P> | 
|  |  | 
|  | <H2 ALIGN=CENTER>Document Storage Notes</H2> | 
|  | <H3 ALIGN=CENTER>Binary Files</H3> | 
|  | <P> | 
|  | All files with a <SAMP>Content-Type:</SAMP> which does not | 
|  | start with <SAMP>text/</SAMP> are regarded as <EM>binary files</EM> | 
|  | by the server and are not subject to any conversion. | 
|  | Examples for binary files are GIF images, gzip-compressed | 
|  | files and the like. | 
|  | </P> | 
|  | <P> | 
|  | When exchanging binary files between the mainframe host and a | 
|  | Unix machine or Windows PC, be sure to use the ftp "binary" | 
|  | (<SAMP>TYPE I</SAMP>) command, or use the | 
|  | <SAMP>rcp -b</SAMP> command from the mainframe host | 
|  | (the -b switch is not supported in unix rcp's). | 
|  | </P> | 
|  |  | 
|  | <H3 ALIGN=CENTER>Text Documents</H3> | 
|  | <P> | 
|  | The default assumption of the server is that Text Files | 
|  | (<EM>i.e.</EM>, all files whose <SAMP>Content-Type:</SAMP> starts with | 
|  | <SAMP>text/</SAMP>) are stored in the native character | 
|  | set of the host, EBCDIC. | 
|  | </P> | 
|  |  | 
|  | <H3 ALIGN=CENTER>Server Side Included Documents</H3> | 
|  | <P> | 
|  | SSI documents must currently be stored in EBCDIC only. No | 
|  | provision is made to convert it from ASCII before processing. | 
|  | </P> | 
|  |  | 
|  | <H2 ALIGN=CENTER>Apache Modules' Status</H2> | 
|  | <TABLE BORDER ALIGN=middle> | 
|  | <TR> | 
|  | <TH>Module | 
|  | <TH>Status | 
|  | <TH>Notes | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>http_core | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_access | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_actions | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_alias | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_asis | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_auth | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_auth_anon | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_auth_db | 
|  | <TD ALIGN=CENTER>? | 
|  | <TD>with own libdb.a | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_auth_dbm | 
|  | <TD ALIGN=CENTER>? | 
|  | <TD>with own libdb.a | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_autoindex | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_cern_meta | 
|  | <TD ALIGN=CENTER>? | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_cgi | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_digest | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_dir | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_so | 
|  | <TD ALIGN=CENTER>- | 
|  | <TD>no shared libs | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_env | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_example | 
|  | <TD ALIGN=CENTER>- | 
|  | <TD>(test bed only) | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_expires | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_headers | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_imap | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_include | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_info | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_log_agent | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_log_config | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_log_referer | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_mime | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_mime_magic | 
|  | <TD ALIGN=CENTER>? | 
|  | <TD>not ported yet | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_negotiation | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_proxy | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_rewrite | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD>untested | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_setenvif | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_speling | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_status | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_unique_id | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_userdir | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD> | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT>mod_usertrack | 
|  | <TD ALIGN=CENTER>? | 
|  | <TD>untested | 
|  | </TR> | 
|  | </TABLE> | 
|  |  | 
|  | <H2 ALIGN=CENTER>Third Party Modules' Status</H2> | 
|  | <TABLE BORDER ALIGN=middle> | 
|  | <TR> | 
|  | <TH>Module | 
|  | <TH>Status | 
|  | <TH>Notes | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT><A HREF="http://java.apache.org/">mod_jserv</A> | 
|  | <TD ALIGN=CENTER>- | 
|  | <TD>JAVA still being ported. | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT><A HREF="http://www.php.net/">mod_php3</A> | 
|  | <TD ALIGN=CENTER>+ | 
|  | <TD>mod_php3 runs fine, with LDAP and GD and FreeType libraries | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT | 
|  | ><A HREF="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html">mod_put</A> | 
|  | <TD ALIGN=CENTER>? | 
|  | <TD>untested | 
|  | </TR> | 
|  |  | 
|  | <TR> | 
|  | <TD ALIGN=LEFT | 
|  | ><A HREF="ftp://hachiman.vidya.com/pub/apache/">mod_session</A> | 
|  | <TD ALIGN=CENTER>- | 
|  | <TD>untested | 
|  | </TR> | 
|  |  | 
|  | </TABLE> | 
|  |  | 
|  | <!--#include virtual="footer.html" --> | 
|  | </BODY> | 
|  | </HTML> |