| <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> |
| <HTML> |
| <HEAD> |
| <TITLE>The Apache EBCDIC Port</TITLE> |
| </HEAD> |
| |
| <!-- Background white, links blue (unvisited), navy (visited), red (active) --> |
| <BODY |
| BGCOLOR="#FFFFFF" |
| TEXT="#000000" |
| LINK="#0000FF" |
| VLINK="#000080" |
| ALINK="#FF0000" |
| > |
| <!--#include virtual="header.html" --> |
| <H1 ALIGN="CENTER">Overview of the Apache EBCDIC Port</H1> |
| |
| <P> |
| Version 1.3 of the Apache HTTP Server is the first version which |
| includes a port to a (non-ASCII) mainframe machine which uses |
| the EBCDIC character set as its native codeset.<BR> |
| (It is the SIEMENS family of mainframes running the |
| <A HREF="http://www.siemens.de/servers/bs2osd/osdbc_us.htm">BS2000/OSD |
| operating system</A>. This mainframe OS nowadays features a |
| SVR4-derived POSIX subsystem). |
| </P> |
| |
| <P> |
| The port was started initially to |
| <UL> |
| <LI> prove the feasibility of porting |
| <A HREF="http://dev.apache.org/">the Apache HTTP server</A> |
| to this platform |
| <LI> find a "worthy and capable" successor for the venerable |
| <A HREF="http://www.w3.org/Daemon/">CERN-3.0</A> daemon |
| (which was ported a couple of years ago), and to |
| <LI> prove that Apache's preforking process model can on this platform |
| easily outperform the accept-fork-serve model used by CERN by a |
| factor of 5 or more. |
| </UL> |
| </P> |
| |
| <P> |
| This document serves as a rationale to describe some of the design |
| decisions of the port to this machine. |
| </P> |
| |
| <H2 ALIGN=CENTER>Design Goals</H2> |
| <P> |
| One objective of the EBCDIC port was to maintain enough backwards |
| compatibility with the (EBCDIC) CERN server to make the transition to |
| the new server attractive and easy. This required the addition of |
| a configurable method to define whether a HTML document was stored |
| in ASCII (the only format accepted by the old server) or in EBCDIC |
| (the native document format in the POSIX subsystem, and therefore |
| the only realistic format in which the other POSIX tools like grep |
| or sed could operate on the documents). The current solution to |
| this is a "pseudo-MIME-format" which is intercepted and |
| interpreted by the Apache server (see below). Future versions |
| might solve the problem by defining an "ebcdic-handler" for all |
| documents which must be converted. |
| </P> |
| |
| <H2 ALIGN=CENTER>Technical Solution</H2> |
| <P> |
| Since all Apache input and output is based upon the BUFF data type |
| and its methods, the easiest solution was to add the conversion to |
| the BUFF handling routines. The conversion must be settable at any |
| time, so a BUFF flag was added which defines whether a BUFF object |
| has currently enabled conversion or not. This flag is modified at |
| several points in the HTTP protocol: |
| <UL> |
| <LI><STRONG>set</STRONG> before a request is received (because the |
| request and the request header lines are always in ASCII |
| format) |
| |
| <LI><STRONG>set/unset</STRONG> when the request body is |
| received - depending on the content type of the request body |
| (because the request body may contain ASCII text or a binary file) |
| |
| <LI><STRONG>set</STRONG> before a reply header is sent (because the |
| response header lines are always in ASCII format) |
| |
| <LI><STRONG>set/unset</STRONG> when the response body is |
| sent - depending on the content type of the response body |
| (because the response body may contain text or a binary file) |
| </UL> |
| </P> |
| |
| <H2 ALIGN=CENTER>Porting Notes</H2> |
| <P> |
| <OL> |
| <LI> |
| The relevant changes in the source are #ifdef'ed into two |
| categories: |
| <DL> |
| <DT><CODE><STRONG>#ifdef CHARSET_EBCDIC</STRONG></CODE> |
| <DD>Code which is needed for any EBCDIC based machine. This |
| includes character translations, differences in |
| contiguity of the two character sets, flags which |
| indicate which part of the HTTP protocol has to be |
| converted and which part doesn't <EM>etc.</EM> |
| <DT><CODE><STRONG>#ifdef _OSD_POSIX</STRONG></CODE> |
| <DD>Code which is needed for the SIEMENS BS2000/OSD |
| mainframe platform only. This deals with include file |
| differences and socket implementation topics which are |
| only required on the BS2000/OSD platform. |
| </DL> |
| </LI><BR> |
| |
| <LI> |
| The possibility to translate between ASCII and EBCDIC at the |
| socket level (on BS2000 POSIX, there is a socket option which |
| supports this) was intentionally <EM>not</EM> chosen, because |
| the byte stream at the HTTP protocol level consists of a |
| mixture of protocol related strings and non-protocol related |
| raw file data. HTTP protocol strings are always encoded in |
| ASCII (the GET request, any Header: lines, the chunking |
| information <EM>etc.</EM>) whereas the file transfer parts (<EM>i.e.</EM>, GIF |
| images, CGI output <EM>etc.</EM>) should usually be just "passed through" |
| by the server. This separation between "protocol string" and |
| "raw data" is reflected in the server code by functions like |
| bgets() or rvputs() for strings, and functions like bwrite() |
| for binary data. A global translation of everything would |
| therefore be inadequate.<BR> |
| (In the case of text files of course, provisions must be made so |
| that EBCDIC documents are always served in ASCII) |
| </LI><BR> |
| |
| <LI> |
| This port therefore features a built-in protocol level conversion |
| for the server-internal strings (which the compiler translated to |
| EBCDIC strings) and thus for all server-generated documents. |
| The hard coded ASCII escapes \012 and \015 which are |
| ubiquitous in the server code are an exception: they are |
| already the binary encoding of the ASCII \n and \r and must |
| not be converted to ASCII a second time. This exception is |
| only relevant for server-generated strings; and <EM>external</EM> |
| EBCDIC documents are not expected to contain ASCII newline characters. |
| </LI><BR> |
| |
| <LI> |
| By examining the call hierarchy for the BUFF management |
| routines, I added an "ebcdic/ascii conversion layer" which |
| would be crossed on every puts/write/get/gets, and a |
| conversion flag which allowed enabling/disabling the |
| conversions on-the-fly. Usually, a document crosses this |
| layer twice from its origin source (a file or CGI output) to |
| its destination (the requesting client): <SAMP>file -> |
| Apache</SAMP>, and <SAMP>Apache -> client</SAMP>.<BR> |
| The server can now read the header |
| lines of a CGI-script output in EBCDIC format, and then find |
| out that the remainder of the script's output is in ASCII |
| (like in the case of the output of a WWW Counter program: the |
| document body contains a GIF image). All header processing is |
| done in the native EBCDIC format; the server then determines, |
| based on the type of document being served, whether the |
| document body (except for the chunking information, of |
| course) is in ASCII already or must be converted from EBCDIC. |
| </LI><BR> |
| |
| <LI> |
| For Text documents (MIME types text/plain, text/html <EM>etc.</EM>), |
| an implicit translation to ASCII can be used, or (if the |
| users prefer to store some documents in raw ASCII form for |
| faster serving, or because the files reside on a NFS-mounted |
| directory tree) can be served without conversion. |
| <BR> |
| <STRONG>Example:</STRONG><BLOCKQUOTE> |
| to serve files with the suffix .ahtml as a raw ASCII text/html |
| document without implicit conversion (and suffix .ascii |
| as ASCII text/plain), use the directives:<PRE> |
| AddType text/x-ascii-html .ahtml |
| AddType text/x-ascii-plain .ascii |
| </PRE></BLOCKQUOTE> |
| Similarly, any text/XXXX MIME type can be served as "raw ASCII" by |
| configuring a MIME type "text/x-ascii-XXXX" for it using AddType. |
| </LI><BR> |
| |
| <LI> |
| Non-text documents are always served "binary" without conversion. |
| This seems to be the most sensible choice for, .<EM>e.g.</EM>, GIF/ZIP/AU |
| file types. This of course requires the user to copy them to the |
| mainframe host using the "rcp -b" binary switch. |
| </LI><BR> |
| |
| <LI> |
| Server parsed files are always assumed to be in native (<EM>i.e.</EM>, |
| EBCDIC) format as used on the machine, and are converted after |
| processing. |
| </LI><BR> |
| |
| <LI> |
| For CGI output, the CGI script determines whether a conversion is |
| needed or not: by setting the appropriate Content-Type, text files |
| can be converted, or GIF output can be passed through unmodified. |
| An example for the latter case is the wwwcount program which we ported |
| as well. |
| </LI><BR> |
| </OL> |
| </P> |
| |
| <H2 ALIGN=CENTER>Document Storage Notes</H2> |
| <H3 ALIGN=CENTER>Binary Files</H3> |
| <P> |
| All files with a <SAMP>Content-Type:</SAMP> which does not |
| start with <SAMP>text/</SAMP> are regarded as <EM>binary files</EM> |
| by the server and are not subject to any conversion. |
| Examples for binary files are GIF images, gzip-compressed |
| files and the like. |
| </P> |
| <P> |
| When exchanging binary files between the mainframe host and a |
| Unix machine or Windows PC, be sure to use the ftp "binary" |
| (<SAMP>TYPE I</SAMP>) command, or use the |
| <SAMP>rcp -b</SAMP> command from the mainframe host |
| (the -b switch is not supported in unix rcp's). |
| </P> |
| |
| <H3 ALIGN=CENTER>Text Documents</H3> |
| <P> |
| The default assumption of the server is that Text Files |
| (<EM>i.e.</EM>, all files whose <SAMP>Content-Type:</SAMP> starts with |
| <SAMP>text/</SAMP>) are stored in the native character |
| set of the host, EBCDIC. |
| </P> |
| |
| <H3 ALIGN=CENTER>Server Side Included Documents</H3> |
| <P> |
| SSI documents must currently be stored in EBCDIC only. No |
| provision is made to convert it from ASCII before processing. |
| </P> |
| |
| <H2 ALIGN=CENTER>Apache Modules' Status</H2> |
| <TABLE BORDER ALIGN=middle> |
| <TR> |
| <TH>Module |
| <TH>Status |
| <TH>Notes |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>http_core |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_access |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_actions |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_alias |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_asis |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_auth |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_auth_anon |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_auth_db |
| <TD ALIGN=CENTER>? |
| <TD>with own libdb.a |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_auth_dbm |
| <TD ALIGN=CENTER>? |
| <TD>with own libdb.a |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_autoindex |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_cern_meta |
| <TD ALIGN=CENTER>? |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_cgi |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_digest |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_dir |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_so |
| <TD ALIGN=CENTER>- |
| <TD>no shared libs |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_env |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_example |
| <TD ALIGN=CENTER>- |
| <TD>(test bed only) |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_expires |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_headers |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_imap |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_include |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_info |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_log_agent |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_log_config |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_log_referer |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_mime |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_mime_magic |
| <TD ALIGN=CENTER>? |
| <TD>not ported yet |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_negotiation |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_proxy |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_rewrite |
| <TD ALIGN=CENTER>+ |
| <TD>untested |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_setenvif |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_speling |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_status |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_unique_id |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_userdir |
| <TD ALIGN=CENTER>+ |
| <TD> |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT>mod_usertrack |
| <TD ALIGN=CENTER>? |
| <TD>untested |
| </TR> |
| </TABLE> |
| |
| <H2 ALIGN=CENTER>Third Party Modules' Status</H2> |
| <TABLE BORDER ALIGN=middle> |
| <TR> |
| <TH>Module |
| <TH>Status |
| <TH>Notes |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT><A HREF="http://java.apache.org/">mod_jserv</A> |
| <TD ALIGN=CENTER>- |
| <TD>JAVA still being ported. |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT><A HREF="http://www.php.net/">mod_php3</A> |
| <TD ALIGN=CENTER>+ |
| <TD>mod_php3 runs fine, with LDAP and GD and FreeType libraries |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT |
| ><A HREF="http://hpwww.ec-lyon.fr/~vincent/apache/mod_put.html">mod_put</A> |
| <TD ALIGN=CENTER>? |
| <TD>untested |
| </TR> |
| |
| <TR> |
| <TD ALIGN=LEFT |
| ><A HREF="ftp://hachiman.vidya.com/pub/apache/">mod_session</A> |
| <TD ALIGN=CENTER>- |
| <TD>untested |
| </TR> |
| |
| </TABLE> |
| |
| <!--#include virtual="footer.html" --> |
| </BODY> |
| </HTML> |