blob: 4c695d24c959b4a8819209f5e0412043c7ca7842 [file] [log] [blame]
<?xml version='1.0' encoding='UTF-8' ?>
<!DOCTYPE manualpage SYSTEM "../style/manualpage.dtd">
<?xml-stylesheet type="text/xsl" href="../style/manual.en.xsl"?>
<!-- $LastChangedRevision$ -->
<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<manualpage metafile="cgi.xml.meta">
<parentdocument href="./">How-To / Tutorials</parentdocument>
<title>Apache Tutorial: Dynamic Content with CGI</title>
<section id="intro">
<title>Introduction</title>
<related>
<modulelist>
<module>mod_alias</module>
<module>mod_cgi</module>
<module>mod_cgid</module>
</modulelist>
<directivelist>
<directive module="mod_mime">AddHandler</directive>
<directive module="core">Options</directive>
<directive module="mod_alias">ScriptAlias</directive>
</directivelist>
</related>
<p>The CGI (Common Gateway Interface) defines a way for a web
server to interact with external content-generating programs,
which are often referred to as CGI programs or CGI scripts. It
is a simple way to put dynamic content on
your web site, using whatever programming language you're most
familiar with. This document will be an introduction to setting
up CGI on your Apache web server, and getting started writing
CGI programs.</p>
</section>
<section id="configuring">
<title>Configuring Apache to permit CGI</title>
<p>In order to get your CGI programs to work properly, you'll
need to have Apache configured to permit CGI execution. There
are several ways to do this.</p>
<note type="warning">Note: If Apache has been built with shared module
support you need to ensure that the module is loaded; in your
<code>httpd.conf</code> you need to make sure the
<directive module="mod_so">LoadModule</directive>
directive has not been commented out. A correctly configured directive
may look like this:
<highlight language="config">
LoadModule cgid_module modules/mod_cgid.so
</highlight>
On Windows, or using a non-threaded MPM like prefork, A correctly
configured directive may look like this:
<highlight language="config">
LoadModule cgi_module modules/mod_cgi.so
</highlight></note>
<section id="scriptalias">
<title>ScriptAlias</title>
<p>The
<directive module="mod_alias">ScriptAlias</directive>
directive tells Apache that a particular directory is set
aside for CGI programs. Apache will assume that every file in
this directory is a CGI program, and will attempt to execute
it, when that particular resource is requested by a
client.</p>
<p>The <directive module="mod_alias">ScriptAlias</directive>
directive looks like:</p>
<highlight language="config">
ScriptAlias "/cgi-bin/" "/usr/local/apache2/cgi-bin/"
</highlight>
<p>The example shown is from your default <code>httpd.conf</code>
configuration file, if you installed Apache in the default
location. The <directive module="mod_alias">ScriptAlias</directive>
directive is much like the <directive module="mod_alias"
>Alias</directive> directive, which defines a URL prefix that
is to mapped to a particular directory. <directive>Alias</directive>
and <directive>ScriptAlias</directive> are usually used for
directories that are outside of the <directive module="core"
>DocumentRoot</directive> directory. The difference between
<directive>Alias</directive> and <directive>ScriptAlias</directive>
is that <directive>ScriptAlias</directive> has the added meaning
that everything under that URL prefix will be considered a CGI
program. So, the example above tells Apache that any request for a
resource beginning with <code>/cgi-bin/</code> should be served from
the directory <code>/usr/local/apache2/cgi-bin/</code>, and should be
treated as a CGI program.</p>
<p>For example, if the URL
<code>http://www.example.com/cgi-bin/test.pl</code>
is requested, Apache will attempt to execute the file
<code>/usr/local/apache2/cgi-bin/test.pl</code>
and return the output. Of course, the file will have to
exist, and be executable, and return output in a particular
way, or Apache will return an error message.</p>
</section>
<section id="nonscriptalias">
<title>CGI outside of ScriptAlias directories</title>
<p>CGI programs are often restricted to <directive module="mod_alias"
>ScriptAlias</directive>'ed directories for security reasons.
In this way, administrators can tightly control who is allowed to
use CGI programs. However, if the proper security precautions are
taken, there is no reason why CGI programs cannot be run from
arbitrary directories. For example, you may wish to let users
have web content in their home directories with the
<directive module="mod_userdir">UserDir</directive> directive.
If they want to have their own CGI programs, but don't have access to
the main <code>cgi-bin</code> directory, they will need to be able to
run CGI programs elsewhere.</p>
<p>There are two steps to allowing CGI execution in an arbitrary
directory. First, the <code>cgi-script</code> handler must be
activated using the <directive
module="mod_mime">AddHandler</directive> or <directive
module="core">SetHandler</directive> directive. Second,
<code>ExecCGI</code> must be specified in the <directive
module="core">Options</directive> directive.</p>
</section>
<section id="options">
<title>Explicitly using Options to permit CGI execution</title>
<p>You could explicitly use the <directive module="core"
>Options</directive> directive, inside your main server configuration
file, to specify that CGI execution was permitted in a particular
directory:</p>
<highlight language="config">
&lt;Directory "/usr/local/apache2/htdocs/somedir"&gt;
Options +ExecCGI
&lt;/Directory&gt;
</highlight>
<p>The above directive tells Apache to permit the execution
of CGI files. You will also need to tell the server what
files are CGI files. The following <directive module="mod_mime"
>AddHandler</directive> directive tells the server to treat all
files with the <code>cgi</code> or <code>pl</code> extension as CGI
programs:</p>
<highlight language="config">
AddHandler cgi-script .cgi .pl
</highlight>
</section>
<section id="htaccess">
<title>.htaccess files</title>
<p>The <a href="htaccess.html"><code>.htaccess</code> tutorial</a>
shows how to activate CGI programs if you do not have
access to <code>httpd.conf</code>.</p>
</section>
<section id="userdir">
<title>User Directories</title>
<p>To allow CGI program execution for any file ending in
<code>.cgi</code> in users' directories, you can use the
following configuration.</p>
<highlight language="config">
&lt;Directory "/home/*/public_html"&gt;
Options +ExecCGI
AddHandler cgi-script .cgi
&lt;/Directory&gt;
</highlight>
<p>If you wish designate a <code>cgi-bin</code> subdirectory of
a user's directory where everything will be treated as a CGI
program, you can use the following.</p>
<highlight language="config">
&lt;Directory "/home/*/public_html/cgi-bin"&gt;
Options ExecCGI
SetHandler cgi-script
&lt;/Directory&gt;
</highlight>
</section>
</section>
<section id="writing">
<title>Writing a CGI program</title>
<p>There are two main differences between ``regular''
programming, and CGI programming.</p>
<p>First, all output from your CGI program must be preceded by
a <glossary>MIME-type</glossary> header. This is HTTP header that tells the client
what sort of content it is receiving. Most of the time, this
will look like:</p>
<example>
Content-type: text/html
</example>
<p>Secondly, your output needs to be in HTML, or some other
format that a browser will be able to display. Most of the
time, this will be HTML, but occasionally you might write a CGI
program that outputs a gif image, or other non-HTML
content.</p>
<p>Apart from those two things, writing a CGI program will look
a lot like any other program that you might write.</p>
<section id="firstcgi">
<title>Your first CGI program</title>
<p>The following is an example CGI program that prints one
line to your browser. Type in the following, save it to a
file called <code>first.pl</code>, and put it in your
<code>cgi-bin</code> directory.</p>
<highlight language="perl">
#!/usr/bin/perl
print "Content-type: text/html\n\n";
print "Hello, World.";
</highlight>
<p>Even if you are not familiar with Perl, you should be able
to see what is happening here. The first line tells Apache
(or whatever shell you happen to be running under) that this
program can be executed by feeding the file to the
interpreter found at the location <code>/usr/bin/perl</code>.
The second line prints the content-type declaration we
talked about, followed by two carriage-return newline pairs.
This puts a blank line after the header, to indicate the end
of the HTTP headers, and the beginning of the body. The third
line prints the string "Hello, World.". And that's the end
of it.</p>
<p>If you open your favorite browser and tell it to get the
address</p>
<example>
http://www.example.com/cgi-bin/first.pl
</example>
<p>or wherever you put your file, you will see the one line
<code>Hello, World.</code> appear in your browser window.
It's not very exciting, but once you get that working, you'll
have a good chance of getting just about anything working.</p>
</section>
</section>
<section id="troubleshoot">
<title>But it's still not working!</title>
<p>There are four basic things that you may see in your browser
when you try to access your CGI program from the web:</p>
<dl>
<dt>The output of your CGI program</dt>
<dd>Great! That means everything worked fine. If the output is correct,
but the browser is not processing it correctly, make sure you have the
correct <code>Content-Type</code> set in your CGI program.</dd>
<dt>The source code of your CGI program or a "POST Method Not
Allowed" message</dt>
<dd>That means that you have not properly configured Apache
to process your CGI program. Reread the section on
<a href="#configuring">configuring
Apache</a> and try to find what you missed.</dd>
<dt>A message starting with "Forbidden"</dt>
<dd>That means that there is a permissions problem. Check the
<a href="#errorlogs">Apache error log</a> and the section below on
<a href="#permissions">file permissions</a>.</dd>
<dt>A message saying "Internal Server Error"</dt>
<dd>If you check the
<a href="#errorlogs">Apache error log</a>, you will probably
find that it says "Premature end of
script headers", possibly along with an error message
generated by your CGI program. In this case, you will want to
check each of the below sections to see what might be
preventing your CGI program from emitting the proper HTTP
headers.</dd>
</dl>
<section id="permissions">
<title>File permissions</title>
<p>Remember that the server does not run as you. That is,
when the server starts up, it is running with the permissions
of an unprivileged user - usually <code>nobody</code>, or
<code>www</code> - and so it will need extra permissions to
execute files that are owned by you. Usually, the way to give
a file sufficient permissions to be executed by <code>nobody</code>
is to give everyone execute permission on the file:</p>
<example>
chmod a+x first.pl
</example>
<p>Also, if your program reads from, or writes to, any other
files, those files will need to have the correct permissions
to permit this.</p>
</section>
<section id="pathinformation">
<title>Path information and environment</title>
<p>When you run a program from your command line, you have
certain information that is passed to the shell without you
thinking about it. For example, you have a <code>PATH</code>,
which tells the shell where it can look for files that you
reference.</p>
<p>When a program runs through the web server as a CGI program,
it may not have the same <code>PATH</code>. Any programs that you
invoke in your CGI program (like <code>sendmail</code>, for
example) will need to be specified by a full path, so that the
shell can find them when it attempts to execute your CGI
program.</p>
<p>A common manifestation of this is the path to the script
interpreter (often <code>perl</code>) indicated in the first
line of your CGI program, which will look something like:</p>
<highlight language="perl">
#!/usr/bin/perl
</highlight>
<p>Make sure that this is in fact the path to the
interpreter.</p>
<note type="warning">
When editing CGI scripts on Windows, end-of-line characters may be
appended to the interpreter path. Ensure that files are then
transferred to the server in ASCII mode. Failure to do so may
result in "Command not found" warnings from the OS, due to the
unrecognized end-of-line character being interpreted as a part of
the interpreter filename.
</note>
</section>
<section id="missingenv">
<title>Missing environment variables</title>
<p>If your CGI program depends on non-standard <a
href="#env">environment variables</a>, you will need to
assure that those variables are passed by Apache.</p>
<p>When you miss HTTP headers from the environment, make
sure they are formatted according to
<a href="http://tools.ietf.org/html/rfc2616">RFC 2616</a>,
section 4.2: Header names must start with a letter,
followed only by letters, numbers or hyphen. Any header
violating this rule will be dropped silently.</p>
</section>
<section id="syntaxerrors">
<title>Program errors</title>
<p>Most of the time when a CGI program fails, it's because of
a problem with the program itself. This is particularly true
once you get the hang of this CGI stuff, and no longer make
the above two mistakes. The first thing to do is to make
sure that your program runs from the command line before
testing it via the web server. For example, try:</p>
<example>
cd /usr/local/apache2/cgi-bin<br/>
./first.pl
</example>
<p>(Do not call the <code>perl</code> interpreter. The shell
and Apache should find the interpreter using the <a
href="#pathinformation">path information</a> on the first line of
the script.)</p>
<p>The first thing you see written by your program should be
a set of HTTP headers, including the <code>Content-Type</code>,
followed by a blank line. If you see anything else, Apache will
return the <code>Premature end of script headers</code> error if
you try to run it through the server. See <a
href="#writing">Writing a CGI program</a> above for more
details.</p>
</section>
<section id="errorlogs">
<title>Error logs</title>
<p>The error logs are your friend. Anything that goes wrong
generates message in the error log. You should always look
there first. If the place where you are hosting your web site
does not permit you access to the error log, you should
probably host your site somewhere else. Learn to read the
error logs, and you'll find that almost all of your problems
are quickly identified, and quickly solved.</p>
</section>
<section id="suexec">
<title>Suexec</title>
<p>The <a href="../suexec.html">suexec</a> support program
allows CGI programs to be run under different user permissions,
depending on which virtual host or user home directory they are
located in. Suexec has very strict permission checking, and any
failure in that checking will result in your CGI programs
failing with <code>Premature end of script headers</code>.</p>
<p>To check if you are using suexec, run <code>apachectl
-V</code> and check for the location of <code>SUEXEC_BIN</code>.
If Apache finds an <program>suexec</program> binary there on startup,
suexec will be activated.</p>
<p>Unless you fully understand suexec, you should not be using it.
To disable suexec, simply remove (or rename) the <program>suexec</program>
binary pointed to by <code>SUEXEC_BIN</code> and then restart the
server. If, after reading about <a href="../suexec.html">suexec</a>,
you still wish to use it, then run <code>suexec -V</code> to find
the location of the suexec log file, and use that log file to
find what policy you are violating.</p>
</section>
</section>
<section id="behindscenes">
<title>What's going on behind the scenes?</title>
<p>As you become more advanced in CGI programming, it will
become useful to understand more about what's happening behind
the scenes. Specifically, how the browser and server
communicate with one another. Because although it's all very
well to write a program that prints "Hello, World.", it's not
particularly useful.</p>
<section id="env">
<title>Environment variables</title>
<p>Environment variables are values that float around you as
you use your computer. They are useful things like your path
(where the computer searches for the actual file
implementing a command when you type it), your username, your
terminal type, and so on. For a full list of your normal,
every day environment variables, type
<code>env</code> at a command prompt.</p>
<p>During the CGI transaction, the server and the browser
also set environment variables, so that they can communicate
with one another. These are things like the browser type
(Netscape, IE, Lynx), the server type (Apache, IIS, WebSite),
the name of the CGI program that is being run, and so on.</p>
<p>These variables are available to the CGI programmer, and
are half of the story of the client-server communication. The
complete list of required variables is at
<a href="http://www.ietf.org/rfc/rfc3875">Common Gateway
Interface RFC</a>.</p>
<p>This simple Perl CGI program will display all of the
environment variables that are being passed around. Two
similar programs are included in the
<code>cgi-bin</code>
directory of the Apache distribution. Note that some
variables are required, while others are optional, so you may
see some variables listed that were not in the official list.
In addition, Apache provides many different ways for you to
<a href="../env.html">add your own environment variables</a>
to the basic ones provided by default.</p>
<highlight language="perl">
#!/usr/bin/perl
use strict;
use warnings;
print "Content-type: text/html\n\n";
foreach my $key (keys %ENV) {
print "$key --&gt; $ENV{$key}&lt;br&gt;";
}
</highlight>
</section>
<section id="stdin">
<title>STDIN and STDOUT</title>
<p>Other communication between the server and the client
happens over standard input (<code>STDIN</code>) and standard
output (<code>STDOUT</code>). In normal everyday context,
<code>STDIN</code> means the keyboard, or a file that a
program is given to act on, and <code>STDOUT</code>
usually means the console or screen.</p>
<p>When you <code>POST</code> a web form to a CGI program,
the data in that form is bundled up into a special format
and gets delivered to your CGI program over <code>STDIN</code>.
The program then can process that data as though it was
coming in from the keyboard, or from a file</p>
<p>The "special format" is very simple. A field name and
its value are joined together with an equals (=) sign, and
pairs of values are joined together with an ampersand
(&amp;). Inconvenient characters like spaces, ampersands, and
equals signs, are converted into their hex equivalent so that
they don't gum up the works. The whole data string might look
something like:</p>
<example>
name=Rich%20Bowen&amp;city=Lexington&amp;state=KY&amp;sidekick=Squirrel%20Monkey
</example>
<p>You'll sometimes also see this type of string appended to
a URL. When that is done, the server puts that string
into the environment variable called
<code>QUERY_STRING</code>. That's called a <code>GET</code>
request. Your HTML form specifies whether a <code>GET</code>
or a <code>POST</code> is used to deliver the data, by setting the
<code>METHOD</code> attribute in the <code>FORM</code> tag.</p>
<p>Your program is then responsible for splitting that string
up into useful information. Fortunately, there are libraries
and modules available to help you process this data, as well
as handle other of the aspects of your CGI program.</p>
</section>
</section>
<section id="libraries">
<title>CGI modules/libraries</title>
<p>When you write CGI programs, you should consider using a
code library, or module, to do most of the grunt work for you.
This leads to fewer errors, and faster development.</p>
<p>If you're writing CGI programs in Perl, modules are
available on <a href="http://www.cpan.org/">CPAN</a>. The most
popular module for this purpose is <code>CGI.pm</code>. You might
also consider <code>CGI::Lite</code>, which implements a minimal
set of functionality, which is all you need in most programs.</p>
<p>If you're writing CGI programs in C, there are a variety of
options. One of these is the <code>CGIC</code> library, from
<a href="http://www.boutell.com/cgic/"
>http://www.boutell.com/cgic/</a>.</p>
</section>
<section id="moreinfo">
<title>For more information</title>
<p>The current CGI specification is available in the
<a href="http://www.ietf.org/rfc/rfc3875">Common Gateway
Interface RFC</a>.</p>
<p>When you post a question about a CGI problem that you're
having, whether to a mailing list, or to a newsgroup, make sure
you provide enough information about what happened, what you
expected to happen, and how what actually happened was
different, what server you're running, what language your CGI
program was in, and, if possible, the offending code. This will
make finding your problem much simpler.</p>
<p>Note that questions about CGI problems should <strong>never</strong>
be posted to the Apache bug database unless you are sure you
have found a problem in the Apache source code.</p>
</section>
</manualpage>