| =head1 NAME |
| |
| mod_perl_tuning - mod_perl performance tuning |
| |
| =head1 DESCRIPTION |
| |
| Described here are examples and hints on how to configure a mod_perl |
| enabled Apache server, concentrating on tips for configuration for |
| high-speed performance. The primary way to achieve maximal |
| performance is to reduce the resources consumed by the mod_perl |
| enabled HTTPD processes. |
| |
| This document assumes familiarity with Apache configuration directives |
| some familiarity with the mod_perl configuration directives, and that |
| you have already built and installed a mod_perl enabled Apache server. |
| Please also read the mod_perl documentation that comes with mod_perl |
| for programming tips. Some configurations below use features from |
| mod_perl version 1.03 which were not present in earlier versions. |
| |
| These performance tuning hints are collected from my experiences in |
| setting up and running servers for handling large promotional sites, |
| such as The Weather Channel's "Blimp Site-ings" game, the MSIE 4.0 |
| "Subscribe to Win" game, and the MSN Million Dollar Madness game. |
| |
| =head1 BASIC CONFIGURATION |
| |
| The basic configuration for mod_perl is as follows. In the |
| F<httpd.conf> file, I add configuration parameters to make the |
| C<http://www.domain.com/programs> URL be the base location for all |
| mod_perl programs. Thus, access to |
| C<http://www.domain.com/programs/printenv> will run the printenv |
| script, as we'll see below. Also, any *.perl file will be interpreted |
| as a mod_perl program just as if it were in the programs directory, |
| and *.rperl will be mod_perl, but I<without> any HTTP headers |
| automatically sent; you must do this explicitly. If you don't want |
| these last two, just leave it out of your configuration. |
| |
| In the configuration files, I use F</var/www> as the C<ServerRoot> |
| directory, and F</var/www/docs> as the C<DocumentRoot>. You will need |
| to change it to match your particular setup. The network address below |
| in the access to perl-status should also be changed to match yours. |
| |
| Additions to F<httpd.conf>: |
| |
| # put mod_perl programs here |
| # startup.perl loads all functions that we want to use within mod_perl |
| Perlrequire /var/www/perllib/startup.perl |
| <Directory /var/www/docs/programs> |
| AllowOverride None |
| Options ExecCGI |
| SetHandler perl-script |
| PerlHandler Apache::Registry |
| PerlSendHeader On |
| </Directory> |
| |
| # like above, but no PerlSendHeaders |
| <Directory /var/www/docs/rprograms> |
| AllowOverride None |
| Options ExecCGI |
| SetHandler perl-script |
| PerlHandler Apache::Registry |
| PerlSendHeader Off |
| </Directory> |
| |
| # allow arbitrary *.perl files to be scattered throughout the site. |
| <Files *.perl> |
| SetHandler perl-script |
| PerlHandler Apache::Registry |
| PerlSendHeader On |
| Options +ExecCGI |
| </Files> |
| |
| # like *.perl, but do not send HTTP headers |
| <Files *.rperl> |
| SetHandler perl-script |
| PerlHandler Apache::Registry |
| PerlSendHeader Off |
| Options +ExecCGI |
| </Files> |
| |
| <Location /perl-status> |
| SetHandler perl-script |
| PerlHandler Apache::Status |
| order deny,allow |
| deny from all |
| allow from 204.117.82. |
| </Location> |
| |
| Now, you'll notice that I use a C<PerlRequire> directive to load in the |
| file F<startup.perl>. In that file, I include all of the C<use> |
| statements that occur in any of my mod_perl programs (either from the |
| programs directory, or the *.perl files). Here is an example: |
| |
| #! /usr/local/bin/perl |
| use strict; |
| |
| # load up necessary perl function modules to be able to call from Perl-SSI |
| # files. These objects are reloaded upon server restart (SIGHUP or SIGUSR1) |
| # if PerlFreshRestart is "On" in httpd.conf (as of mod_perl 1.03). |
| |
| # only library-type routines should go in this directory. |
| |
| use lib "/var/www/perllib"; |
| |
| # make sure we are in a sane environment. |
| $ENV{GATEWAY_INTERFACE} =~ /^CGI-Perl/ or die "GATEWAY_INTERFACE not Perl!"; |
| |
| use Apache::Registry (); # for things in the "/programs" URL |
| |
| # pull in things we will use in most requests so it is read and compiled |
| # exactly once |
| use CGI (); CGI->compile(':all'); |
| use CGI::Carp (); |
| use DBI (); |
| use DBD::mysql (); |
| |
| 1; |
| |
| What this does is pull in all of the code used by the programs (but |
| does not C<import> any of the module methods) into the main HTTPD |
| process, which then creates the child processes with the code already |
| in place. You can also put any new modules you like into the |
| F</var/www/perllib> directory and simply C<use> them in your |
| programs. There is no need to put C<use lib "/var/www/perllib";> in |
| all of your programs. You do, however, still need to C<use> the |
| modules in your programs. Perl is smart enough to know it doesn't |
| need to recompile the code, but it does need to C<import> the module |
| methods into your program's name space. |
| |
| If you only have a few modules to load, you can use the PerlModule |
| directive to pre-load them with the same effect. |
| |
| The biggest benefit here is that the child process never needs to |
| recompile the code, so it is faster to start, and the child process |
| actually shares the same physical copy of the code in memory due to |
| the way the virtual memory system in modern operating systems works. |
| |
| You will want to replace the C<use> lines above with modules you |
| actually need. |
| |
| =head2 Simple Test Program |
| |
| Here's a sample script called F<printenv> that you can stick in the |
| F<programs> directory to test the functionality of the configuration. |
| |
| #! /usr/local/bin/perl |
| use strict; |
| # print the environment in a mod_perl program under Apache::Registry |
| |
| print "Content-type: text/html\n\n"; |
| |
| print "<HEAD><TITLE>Apache::Registry Environment</TITLE></HEAD>\n"; |
| |
| print "<BODY><PRE>\n"; |
| print map { "$_ = $ENV{$_}\n" } sort keys %ENV; |
| print "</PRE></BODY>\n"; |
| |
| When you run this, check the value of the GATEWAY_INTERFACE variable |
| to see that you are indeed running mod_perl. |
| |
| =head1 REDUCING MEMORY USE |
| |
| As a side effect of using mod_perl, your HTTPD processes will be |
| larger than without it. There is just no way around it, as you have |
| this extra code to support your added functionality. |
| |
| On a very busy site, the number of HTTPD processes can grow to be |
| quite large. For example, on one large site, the typical HTTPD was |
| about 5Mb large. With 30 of these, all of RAM was exhausted, and we |
| started to go to swap. With 60 of these, swapping turned into |
| thrashing, and the whole machine slowed to a crawl. |
| |
| To reduce thrashing, limiting the maximum number of HTTPD processes to |
| a number that is just larger than what will fit into RAM (in this |
| case, 45) is necessary. The drawback is that when the server is |
| serving 45 requests, new requests will queue up and wait; however, if |
| you let the maximum number of processes grow, the new requests will |
| start to get served right away, I<but> they will take much longer to |
| complete. |
| |
| One way to reduce the amount of real memory taken up by each process |
| is to pre-load commonly used modules into the primary HTTPD process so |
| that the code is shared by all processes. This is accomplished by |
| inserting the C<use Foo ();> lines into the F<startup.perl> file for |
| any C<use Foo;> statement in any commonly used Registry program. The |
| idea is that the operating system's VM subsystem will share the data |
| across the processes. |
| |
| You can also pre-load Apache::Registry programs using the |
| C<Apache::RegistryLoader> module so that the code for these programs |
| is shared by all HTTPD processes as well. |
| |
| B<NOTE>: When you pre-load modules in the startup script, you may |
| need to kill and restart HTTPD for changes to take effect. A simple |
| C<kill -HUP> or C<kill -USR1> will not reload that code unless you |
| have set the C<PerlFreshRestart> configuration parameter in |
| F<httpd.conf> to be "On". |
| |
| =head1 REDUCING THE NUMBER OF LARGE PROCESSES |
| |
| Unfortunately, simply reducing the size of each HTTPD process is not |
| enough on a very busy site. You also need to reduce the quantity of |
| these processes. This reduces memory consumption even more, and |
| results in fewer processes fighting for the attention of the CPU. If |
| you can reduce the quantity of processes to fit into RAM, your |
| response time is increased even more. |
| |
| The idea of the techniques outlined below is to offload the normal |
| document delivery (such as static HTML and GIF files) from the |
| mod_perl HTTPD, and let it only handle the mod_perl requests. This |
| way, your large mod_perl HTTPD processes are not tied up delivering |
| simple content when a smaller process could perform the same job more |
| efficiently. |
| |
| In the techniques below where there are two HTTPD configurations, the |
| same httpd executable can be used for both configurations; there is no |
| need to build HTTPD both with and without mod_perl compiled into it. |
| With Apache 1.3 this can be done with the DSO configuration -- just |
| configure one httpd invocation to dynamically load mod_perl and the |
| other not to do so. |
| |
| These approaches work best when most of the requests are for static |
| content rather than mod_perl programs. Log file analysis become a bit |
| of a challenge when you have multiple servers running on the same |
| host, since you must log to different files. |
| |
| =head2 TWO MACHINES |
| |
| The simplest way is to put all static content on one machine, and all |
| mod_perl programs on another. The only trick is to make sure all |
| links are properly coded to refer to the proper host. The static |
| content will be served up by lots of small HTTPD processes (configured |
| I<not> to use mod_perl), and the relatively few mod_perl requests |
| can be handled by the smaller number of large HTTPD processes on the |
| other machine. |
| |
| The drawback is that you must maintain two machines, and this can get |
| expensive. For extremely large projects, this is the best way to go. |
| |
| =head2 TWO IP ADDRESSES |
| |
| Similar to above, but one HTTPD runs bound to one IP address, while |
| the other runs bound to another IP address. The only difference is |
| that one machine runs both servers. Total memory usage is reduced |
| because the majority of files are served by the smaller HTTPD |
| processes, so there are fewer large mod_perl HTTPD processes sitting |
| around. |
| |
| This is accomplished using the F<httpd.conf> directive C<BindAddress> |
| to make each HTTPD respond only to one IP address on this host. One |
| will have mod_perl enabled, and the other will not. |
| |
| =head2 TWO PORT NUMBERS |
| |
| If you cannot get two IP addresses, you can also split the HTTPD |
| processes as above by putting one on the standard port 80, and the |
| other on some other port, such as 8042. The only configuration |
| changes will be the C<Port> and log file directives in the httpd.conf |
| file (and also one of them does not have any mod_perl directives). |
| |
| The major flaw with this scheme is that some firewalls will not allow |
| access to the server running on the alternate port, so some people |
| will not be able to access all of your pages. |
| |
| If you use this approach or the one above with dual IP addresses, you |
| probably do not want to have the *.perl and *.rperl sections from the |
| sample configuration above, as this would require that your primary |
| HTTPD server be mod_perl enabled as well. |
| |
| Thanks to Gerd Knops for this idea. |
| |
| =head2 USING ProxyPass WITH TWO SERVERS |
| |
| To overcome the limitation of the alternate port above, you can use |
| dual Apache HTTPD servers with just slight difference in |
| configuration. Essentially, you set up two servers just as you would |
| with the two port on same IP address method above. However, in your |
| primary HTTPD configuration you add a line like this: |
| |
| ProxyPass /programs http://localhost:8042/programs |
| |
| Where your mod_perl enabled HTTPD is running on port 8042, and has |
| only the directory F<programs> within its DocumentRoot. This assumes |
| that you have included the mod_proxy module in your server when it was |
| built. |
| |
| Now, when you access http://www.domain.com/programs/printenv it will |
| internally be passed through to your HTTPD running on port 8042 as the |
| URL http://localhost:8042/programs/printenv and the result relayed |
| back transparently. To the client, it all seems as if it is just one |
| server running. This can also be used on the dual-host version to |
| hide the second server from view if desired. |
| |
| =begin html |
| <P> |
| A complete configuration example of this technique is provided by |
| two HTTPD configuration files. |
| <A HREF="httpd.conf.txt">httpd.conf</A> is for the main server for all |
| regular pages, and <A HREF="httpd%2bperl.conf.txt">httpd+perl.conf</A> is |
| for the mod_perl programs accessed in the <CODE>/programs</CODE> URL. |
| </P> |
| |
| The directory structure assumes that F</var/www/documents> is the |
| C<DocumentRoot> directory, and the the mod_perl programs are in |
| F</var/www/programs> and F</var/www/rprograms>. I start them as |
| follows: |
| |
| daemon httpd |
| daemon httpd -f conf/httpd+perl.conf |
| |
| =end html |
| |
| Thanks to Bowen Dwelle for this idea. |
| |
| =head2 SQUID ACCELERATOR |
| |
| Another approach to reducing the number of large HTTPD processes on |
| one machine is to use an accelerator such as Squid (which can be found |
| at http://squid.nlanr.net/Squid/ on the web) between the clients and |
| your large mod_perl HTTPD processes. The idea here is that squid will |
| handle the static objects from its cache while the HTTPD processes |
| will handle mostly just the mod_perl requests once the cache is |
| primed. This reduces the number of HTTPD processes and thus reduces |
| the amount of memory used. |
| |
| To set this up, just install the current version of Squid (at this |
| writing, this is version 1.1.22) and use the RunAccel script to start |
| it. You will need to reconfigure your HTTPD to use an alternate port, |
| such as 8042, rather than its default port 80. To do this, you can |
| either change the F<httpd.conf> line C<Port> or add a C<Listen> |
| directive to match the port specified in the F<squid.conf> file. |
| Your URLs do not need to change. The benefit of using the C<Listen> |
| directive is that redirected URLs will still use the default port 80 |
| rather than your alternate port, which might reveal your real server |
| location to the outside world and bypass the accelerator. |
| |
| In the F<squid.conf> file, you will probably want to add C<programs> |
| and C<perl> to the C<cache_stoplist> parameter so that these are |
| always passed through to the HTTPD server under the assumption that |
| they always produce different results. |
| |
| This is very similar to the two port, ProxyPass version above, but the |
| Squid cache may be more flexible to fine tune for dynamic documents |
| that do not change on every view. The Squid proxy server also seems |
| to be more stable and robust than the Apache 1.2.4 proxy module. |
| |
| One drawback to using this accelerator is that the logfiles will |
| always report access from IP address 127.0.0.1, which is the local |
| host loopback address. Also, any access permissions or other user |
| tracking that requires the remote IP address will always see the local |
| address. The following code uses a feature of recent mod_perl |
| versions (tested with mod_perl 1.16 and Apache 1.3.3) to trick Apache |
| into logging the real client address and giving that information to |
| mod_perl programs for their purposes. |
| |
| First, in your F<startup.perl> file add the following code: |
| |
| use Apache::Constants qw(OK); |
| |
| sub My::SquidRemoteAddr ($) { |
| my $r = shift; |
| |
| if (my ($ip) = $r->header_in('X-Forwarded-For') =~ /([^,\s]+)$/) { |
| $r->connection->remote_ip($ip); |
| } |
| |
| return OK; |
| } |
| |
| Next, add this to your F<httpd.conf> file: |
| |
| PerlPostReadRequestHandler My::SquidRemoteAddr |
| |
| This will cause every request to have its C<remote_ip> address |
| overridden by the value set in the C<X-Forwarded-For> header added by |
| Squid. Note that if you have multiple proxies between the client and |
| the server, you want the IP address of the last machine before your |
| accelerator. This will be the right-most address in the |
| X-Forwarded-For header (assuming the other proxies append their |
| addresses to this same header, like Squid does.) |
| |
| If you use apache with mod_proxy at your frontend, you can use Ask |
| BjΓΈrn Hansen's mod_proxy_add_forward module from |
| ftp://ftp.netcetera.dk/pub/apache/ to make it insert the |
| C<X-Forwarded-For> header. |
| |
| =head1 SUMMARY |
| |
| To gain maximal performance of mod_perl on a busy site, one must |
| reduce the amount of resources used by the HTTPD to fit within what |
| the machine has available. The best way to do this is to reduce |
| memory usage. If your mod_perl requests are fewer than your static |
| page requests, then splitting the servers into mod_perl and |
| non-mod_perl versions further allows you to tune the amount of |
| resources used by each type of request. Using the C<ProxyPass> |
| directive allows these multiple servers to appear as one to the |
| users. Using the Squid accelerator also achieves this effect, but |
| Squid takes care of deciding when to acccess the large server |
| automatically. |
| |
| If all of your requests require processing by mod_perl, then the only |
| thing you can really do is throw a I<lot> of memory on your machine |
| and try to tweak the perl code to be as small and lean as possible, |
| and to share the virtual memory pages by pre-loading the code. |
| |
| =head1 AUTHOR |
| |
| This document is written by Vivek Khera. If you need to contact me, |
| just send email to the mod_perl mailing list. |
| |
| This document is copyright (c) 1997-1998 by Vivek Khera. |
| |
| If you have contributions for this document, please post them to the |
| mailing list. Perl POD format is best, but plain text will do, too. |
| |
| If you need assistance, contact the mod_perl mailing list at |
| modperl@perl.apache.org first (send 'subscribe' to modperl-request@apache.org |
| to subscribe). There are lots of people there that can help. Also, |
| check the web pages http://perl.apache.org/ and http://www.apache.org/ |
| for explanations of the configuration options. |
| |
| $Revision$ |
| $Date$ |