| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| <html> |
| <head> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <title>PageSpeed Authorizing and Mapping Domains</title> |
| <link rel="stylesheet" href="doc.css"> |
| </head> |
| <body> |
| <!--#include virtual="_header.html" --> |
| |
| |
| <div id=content> |
| <h1>PageSpeed Authorizing and Mapping Domains</h1> |
| <h2 id="auth_domains">Authorizing domains</h2> |
| <p> |
| In addition to optimizing HTML resources, PageSpeed restricts itself to |
| optimizing resources (JavaScript, CSS, images) that are served from domains, |
| with optional paths, that must be explicitly listed in the configuration file. |
| For example: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedDomain http://example.com |
| ModPagespeedDomain cdn.example.com |
| ModPagespeedDomain http://styles.example.com/css |
| ModPagespeedDomain *.example.org</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed Domain http://example.com; |
| pagespeed Domain cdn.example.com; |
| pagespeed Domain http://styles.example.com/css; |
| pagespeed Domain *.example.org;</pre> |
| </dl> |
| |
| <p> |
| PageSpeed will rewrite resources found from these explicitly |
| listed domains, although in the case of <code>styles.example.com</code> |
| only resources under the <code>css</code> directory will be rewritten. |
| Additionally, it will rewrite resources that are |
| served from the same domain as the HTML file, or are specified as |
| a path relative to the HTML. When resources are rewritten, their |
| domain and path are not changed. However, the leaf name is changed to |
| encode rewriting information that can be used to identify and serve |
| the optimized resource. |
| </p> |
| |
| <p>The leading "http://" is optional; bare hostnames will be interpreted |
| as referring to HTTP. Wildcards can be used in the domain.</p> |
| |
| <p> |
| These directives can be used |
| in <a href="configuration#htaccess">location-specific configuration |
| sections</a>. |
| </p> |
| |
| |
| <h2 id="mapping_origin">Mapping origin domains</h2> |
| |
| <p>In order to improve the performance of web pages, PageSpeed |
| must examine and modify the content of resources referenced on those |
| pages. To do that, it must fetch those resources using HTTP, using |
| the URL reference specified on the HTML page.</p> |
| |
| <p>In some cases, the URL specified in the HTML file is not the best URL to use |
| to fetch the resource. Scenarios where this is a concern include:</p> |
| <ol> |
| <li>If the server is behind a load balancer, and it's more efficient to |
| reference the server directly by its IP address, or as 'localhost'.</li> |
| <li>The server has a special DNS configuration</li> |
| <li>The server is behind a firewall preventing outbound connections</li> |
| <li>The server is running in a CDN or proxy, and must go back to the |
| origin server for the resources</li> |
| <li>The server needs to service https requests</li> |
| </ol> |
| |
| <p>In these situations the remedy is to map the origin domain:</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedMapOriginDomain origin_to_fetch_from origin_specified_in_html [host_header]</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed MapOriginDomain origin_to_fetch_from origin_specified_in_html [host_header];</pre> |
| </dl> |
| |
| <p>Wildcards can also be used in the <code>origin_specified_in_html</code>, e.g. |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedMapOriginDomain localhost *.example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed MapOriginDomain localhost *.example.com;</pre> |
| </dl> |
| |
| <p>The <code>origin_to_fetch_from</code> can include a path after the domain |
| name, e.g.</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedMapOriginDomain localhost/example *.example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed MapOriginDomain localhost/example *.example.com;</pre> |
| </dl> |
| |
| <p>When a path is specified, the source domain is mapped to the destination |
| domain and the source path is mapped to the concatenation of the path from |
| <code>origin_to_fetch_from</code> and the source path. For example, given the |
| above mapping, <code>http://www.example.com/index.html</code> will be mapped |
| to <code>http://localhost/example/index.html</code>.</p> |
| |
| <p>The origin_specified_in_html can specify https but the origin_to_fetch_from |
| can only specify http, e.g.</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedMapOriginDomain http://localhost https://www.example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed MapOriginDomain http://localhost https://www.example.com;</pre> |
| </dl> |
| |
| <p>This directive lets the server accept https requests for |
| <code>www.example.com</code> without requiring a SSL certificate to fetch |
| resources. For example, given the above mapping, and assuming the server is |
| configured for https support, PageSpeed will fetch and optimize resources |
| accessed using |
| <code>https://www.example.com</code>, fetching the resources from |
| <code>http://localhost</code>, which can be the same server process or a |
| different server process. |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedMapOriginDomain http://localhost https://www.example.com |
| ModPagespeedShardDomain https://www.example.com \ |
| https://example1.cdn.com,https://example2.cdn.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed MapOriginDomain http://localhost https://www.example.com; |
| pagespeed ShardDomain https://www.example.com |
| https://example1.cdn.com,https://example2.cdn.com;</pre> |
| </dl> |
| |
| <p>In this example the https origin domain is mapped to <code>localhost</code> |
| <em>and</em> <a href="domains#shard">sharding</a> is used to parallelize |
| downloads across hostnames. Note that the shards also specify https.</p> |
| |
| <p>By specifying a source domain in this directive, you are authorizing |
| PageSpeed to rewrite resources found in that domain. For example, in the |
| above directives, '*.example.com' gets authorized for rewrites from HTML files, |
| but 'localhost' does not. See <a href="#auth_domains"><code |
| >Domain</code></a>.</p> |
| |
| <p>When PageSpeed fetches resources from a mapped origin domain, it |
| specifies the source domain in the <code>Host:</code> header in the |
| request. You can override the <code>Host:</code> header value with the |
| optional third parameter <code>host_header</code>. See |
| <a href="#shared_cdn">Mapping Origins with a Shared Domain</a> for |
| an example.</p> |
| |
| <p> |
| See also |
| <a href="#ModPagespeedLoadFromFile"><code>LoadFromFile</code></a> |
| to load origin resource directly from the filesystem and avoid an HTTP |
| connection altogether. |
| </p> |
| |
| <p> |
| These directives can be used |
| in <a href="configuration#htaccess">location-specific configuration |
| sections</a>. |
| </p> |
| |
| |
| <h2 id="mapping_rewrite">Mapping rewrite domains</h2> |
| |
| <p>When PageSpeed rewrites a resource, it updates the HTML to |
| refer to the resource by its new name. Generally PageSpeed leaves |
| the resource at the same origin and path that was originally found in |
| the HTML. However, it is possible to map the domain of rewritten |
| resources. Examples of why this might be desirable include:</p> |
| |
| <ol> |
| <li>Serving static content from cookieless domains, to reduce the size of |
| HTTP requests from the browser. See |
| <a target="_blank" href="https://developers.google.com/speed/docs/best-practices/payload">Minimizing Payload</a> |
| <li>To move content to a Content Delivery Network (CDN)</li> |
| </ol> |
| |
| <p>This is done using the configuration file directive:</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedMapRewriteDomain domain_to_write_into_html \ |
| domain_specified_in_html</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed MapRewriteDomain domain_to_write_into_html |
| domain_specified_in_html;</pre> |
| </dl> |
| |
| <p>Wildcards can also be used in the <code>domain_specified_in_html</code>, e.g. |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedMapRewriteDomain cdn.example.com *example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed MapRewriteDomain cdn.example.com *example.com;</pre> |
| </dl> |
| |
| <p>The <code>domain_to_write_into_html</code> can include a path after the |
| domain name, e.g.</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedMapRewriteDomain cdn.com/example *.example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed MapRewriteDomain cdn.com/example *.example.com;</pre> |
| </dl> |
| |
| <p>When a path is specified, the source domain is rewritten to the destination |
| domain and the source path is rewritten to the concatenation of the path from |
| <code>domain_to_write_into_html</code> and the source path. For example, given |
| the above mapping, <code>http://www.example.com/index.html</code> will be |
| rewritten to <code>http://cdn.com/example/index.html</code>.</p> |
| |
| <p class="note" id="equiv_servers"> |
| <strong>Note:</strong> It is the responsibility of the site administrator to |
| ensure that PageSpeed is installed on |
| the <code>domain_to_write_into_html</code>. This might be a separate server, or |
| there may be a single server with multiple domains mapped into it. The files |
| must be accessible via the same path on the destination server as was specified |
| in the HTML file. No other files should be stored on the |
| <code>domain_to_write_into_html</code> -- it should be functionally equivalent |
| to <code>domain_specified_in_html</code>. See |
| also <a href="#MapProxyDomain">MapProxyDomain</a> which enables proxying content |
| from a different server.</p> |
| |
| <p>For example, if PageSpeed |
| cache_extends <code>http://www.example.com/styles/style.css</code> to |
| <code>http://cdn.example.com/styles/style.css.pagespeed.ce.HASH.css</code>, |
| then <code>cdn.example.com</code> will have to have a mechanism in place to |
| either rewrite that file in place, or refer back to the origin server to |
| pull the rewritten content. |
| </p> |
| |
| <p class="note"> |
| <strong>Note:</strong> It is the responsibility of the site |
| administrator to ensure that moving resources onto domains does not |
| create a security vulnerability. In particular, if the target domain |
| has cookies, then any JavaScript loaded from a resource moved to a |
| domain with cookies will gain access to those cookies. In general, |
| moving resources to a cookieless domain is a great way to improve |
| security. Be aware that CSS can load JavaScript in certain environments. |
| </p> |
| |
| <p>By specifying a domain in this directive, either as source or destination, |
| you are authorizing PageSpeed to rewrite resources found in this |
| domain. See <a href="#auth_domains"><code>Domain</code></a>.</p> |
| |
| <p>These directives can be used |
| in <a href="configuration#htaccess">location-specific configuration |
| sections</a>.</p> |
| |
| <h3 id="shared_cdn">Mapping Origins with a Shared CDN</h3> |
| |
| <p>Consider a scenario where an installation serving multiple domains |
| uses a single CDN for caching and delivery of all content. The origin |
| fetches need to be routed to the correct VirtualHost on the server. |
| This can be achieved by using a subdirectory per domain in the |
| CDN, and then using that subdirectory to map to the correct |
| VirtualHost at origin. The host-header control offered by the third |
| argument to <a href="#mapping_origin">MapOriginDomain</a> makes this |
| feasible.</p> |
| |
| <p>In the example below, resources with a domain of |
| sharedcdn.example.com and path starting with /vhost1 will be fetched |
| from localhost but with a <code>Host:</code> header value of |
| vhost1.example.com. Without the third argument to MapOriginDomain, |
| the <code>Host:</code> header would be sharedcdn.example.com.</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedMapOriginDomain localhost sharedcdn.example.com/vhost1 vhost1.example.com |
| ModPagespeedMapRewriteDomain sharedcdn.example.com/vhost1 vhost1.example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed MapOriginDomain localhost sharedcdn.example.com/vhost1 vhost1.example.com; |
| pagespeed MapRewriteDomain sharedcdn.example.com/vhost1 vhost1.example.com;</pre> |
| </dl> |
| |
| <p>This would be used in conjunction with a VirtualHost setup for |
| vhost1.example.com, and a single CDN setup for multple hosts segregated by |
| subdirectory.</p> |
| |
| <h2 id="shard">Sharding domains</h2> |
| |
| <p>Best practices suggest <a target="_blank" href="https://developers.google.com/speed/docs/best-practices/rtt" |
| >minimizing round-trip times</a> by <a |
| target="_blank" href="https://developers.google.com/speed/docs/best-practices/rtt#ParallelizeDownloads" |
| >parallelizing downloads across hostnames</a>. PageSpeed can partially |
| automate this for resources that it rewrites, using the directive: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedShardDomain domain_to_shard shard1,shard2,shard3...</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed ShardDomain domain_to_shard shard1,shard2,shard3...;</pre> |
| </dl> |
| |
| <p>Wildcards cannot be used in this directive.</p> |
| |
| <p>This will distribute the domains for rewritten URLs among the |
| specified shards. The shard selected for a particular URL is computed |
| from the original URL.</p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedShardDomain example.com \ |
| static1.example.com,static2.example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed ShardDomain example.com static1.example.com,static2.example.com;</pre> |
| </dl> |
| |
| |
| <p> |
| Using this directive, PageSpeed will distribute roughly half the |
| resources rewritten from example.com |
| into <code>static1.example.com</code>, and the rest to |
| <code>static2.example.com</code>. You can specify as many shards as |
| you like. The optimum number of shards is a topic of active |
| research, and is browser-dependent. Configuring between 2 and 4 |
| shards should yield good results. Changing the number of shards |
| will cause PageSpeed to choose different names for resources, |
| resulting in a partial cache flush.</p> |
| |
| <p>When used in combination with <code>RewriteDomain</code>, the Rewrite |
| mappings will be done first. Then the shard selection occurs. Origin domains |
| are always tracked so that when a browser sends a sharded URL back to the |
| server, PageSpeed can find it. |
| </p> |
| <p>Let's look at an example: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedShardDomain example.com static1.example.com,static2.example.com |
| ModPagespeedMapRewriteDomain example.com www.example.com |
| ModPagespeedMapOriginDomain localhost example.com</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed ShardDomain example.com static1.example.com,static2.example.com; |
| pagespeed MapRewriteDomain example.com www.example.com; |
| pagespeed MapOriginDomain localhost example.com;</pre> |
| </dl> |
| |
| <p>In this example, <code>example.com</code> |
| and <code>www.example.com</code> are "tied" together via |
| <code>MapRewriteDomain</code>. The origin-mapping |
| to <code>localhost</code> propagates automatically |
| to <code>www.example.com</code>, <code>static1.example.com</code>, and |
| <code>static2.example.com</code>. So when PageSpeed cache-extends an HTML |
| stylesheet reference <code>http://www.example.com/styles.css</code>, it will be: |
| </p> |
| <ol> |
| <li>Fetched by the server rewriting the HTML |
| from <code>localhost</code></li> |
| <li>Rewritten to |
| <code>http://example.com/styles.css.pagespeed.ce.HASH.css</code></li> |
| <li>Sharded to |
| <code>http://static1.example.com/styles.css.pagespeed.ce.HASH.css</code> |
| </li> |
| </ol> |
| |
| <h2 id="MapProxyDomain">Proxying and optimizing resources from |
| trusted domains</h2> |
| |
| <p> |
| Proxying resources is desirable under several scenarios: |
| </p> |
| <ul> |
| <li>The resources on the origin domain may benefit from optimizations |
| done by PageSpeed.</li> |
| <li>SPDY may work better if there are fewer domains on a page.</li> |
| <li>The target domain running PageSpeed may have better serving |
| infrastructure than the origin.</li> |
| </ul> |
| <p> |
| It is possible to proxy and optimize resources whose origin is a trusted |
| domain that may not be running PageSpeed. This cannot be directly achieved |
| with MapRewriteDomain because that is a declaration that the domains listed |
| are functionally equivalent to one another, either because they are backed by |
| the same storage, or because the target is acting as a proxy (e.g. a |
| CDN). <code>MapProxyDomain</code> makes it technically possible to proxy and |
| optimize resources from any domain <b>that you trust</b>. |
| |
| <p class="warning"> |
| You must only proxy resources that are controlled by an organization |
| you <b>trust</b> because it is possible for malicious content (e.g. |
| <a href="http://hackaday.com/2008/08/04/the-gifar-image-vulnerability/" |
| >GIFAR</a>) |
| proxied from an untrustworthy domain to gain access to private |
| content on your domain, compromising your site or its viewers. You |
| must never map directories that may contain files that may be |
| controlled by a third party. |
| </p> |
| <p class="warning"> |
| There may be legal issues restricting the optimization of resources |
| you don't own. If in doubt consult a lawyer. |
| {# TODO(jmarantz): it should be possible to use this directive in #} |
| {# combination with Disallow & rewrite_domains to proxy without #} |
| {# optimizing. A demo/test of that will be left for a follow-up. #} |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedMapProxyDomain target_domain/subdir \ |
| origin_domain/subdir [rewrite_domain/subdir] |
| </pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed MapProxyDomain target_domain/subdir |
| origin_domain/subdir [rewrite_domain/subdir];</pre> |
| </dl> |
| |
| <p> |
| If the optional rewrite_domain/subdir argument is supplied then optimized |
| resources will be rewritten to that location. This is useful for rewriting |
| optimized resources proxied from an external origin to a CDN. |
| </p> |
| <p> |
| It is important to specify a subdirectory in the target domain, because |
| PageSpeed will need to be able to unambiguously identify the |
| origin domain given the target when fetching content. Thus each |
| MapProxyDomain command should be given a distinct subdirectory |
| of the target domain. |
| </p> |
| <p> |
| It is important to specify a subdirectory in the origin domain to |
| limit the scope of the proxying. For example, |
| in <a href="https://picasaweb.google.com">picasaweb</a>, all of a user's |
| photos are underneath a single subdirectory; it is critical not to enable |
| proxying for the entire site. |
| </p> |
| <h3>Example</h3> |
| <p> |
| You can see proxy-mapping in action at <code>www.modpagespeed.com</code> on this |
| <a href="https://www.modpagespeed.com/examples/proxy_external_resource.html">example</a>. |
| </p> |
| |
| <h2 id="fetch_servers">Fetch server restrictions</h2> |
| <p> PageSpeed will only fetch resources from <code>localhost</code> and |
| domains explicitly mentioned in domain configuration directives such |
| as <code>Domain</code>, <code>MapRewriteDomain</code> |
| and <code>MapOriginDomain</code>. As this security restriction is not |
| desirable for some large deployments, in Apache it is possible to disable it |
| starting from 0.10.22.7, via the following configuration directive (which has |
| a global effect): <pre class="prettyprint" |
| >ModPagespeedDangerPermitFetchFromUnknownHosts on</pre> |
| |
| <p class="warning"><strong>Warning: </strong>Enabling |
| <code>DangerPermitFetchFromUnknownHosts</code> could permit |
| hostile third parties to access any machine and port that the server running |
| mod_pagespeed has access to, including potentially those behind firewalls. |
| </p> |
| Before doing this, however, it must be ensured that at least one of these |
| things is true: |
| <ol> |
| <li>The server running mod_pagespeed has no more access to machines or |
| ports than anyone on the Internet, and that machines it can access will |
| not treat its traffic specially (mod_pagespeed 0.10.22.6 and newer will |
| make sure its own traffic to <code>localhost</code> does not appear to be |
| local, but that does not work across machines)</li> |
| <li>Every virtual host in Apache running mod_pagespeed (and, if applicable, |
| the global configuration) has an accurate explicit <code>ServerName</code>, |
| and sets the options <code>UseCanonicalName</code> and |
| <code>UseCanonicalPhysicalPort</code> to <code>On</code>. |
| <li>A proxy running in front of the mod_pagespeed server fully verifies that |
| the URLs and <code>Host:</code> headers that reach it refer only to machines |
| the mod_pagespeed server is expected to contact. |
| </ol> |
| If possible, you are strongly encouraged to use |
| <code>MapOriginDomain</code> in preference to this switch. |
| </p> |
| |
| <h2 id="url-valued-attributes">Specifying additional URL-valued attributes</h2> |
| |
| <p> |
| All PageSpeed filters that process URLs need to know which attributes of |
| which elements to consider. By default they consider those in the HTML4 and |
| HTML5 specifications and a few common extensions: |
| </p> |
| <pre class="prettyprint"> |
| <a href=...> |
| <area href=...> |
| <audio src=...> |
| <blockquote cite=...> |
| <body background=...> |
| <button formaction=...> |
| <command icon=...> |
| <del cite=...> |
| <embed src=...> |
| <form action=...> |
| <frame src=...> |
| <html manifest=...> |
| <iframe src=...> |
| <img src=...> |
| <input type="image" src=...> |
| <ins cite=...> |
| <link href=...> |
| <q cite=...> |
| <script src=...> |
| <source src=...> |
| <td background=...> |
| <th background=...> |
| <table background=...> |
| <tbody background=...> |
| <tfoot background=...> |
| <thead background=...> |
| <track src=...> |
| <video src=...> |
| </pre> |
| <p> |
| If your site uses a non-standard attribute for URLs, PageSpeed won't |
| know to rewrite them or the resources they reference. To identify them to |
| PageSpeed, use the <code>UrlValuedAttribute</code> directive. |
| For example: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedUrlValuedAttribute span src hyperlink |
| ModPagespeedUrlValuedAttribute div background image</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed UrlValuedAttribute span src hyperlink; |
| pagespeed UrlValuedAttribute div background image;</pre> |
| </dl> |
| |
| <p> |
| These would identify <code><span src=...></code> and <code><div |
| background=...></code> as containing URLs. Further, |
| the <code>background</code> attribute of <code>div</code> elements would be |
| treated as referring to an image and would be treated just like an image |
| resource referenced with <code><img src=...></code>. The general form |
| is: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedUrlValuedAttribute ELEMENT ATTRIBUTE CATEGORY</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed UrlValuedAttribute ELEMENT ATTRIBUTE CATEGORY;</pre> |
| </dl> |
| |
| <p> |
| All fields are case-insensitive. |
| <span id="categories">Valid categories are:</span> |
| <ul> |
| <li><code>script</code></li> |
| <li><code>image</code></li> |
| <li><code>stylesheet</code> (As of 1.12.34.1)</li> |
| <li><code>otherResource</code> |
| <ul><li>Any other URL that will be automatically loaded by the |
| browser along with the main page. For example, |
| the <code>manifest</code> attribute of the <code>html</code> |
| element or the <code>src</code> attribute of |
| an <code>iframe</code> element.</li></ul> |
| </li> |
| <li><code>hyperlink</code> |
| <ul><li>A link to another page or resource that a browser wouldn't |
| normally load in connection to this page (like |
| the <code>href</code> attribute of an <code>a</code> element). |
| These URLs will still be rewritten |
| by <code>MapRewriteDomain</code> and similar directives, but they |
| will not be sharded and PageSpeed will not load the URL and |
| rewrite the resource.</li></ul> |
| </li> |
| </ul> |
| When in doubt, <code>hyperlink</code> is the safest choice. |
| |
| <p class="note"> |
| <b>Note:</b> Until 1.12.34.1, <code>stylesheet</code> was accepted by the |
| configuration parser, but was non-functional. |
| </p> |
| |
| </p> |
| |
| <h2 id="ModPagespeedLoadFromFile">Loading static files from disk</h2> |
| <p> |
| By default PageSpeed loads sub-resources via an HTTP fetch. It would be |
| faster to load sub-resources directly from the filesystem, however this may |
| not be safe to do because the sub-resources may be dynamically generated or |
| the sub-resources may not be stored on the same server. |
| </p> |
| <p> |
| However, you can explicitly tell PageSpeed to load static sub-resources from |
| disk by using the <code>LoadFromFile</code> directive. For example: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedLoadFromFile "http://www.example.com/static/" \ |
| "/var/www/static/"</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed LoadFromFile "http://www.example.com/static/" |
| "/var/www/static/";</pre> |
| </dl> |
| |
| <p> |
| tells PageSpeed to load all resources whose URLs start |
| with <code>http://www.example.com/static/</code> from the filesystem |
| under <code>/var/www/static/</code>. For |
| example, <code>http://www.example.com/static/images/foo.png</code> will be |
| loaded from the file <code>/var/www/static/images/foo.png</code>. |
| However, <code>http://www.example.com/bar.jpg</code> will still be fetched |
| using HTTP. |
| </p> |
| <p> |
| If you need more sophisticated prefix-matching behavior, you can use |
| the <code>LoadFromFileMatch</code> directive, which |
| supports <a href="https://github.com/google/re2/wiki/Syntax">RE2-format</a> |
| regular expressions. (Note that this is not the same format as the wildcards |
| used above and elsewhere in PageSpeed.) For example: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedLoadFromFileMatch "^https?://example.com/~([^/]*)/static/" \ |
| "/var/www/static/\\1"</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed LoadFromFileMatch "^https?://example.com/~([^/]*)/static/" |
| "/var/www/static/\\1";</pre> |
| </dl> |
| |
| <p> |
| Will load <code>http://example.com/~pat/static/cat.jpg</code> from |
| <code>/var/www/static/pat/cat.jpg</code>, |
| <code>http://example.com/~sam/static/images/dog.jpg</code> from |
| <code>/var/www/static/sam/images/dog.jpg</code>, and |
| <code>https://example.com/~al/static/css/ie</code> from |
| <code>/var/www/static/al/css/ie</code>. The resource |
| <code>http://example.com/~pat/images/static/puppy.gif</code>, however, |
| would not be matched by this directive and would be fetched using HTTP. |
| </p> |
| <p> |
| Because PageSpeed is loading the files directly from the filesystem, no custom |
| headers will be set. For example, no headers set with the <code>Header |
| set</code> (Apache) or <code>add_header</code> (Nginx) directives will be |
| applied to these resources. If you have resources that need to be served with |
| custom headers, such as <code>Cache-Control: private</code>, you need to |
| exclude them from <code>LoadFromFile</code>. For resources PageSpeed |
| rewrites <a href="system#ipro">in-place</a> it will set a 5-minute cache |
| lifetime by default, which you can adjust by |
| changing <a href="system#load_from_file_cache_ttl"><code |
| >LoadFromFileCacheTtlMs</code></a>. |
| </p> |
| <p> |
| Furthermore, the content type will be set based |
| upon only the filename extension and only for common filename extensions we |
| recognize (<code>.html</code>, <code>.css</code>, <code>.js</code>, |
| <code>.jpg</code>, <code>.jpeg</code>, ... see full |
| list: <a href="https://github.com/apache/incubator-pagespeed-mod/blob/master/pagespeed/kernel/http/content_type.cc">content_type.cc</a>). |
| Before 1.9.32.1, filenames with unrecognized extensions were served with no |
| <code>Content-Type</code> header; in 1.9.32.1 and later such filenames will |
| not be loaded from file and instead will fall back to ordinary fetching. |
| </p> |
| <p> |
| You can also use the <code>LoadFromFile</code> directive to |
| load HTTPS resources which would not be otherwise fetchable directly. |
| For example: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedLoadFromFile "https://www.example.com/static/" \ |
| "/var/www/static/"</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed LoadFromFile "https://www.example.com/static/" |
| "/var/www/static/";</pre> |
| </dl> |
| |
| <p> |
| The filesystem path must be an absolute path. |
| </p> |
| <p> |
| You can specify multiple <code>LoadFromFile</code> associations in |
| configuration files. Note that large numbers of such directives may impact |
| performance. |
| </p> |
| <p> |
| If the sub-resource cannot be loaded from file in the directory |
| specified, the sub-request will fail (rather than fall back to |
| HTTP fetch). Part of the reason for this is to indicate a configuration |
| error more clearly. |
| </p> |
| <p> |
| As an added benefit. If resources are loaded from file, the rewritten |
| versions will be updated immediately when you change the associated file. |
| Resources loaded via normal HTTP fetches are refreshed only when they |
| expire from the cache (by default every 5 minutes). Therefore, the |
| rewritten versions are only updated as often as the cache is refreshed. |
| Resources loaded from file are not subject to caching behavior because |
| they are accessed directly from the filesystem for every request for the |
| rewritten version. |
| </p> |
| |
| <p> |
| See also <a href="#mapping_origin"><code>MapOriginDomain</code></a>. |
| </p> |
| |
| <p> |
| This directive can <strong>not</strong> be used |
| in <a href="configuration#htaccess">location-specific configuration |
| sections</a>. |
| </p> |
| |
| <h4 id="limiting-load-from-file">Limiting Direct Loading</h4> |
| <p> |
| A mapping set up with <code>LoadFromFile</code> allows filesystem loading for |
| anything it matches. If you have directories or file types that cannot be |
| loaded directly from the filesystem, <code>LoadFromFileRule</code> lets you |
| add fine-grained rules to control which files will be loaded directly and |
| which will fall back to the standard process, over HTTP. |
| </p> |
| <p> |
| When given a URL PageSpeed first determines whether any LoadFromFile |
| mappings apply. If one does, it calculates the mapped filename and checks for |
| applicable LoadFromFileRules. Considering rules in the reverse order of |
| definition, it takes the first applicable one and uses that to determine |
| whether to load from file or fall back to HTTP. |
| </p> |
| <p> |
| Some examples may be helpful. Consider a website that is entirely static |
| content except for a <code>/cgi-bin</code> directory: |
| </p> |
| <pre> |
| /var/www/index.html |
| /var/www/pets.html |
| /var/www/images/cat.jpg |
| /var/www/stylesheets/main.css |
| /var/www/stylesheets/ie.css |
| /var/www/cgi-bin/guestbook.pl |
| /var/www/cgi-bin/visitcounter.pl |
| </pre> |
| <p> |
| While most of the site can be loaded directly from the |
| filesystem, <code>guestbook.pl</code> and <code>visitcounter.pl</code> are |
| perl files that need to be interpreted before serving. Adding a rule |
| disallowing the <code>/cgi-bin</code> directory tells us to fall back to HTTP |
| appropriately: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedLoadFromFile http://example.com/ /var/www/ |
| ModPagespeedLoadFromFileRule Disallow /var/www/cgi-bin/</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed LoadFromFile http://example.com/ /var/www/; |
| pagespeed LoadFromFileRule Disallow /var/www/cgi-bin/;</pre> |
| </dl> |
| |
| <p> |
| The <code>LoadFromFileRule</code> directive takes two arguments. |
| The first must be either <code>Allow</code> or <code>Disallow</code> while the |
| second is a prefix that specifies which filesystem paths it should apply to. |
| Because the default is to allow loading from the filesystem for all paths |
| listed in any <code>LoadFromFile</code> statement, most of the time you will |
| be using <code>Disallow</code> to turn off filesystem loading for some subset |
| of those paths. You would use <code>Allow</code> only after |
| a <code>Disallow</code> that was overly general. |
| </p> |
| <p> |
| Not all sites are well suited for prefix-based control. Consider a site with |
| PHP files mixed in with ordinary static files: |
| </p> |
| <pre> |
| /var/www/index.html |
| /var/www/webmail.php |
| /var/www/webmail.css |
| /var/www/blog/index.php |
| /var/www/blog/header.png |
| /var/www/blog/blog.css |
| </pre> |
| <p> |
| Blacklisting just the <code>.php</code> files so they fall back to an HTTP |
| fetch allows everything else to be loaded directly from the filesystem: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedLoadFromFile http://example.com/ /var/www/ |
| ModPagespeedLoadFromFileRuleMatch Disallow \.php$</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed LoadFromFile http://example.com/ /var/www/; |
| pagespeed LoadFromFileRuleMatch Disallow \.php$;</pre> |
| </dl> |
| |
| <p> |
| The <code>LoadFromFileRuleMatch</code> directive also takes two arguments. |
| The first is either <code>Allow</code> or <code>Disallow</code> and functions |
| just like for <code>LoadFromFileRule</code> above. The second argument, |
| however, is |
| a <a href="https://github.com/google/re2/wiki/Syntax">RE2-format</a> regular |
| expression instead of a file prefix. Remember to escape characters that have |
| special meaning in regular expressions. For example, if instead |
| of <code>\.php$</code> we had simply <code>.php$</code> then a file |
| named <code>example.notphp</code> would still be forced to load over HTTP |
| because "<code>.</code>" is special syntax for "match any single character". |
| </p> |
| <p> |
| Consider a site with the opposite problem: a few file types can be reliably |
| loaded from file but the rest need interpretation first. For example: |
| </p> |
| <pre> |
| /var/www/index.html |
| /var/www/site.css |
| /var/www/script-using-ssi.js |
| /var/www/generate-image.pl |
| /var/www/ |
| </pre> |
| <p> |
| This site uses server side includes |
| (<a href="http://httpd.apache.org/docs/2.2/howto/ssi.html">Apache</a>, |
| <a href="http://wiki.nginx.org/HttpSsiModule">Nginx</a>) |
| in its javascript and <code>generate-image.pl</code> needs to be interpreted |
| to make images. The only resources on the site that are generally safe to |
| load are <code>.css</code> ones. By first blacklisting everything and then |
| whitelisting only the <code>.css</code> files, we can make PageSpeed do this: |
| </p> |
| |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedLoadFromFile http://example.com/ /var/www/ |
| ModPagespeedLoadFromFileRuleMatch disallow .* |
| ModPagespeedLoadFromFileRuleMatch allow \.css$</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed LoadFromFile http://example.com/ /var/www/; |
| pagespeed LoadFromFileRuleMatch disallow .*; |
| pagespeed LoadFromFileRuleMatch allow \.css$;</pre> |
| </dl> |
| |
| <p> |
| This works because order is significant: later rules take precedence over |
| earlier ones. |
| </p> |
| |
| <h3 id="LoadFromFileScriptVariables">Script Variables with LoadFromFile</h3> |
| <p class="note"><strong>Note: New feature as of 1.9.32.1</strong></p> |
| <p class="note"><strong>Note: Nginx-only</strong></p> |
| |
| <p> |
| As of 1.9.32.1 Nginx <a href="http://nginx.org/en/docs/varindex.html">script |
| variables</a> are now supported with the various <code>LoadFromFile</code> |
| directives. Script support for those options makes it possible to configure a |
| generic mapping of http hosts to disk, to reduce the amount of configuration |
| required when you want to load as much from disk as possible but have a lot |
| of <code>server{}</code> blocks. |
| </p> |
| |
| <p> |
| As an example, consider one server that hosts three sites, each of which have |
| a directory <code>/static</code> that holds static resources and can be loaded |
| from file. One way to configure this server would be: |
| </p> |
| |
| <dl> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| http { |
| ... |
| server { |
| server_name a.example.com; |
| pagespeed LoadFromFile http://a.example.com/static /var/www-a/static; |
| ... |
| } |
| server { |
| server_name b.example.com; |
| pagespeed LoadFromFile http://b.example.com/static /var/www-b/static; |
| ... |
| } |
| server { |
| server_name c.example.com; |
| pagespeed LoadFromFile http://c.example.com/static /var/www-c/static; |
| ... |
| } |
| }</pre> |
| </dl> |
| |
| <p> |
| For three sites this is kind of annoying, but the more sites you have the |
| worse it gets. With <code>ProcessScriptVariables</code> you can define one |
| generic <code>LoadFromFile</code> mapping instead of defining each one |
| individually: |
| </p> |
| |
| <dl> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| http { |
| ... |
| pagespeed ProcessScriptVariables on; |
| pagespeed LoadFromFile "http://$host/static" "$document_root/static"; |
| |
| server { |
| server_name a.example.com; |
| ... |
| } |
| server { |
| server_name b.example.com; |
| ... |
| } |
| server { |
| server_name c.example.com; |
| ... |
| } |
| }</pre> |
| </dl> |
| |
| <p> |
| This will use Nginx's <code>$host</code> and <code>$document_root</code> |
| script variables instead of requiring you to explicitly code each one. |
| </p> |
| |
| <p> |
| For more details on script variables, including how to handle dollar signs, |
| see <a href="system#nginx_script_variables">Script Variable Support</a>. |
| </p> |
| |
| <h3 id="risks">Risks</h3> |
| <p> |
| This should only be used for completely static resources which do not |
| need any custom headers or special server processing. If non-static |
| resources exist in the specified directory, the source code will |
| be used without applying SSI includes, CGI generation, etc. |
| Furthermore, all the resources should have filenames with common |
| extensions for their Content-Type (Ex: .html, .css, .js, .jpg, .jpeg, ... see |
| full list: <a href="https://github.com/apache/incubator-pagespeed-mod/blob/master/pagespeed/kernel/http/content_type.cc">content_type.cc</a>). |
| </p> |
| |
| <h2 id="inline_without_auth">Inlining resources without explicit authorization |
| </h2> |
| <p> |
| Several filters in PageSpeed operate by inlining content from resources into |
| the HTML: inline_css, inline_javascript and prioritize_critical_css are a |
| few of the filters that operate in this manner. If resources from |
| third-party domains are not authorized explicitly, the effectiveness of |
| these filters decreases. For instance, prioritize_critical_css attempts to |
| remove blocking CSS requests needed for the initial render by inlining |
| critical CSS snippets into the HTML, however, the CSS resources that are not |
| authorized will continue to block. This option allows such resources to |
| be inlined without having to authorize all the individual domains. |
| </p> |
| <p> |
| The <code>InlineResourcesWithoutExplicitAuthorization</code> |
| directive can be used to allow resources from third-party domains to be |
| inlined into the HTML without requiring explicit authorization for each |
| domain. This option is "off" by default, and takes a |
| comma-separated list of strings representing resource categories for which |
| the option should be enabled. The list of valid resource categories is |
| given <a href="#categories">here</a>. Currently, only Script and |
| Stylesheet resource types are supported for this option. |
| </p> |
| |
| This option can be enabled as follows: |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedInlineResourcesWithoutExplicitAuthorization Script,Stylesheet |
| </pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed InlineResourcesWithoutExplicitAuthorization Script,Stylesheet; |
| </pre> |
| </dl> |
| |
| <p class="warning"><strong>Warning: </strong>Enabling |
| <code>InlineResourcesWithoutExplicitAuthorization</code> could permit |
| hostile third parties to access any machine and port that the server running |
| mod_pagespeed has access to, including potentially those behind firewalls. |
| Please read the following information for details. |
| </p> |
| <p> |
| This directive should only be enabled if all of the following conditions are |
| met for the resource types for which this option is enabled: |
| </p> |
| <ol> |
| <li>The webmaster is confident that the resources referenced on their pages are |
| from trusted domains only. |
| </li> |
| <li>The site does not allow user-injected resources for the enabled resource |
| types. |
| </li> |
| <li>Fetches from the PageSpeed server should have no |
| more access to machines or ports than anyone on the Internet, and machines it |
| can access should not treat its traffic specially. Specifically, the |
| PageSpeed servers should not be able to access anything that is internal to a |
| firewall. Please refer to <a href="#fetch_servers"> |
| Fetch server restrictions</a> sections for more details. |
| </li> |
| </ol> |
| |
| <p> |
| Note that resources inlined into HTML via this option will not be accessible |
| directly via a pagespeed URL, since that involves different security risks. |
| Resources will also not be inlined into other non-HTML resources via this |
| option. This means that flatten_css_imports will not flatten third-party CSS |
| into another CSS resource, unless the relevant third-party domains are |
| authorized explicitly via one of the techniques mentioned in the previous |
| sections. |
| </p> |
| |
| </div> |
| <!--#include virtual="_footer.html" --> |
| </body> |
| </html> |