| <!-- |
| Licensed to the Apache Software Foundation (ASF) under one |
| or more contributor license agreements. See the NOTICE file |
| distributed with this work for additional information |
| regarding copyright ownership. The ASF licenses this file |
| to you under the Apache License, Version 2.0 (the |
| "License"); you may not use this file except in compliance |
| with the License. You may obtain a copy of the License at |
| |
| http://www.apache.org/licenses/LICENSE-2.0 |
| |
| Unless required by applicable law or agreed to in writing, |
| software distributed under the License is distributed on an |
| "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY |
| KIND, either express or implied. See the License for the |
| specific language governing permissions and limitations |
| under the License. |
| --> |
| |
| <html> |
| <head> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| <title>Configuring Downstream Caches</title> |
| <link rel="stylesheet" href="doc.css"> |
| </head> |
| <body> |
| <!--#include virtual="_header.html" --> |
| |
| |
| <div id=content> |
| <h1>Configuring Downstream Caches</h1> |
| <style> |
| .diff-del { color: darkred } |
| .diff-add { color: green } |
| </style> |
| |
| <h2 id="standard">Standard Configuration</h2> |
| <p class="warning"><strong>Note: This feature is currently experimental. |
| Options and configuration described here are subject to change in future |
| releases. Please subscribe to the <a href="mailing-lists#announcements" |
| >announcements mailing list</a> to keep yourself informed of updates to |
| this feature.</strong></p> |
| |
| <p> |
| By default PageSpeed serves HTML files with <code>Cache-Control: no-cache, |
| max-age=0</code> so that changes to the HTML and its resources are sent |
| fresh on each request. The HTML can be cached, however, if you: |
| <ul> |
| <li> Set up a <code>PURGE</code> handler in your cache. |
| <li> Tell PageSpeed the url for the <code>PURGE</code> handler. |
| <li> Have the cache set the <code>PS-CapabilityList</code> header |
| so PageSpeed will emit HTML that can be sent to any browser. |
| <li> Have the cache occasionally pass through requests to the origin with |
| the <code>PS-ShouldBeacon</code> header set. |
| </ul> |
| </p> |
| |
| <p> |
| For example, if you're running a cache on port 80 that reverse proxies to |
| your site on port 8080, then you'd need to tell PageSpeed to send |
| its <code>PURGE</code> requests to port 80: |
| </p> |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedDownstreamCachePurgeLocationPrefix http://localhost:80</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed DownstreamCachePurgeLocationPrefix http://localhost:80;</pre> |
| </dl> |
| <p> |
| You also need to give PageSpeed a key so it can allow the cache to request |
| rebeaconing without allowing external entities to do so: |
| </p> |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedDownstreamCacheRebeaconingKey "<your-secret-key>"</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed DownstreamCacheRebeaconingKey "<your-secret-key>";</pre> |
| </dl> |
| <p> |
| These are the only changes you need to make to the PageSpeed configuration |
| file, but before you restart you also need to make some changes to your |
| cache configuration. These vary by cache; below are configurations for |
| Varnish 3.x, Varnish 4.x, and Nginx's proxy_cache: |
| </p> |
| <dl> |
| <dt>Varnish 3.x:<dd><pre class="prettyprint"> |
| acl purge { |
| # If PageSpeed isn't running on the same server as your cache, list the IP(s) |
| # of the PageSpeed machine(s) here. |
| "127.0.0.1"; |
| } |
| sub vcl_recv { |
| # Tell PageSpeed not to use optimizations specific to this request. |
| set req.http.PS-CapabilityList = "fully general optimizations only"; |
| |
| # Don't allow external entities to force beaconing. |
| remove req.http.PS-ShouldBeacon; |
| |
| # Authenticate the purge request by IP. |
| if (req.request == "PURGE") { |
| if (!client.ip ~ purge) { |
| error 405 "Not allowed."; |
| } |
| return (lookup); |
| } |
| } |
| |
| # Mark HTML as uncacheable. If we can't send them purge requests they can't |
| # cache our html. |
| sub vcl_fetch { |
| if (beresp.http.Content-Type ~ "text/html") { |
| remove beresp.http.Cache-Control; |
| set beresp.http.Cache-Control = "no-cache, max-age=0"; |
| } |
| return (deliver); |
| } |
| |
| sub vcl_hit { |
| # Make purging happen in response to a PURGE request. This happens |
| # automatically in Varnish 4.x so we don't need it there. |
| if (req.request == "PURGE") { |
| purge; |
| error 200 "Purged."; |
| } |
| |
| # 5% of the time ignore that we got a cache hit and send the request to the |
| # backend anyway for instrumentation. |
| if (std.random(0, 100) < 5) { |
| set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| return (pass); |
| } |
| } |
| |
| sub vcl_miss { |
| # Make purging happen in response to a PURGE request. This happens |
| # automatically in Varnish 4.x so we don't need it there. |
| if (req.request == "PURGE") { |
| purge; |
| error 200 "Purged."; |
| } |
| |
| # Instrument 25% of cache misses. |
| if (std.random(0, 100) < 25) { |
| set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| return (pass); |
| } |
| }</pre> |
| <dt>Varnish 4.x:<dd><pre class="prettyprint"> |
| acl purge { |
| # If PageSpeed isn't running on the same server as your cache, list the IP(s) |
| # of the PageSpeed machine(s) here. |
| "127.0.0.1"; |
| } |
| |
| sub vcl_recv { |
| # Tell PageSpeed not to use optimizations specific to this request. |
| set req.http.PS-CapabilityList = "fully general optimizations only"; |
| |
| # Don't allow external entities to force beaconing. |
| unset req.http.PS-ShouldBeacon; |
| |
| # Authenticate the purge request by IP. |
| if (req.method == "PURGE") { |
| if (!client.ip ~ purge) { |
| return (synth(405,"Not allowed.")); |
| } |
| return (purge); |
| } |
| } |
| |
| # Mark HTML as uncacheable. If we can't send them purge requests they can't |
| # cache our html. |
| sub vcl_backend_response { |
| if (beresp.http.Content-Type ~ "text/html") { |
| unset beresp.http.Cache-Control; |
| set beresp.http.Cache-Control = "no-cache, max-age=0"; |
| } |
| return (deliver); |
| } |
| |
| sub vcl_hit { |
| # 5% of the time ignore that we got a cache hit and send the request to the |
| # backend anyway for instrumentation. |
| if (std.random(0, 100) < 5) { |
| set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| return (pass); |
| } |
| } |
| sub vcl_miss { |
| # Instrument 25% of cache misses. |
| if (std.random(0, 100) < 25) { |
| set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| return (pass); |
| } |
| }</pre> |
| <dt>Nginx proxy_cache:<dd><pre class="prettyprint"> |
| http { |
| # Define a mapping used to mark HTML as uncacheable. |
| map $upstream_http_content_type $new_cache_control_header_val { |
| default $upstream_http_cache_control; |
| "~*text/html" "no-cache, max-age=0"; |
| } |
| |
| server { |
| # PageSpeed's beacon dependent filters need the cache to let some requests |
| # through to the backend. This code below depends on the <a href="http://wiki.nginx.org/HttpSetMiscModule">ngx_set_misc</a> |
| # module and randomly passes 5% of traffic to the backend for rebeaconing. |
| set $should_beacon_header_val ""; |
| set_random $rand 0 100; |
| if ($rand ~* "^[0-4]$") { |
| set $should_beacon_header_val "<your-secret-key>"; |
| set $bypass_cache 1; |
| } |
| |
| location / { |
| # existing proxy_pass |
| # existing proxy_cache |
| # existing proxy_cache_key |
| |
| # What servers should we accept PURGE requests from? If PageSpeed isn't |
| # running on the same server as your cache, list the IP(s) of the |
| # PageSpeed machine(s) here. |
| # |
| # This requires rebuilding with the ngx_cache_purge module: |
| # https://github.com/FRiCKLE/ngx_cache_purge |
| proxy_cache_purge PURGE from 127.0.0.1; |
| |
| # Mark HTML as uncacheable. If we can't send them purge requests they |
| # can't cache our html. Uses the map defined above. |
| proxy_hide_header Cache-Control; |
| add_header Cache-Control $new_cache_control_header_val; |
| |
| # Tell PageSpeed not to use optimizations specific to this request. |
| proxy_set_header PS-CapabilityList "fully general optimizations only"; |
| |
| # See discussion of rebeaconing above. |
| proxy_cache_bypass $bypass_cache; |
| proxy_hide_header PS-ShouldBeacon; |
| proxy_set_header PS-ShouldBeacon $should_beacon_header_val; |
| } |
| } |
| }</pre> |
| </dl> |
| |
| <p> |
| When running with downstream caching all resources referenced from the HTML |
| will be cache-extended as usual, so if you have resources that need to be |
| cached for a short time then they can be stale. If so, |
| either <code>Disallow</code> those resources, so PageSpeed doesn't inline or |
| cache-extend them, or decrease the cache lifetime on your HTML. |
| </p> |
| |
| <h2 id="additional">Additional Options</h2> |
| |
| The configuration above should be a good fit for most sites, but PageSpeed's |
| downstream caching is highly configurable with many options that allow you to |
| tweak it for your particular setup. |
| |
| <h3 id="beaconing">Beaconing</h3> |
| <p> |
| Several filters such as |
| <a href="reference-image-optimize#inline_images">inline_images</a>, |
| <a href="filter-inline-preview-images">inline_preview_images</a>, |
| <a href="filter-lazyload-images">lazyload_images</a> and |
| <a href="filter-prioritize-critical-css">prioritize_critical_css</a> |
| depend extensively on client beacons to determine critical images and |
| CSS. When such filters are enabled, pages periodically have beaconing |
| JavaScript inserted as part of the rewriting process. |
| The <a href="#standard">standard configuration</a> passes through 5% of cache |
| hits to the backend with a <code>PS-ShouldBeacon</code> header set, so that |
| these filters can continue to receive the beacons they need. |
| </p> |
| <p> |
| If you have a high traffic site, 5% is probably a larger share than you need |
| for PageSpeed to receive sufficient beacons. In that case you can decrease |
| the percentage of traffic to pass through. For example, here's how you'd |
| decrease it to 2%: |
| </p> |
| <dl> |
| <dt>Varnish 3.x or 4.x:<dd><pre> |
| <span class=diff-del>- if (std.random(0, 100) < 5) {</span> |
| <span class=diff-add>+ if (std.random(0, 100) < 2) {</span></pre> |
| <dt>Nginx proxy_cache<dd><pre> |
| <span class=diff-del>- if ($rand ~* "^[0-4]$") {</span> |
| <span class=diff-add>+ if ($rand ~* "^[01]$") {</span></pre> |
| </dl> |
| <p> |
| Alternatively, you may be willing to give up the benefit of the |
| beaconing-dependent filters in exchange for never intentionally bypassing the |
| cache. If so, you should <a href="config_filters#beacons">turn off |
| beaconing</a> and beacon-dependent filters in PageSpeed: |
| </p> |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint" |
| >ModPagespeedCriticalImagesBeaconEnabled false |
| ModPagespeedDisableFilters prioritize_critical_css</pre> |
| <dt>Nginx:<dd><pre class="prettyprint" |
| >pagespeed CriticalImagesBeaconEnabled false; |
| pagespeed DisableFilters prioritize_critical_css;</pre> |
| </dl> |
| <p> |
| Additionally you should remove the proxy config that handles beaconing: |
| </p> |
| <dl> |
| <dt>Varnish 3.x:<dd><pre><span class="diff-del" |
| >- remove req.http.PS-ShouldBeacon; |
| ... |
| - if (std.random(0, 100) < 5) { |
| - set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| - return (pass); |
| - } |
| ... |
| - if (std.random(0, 100) < 25) { |
| - set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| - return (pass); |
| - }</span></pre> |
| <dt>Varnish 4.x:<dd><pre><span class="diff-del" |
| >- unset req.http.PS-ShouldBeacon; |
| ... |
| - sub vcl_hit { |
| - if (std.random(0, 100) < 5) { |
| - set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| - return (pass); |
| - } |
| - } |
| - sub vcl_miss { |
| - if (std.random(0, 100) < 25) { |
| - set req.http.PS-ShouldBeacon = "<your-secret-key>"; |
| - return (pass); |
| - } |
| - }</span></pre> |
| <dt>Nginx proxy_cache<dd><pre><span class="diff-del" |
| >- set $should_beacon_header_val ""; |
| - set_random $rand 0 100; |
| - if ($rand ~* "^[0-4]$") { |
| - set $should_beacon_header_val "<your-secret-key>"; |
| - set $bypass_cache 1; |
| - } |
| ... |
| - proxy_cache_bypass $bypass_cache; |
| - proxy_hide_header PS-ShouldBeacon; |
| - proxy_set_header PS-ShouldBeacon $should_beacon_header_val;</span></pre> |
| </dl> |
| |
| |
| <h3 id="ps-resource">PageSpeed Resources</h3> |
| <p> |
| Because PageSpeed already caches its optimized resources, you may want to |
| exclude them caching by the downstream cache. If so, you can set: |
| |
| <dl> |
| <dt>Varnish 3.x and 4.x:<dd><pre><span class="diff-add" |
| >+ if (req.url ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") { |
| + return (pass); |
| + }</span></pre> |
| <dt>Nginx proxy_cache<dd><pre><span class="diff-add" |
| >+ if ($uri ~ "\.pagespeed\.([a-z]\.)?[a-z]{2}\.[^.]{10}\.[^.]+") { |
| + set $bypass_cache "1"; |
| + }</span></pre> |
| </dl> |
| |
| <p> |
| If you have enabled <a href="restricting_urls#url_signatures">URL signing</a>, |
| change the <code>10</code> in the regexp to <code>20</code> to account for the |
| additional characters in the hash. |
| </p> |
| |
| <h3 id="ps-capabilitylist">PS-CapabilityList</h3> |
| |
| <p> |
| Typically PageSpeed will produce different HTML for different browsers. For |
| example, when responding to a request that has <code>Accept: |
| image/webp</code>, PageSpeed knows the requesting browser supports WebP and so |
| it can send these images, while if the <code>Accept</code> header doesn't |
| mention WebP then it will send JPEG or PNG. To suppress this behavior, |
| the <a href="#standard">standard configuration</a> above sets a header: |
| </p> |
| <pre> |
| PS-CapabilityList: fully general optimizations only |
| </pre> |
| <p> |
| This header can also be used to tell PageSpeed to make specific optimizations. |
| There are five capabilities PageSpeed can take advantage of that aren't |
| supported in all browsers, and it gives them each a code: |
| </p> |
| <table> |
| <tr><th>Capability<th>Code |
| <tr><td>Inline Images<td><code>ii</code> |
| <tr><td>Lazyload Images<td><code>ll</code> |
| <tr><td>WebP Images<td><code>jw</code> |
| <tr><td>Lossless WebP Images<td><code>ws</code> |
| <tr><td>Animated WebP Images<td><code>wa</code> |
| <tr><td>Defer Javascript<td><code>dj</code> |
| </table> |
| <p> |
| For example, you could include whether the <code>Accept</code> header |
| includes <code>image/webp</code> in your cache key, and then for the |
| fraction of traffic that claimed webp support send: |
| </p> |
| <pre> |
| PS-CapabilityList: jw: |
| </pre> |
| <p> |
| Every page would go through to your origin twice and be cached twice, once |
| processed with WebP support and once without. |
| </p> |
| <p> |
| You can combine multiple capabilities together with a comma. For example, if |
| you decided to make a cache fragment for Chrome 30+, which supports all of |
| these, for that fragment you would send: |
| </p> |
| <pre> |
| PS-CapabilityList: ll,ii,dj,jw,ws: |
| </pre> |
| <p> |
| For Firefox 4+, which supports all of these but WebP, you would send: |
| </p> |
| <pre> |
| PS-CapabilityList: ll,ii,dj: |
| </pre> |
| <p> |
| To use this header properly, however, you have to know which capabilities are |
| supported by which browsers in the version of PageSpeed you're using and craft |
| regular expressions to match exactly those ones. This is very difficult to do |
| in general because it involves duplicating the code in |
| <a href="https://github.com/apache/incubator-pagespeed-mod/blob/latest-stable/pagespeed/kernel/http/user_agent_matcher.cc"><code>user_agent_matcher.cc</code></a> |
| as regexes, but a simple division is: |
| <ul> |
| <li>Chrome 32+: <code>ll,ii,dj,jw,wa,ws</code> |
| <li>Firefox 4+, Safari, IE10 (but not IE11): <code>ll,ii,dj</code> |
| <li>Everything else: <code>fully general optimizations only</code> |
| </ul> |
| </p> |
| |
| <h3 id="purge-get">Purging with GET</h3> |
| |
| <p> |
| If you're integrating PageSpeed with a cache that doesn't |
| support <code>PURGE</code> requests but does support purging in response to a |
| prefixed <code>GET</code> request, PageSpeed can support that. You would |
| configure your cache to treat a <code>GET</code> to |
| <code>/purge/foo/bar</code> as a request to purge <code>/foo/bar</code> and |
| configure PageSpeed as: |
| </p> |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedDownstreamCachePurgeLocationPrefix http://CACHE-HOST:PORT/purge |
| ModPagespeedDownstreamCachePurgeMethod GET</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed DownstreamCachePurgeLocationPrefix http://CACHE-HOST:PORT/purge; |
| pagespeed DownstreamCachePurgeMethod GET;</pre> |
| </dl> |
| |
| <h3 id="purge-threshold">Purge Threshold</h3> |
| <p> |
| Whenever PageSpeed serves an HTML response that is not fully optimized it |
| continues rewriting in the background. When it finishes, if the HTML it |
| served was less than 95% optimized, it sends a purge request to the downstream |
| cache. The next request to come in will bypass the cache and come back to |
| PageSpeed where it can serve out the now more highly optimized page. If you |
| want to change what point PageSpeed considers the page done and stops |
| optimizing, you can set a different value for this threshold. For example, to |
| lower it to 80%, so that PageSpeed is satisfied with a page that is only 80% |
| optimized, you would set: |
| </p> |
| <dl> |
| <dt>Apache:<dd><pre class="prettyprint"> |
| ModPagespeedDownstreamCacheRewrittenPercentageThreshold 80</pre> |
| <dt>Nginx:<dd><pre class="prettyprint"> |
| pagespeed DownstreamCacheRewrittenPercentageThreshold 80;</pre> |
| </dl> |
| |
| <h3 id="script-variables">Script Variables</h3> |
| <p class="note"><strong>Note: Nginx-only</strong></p> |
| <p class="note"><strong>Note: New feature as of 1.10.33.0</strong></p> |
| |
| <p> |
| In ngx_pagespeed <code>DownstreamCachePurgeLocationPrefix</code>, |
| <code>DownstreamCachePurgeMethod</code>, and |
| <code>DownstreamCacheRewrittenPercentageThreshold</code> support script |
| variables, so it's possible to set them on a per-request basis. Turn this on |
| with: |
| <pre class="prettyprint"> |
| http { |
| pagespeed ProcessScriptVariables on; |
| ... |
| }</pre> |
| You can then use script variables in arguments for these commands: |
| <pre class="prettyprint"> |
| pagespeed DownstreamCachePurgeLocationPrefix "$purge_location"; |
| pagespeed DownstreamCachePurgeMethod "$cache_purge_method"; |
| pagespeed DownstreamCacheRewrittenPercentageThreshold "$rewrite_threshold";</pre> |
| </p> |
| <p> |
| For more details on script variables, including how to handle dollar signs, |
| see <a href="system#nginx_script_variables">Script Variable Support</a>. |
| </p> |
| <h2 id="implementation">Implementation Details</h2> |
| <p> |
| To support downstream caching PageSpeed sends a purge request to the caching |
| layer whenever it identifies an opportunity for more rewriting to be done on |
| content that was just served. Such opportunities could arise because of, say, |
| the resources now becoming available in the PageSpeed cache or an image |
| compression operation completing. The cache purge forces the next request for |
| the HTML file to come all the way to the backend PageSpeed server and obtain |
| better rewritten content, which is then stored in the cache. This interaction |
| between the PageSpeed server and the downstream caching layer is depicted in |
| the diagram given below. |
| </p> |
| <img src="images/downstream_caching.png" > |
| <p> |
| In the interaction depicted above, note that the partially optimized HTML will |
| be served from the cache until a purge request gets sent by the PageSpeed |
| server. It is recommended to set up PageSpeed and the downstream caching |
| layer servers on a one to one basis so that the purges can be sent to the |
| correct downstream server. |
| </p> |
| </div> |
| <!--#include virtual="_footer.html" --> |
| </body> |
| </html> |