Cache Headers and Enfold Proxy

Cache Headers and Enfold Proxy

Cache Headers and Enfold Proxy

Introduction

HTTP caching is defined in various specifications used by web servers and clients, particularly RFC2616. As such, the formal definition of how Enfold Proxy (EP) caching works should be considered that RFC. This document attempts to give an overview of how EP works in a less formal way.

At the bottom of this topic is an explanation of the core concepts behind caching and a diagram about how caching works.

Caching Goals

Enfold Proxy handles HTTP requests significantly faster than Zope can because it returns a cached version from EP's cache. Therefore, you should increase the ratio of requests which EP can handle from its own cache to the total number of HTTP requests.

There are three ways to do this (in descending order of importance).

  1. Increase the number of items which are cachable.
  2. Increase the length of time that individual items will be considered fresh.
  3. Increase the number of stale items which can be revalidated from Zope.

To do any of these three things, you need to modify the HTTP headers which Enfold Server/Plone sends to Enfold Proxy. There are three ways to do this:

  1. Configure a cache profile (if using Enfold Server). Enfold Server includes a configuration screen in Plone Site Setup which lets you configure a cache profile.
  2. Configure a cache policy with CacheFu. CacheFu is a third party Plone product.
  3. Manually set the HTTP headers on your Plone/Zope pages. (This must be done programmatically).

Important If you configure your cache settings incorrectly or if you make major modifications to your cache settings, it is best to purge the cache for your proxy definition. (See Purging Cached Content).

Increase the number of items which are cachable

One prerequisite to caching is knowing how much caching is too much and having a way to apply different caching policies to different kinds of Plone/Enfold Server content.

Chasseur (the caching product installed by default with Enfold Server) keeps things simple: all content which can be accessed by the anonymous user is considered cachable. CacheFu lets you mark certain content types as cachable and even lets you modify cache rules for logged in users vs. anonymous users. (Read more information about configuring caching products with Plone).

The best way to increase the number of cachable items is to make sure a Plone product is installed and verify that HTTP requests are generally being cached. (See Verifying that your cache settings are in effect).

Increase the length of time that individual items will be considered fresh.

Increasing the time that a content item is fresh will allow Enfold Proxy to handle the request by itself without having to bother Plone/Enfold Server.

You do this in Plone by going to your caching product and increasing the max-age or s-max-age value for a content item or a category of content items (In Chasseur on Enfold Server, you don't actually see these values, but that is essentially what you are modifying when you change the caching profiles). If you have enabled Headers or Debug log level, you can view the S-max-age or max-age value as well.

Increase the number of stale items which can be revalidated from Zope

The biggest bang for the buck comes when you have increased the length of time items will be considered fresh. You can also make minor time savings by maximizing the number of stale items which EP can keep for future revalidation. Successful revalidation occurs when EP receives a 304 message from Plone with an updated date stamp. Because Plone does not need to send the full content item in this case (only the 304 header with the date stamp), the overall load on Plone is reduced. Performing that transaction is faster than having to process and return a full Plone request; on the other hand, because this revalidation attempt involves 2 requests (to EP and Plone) instead of 1, the second request may cancel any potential time-saving. The advantage from revalidation comes not from this request but from increasing the amount of time in which EP's cache for this item will be fresh for other people seeking it.

Similarly, a revalidation attempt which is not successful will involve waiting for two requests and receiving the full content from Plone.

If a high percentage of revalidation attempts are not succeeding, that could slow down your site significantly. Therefore, when troubleshooting cache settings, you might as a first step verify that your revalidation attempts are actually resulting in success. The only way to do is to monitor traffic between Enfold Proxy and Plone and verify that Plone is sending 304s back to Enfold Proxy. You would need to inspect EP's logs after setting the log level to Headers or Debug. (For more info, see Headers log level).

Concepts

The first time a person makes a request for a content item, here is what typically happens:

images/diagram-not-in-cache.png

At first, Enfold Proxy does not have a cachable copy of the item, so it must fetch it directly from Plone.

When an item is received from the backend server, it is examined to see if it is "cachable" - that is, if it is able to be stored in the cache and used to satisfy future requests for that item.

Unless Plone includes a special header which specifies that the item is not cachable, Enfold Proxy will keep a copy locally on its machine which it can use for responding to future browser requests.

If the sysadmin purges the cache for the proxy definition, then all cached copies are deleted (and Enfold Proxy has to start again from scratch).

After the initial request for a content item, here is what typically happens:

images/diagram-found-in-cache.png

When a client connects to EP and asks for a request, EP first checks to see if it is in the cache - that is, if a previous request for the resource determined that it was cacheable, as described above.

If the item is in the cache, the next thing to be determined is if the item is "stale." A stale item means that although the item is in the cache, the parameters for that item indicate that it is no longer allowed to be used directly.

If an item is fresh, then it can be sent back to the client without contacting the backend server. However, if the item is stale, EP can generally "validate" such items. Validating an item consists of connecting to the server and asking the server if the version we fetched before is still the same. If the server responds in the affirmative, EP is still able to used the previously stale item, and although contact was made with the back-end server, the data itself was not re-transmitted.

It is important to understand the conceptual difference between an item being "cachable" and an item being "fresh". It is possible, and quite common, that a server will send a cachable response, but indicate it is immediately stale (i.e., by including a Max-Age:0 or an Expires value which is already out-of-date. This means that although EP can store the item in its cache, it is not able to use it in client requests without first validating it with the server. In this case, the end result is that the cache will never serve old items (because they are never fresh and require revalidation), but bandwidth usage between EP and the backend server will be reduced because Plone won't need to send the content item itself, only the 304 message with an updated date stamp.

HTTP Headers Reference

There are a number of HTTP headers used to control caching. Headers indicate if the response is cachable at all, and if so, how long the item should be considered "fresh". Other headers can also dictate certain paramaters for the cachability and freshness of the response - e.g., "this item can only be served from a cache if the client has these specific request headers". As a result, caching can get quite complex to understand. Unfortunately, that is just the nature of the beast.

This section attempts to detail the more common HTTP headers used to control caching. For more detailed information, please refer to RFC2616.

A number of headers found in CacheFu are not currently supported in Enfold Proxy. If any of these unsupported headers are present, Enfold Proxy will simply ignore them. Here is a list of headers which are not currently supported By Enfold Proxy: vary (limited support), no-transform, pre-support, post-check, stale-while-revalidate, stale-if-error.

With regard to "vary," EP only caches a single copy of an item, not one copy per variation. Thus, pages with different Vary headers may cause more cache misses than you might otherwise expect.

In general, Cache Control headers are supported. That includes max-age, s-maxage, public, no-cache, no-store, must-revalidate, and proxy-revalidate.

(Read an introduction to Cache Control headers at http://www.mnot.net/cache_docs/#CACHE-CONTROL).

Note: Browser-based tools for viewing HTTP headers don't show the full story. That is because you won't see the HTTP headers which Enfold Proxy and Enfold Server(Plone) will exchange. The easiest way to view the HTTP headers which EP is sending and receiving (from both the browser and Plone) is to examine the Enfold Proxy logs. In September 2008 Enfold Proxy introduced a new log level called Headers Log Level to permit more user-friendly viewing of the HTTP traffic. But first you will need to enable this log level in your proxy definition.

For informational purposes if your web browser is making an HTTP request, you may see another header specific to Enfold Proxy.
  • If you see X-Cache: HIT, then yes, Enfold Proxy has sent a cached version to the web browser.
  • If you see X-Cache: MISS, then no, Enfold Proxy did not send a cached version to the browser (i.e., it had to fetch it from Plone).
Last-Modified & ETag:
These headers are used when EP needs to validate an item with the backend server after it is stale. EP sends the values it initially got for these headers, and the server looks at the values to determine if the existing copy EP holds is fresh or not. If a response does not have these headers it is able to be cached (and therefore served from the cache while it is considered fresh), but is unable to be validated with the backend server once it becomes stale.

Expires: This is an older HTTP 1.0 header which indicates a date when the item should be considered stale. Normally this value should be in the future when served with the content. If the expires date has already arrived, then EP will

Pragma: no-cache: This is an old HTTP 1.0 header which indicates the item should not be considered cachable.

Cache-Control: This is a general-purpose header for controlling various aspects of caching. The value of the header determines what control is being requested. Common values are:

private no-store: indicate the item should not be considered cachable

no-cache:
despite the name, only indicates the item should be immediately considered stale - ie, it can be stored in the cache, but can not be served before validation. (Note a variation of this control allows you to specify a specific header that should not be cached, but that is not covered here)
must-revalidate & proxy-revalidate:
Indicates that once the item becomes stale, if there is an error validating the item, then the proxy must return a 503 (Bad Gateway) error rather than serving the cached response. Suppose the backend Plone server were down. In the absence of either header, EP will serve stale cached content. Plone would need to include either header to ensure that an error message is returned to the user.

max-age: number of seconds the item should be considered fresh for.

s-maxage:
Similar to the 'Expires' header, but indicates a number of seconds the item should be considered fresh for. s-maxage is the version used only by 'shared caches', or which EP is one. If expires and one of these headers is given, 'Expires' is ignored.