How Caching Works

 

How Caching Works

How Caching Works

Why Cache?

One primary function of Enfold Proxy (EP) is to cache content for Plone. Requests to Plone are expensive in terms of time, and caching minimizes the number of direct requests to Plone. In a typical configuration, Enfold Proxy and Internet Information Services (IIS) reside on the same machine, and Plone exists on an entirely different machine. Enfold Proxy keeps cached versions of many items on its machine (typically C:\Program Files\Enfold Proxy\cache) and serves them directly to the person's browser without obtaining it from Plone/Enfold. This has two advantages. First, IIS and EP are faster to begin with, so that brings improved performance. Second, the more you cache, the less load that Plone receives (and lower RAM/CPU use).

This topic provides a general introduction to what caching is about and how to monitor it. The next topic Configure EP for Caching covers typical configurations, plus how to purge the cache.

Steve Souders wrote in the book High Performance Web Sites (O'Reilly, 2007), "Only 10-20% of the end user response time is spent downloading the HTML document. The other 80-90% is spent downloading all the components in the page." Enfold Proxy is the component to help assist downloading the remaining 80-90% more efficiently and in a way that ensures that stale content is not sent.

Introduction to Caching

First, a more basic question: what is web caching?

There are three different kinds of "cache" in the context of websites: A Private Cache (also called browser-based caching or user-agent cache). If a browser has already sought a web resource and needs the same resource again, it knows (through HTTP Headers) to find it through a local copy instead of having to request one remotely. In this case, the browser will receive a 304 message from the server (which is the signal for the browser to use its private cached). Ultimately, this kind of cache has nothing to do with Enfold or IIS. By default, the http headers for Enfold Proxy give Age=0, and that setting will always cause EP to check for a cached version. (Some plone-caching solutions use private caches, but Enfold Proxy does not).

Forward Proxy Servers keep local copies of frequently requested resources, allowing large organizations and ISPs to significantly reduce their upstream bandwidth usage and cost, while significantly increasing performance. For the most part, a forward proxy server operates independently of Plone/EP Proxy and is not really relevant to this document. However, some settings on the HTTP headers and plone.app.caching are geared to forward proxy servers, so this document mentions it here simply to point out the potential for confusion.

A caching proxy server (or reverse proxy ) receives web traffic to a site and returns cached content (usually static content) whenever possible. Enfold Proxy is a caching proxy server because it forwards HTTP requests received from IIS to Plone only after first making sure that it does not have a cached version which it could return. See the Enfold Proxy Architecture for more details.

In the context of Enfold Proxy, we can break the concept even further according to caching method:

  • Disk-based Cache. The Cached item is saved on the proxy server's file system as a file. This is probably the most common method.
  • RAM-based Cache. Because directories in Plone refer not to physical files but objects, these items are not cached on the file system. Instead, they are cached as objects. Object-based cache is discussed in greater detail in the Caching XSLT topic.

Clearing Private Cache with your Browser

Web surfers are used to clicking F5 or even clicking the Refresh button to obtain the latest version of a web page. This doesn't clear the web cache and is not suitable for testing or troubleshooting caching issues. Here is the right way to clear the cache in each browser:

  • Firefox. Tools ' Clear Private Data (Ctl + Shift + Delete) . You don't need to clear all the items here. Just Cache is sufficient
  • Internet Explorer 7. Tools ' Internet Options ' (General Tab) ' Browsing History: Delete ' Delete Temporary Internet Files.

(It is not necessary to delete cookies or the other items).

Viewing HTTP Headers

There are several ways to watch HTTP headers. First, you can use the logs that are generated by Enfold Proxy. After the October 2008 release, Enfold Proxy now includes a special log level which shows HTTP headers and directional arrows in the log to indicate whether the request is coming or going. (Read more about how to view HTTP headers in the proxy log). The second way is to install plugins in your browser that lets you view the headers in real time. Because of the potential for confusion, this second way is not recommended but mainly useful when you do not have access to the Enfold Proxy logs. Below we will describe how to use browser plugins to view HTTP headers and how to interpret them. The HTTP headers in the proxy log may have slight differences than what you see below, but generally the syntax should be similar. Also, the directional arrows in the header logs should indicate where the responses are coming from. (Mainly, you will be checking how often Zope/Plone is handling requests; the less often, the better.

To verify caching, you will need to view your HTTP response headers. Several browser tools let you do this.

  1. Live HTTP Headers ( http://livehttpheaders.mozdev.org/ ) is the recommended tool. This is a Firefox plugin which lets you view http headers in real time. For easier reading, you can go to the Generator tab and filter out image requests.
  2. Firebug ( http://www.getfirebug.com/ ) Here is another Firefox plugin which not only records http headers for each item but also records other information (like download speed). You can view HTTP information for each HTTP request by clicking on the web resource. Live HTTP Headers is probably better at capturing data in real time. But Firebug shows graphically which resources on a web page are taking the longest time to load. It is commonly assumed that dynamically-generated server content accounts takes longer to load than images or scripts. In fact, Firebug reveals how often images and scripts are the culprits, not the html content.
  3. Google Chrome Dev Tools.
  4. Fiddler ( http://www.fiddlertool.com/ ) This is a tool for Internet Explorer. A good MSDN tutorial about using fiddler is here http://msdn2.microsoft.com/en-us/library/bb250446(VS.85).aspx

Tip # 1: surf with two browsers! More than likely, you'll be using Firefox with Live Headers to inspect your http headers. Then you can use one browser specifically to simulate a logged-in user (where different caching rules apply) and the other browser to simulate the anonymous non-logged in user (where the most aggressive caching rules apply).

Tip #2: When you are logged in as administrative user, a smaller percentage of your HTTP requests will be cached. As a result, response time for a Plone page might seem slower than they actually would appear to an anonymous user.

For more information, see the topic on monitoring your Enfold Proxy logs . More information: See this general introduction to web caching .