How Caching Works

How Caching Works

How Caching Works

Why Cache?

One primary function of Enfold Proxy (EP) is to cache content for Enfold Server or Plone. Requests to Plone are expensive in terms of time, and caching minimizes the number of direct requests to Plone. In a typical configuration, Enfold Proxy and Internet Information Services (IIS) reside on the same machine, and Enfold Server (or Plone) exist on an entirely different machine. Enfold Proxy keeps cached versions of many items on its machine (typically C:\Program Files\Enfold Proxy\cache) and serves them directly to the person's browser without obtaining it from Plone/Enfold. This has two advantages. First, IIS and EP are faster to begin with, so that brings improved performance. Second, the more you cache, the less load that Plone receives (and lower RAM/CPU use).

This topic provides a general introduction to what caching is about and how to monitor it. The next topic Configure EP for Caching covers typical configurations, plus how to purge the cache.

Steve Souders wrote in the book High Performance Web Sites (O'Reilly, 2007), "Only 10-20% of the end user response time is spent downloading the HTML document. The other 80-90% is spent downloading all the components in the page." Enfold Proxy is the component to help assist downloading the remaining 80-90% more efficiently and in a way that ensures that stale content is not sent.

Introduction to Caching

First, a more basic question: what is web caching?

There are three different kinds of "cache" in the context of websites: A Private Cache (also called browser-based caching or user-agent cache). If a browser has already sought a web resource and needs the same resource again, it knows (through HTTP Headers) to find it through a local copy instead of having to request one remotely. In this case, the browser will receive a 304 message from the server (which is the signal for the browser to use its private cached). Ultimately, this kind of cache has nothing to do with Enfold or IIS. By default, the http headers for Enfold Proxy give Age=0, and that setting will always cause EP to check for a cached version. (Some plone-caching solutions use private caches, but Enfold Proxy does not).

Forward Proxy Servers keep local copies of frequently requested resources, allowing large organizations and ISPs to significantly reduce their upstream bandwidth usage and cost, while significantly increasing performance. For the most part, a forward proxy server operates independently of Plone/EP Proxy and is not really relevant to this document. However, some settings on the HTTP headers and CacheFu are geared to forward proxy servers, so this document mentions it here simply to point out the potential for confusion.

A caching proxy server (or reverse proxy ) receives web traffic to a site and returns cached content (usually static content) whenever possible. Enfold Proxy is a caching proxy server because it forwards HTTP requests received from IIS to Enfold Server/Plone only after first making sure that it does not have a cached version which it could return. See the Enfold Proxy Architecture for more details.

In the context of Enfold Proxy, we can break the concept even further according to caching method:

  • Disk-based Cache. The Cached item is saved on the proxy server's file system as a file. This is probably the most common method.
  • RAM-based Cache. Because directories in Plone refer not to physical files but objects, these items are not cached on the file system. Instead, they are cached as objects. Object-based cache is discussed in greater detail in the Caching XSLT topic.

Clearing Private Cache with your Browser

Web surfers are used to clicking F5 or even clicking the Refresh button to obtain the latest version of a web page. This doesn't clear the web cache and is not suitable for testing or troubleshooting caching issues. Here is the right way to clear the cache in each browser:

  • Firefox. Tools ' Clear Private Data (Ctl + Shift + Delete) . You don't need to clear all the items here. Just Cache is sufficient
  • Internet Explorer 7. Tools ' Internet Options ' (General Tab) ' Browsing History: Delete ' Delete Temporary Internet Files.

(It is not necessary to delete cookies or the other items).

Viewing HTTP Headers

There are several ways to watch HTTP headers. First, you can use the logs that are generated by Enfold Proxy. The October 2008 release of Enfold Proxy now includes a special log level which shows HTTP headers and directional arrows in the log to indicate whether the request is coming or going. (Read more about how to view HTTP headers in the proxy log). The second way is to install plugins in your browser that lets you view the headers in real time. Below we will describe how to use browser plugins to view HTTP headers and how to interpret them. The HTTP headers in the proxy log may have slight differences than what you see below, but generally the syntax should be similar. Also, the directional arrows in the header logs should indicate where the responses are coming from. (Mainly, you will be checking how often Zope/Plone is handling requests; the less, the better.

To verify caching, you will need to view your HTTP response headers. Several browser tools let you do this.

  1. Live HTTP Headers ( http://livehttpheaders.mozdev.org/ ) is the recommended tool. This is a Firefox plugin which lets you view http headers in real time. For easier reading, you can go to the Generator tab and filter out image requests.
  2. Firebug ( http://www.getfirebug.com/ ) Here is another Firefox plugin which not only records http headers for each item but also records other information (like download speed). You can view HTTP information for each HTTP request by clicking on the web resource. Live HTTP Headers is probably better at capturing data in real time. But Firebug shows graphically which resources on a web page are taking the longest time to load. It is commonly assumed that dynamically-generated server content accounts takes longer to load than images or scripts. In fact, Firebug reveals how often images and scripts are the culprits, not the html content.
  3. Fiddler ( http://www.fiddlertool.com/ ) This is a tool for Internet Explorer. A good MSDN tutorial about using fiddler is here http://msdn2.microsoft.com/en-us/library/bb250446(VS.85).aspx

Tip # 1: surf with two browsers! More than likely, you'll be using Firefox with Live Headers to inspect your http headers. Then you can use one browser specifically to simulate a logged-in user (where different caching rules apply) and the other browser to simulate the anonymous non-logged in user (where the most aggressive caching rules apply).

Tip #2: When you are logged in as administrative user, a smaller percentage of your HTTP requests will be cached. As a result, response time for a Plone page might seem slower than they actually would appear to an anonymous user.

Interpreting HTTP Headers

Ultimately the most accurate way to troubleshoot caching is to look at HTTP Headers. This can be a daunting task. Just one web page may involve 10-15 separate HTTP requests. However, if you have experience and know what to look for, you can spot problems quickly without becoming bogged down. For example, a simple URL such as http://www.originalfunsite.com/events will consist of these requests:

GET /events
GET /portal_javascripts/Enfold%20Theme/ploneScripts6490.js
GET /portal_css/Enfold%20Theme/ploneStyles6499.css
GET /portal_css/Enfold%20Theme/ploneStyles1162.css
GET /portal_css/Enfold%20Theme/ploneStyles6975.css
GET /favicon.ico
GET /info_icon.gif
GET /user.gif
GET /rss.gif
GET /mail_icon.gif
GET /print_icon.gif
GET /topheader.png
GET /input_background.gif
GET /portal_css/Enfold%20Theme/ploneStyles2247.css
GET /search_icon.gif
GET /logo.gif
GET /site_icon.gif
GET /folder_icon.gif
GET /topic_icon.gif
GET /topic_icon.gif
GET /linkTransparent.gif
GET /arrowUp.gif
GET /favicon.ico
GET /arrowLeft.gif
GET /arrowRight.gif
GET /plone_powered.gif
GET /enfold_powered.png
GET /colophon_sec508.gif
GET /colophon_wai-aa.gif
GET /colophon_xhtml.png
GET /colophon_anybrowser.png
GET /colophon_css.png

Out of these requests only the first ( GET /events ) could conceivably be a Plone request. The rest of the http GETs (images, JavaScript and css) are automatically cached by Enfold Proxy (how long EP considers it to be fresh is another story). In fact, though, EP might even cache GET /events as well depending on the rules configured for it.

An HTTP header consists of a request and a response. Usually you will be interested only in the response.

http://www.originalfunsite.com/events

GET /events HTTP/1.1
Host: www.originalfunsite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Date: Sat, 16 Feb 2008 02:04:31 GMT
Server: Microsoft-IIS/6.0, Zope/(Zope 2.9.6-final, python 2.4.4, win32) ZServer/1.1 Plone/2.5.2
X-Powered-By: EnfoldProxy 4.0.0.8015 (http://www.enfoldsystems.com/Products/Proxy)
Content-Length: 25977
Content-Language: en
X-Cache-Headers-Set-By: CachingPolicyManager: /Plone/caching_policy_manager
Expires: Sat, 16 Feb 2008 03:04:31 GMT
Cache-Control: max-age=3600, s-maxage=3600, public
Content-Type: text/html;charset=utf-8
X-Cache: MISS from www.originalfunsite.com
Via: 1.1 www.originalfunsite.com:80

Now let's try it again. This is what happens after you type in the same URL immediately again.

http://www.originalfunsite.com/events

GET /events HTTP/1.1
Host: www.originalfunsite.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive

HTTP/1.x 200 OK
Date: Sat, 16 Feb 2008 02:15:02 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: EnfoldProxy 4.0.0.8015 (http://www.enfoldsystems.com/Products/Proxy)
Content-Length: 25977
Content-Type: text/html;charset=utf-8
Cache-Control: max-age=3600, s-maxage=3600, public
Expires: Sat, 16 Feb 2008 03:04:31 GMT
Age: 0
X-Cache: HIT from www.originalfunsite.com

Now, let's interpret. In the first case, you had X-Cache: MISS from www.originalfunsite.com. This means that EP could find no appropriate cache from Enfold/Plone, so it needed to make a Plone request. Apparently, a Plone product called Caching Policy Manager set these headers. That is another sign that the response came directly from Enfold or Plone and not Enfold Proxy itself.

In the second HTTP block, we see X-Cache: HIT from www.originalfunsite.com which means that EP cached this item on the file system. The max-age=3600 refers to the 1 hour expiration time for Aggressive Caching you declared in the Caching Profile (http://www.originalfunsite.com/chasseur_profiles).

With CacheFu, the headers will look different (CacheFu throws in some extra headers), but essentially you are looking for the same X-Cache: HIT from www.originalfunsite.com line. The more hits, the better. (Generally, unless you plan to declare the cache headers on your Plone templates, you will need a Plone caching product to enable caching).

Measuring Cache Performance

One more thing. If you check Enfold Proxy's proxy.log messages, you will see some rough percentage to use as a metric of how much caching is taking place:

2008-02-18 10:31:45,250|cache.host originalfunsite|STATS|3500|2856|Cache statistics:
        gets: 88, hits: 65 (33 validated), misses: 23 (0 uncachable)
        hitrate: 73% (58% excluding validations)
        size: 1943668 bytes, 324 items

For more information, see the topic on monitoring your Enfold Proxy logs . More information: See this general introduction to web caching .