Kevin Cupp

Making EE Sites Fly with Varnish

UPDATE: I wrote a much longer, accurate primer for pairing ExpressionEngine and Varnish here.

Since publishing my Purge addon, several requests have come in from developers asking how to set up Varnish with ExpressionEngine, how well it works and if I have any tips. When I was first setting up Varnish with EE, there were no guides online specific to the CMS, so here’s a first crack at a guide for caching ExpressionEngine with Varnish.

First, a quick explanation about what Varnish does: Varnish is a caching proxy that sits between your users and your “backend,” which can be Apache, nginx, anything that serves HTTP requests. As requests come in, Varnish caches the fully-rendered output from the backend in memory so that future requests within a specified time frame don’t require the overhead needed to construct the page from scratch.

With that said, Varnish pairs best with EE when your site is mainly static content that users don’t interact with. For example, Plaid Avenger is a site I built in EE, and the only dynamic parts that update are when the site admins create entries in the EE backend. The comments are handled by Disqus, so those are handled once the page has landed in the user’s browser1. So this is a perfect site for Varnish.

For a site like Devot-EE where there is user-specific content delivered to the browser, and developers are adding, editing, rating and reviewing addons frequently, you do not get all of the benefits from serving your entire site out of a cache. You could configure Varnish to only cache when a user is not logged in which would definitely help server load when people are just browsing, but most logged-in requests will need to hit the backend and those would not scale as nicely to high traffic.

Installing

Depending on your flavor of Linux, Varnish will be more or less set up for you out of the package manager. I’ve installed it from Arch Linux’s User Repository which is basically like compiling from source. I won’t go into detail about starting the actual Varnish daemon on your distro, but it’s actually pretty easy as you’ll see from the wiki, just be sure to use malloc as your storage method so that pages are stored in RAM instead of on the disk. Avoiding disk I/O is crucial for achieving a high hitrate.

Once you have Varnish running and serving requests from your backend, now you need a good VCL file.

The VCL

Here is a bare minimum VCL file to get ExpressionEngine caching properly.

sub vcl_recv {

    if (req.url ~ "^/system" ||
        req.url ~ "ACT=" ||
        req.request == "POST") {
        return (pass);
    }
	
    unset req.http.Cookie;
	
    return (lookup);
}

sub vcl_fetch {

    set beresp.ttl = 60s;

    return (deliver);
}

What this will do is return cached objects for all requests except those for the system directory, otherwise you won’t be able to log into EE. We will also choose not to cache action (ACT) requests which some third-party addons use, and we shouldn’t want any POST requests cached since those will likely be unique.

You may also notice we are unsetting some cookies. This is because Varnish assumes that if a page is setting a cookie, there is user-specific content to be displayed and therefore it shouldn’t be cached. There are a few session cookies EE sets for guest users, so we need to tell Varnish to ignore those.

And finally, we are setting the TTL to 60 seconds, meaning that when a request is pulled from the backend, any requests that happen within the next 60 seconds for that object will be pulled from the cache. Say you have an resource-intensive EE template, under high traffic that template will only need to be generated at most once every 60 seconds. Pretty nice, huh? But even that didn’t satisfy me, I wanted a longer TTL while still allowing content administrators to update the site, so I created Purge so that my cache could be purged only when it needs to be.

Now that everything is being passed through Varnish, you might notice your Apache logs are nearly useless2. All that’s logged are requests from 127.0.0.1 and no referrer information. We need to add something else to our vcl_recv function to fix that:

remove req.http.X-Forwarded-For;
set req.http.X-Forwarded-For = client.ip;

The above will pass along the client’s IP address and their referrer to your server logs.

Varnish can also protect you from downtime if your backend is unresponsive. There’s always that time where we make a mistake in an Apache configuration file and Apache fails to restart gracefully, or the process died for other reasons. The trick is to set Grace time on your requests. You add this both to your vcl_recv and vcl_fetch:

set req.grace = 1h; #vcl_recv
set beresp.grace = 1h; #vcl_fetch

With the addition of backend polling, Varnish will now keep objects for 1 hour past their expiry time while you work on getting your backend back up and running. Grace time also kicks in if another user’s request for an object in the backend has not finished yet, it will return the cached version while the other request completes so there aren’t multiple requests for the same intensive object in the backend.

Here is our new VCL file with everything above implemented:

# Configure backend with probing
backend default {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = { 
        .url = "/";
        .timeout = 34 ms; 
        .interval = 1s; 
        .window = 10;
        .threshold = 8;
    }
}

sub vcl_recv {

    # Don't cache EE system directory, ACT requests or POST requests
    if (req.url ~ "^/system" ||
        req.url ~ "ACT=" ||
        req.request == "POST") {
        return (pass);
    }
    
    # Purge site from cache when 
    if (req.request == "EE_PURGE") {
        ban("req.http.host ~ mysite.com && req.url ~ ^/.*$");
        error 200 "Purged";
    }
    
    # Pass along client information to backend
    remove req.http.X-Forwarded-For;
    set req.http.X-Forwarded-For = client.ip;
    
    unset req.http.Cookie;
    
    set req.grace = 1h;
    
    return (lookup);
}

sub vcl_fetch {

    set beresp.grace = 1h;
    
    set beresp.ttl = 24h;
    
    return (deliver);
}

I hope the above information is a good primer for you to get started caching your sites with Varnish. There are many more ways to get the most out of Varnish, so please take a look around and work out the best solution for you. For fun, hammer on your newly-cached site with Apache Benchmark and watch your site serve hundreds of requests per second without batting an eyelash.

UPDATE: Simplified VCL and updated syntax to match Varnish 3 (2011-12-04)


  1. Even if comments were managed by EE, it would be ok to use Varnish if the commenting level wasn't very high, I would just modify the Purge extension to also purge when a comment is posted. But if people are commenting every few seconds, caching becomes less effective if you're serving off of a single server.
  2. I often analyze my Apache logs to get accurate numbers of unique requests and referrers, as well to watch for any unusual traffic.