Kristian Lyngstol's Blog

A free software-hacker's blog

Varnish purges

Varnish purges can be tricky. Both the model of purging that Varnish use and the syntax you need to use to take advantage of them can be difficult to grasp. It took me about five Varnish Administration Courses until I was happy with how I explained it to the participants, specially because the syntax is the most confusing syntax we have in VCL. However, it’s not very hard to work with once you understand the magic at work.

0. Separating purges and forced expiry

There are two ways to throw something out of the cache before the TTL is due in Varnish. You can either find the object you want gone and set the TTL to 0 forcing it to expire, or use Varnish’ purge mechanism. Setting ttl to 0 has it’s advantages, since you evict the object immediately, but it also means you have to evict one object at a time. This is fairly easy and usually done by having Varnish look for a “PURGE” request and handle it. This is not what I’ll talk about today, though. Read http://varnish-cache.org/wiki/VCLExamplePurging for information on forcibly expiring an object.

1. The challenges of purging a cache

The main reason people purge their cache, is to make room for updated content. Some journalist updated an article and you want the old one – possibly cached for days – gone. In addition, you may not know exactly what to cache, or it might be broader than just one item. En example would be a template used to generate multiple php files. Or all sports articles.

All in all, you do not purge to conserve memory. Because you expect that the cache will be filled soon.

If you are to purge all your php pages and you have 150 000 objects, you may not want to go looking for them either. This the reason some competing cache products are slow at large purging. By looking for all these objects, you might have to hit the disk to fetch cold objects.

In varnish, we also leave it up to VCL what’s unique to an object. That is to say: You can override the cache hash. By default it’s the host name or server IP combined with the “URL”. This is usually what people want, but sometimes you may want to add a cookie into the mix, for instance. The point is, we don’t know exactly what people cache on.

2. How Varnish attacks the problem

In Varnish, you purge by adding a purge to a list. This list can grow large if you add several very specific purges, but we try to reduce the overlap as much as possible. The purge in question can be pretty much anything you can match in VCL, including regular expressions on URLs, host names and user-agents for that matter. You can see the list by typing “purge.list” in the command line interface (CLI, or telnet).

Each object in your cache points to the last purge it was tested against. When you hit an object, it checks if there are any new purges in the list, test the object against them, then either evict the object and fetch a new one, or update the “last tested against”-pointer.

Because of this, the ‘req’-structure you are evaluating is actually that of the client to access the object next, not the client who pulled the object from the backend. It also means that every single object in your cache that is hit will be tested against all purges to see if it matches. But it’s spread out over time. It might sound wasteful, but it means you can add purges at constant time, and not really think about the cost of evaluating them.

It also means the object stays in the cache until it expires if it is not hit. So you don’t free up memory.

3. Adding purges “by hand”

Want to purge a http://example.com/somedirectory/ and everything beneath that path?

purge req.http.host == example.com && req.url ~ ^/somedirectory/.*$

or

purge req.url ~ ^/somedirectory/ && req.http.host == example.com

Want to purge all objects with a “Cache-Control: max-age=” set to 3600 ?

purge obj.http.Cache-Control ~ max-age=3600

or to take white space into account and no trailing numbers:

purge obj.http.Cache-Control ~ max-age ?= ?3600[^0-9]

Notice that all of the variables are in the same “VCL-context” as the client to hit the object next, so if you purge on req.http.user-agent, it’s fairly random if the object is really purged, because you (probably) can’t predict what user-agent the next person to visit a specific object is using. If you wish to purge based on a parameter sent from the “original” client, you will have to store that parameter in obj.http somewhere and remove it in vcl_deliver if you don’t want to expose it.

4. Adding purges in VCL

This is where it gets tricky. The normal example of why, is this: purge(“req.url == ” req.url);

Normal programming-thinking would tell you that this would match everything, since the url is always equal to itself. This is where VCL string concatenation comes into the picture. In reality, you are writing: “add this to the purge list: The string containing “req.url == ” and the value of the variable req.url”.

In other words, if the client access http://example.com/foobar and hit the code above, this would say: “Add the string containing “req.url == ” and “/foobar” to the purge list.” The quotation marks are essential!

I find it easier to think of it as preparing a string for the purge-command on cli. Varnish concatenates two strings without any special sign.

In the end, this is the rule of thumb: Put everything you expect to see literally when you type “purge.list” inside quotation marks, and put things you wish to replace with the variable of the calling session outside.

So you actually have three different VCL contexts to worry about:

  1. The context that originally pulled the object in from a backend (not much you can do here unless you hide things in obj.http)
  2. The context that will hit the object and thereby test the object against the purge. Any variable in this context has to be inside quotation marks.
  3. The context that triggered the purge, variables from this context should be outside quotation marks, so they are replaced with their string values before being added to the purge list.

The reason you do not need quotation marks if you enter the purge command on the command line interface is because you don’t have the third context. There is no req.url in telnet, since you are not going through VCL at all.

Some examples, note that when I say “supplied by the client” I mean the client initiating the purge, typically some smart system you’ve set up:

Purge object on the current host and URLs matching the regex stored in the X-Purge-Regex header supplied by the client:

purge("req.http.host == " req.http.host " && req.url ~ " req.http.X-Purge-Regex);

Purge all php for any example.com-domain:

purge("req.http.host ~ example.com$ && req.url ~ ^/.*\.php");

Same, but for the host provided in the X-Purge-HostPHP:

purge("req.http.host ~ " req.http.X-Purge-HostPHP " && req.url ~ ^/.*\.php");

Purge objects with X-Cache-Channel set to “sport”:

purge("obj.http.X-Cache-Channel ~ sport");

Same, but purge the cache-channel set in the header ‘X-Purge-CC’:

purge("obj.http.X-Cache-Channel ~ " X-Purge-CC);

Purge in vcl_fetch if the backend sent a X-Purge-URL header (weird thing to do, but fun example):

sub vcl_fetch {
(....)
if (obj.http.X-Purge-URL) {
purge("req.url ~ " obj.http.X-Purge-URL);
}
(...)
}

(PS: I have not actually tested all these examples, but they look correct)

7 responses to “Varnish purges

  1. Per Buer February 2, 2010 at 4:30 pm

    How can you match on User-Agent? Thats a request header.

  2. kristian February 2, 2010 at 5:02 pm

    Exactly. So while you CAN match on req.http.User-Agent, it will be the user-agent of whatever client happens to stroll by next time. If you want to match on the User-Agent of the request that pulled the object from the backend, you have to store the user-agent in obj. “set obj.http.X-Orig-User-Agent = req.http.User-Agent;” for instance, then match on obj.http.X-Orig-User-Agent.

  3. Yvan February 3, 2010 at 12:05 pm

    Thanks a lot for all this info on purging, Kristian. Currently, I use:
    if (req.request == “PURGE”)
    {
    purge(“req.url ~ ” req.url);
    return (lookup);
    }
    in vcl_fetch()

    I’ve got the code form the URL you’ve mentioned at the top of your article. Is the hostname used in such a case? Because in purge.list, I never see any hostnames.
    Your example above states:
    purge(“req.http.host == ” req.http.host ” && req.url ~ ” req.http.X-Purge-Regex);
    so I think there’s an issue in my code as I don’t have the host (of course I have multiple hostnames, else the question is silly).

    Another question:
    if you want to purge a whole host, is it better to do:
    purge(“req.http.host ~ example.com$”);
    or:
    purge(“req.http.host ~ example.com$ && req.url ~ ^/.*”);

    I think the first one should be faster.

    And finally, I want to track the number of purging rules (to see if there’s not too much rules). If I run purge.list, I can see 35 rules (I exclude the «G» rules, I think these are duplicates, else it would be 39 rules), but in the stats I can read:
    n_purge 43 . N total active purges
    n_purge_add 4844 0.00 N new purges added
    n_purge_retire 4801 0.00 N old purges deleted
    n_purge_obj_test 5206879 3.06 N objects tested
    n_purge_re_test 34006296 19.97 N regexps tested against
    n_purge_dups 1569 0.00 N duplicate purges removed

    What data contains the real active rules?

  4. kristian February 3, 2010 at 1:12 pm

    To the first question: No, the hostname isn’t used, so you want to add that. You may also want to rewrite the request from purge to GET before issuing lookup, or just issue «error 200 “Purge added”».

    To the second question: Using just req.http.host is faster, though I doubt you’d notice the difference.

    As for ‘G’-purges, those are ‘Gone’: They are just placeholders because an object points to them, but they will never be tested against. They will be removed once there are no objects referencing them. (This is why you always have at least one item on the purge list, even right after you start up).

    n_purge should contain the number of purges you have active, and you can see that if you subtract the retired purges from the added purges (4844 – 4801), you get n_purge (43). I can’t explain why you only see 39 purges, but if varnishstat and purge.list disagree on the number of purges, purge.list is probably correct.

  5. Yvan February 3, 2010 at 1:54 pm

    Thanks Kristian for your answers.

    I was using the code here: http://varnish-cache.org/wiki/VCLExamplePurging . But as the VCL is quite the same as the one you wrote in this article, is the sentence correct: «That means that if you now send a

    PURGE / HTTP/1.0
    Host: http://www.example.com

    to Varnish over port 80 (restricted to client with IP 192.0.2.14), your / document from the http://www.example.com website will be purged.»

    With your answer, I would guess that any «/» document will be purged, was it on http://www.example.com or not. If so, can someone fix the wiki? Along with your remark about the fact that it’s a purging command, so an error should be sent right away, or a GET should be used (that’s the core of your posting here).

    Thanks a lot!

  6. Pingback: Smart bans with Varnish « Kristian Lyngstol's Blog

  7. J C November 5, 2010 at 2:07 pm

    So is it correct to say that if an incoming request matches a purge (based on say, a header in the request), the cached object is purged for all clients? Or does by some (as yet to me) unknown mechanism varnish retain the object to use for requests that do not match?

Follow

Get every new post delivered to your Inbox.