Setting up Squid Caching in a ZEO Cluster

Meta:

Valid for:  Silva 0.9
Author:     Jan-Wijbrand Kolman
Email:      jw@infrae.com
CVS:        $Revision: 1.4 $ $Date: 2002/12/23 20:52:08 $

Introduction

These notes were collected during a Squid[1]/Apache/HTTPS/ZEO[7] Cluster setup process to use Silva in a cached, secured and clustered environment.

Although these notes try to assemble a concise set of instructions as accurately as possible, they do require above average knowledge and experience of Apache (including HTTPS) and Zope. Squid knowledge is highly recommended.

Feedback and/or corrections are welcome.

Problem description, Goals

Complex web applications may put a strain on system resources, decreasing performance. A possible solution to increase performance on web applications is to cache content, provided this content is static to a certain degree (e.g. in time spans of minutes, possibly hours, maybe even longer).

Squid can provide such a cache. It can act as a frontend server for underlying application servers. Each web request will be handled by the cache system: it checks whether the requested object is in cache and not yet expired. If this is indeed the case, this object will be served from cache. If not, the request will be forwarded to the web application backend, which will compute the object. Squid, then, stores this object (if certain criteria are met) for consecutive requests, and serves it.

In a clustered environment, a Squid cache on one cluster node is able to communicate with the caches on sibling nodes. This helps spread the load of the web application over the different node even more - only one node needs to compute a requested object, while all other nodes may keep this object in cache.

We will setup Squid so it will:

We also will setup the web application backend so it will:

Requirements

Squid configuration

A minimal Squid configuration ([2] and [3]) follows. This configuration ignores most configuration options for tuning cache performance, file locations, RAM usage etc., which are not in the scope of this document:

## Squid port
http_port 80

## ACL's taken from standard Squid conf
acl all src 0.0.0.0/0.0.0.0
acl localhost src 127.0.0.1/255.255.255.255

## ACL for cache peers in network:
acl peers_src src node-1.domain.tld node-2.domain.tld \
                                             ... node-N.domain.tld
acl peers_dst dst node-1.domain.tld node-2.domain.tld \
                                             ... node-N.domain.tld

## ACL for public 
acl public_access dst virtualhost-1.domain.tld virtualhost-2.domain.tld \
                                           ... virtualhost-N.domain.tld
acl public_access_port port 80 8080

## Define cache siblings. Comment the lines which point to cache
## node "itself":
##cache_peer node-1.domain.tld sibling 80 3130 no-digest proxy-only
##cache_peer_access node-1.domain.tld allow public_access
##cache_peer_access node-1.domain.tld deny all

cache_peer node-2.domain.tld sibling 80 3130 no-digest proxy-only
cache_peer_access node-2.domain.tld allow public_access
cache_peer_access node-2.domain.tld deny all

...

cache_peer node-N.domain.tld sibling 80 3130 no-digest proxy-only
cache_peer_access node-N.domain.tld allow public_access
cache_peer_access node-N.domain.tld deny all

## Allow ICP communication between cache peers
icp_access allow peers_src

## FIXME: not sure about this option
prefer_direct off

## Host being accelerated
httpd_accel_host 127.0.0.1
httpd_accel_single_host on
httpd_accel_port 8080

## Proxy on, needed to make cache peers intercommunicate
## Without proper security measures, this could result in an "open
## proxy". 
httpd_accel_with_proxy on

## Keeps information for Apache; needed for virtual hosting       
httpd_accel_uses_host_header on

## Final access control
http_access allow peers_src peers_dst
http_access allow public_access public_access_port
http_access deny all

HTTP Response headers

The HTTP protocol [8] defines several response headers to instruct intermediate caches what to do with requested objects.

The values for these HTTP headers need experimentation and are highly dependent on the nature of the web application, the content served, expected request patterns, cluster setup and estimated use of resources (CPU, memory, network capacity, redundancy, etc.).

HttpHeaders Examples (in Zope Page Template tal expressions):

Where "max-age" is in seconds and the "Expires" date expects a RFC1123 formatted string [9].

If both headers are present, the "max-age" overides "Expires" in any case.

Caveats / ToDo

References

[1] Squid Caching Proxy

[2] Squid configuration

[3] Squid User Guide

[4] HTTP Caching and Zope

[5] Squid as an Accelerator for Zope

[6] VirtualHosting, HTTPS, Apache and Zope

[7] ZEO Clusters

[8] RFC 2616 HTTP/1.1

[9] RFC 1123 Requirements for Internet Hosts, Application and Support

Copyright © 2002-2004 Infrae. All rights reserved.
See also "LICENSE.txt" in the Silva package.