Proxy Cache - Van-Pool for the Web

Do you wonder why your T-1, cable or ISDN connection is sometimes as slow as 28.8? Have you ever wondered what happens when you click in Netscape? Read on.

In the summer of '96, thousands of you probably followed some of the Olympics on the Web. Thousands of identical copies of the same pages traversed a dozen computers from Atlanta to Vancouver. Does something about this strike you as crazy? There is an alternative; it's been part of the Web protocol for a long time but is only now becoming widely used. This is the concept of proxy cache.

Most modern browsers incorporate a local cache; in Netscape if you open the document "about:cache" you can see the current state of yours. This uses some of your hard disk to store pages and images you've seen. If you reload one of these pages, Netscape will issue a Get-If-Modified-Since request, reloading the whole file from the server only if it has changed. This is a great improvement over the original browsers, but still short of the ideal.

Proxy Cache is essentially a simple concept. Suppose that each member of your family has their own computer, you have Ethernet around your house and an ISDN line to an ISP. If each person selects the CNN homepage, several identical copies will be transferred over the ISDN at 128kbps. With a proxy cache; the first person will get the page at 128kbps. The rest will get it at 10Mbps over Ethernet. The same argument applies to an office with a 100Mbps LAN and a T-1 connection to the net.

Hierarchical Cache Servers

The use of cache outline above provides significant benefit where there is a reduction of bandwidth - going from a LAN to WAN, for instance. Cache servers can be used in a more sophisticated way. They can be configured not to request a document directly from the origin server, but from a parent or neighbour cache. Again, cache placed where there is a reduction in bandwidth, such as at national or ocean boundaries, can provide great benefits. NLANR in the US has set up such a scheme, as have various organizations in Europe and elsewhere. A browser requesting a page from Japan might look first in its local cache, then in a LAN cache, then in several neighbour caches, then in a national cache, before getting the page from the overseas server. Properly used, these schemes can turn the Infohighway from a two-lane road to a six-lane expressway.

Controlling Cache

Using cache requires a certain amount of thought, or else pages can be refreshed either too slowly or too rapidly. The http Expires header on a document explicitly gives its expiry date, causing caches to delete it at a specific time. A new request will then reload the updated document. If the Expires header is zero or "now", the document cannot be cached. This feature is supported in some servers such as Apache 1.1.1 and the CERN httpd, and can be used in CGI scripts. If the Expires header is not present, the cache guesses the expiry date based on the documents age. In any case, Reload in Netscape and other browsers will cause the document modification date to be checked. Shift-Reload in Netscape will unconditionally get a new document from the origin server.

A online version of this article with links is available at

Further Reading

Cache Now!

A.Daviel, Vancouver Webpages