Proxy Cache - Van-Pool for the Web
Do you wonder why your T-1, cable or ISDN connection is sometimes as
slow as 28.8? Have you ever wondered what happens when you click
in Netscape? Read on.
In the summer of '96, thousands of you probably followed some of the Olympics
on the Web.
Thousands of identical copies of the same pages traversed
a dozen computers from Atlanta to Vancouver. Does something about this
strike you as crazy? There is an alternative; it's been part of the
Web protocol for a long time but is only now becoming widely used.
This is the concept of proxy cache.
Most modern browsers incorporate a local cache; in Netscape if you open
the document "about:cache" you can see the current state of yours.
This uses some of your hard disk to store pages and images you've seen.
If you reload one of these pages, Netscape will issue a Get-If-Modified-Since
request, reloading the whole file from the server only if it has changed.
This is a great improvement over the original browsers, but still short of the ideal.

Proxy Cache is essentially a simple concept. Suppose that each member of
your family has their own computer, you have Ethernet around your house
and an ISDN line to an ISP. If each person selects the CNN homepage,
several identical copies will be transferred over the ISDN at 128kbps.
With a proxy cache; the first person will get the page at 128kbps. The
rest will get it at 10Mbps over Ethernet. The same argument applies
to an office with a 100Mbps LAN and a T-1 connection to the net.
Hierarchical Cache Servers

The use of cache outline above provides significant benefit where
there is a reduction of bandwidth - going from a LAN to WAN, for instance.
Cache servers can be used in a more sophisticated way. They can be
configured not to request a document directly from the origin server, but from
a parent or neighbour cache. Again, cache placed where there is a
reduction in bandwidth, such as at national or ocean boundaries, can
provide great benefits.
NLANR in the US has set up such a scheme,
as have
various organizations in Europe and elsewhere. A browser requesting a page
from Japan might look first in its local cache, then in a LAN cache, then
in several neighbour caches, then in a national cache, before getting the page
from the overseas server. Properly used, these schemes can turn the
Infohighway from a two-lane road to a six-lane expressway.
Controlling Cache
Using cache requires a certain amount of thought, or else pages can be
refreshed either too slowly or too rapidly. The http Expires header
on a document explicitly gives its expiry date, causing caches
to delete it at a specific time. A new request will then reload the
updated document.
If the Expires header is zero or "now", the
document cannot be cached.
This feature is supported in some servers such as Apache 1.1.1
and the CERN httpd, and can be used in CGI scripts.
If the Expires header is not present, the cache guesses the expiry date
based on the documents age. In any case, Reload in Netscape
and other browsers will cause the document modification date to be
checked. Shift-Reload in Netscape will unconditionally get a
new document from the origin server.
A online version of this article with links is available at
vancouver-webpages.com/proxy/.
Further Reading
A.Daviel,
Vancouver Webpages