4.1. HTTP caching
4.1 Problem
Originally ripped off from Using caching to reduce database load and bandwidth usage, since expanded.
In modest applications, sending entire images across the HTTP connection is not usually a problem, nor is retrieving entire images from the database. However, in order to scale with higher loads, the added bandwidth usage and database load becomes an issue. Not just your bandwidth, either: remember that clients need to download whatever you’re sending, adding to page load times.
The HTTP protocol provides a solution: caching. Not only does caching make the application play nicely with browsers; as a well-behaved ‘net citizen you are leveraging the entire HTTP infrastructure—external caches, proxy servers, automatic download tools, search engine spiders all benefit from this additional intelligence.
HTTP caching is a large subject fraught with ambiguities and potential gotchas. It’s also quite wide in scope. In this example we are simply interested in avoiding sending an image if the other end already has a copy.
4.2 Solution
Timestamp the model
The first step is to add a timestamp property to the model. In \PostgreSQL this would typically be declared like this:
create table photos
(
...
"updated_at" timestamp not null default now()
);
Here we used a magical field name, so Rails will automatically handle the timestamping.
Control the controller
The controller will determine whether to send the image or direct the browser to use it’s cached version instead. This is done based on an optional HTTP_IF_MODIFIED_SINCE header that is provided by the browser.
def image
@photo = Photo.find(@params['id'])
minTime = Time.rfc2822(@request.env["HTTP_IF_MODIFIED_SINCE"]) rescue nil
if minTime and @photo.updated_at <= minTime
# use cached version
render_text '', '304 Not Modified'
else
# send image
@response.headers['Content-Description'] = @photo.description
@response.headers['Last-Modified'] = @photo.updated_at.httpdate
send_data @photo.data, :type => @photo.mimeType, :disposition => 'inline'
end
end
4.3 Discussion
The model—Easy timestamping
If we use update_at, or update_on as field names in our table then we benefit from an ActiveRecord automatic behaviour where any updates to our model result in updates to the timestamp field. See the timestamp API for more details.
The controller
Now in the image controller, we need to look at the If-Modified-Since header and see if the client’s timestamp is older than our timestamp. If it is, we send the entire image as usual; but if it’s not, we send nothing—just an HTTP status code indicating that the image has not been modified.
Got a date?
After loading the model, we read the relevant header: minTime = Time.rfc2822(
@request.env["HTTP_IF_MODIFIED_SINCE"]
) rescue nil
The date given in the header is in a particular format1 that is parsed by Time.rfc2822 which returns a Time object. Note the hanging rescue to catch any exceptions parsing the given date, including the absence of the header(not sent by the browser).
To send or not to send?
The decision to send the image or not is based simply on whether the model is newer than the date given by the browser. if minTime and @photo.updated_at <= minTime
# tell browser to use cached version
else
# send image
end
Sending HTTP status codes
HTTP servers use status codes2 to determine the result of a browser request. Most of Rails render methods take an (optional) HTTP status parameter. If no return code is given, it will default to “200 OK”.
render_text '', '304 Not Modified'
The return code 304 Not Modified tells the browser to use its cached copy of the resource.
Sending a Last-Modified date
We should also provide the client with the correct date regarding when the model was last updated. We do this by sending a Last-Modified header with the appropriately formatted date.
@response.headers['Last-Modified'] = @photo.updated_at.httpdate
Test it out
Now let’s test the image using curl:
$ curl --head http://localhost/photos/image/1 | grep Last-Modified
HTTP/1.1 200 OK
Date: Thu, 13 Jan 2005 21:51:43 GMT
Server: Apache/1.3.31 (Debian GNU/Linux) mod_fastcgi/2.2.10 PHP/4.3.4
Set-Cookie: _session_id=d7c938ffe6179dcb904d39d8d5604b2f; path=/
Last-Modified: Thu, 13 Jan 2005 21:51:41 GMT
Content-Transfer-Encoding: binary
Content-Length: 26811
Content-Description: Test image
Content-Type: image/jpeg
Let’s see what happens when we pass the If-Modified-Since header to Rails, inputting the timestamp returned above.
$ curl --time-cond "Thu, 13 Jan 2005 21:51:41 GMT" http://localhost/photos/image/1 | grep Last-Modified
* About to connect() to foo port 80
* Connected to localhost (127.0.0.1) port 80
> GET /photos/image/1 HTTP/1.1
User-Agent: curl/7.11.2 (i386-pc-linux-gnu) libcurl/7.11.2 OpenSSL/0.9.7d ipv6 zlib/1.2.2
Host: localhost
Pragma: no-cache
Accept: */*
If-Modified-Since: Thu, 13 Jan 2005 21:51:41 GMT
< HTTP/1.1 304 Not Modified
< Date: Thu, 13 Jan 2005 21:54:38 GMT
< Server: Apache/1.3.31 (Debian GNU/Linux) mod_fastcgi/2.2.10 PHP/4.3.4
< Set-Cookie: _session_id=0b9e7441f12d9c36f4397378018f2f7d; path=/
< Cache-Control: no-cache
< Content-Type: text/html; charset=iso-8859-1
* Connection #0 left intact
* Closing connection #0
Note that this example is somewhat artificial, because Rails is returning a Set-Cookie header, which technically speaking conflicts with the 304 response. However, this is a minor problem, and subseqent responses—once the cookie has been set in the client’s browser—will be valid.
References and Resources
1 RFC2822