/var/log/mandarg

Mandar Gokhale's weblog on the internets

Sites on S3 and Accept-Encoding

While working on a chat bot that parses cat picture links on the interwebs, a co-worker found that a curl to this URL generated a bunch of gzip data, rather than the expected text.

1
2
3
4
$ curl -sI http://www.banyanops.com/blog/analyzing-docker-hub/\
                        -H 'Accept-Encoding: identity;q=2, gzip;q=0' \
                        | grep -i 'Content-Encoding'
Content-Encoding: gzip

Apparently, this is an Amazon S3 server.

1
2
$ curl -sI http://www.banyanops.com/blog/analyzing-docker-hub/ | grep Server
Server: AmazonS3

What’s going on here? We even explicitly said we don’t want gzip encoding by using a qvalue of 0, but we’re getting it anyway. You’d think this is an RFC violation of some sort.

However, RFC2616 says (Section 14.3):

1
2
3
4
If an Accept-Encoding field is present in a request, and if the
   server cannot send a response which is acceptable according to the
   Accept-Encoding header, then the server SHOULD send an error response
   with the 406 (Not Acceptable) status code.

Note the SHOULD, not MUST. That is, the server is can send whatever they feel like, but ideally they should send back a 406 Not Acceptable.

But wait! RFC 2616 is technically obsolete, so let’s try and find out what the new ones say. The best I could find was Section 5.3.4 in RFC 7231, which says:

1
2
3
4
If an Accept-Encoding header field is present in a request
   and none of the available representations for the response have a
   content-coding that is listed as acceptable, the origin server SHOULD
   send a response without any content-coding.

So I guess S3’s behaviour in this case is legit according to both the new and old HTTP specs, (since neither specify MUST), but the SHOULD condition seems to have changed between 2616 and 7231. That is, we really should have gotten plain text, but not doing so does not violate the spec.

This is exactly what one of Amazon’s AWS people (presumably) seems to say in response to this thread when asked about the issue.

Today, Amazon S3 does not dynamically compress or decompress objects. If you store compressed content, Amazon S3 will serve compressed content. If you store noncompressed content, Amazon S3 will serve noncompressed content. This is not a violation of the HTTP specification. An HTTP-compliant server can serve compressed content in response to a request that didn’t include an Accept-Encoding: gzip header.

…which, I guess is fair enough, and has sufficient clarity.

By the way, a simple solution? My one would be looking at the resulting Content-Encoding, and conditionally piping the output through zcat. This works for the above example anyway.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com