While working on a chat bot that parses cat picture links on the interwebs,
a co-worker found that a
curl to this URL generated
a bunch of gzip data, rather than the expected text.
1 2 3 4
Apparently, this is an Amazon S3 server.
What’s going on here? We even explicitly said we don’t want gzip encoding by using a qvalue of 0, but we’re getting it anyway. You’d think this is an RFC violation of some sort.
However, RFC2616 says (Section 14.3):
1 2 3 4
Note the SHOULD, not MUST. That is, the server is can send whatever they feel
like, but ideally they should send back a
406 Not Acceptable.
1 2 3 4
So I guess S3’s behaviour in this case is legit according to both the new and old HTTP specs, (since neither specify MUST), but the SHOULD condition seems to have changed between 2616 and 7231. That is, we really should have gotten plain text, but not doing so does not violate the spec.
This is exactly what one of Amazon’s AWS people (presumably) seems to say in response to this thread when asked about the issue.
Today, Amazon S3 does not dynamically compress or decompress objects. If you store compressed content, Amazon S3 will serve compressed content. If you store noncompressed content, Amazon S3 will serve noncompressed content. This is not a violation of the HTTP specification. An HTTP-compliant server can serve compressed content in response to a request that didn’t include an Accept-Encoding: gzip header.
…which, I guess is fair enough, and has sufficient clarity.
By the way, a simple solution? My one would be looking at the resulting Content-Encoding, and conditionally piping the output through
works for the above example anyway.
❧ Suggestions, comments, etc. can be emailed to