/var/log/mandarg

Mandar Gokhale's weblog on the internets

An Interesting Unix Puzzle

This post describes a rather intriguing Unix puzzle. What do you get when you run echo cat | sed statement? Clearly, the first part yields the string cat, but I don’t know sed well enough to get the answer off the cuff. Let’s see what does happen.

1
2
$ echo cat | sed statement
cement

Now that was a bit Unexpected. So somehow, the at part of cat is being replaced by ement from statement.

Now, due to the t’s, I noticed that this look very similar to:

1
s/at/ement

which looks pretty familiar! So maybe you can use arbitrary delimiters, and the ‘t’s are these?

Looking at the section of the sed info page Section 3.5 seems to confirm this:

3.5 The `s’ Command
===================

The syntax of the `s` (as in substitute) command is
`s/REGEXP/REPLACEMENT/FLAGS`. The `/` characters may be uniformly
replaced by any other single character within any given `s` command.

This, like a lot of other sed quirks, is news to me.

So, I suppose I was a bit wrong earlier – all of the ts are being used as delimiters instead of the usual /, and it’s just the a in cat that’s being replaced with an ement.

Our puzzle is thus equivalent to:

1
echo cat | sed 's/a/emen/'

..which, since it is a simple replace of a with emen, will give cement.

Just in order to make sure we have the right explanation, we can try using some random character, say #, as the delimiter.

1
2
$ echo cat | sed 's#a#emen#'
cement

Yep, it works!

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Natural Language and Scripting

This post by Justin Duke reminded me of something unusual I have been doing recently.

A couple of weeks ago, I reflected on the fact that we use a considerable degree of restriction when “talking” to our phones (something that I do very little of). For example, we will make sure to say, “alarm”, or “wake me” — and we usually specify AM or PM as well. Something like, “Ok Google, make sure I get up at 8, and DON’T allow me to snooze” is probably not going to work.

[As an aside, apart from setting alarms, I’m not a big user of voice recog via “Ok, Google”. I’m secretly worried that their voice recognition will fail catastrophically at some point, and end up ordering 50 inflatable pink hippopotamuses from Google Express. Thankfully, this hasn’t happened. Yet.]

So, my revolutionary thought goes, why not do something similar and use more natural language for scripts? To try this out, I rewrote a bunch of scripts that need me to type in well-formed phrases or sentences into the terminal. Now I understand that for the seasoned Unix user who can recite the flags to tar off the top of their head, this will seem like absolute heresy – but it appears to work well for me.

xkcd: tar (via xkcd.com)

For example, we have a bunch of common lab resources at work that we ‘reserve’ for ourselves, or free into a common pool as necessary. A script, for example, to take a resource from User A to User B might be invoked something like this: ./change_script user1 user2 …and I can never remember which one is the “from” user is and which one is the “to” user.

Since I am usually the “from” user while taking and the “to” user while giving, I simplified this into two simple scripts that are invoked as I would say them in real life.

1
2
3
4
5
# Take a resource from someone else's pool
take [resource]

# Give a resource to someone else from your pool:
give [resource]

For some more complicated scripts, I introduced semantic-only arguments, that made my script invocation look as follows:

1
put [resource] in group X

(which traditionally would have been called with something like

1
put --group X [resource]

I am going to try and keep this up for my own scripts. I would take the overhead of typing a new word over the overhead of remembering how a script is called, or (way worse) reading it and figuring out which argument goes where.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

DNS Trivia: Country Codes and the .gb Domain

This week, we were discussing ccTLD (country code Top Level Domain) confusion at work – specifically in the context of the .om toplevel domain, which apparently is used for tricking people into visiting foo.co.om instead of foo.com.

This led me to dig a little more into how exactly the ccTLDs for each country are assigned. RFC 1591 tells us that they are taken straight from ISO-3166. However, as always, there are exceptions. There are a handful of unused ones from small, obscure regions round the world (including, amusingly, .um and .eh) – but there is one rather large exception. The ISO-3166 country code for the United Kingdom is GB, but the domain used by them is the well-known .uk.

You’d think the story ends there, but it doesn’t. Apparently .gb was created and used at some point, and so it is still available from DNS root (although not open for registration). I think this technically makes the UK the only country to have two ccTLDs. And there is a dwindling number of servers out there that still answer serve DNS records over the .gb domain.

1
2
{~}$ dig +noall +answer A delos.dra.hmg.gb
delos.dra.hmg.gb. 86400   IN  A   146.80.9.105

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

How to Renew Your StartSSL Client Certificate

StartSSL offers free SSL certificates that are valid for a period of one year. There are several guides on how to renew your server’s SSL cert every year. However, something that is glossed over a little bit in these articles is that StartSSL uses S/MIME Client Certificates for authentication that also have a validity of one year, and these need to be renewed as well. [1]

We’ll skip ahead to the steps in a little bit, but firstly, what is a client certificate? This site goes into a decent explanation of the UX problems with client certificates.

Talking of UX problems, the StartSSL procedure for renewing your client cert is pretty clunky as well. Two weeks before your client certificate expires, they send you an email about it. So when you get a mildly scary-looking mail that goes,

This mail is intended for the person who owns a digital certificate
issued by the StartSSL™ Certification Authority (http://www.startssl.com/).

The Class 1, client certificate for StartCom Free Certificate Member
and serial number CCAEB is about to expire in about two weeks. Please log into the StartSSL Control Panel at https://startssl.com/Certificates and get a new certificate for this purpose.

Failing to update your client certificate might result in the
loss of your account.

The confusing thing here is that this is also often the date of expiry for your server’s certificate, if you created your client cert and generated a server SSL cert around the same time.

So here’s what you do after getting said email:

  1. Go to “Certificates Wizard” and in “Select Certificate Purpose”, select “Client S/MIME and Authentication Certificate”.

Client Certificate Selection

[Aside: This is my first gripe with this workflow. It could be improved to a simple “Create New Client Authetication Certificate” button that just automatically takes you to the menu.]

  1. To create a new Client Certificate, you have to generate a Certificate Signing Request. I don’t understand this process in depth, but basically, you give them a private key encrypted with a password and they sign it and give you a certificate. This is your client certificate - the next time you log into startssl.com, the site looks at it and says, “Hey, we can verify that this has been signed by Mandar!”. You can generate the private key in one of two ways.

    • By using the website itself (this is part and parcel of the CSR if you’re using the website, so just go to the next step).

    • From the command line: Run openssl genrsa -aes256 -out yourdomain.com.key 2048. This will generate a file called yourdomain.com.key that is encrypted using a passphrase (which you will be prompted for).

  2. Make a Certificate Signing Request (CSR): This is the part where you tell startssl.com “Hey look, this is a cert signed by me! Validate it please!”. Again, you can do this one of two ways, corresponding to the two ways you generated the cert before.

    • From the website itself.
      • Enter your email (there might be some sort of a validation process here, where they send a unique code to your email that you have to then confirm.

        • Select “Generated by PKI system”

        CSR Request from website

        • Click “Submit”
      • From the command line: Run the following: (you will get some prompts for Country, Locality etc. – fill those in according to the details you used while signing up for StartSSL). openssl req -new -sha256 -key yourdomain.com.key -out yourdomain.com.csr

  3. Submit your CSR. If you used the website, you should be prompted to download a .key file, which is your private key. Download this, and hit “submit”. This will give you a download of two .crt files.

  4. Generate PKCS file for export. Again, you can do this one of two ways, corresponding to the ones mentioned in step 4.

    • Using the website itself. If you paste in your key and CSR, the website will generate a .pfx file that you can copy to your computer. On OSX, double clicking this will add your client cert to the system Keychain.

    • On your own from the command line. The download in Step 4 will give you an intermediate .crt file called 1_Intermediate.crt, and file, called 2_you@yourdomain.com.crt. Running the following should get you a newcert.pfx file that you can drag this to your login keychain in Keychain Access (OSX) or double-click on (Ubuntu).

1
openssl pkcs12 -export -out newcert.pfx -inkey ssl.key -in 2_you@yourdomain.com.crt -certfile 1_Intermediate.crt

So there. Once you have your client certificate installed, you can then go to https://startssl.com and log in to generate a new cert for your website with a year’s validity.

[1] I know the Let’s Encrypt project is making this process simpler. It’s in public beta now, so I’ll probably switch over at some point.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Jumble

Dr. Drang’s post referencing puzzle-based games reminded me of a word-puzzle game I used to play a lot and haven’t in a long time.

Jumble was a game that was published first weekly, then daily in the Indian Express, the newspaper that I read daily as a kid. It consists of four anagrams – two five-letter and two six-letter ones. You unscramble the four words, and the circled letters form a larger phrase, that you unscramble from the clue given in the cartoon. Apparently, it has been around since 1954, and was created by a chap who also did covers for DC Comics.

Along with reading Tintin, Hagar the Horrible, and Beetle Bailey, I spending ages trying to figure some of these anagrams out (and sometimes being hauled away to study instead). Sometimes I did not succeed, and would eagerly await the answer in the next day (or week)’s newspaper. During some of the long, hot Indian summers when it was too hot to play outside, and not enough books to read inside, they kept me entertained In any case, it was fun, and also a decent tool for someone learning the English language.

Today's Jumble! game

Several years later, I found them on the web. For some reason, they seem to have gotten a tad easier as an adult. But not by much.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Absolute / Fully Qualified Domain Names

Type in what you think is the domain name for Google into your browser. I bet you typed www.google.com, or for the no-www folks, google.com.

The correct answer, as far as I know, should be www.google.com., though (or google.com.) – the difference being the terminal dot at the end. According to RFC 1035, an “absolute” domain name (also later referred to as a Fully Qualified Domain Name) should have a terminal dot, in order to prevent path spoofing. RFC 1738, which defines a Common Internet Scheme Syntax, explicitly says that the “host” portion of a URL should be an FQDN. From what I understand (and after confirming with some light testing), most stub resolvers will basically interpret any domain with a dot (not just a trailing one) as an FQDN. This seems to work fine in most cases.

Background: I stumbled across this while casually putting a trailing dot at the end of some random sites to see if resolution works for them. There are interesting results when this interacts with SSL certs, or with CDNs like CloudFront. I was somewhat surprised to see that Amazon India, for example, does not seem to resolve properly for either www.amazon.in. or amazon.in.. I don’t imagine this impacts too many people though.

My website handles trailing dots just fine.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Worst Analogies

Today, while reading Plato and a Platypus Walk into a Bar (which was rather fun), I came across a section with selections from the Worst Analogies ever written in a High School Essay Contest.

I had to dig up the entire list online, and here it is. I find them delightful, and of course will find excuses to use them throughout 2016.

Some of the good ones:

McBride fell 12 stories, hitting the pavement like a Hefty Bag filled with vegetable soup.

__

The red brick wall was the color of a brick-red Crayola crayon.

__

Her date was pleasant enough, but she knew that if her life was a movie this guy would be buried in the credits as something like “Second Tall Man.”

And my absolute favourite,

Long separated by cruel fate, the star-crossed lovers raced across the grassy field toward each other like two freight trains, one having left Cleveland at 6:36 p.m. traveling at 55 mph, the other from Topeka at 4:19 p.m. at a speed of 35 mph.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Cichetti

How to have a decent meal of cichetti and fragolino in Venice:

  • Completely ignore the hotel checkout lady who spent half an hour screaming at someone upstairs. Idly wonder what happened to the much nicer lady from yesterday who overused her “allora”s.) Signora Grumpi will claim that it’s near impossible to find any fragolino in the city and that it is extremely rare.

  • Spend some time on your second day wandering around the city in 35°C and 99% humidity. Express half-hearted surprise at how on earth this is a such a popular tourist destination. Tell yourself that the Bridges of Königsberg have nothing on this place. Wonder if you should ride the vaporetto on the canal again like last evening.

Quite relaxing, at that.

  • After roaming around the city in the morning, ask various places if they have cichetti or fragolino. Get mostly negative responses and “Why are you wasting my time instead of asking for the tourist trap spaghetti and mealballs?” faces. Begin to believe Sig. Grumpy might have been right. Realize that it’s a Sunday and most bacari are probably closed.

  • Come to a place that actually has a sign saying Cichetteria near the door. Ask about fragolino in broken Italian, receive a surprised “Sì!”.

  • Have a very reasonably priced meal full of bites of:

    • Baccalà — creamed salt cod)
    • Sepia (yep, the ink) — Cuttlefish cooked in its own ink, on a bit of polenta.
    • Potato-tuna balls – I forget what these are called.
    • Bruschetta

    Repeat as necessary

  • Wash down with glass of fragolino, complete with strawberry aftertaste.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Simple Shell Scripts and Automator

Being a Linux user, I often end up automating small actions in OSX, and then completely forgetting the slightly idiosyncratic way of doing said automation. Hence, this post – largely to keep in my list of things to remember; but also because someone else looking for a very basic example may find it useful.

Having a stamp of the current time at your fingertips is always a useful thing. While logging, org-mode, etc. do these automatically, sometimes you need to log stuff in a text file like “Testing of this problem done at 2015-07-31 19:06:34”, etc. So it’d be nice to have a way to quickly say, ‘insert a timestamp for the current time’. So far as I know, OSX (as of Yosemite) has no built-in way to do this.

It does, however, have the excellent Automator though. So, what we need to do is create an Application in Automator that automatically invokes the relevant bash snippet in a shell script, like so:

Add shell script using Automator

I could assign a keyboard shortcut to this using the built in System Preferences, but I find it pretty difficult to remember arbitrary shortcuts for every such script. Instead, I stuck it in Applications, and my faithful butler Alfred remembers it instead. Thus typing Command + Space, followed by ts or even just t (depending upon how frequently I’m inserting timestamps) tends to suffice.

Alfred tab completion

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com

Sites on S3 and Accept-Encoding

While working on a chat bot that parses cat picture links on the interwebs, a co-worker found that a curl to this URL generated a bunch of gzip data, rather than the expected text.

1
2
3
4
$ curl -sI http://www.banyanops.com/blog/analyzing-docker-hub/\
                        -H 'Accept-Encoding: identity;q=2, gzip;q=0' \
                        | grep -i 'Content-Encoding'
Content-Encoding: gzip

Apparently, this is an Amazon S3 server.

1
2
$ curl -sI http://www.banyanops.com/blog/analyzing-docker-hub/ | grep Server
Server: AmazonS3

What’s going on here? We even explicitly said we don’t want gzip encoding by using a qvalue of 0, but we’re getting it anyway. You’d think this is an RFC violation of some sort.

However, RFC2616 says (Section 14.3):

1
2
3
4
If an Accept-Encoding field is present in a request, and if the
   server cannot send a response which is acceptable according to the
   Accept-Encoding header, then the server SHOULD send an error response
   with the 406 (Not Acceptable) status code.

Note the SHOULD, not MUST. That is, the server is can send whatever they feel like, but ideally they should send back a 406 Not Acceptable.

But wait! RFC 2616 is technically obsolete, so let’s try and find out what the new ones say. The best I could find was Section 5.3.4 in RFC 7231, which says:

1
2
3
4
If an Accept-Encoding header field is present in a request
   and none of the available representations for the response have a
   content-coding that is listed as acceptable, the origin server SHOULD
   send a response without any content-coding.

So I guess S3’s behaviour in this case is legit according to both the new and old HTTP specs, (since neither specify MUST), but the SHOULD condition seems to have changed between 2616 and 7231. That is, we really should have gotten plain text, but not doing so does not violate the spec.

This is exactly what one of Amazon’s AWS people (presumably) seems to say in response to this thread when asked about the issue.

Today, Amazon S3 does not dynamically compress or decompress objects. If you store compressed content, Amazon S3 will serve compressed content. If you store noncompressed content, Amazon S3 will serve noncompressed content. This is not a violation of the HTTP specification. An HTTP-compliant server can serve compressed content in response to a request that didn’t include an Accept-Encoding: gzip header.

…which, I guess is fair enough, and has sufficient clarity.

By the way, a simple solution? My one would be looking at the resulting Content-Encoding, and conditionally piping the output through zcat. This works for the above example anyway.

❧ Suggestions, comments, etc. can be emailed to comments@mandarg.com