Are you a Star (Wars|Trek) fan?

Imagine you are a definite Trekkie, and happen to share a computer with a Star Wars fan. Or the other way around. Whichever you prefer, I am not going to debate (in this post) which one is better. And on that computer, you have all episodes of Star Wars and all series of Star Trek. And you forget where these are saved on the computer, so you need to find them. But as the geek you already are, you want to find your favourite films without having to stumble upon any of your friend’s (or are they friends?!) films. This is where the labyrinthine game of regex golf comes into play.

In regex golf, you are meant to find the shortest possible regular expression that matches all terms in one set without matching any of the terms in another set. Just like code golf, where you have to create the shortest possible code to solve a given task. Just like in golf, where you are meant to visit all holes in as few shots as possible. So, given a set of Star Wars subtitles and one of Star Trek subtitles, can you write an as-short-as-possible regex to match only one of them?

Regex Golf

This regex, /m | [tn]|b/, which seems to be case-insensitive, dates back from when only six Star Wars episodes existed. Now that Episode VII has been released, let’s update the set to have the following:

  • “The Phantom Menace”
  • “Attack of the Clones”
  • “Revenge of the Sith”
  • “A New Hope”
  • “The Empire Strikes Back”
  • “Return of the Jedi”
  • “The Force Awakens”
  • “The Motion Picture”
  • “The Wrath of Khan”
  • “The Search For Spock”
  • “The Voyage Home”
  • “The Final Frontier”
  • “The Undiscovered Country”
  • “Generations”
  • “First Contact”
  • “Insurrection”
  • “Nemesis”
  • the one without a subtitle
  • “Into Darkness”

You will notice that the addition of “The Force Awakens” breaks the regex. This episode’s name is not captured by any of the three options given in the regex. So what’s the smallest change we can make to make the regex work again?

I found two equally long possibilities to this end: either we add an ‘a’ to the middle option, or replace the ‘b’ with the sequence ‘ke’.

/m | [tna]|b/
/m | [tn]|ke/

I’ll come back to this topic later on with a heuristic to find you an approximation of the shortest possible regex that you can have when playing golf.

🖖!

Leave a Reply

Your email address will not be published. Required fields are marked *