Many years ago, after working in my first programming job for a couple of years the company was taken over, and coding tests for new hires were introduced. The incumbent developers all decided to take the test, and it was seen as a fun diversion for a couple of hours.
I don’t have access to the actual wording of the requirements given to candidates, but the test required a text file containing around 100k words to be loaded and sorted into the largest set of the longest anagram. For example in the words file I’m using in this blog post, there are 466544 words in the file, 406627 of which are anagrams. The largest set is for a 7 letter anagram, of whih there are 15 words. There are smaller sets of longer anagrams, we’re not interested in those. And, it had to run in in less a second. They had three hours to write it, on a computer not connected to the internet. They had access to Java, through Eclipse, C/C++/C# through Visual Studio and Delphi through Embarcadero Studio.
I don’t know where the test originally came from - I think it originated in a different company which had been acquired by the same company I now worked for, but I’m not sure. I think the intent of the test was to in part gauge how the candiate reacted to the deadline pressure, part how they could understand the requirements given to them, and lastly what sort of code they wrote.
As it has been a long time and the company no longer recruits after moving most development overseas, so, I’m going to present my solution.
The Solution
First we have to load the file, and figure out to generate the anagram and keep track of how many instances of that anagram there are. It turned out for the candidates taking the test that this was the bit that most got stuck on, specifically the short mental leap it took to working out you needed to sort the letters of the word alphabetically to create the key.
words
is a Dictionary<string, List<string>>
, which we use to track the count of anagrams. The rest of the file loading is a fairly standard while
loop over the reader ReadLine
method, checking the dictionary to see if the anagram has already been found, and if so add the new word to the set, otherwise, add the anagram and create a new list to hold the word(s).
Once we have all the words loaded and matched into sets of anagrams, we can process them to work out which is the largest set with the longest word.
Here we simply bruteforce check all of the entries in the dictionary to find the answer. It’s not elegant, but it gets the job done. Running it on my Macbook Pro gives: