Like a lot of people, I’ve been playing Wordle lately. If you’re unfamiliar with Wordle, it’s a website/game where, once per day, you try to guess a five-letter word in six guesses. Kind of a cross between scrabble and hangman. It’s surprisingly addictive.
Perhaps the most important step in Wordle is to have a first word that will probably match something, and I decided to try to algorithmically determine the absolute best starting word.
- If you don’t want to know what the best word is, read no further.
- If you want to just know what the word, but don’t care how I determined that, just scroll to the end of this post.
- If you want to know how I determined the best word, proceed!
What I Built
With that out of the way, let’s get to business. Over the weekend I wrote a little script to determine the best word to use as your first guess. When that was done, I kept fiddling with the script and it kind of evolved into a Text Based Adventure To Solve Wordle.
If you’re interested in running this, you’ll need to be someone who understands Node.js, Git, and NPM. If that’s not you, sorry! If that is you, all you’ve got to do is:
git clone firstname.lastname@example.org:kiprobinson/wordle-solver.git cd wordle-solver npm install npm start
Then just follow the instructions.
How It Works
The algorithm I came up with is:
- Start with a list of all five-letter English words.
- Look at all of those words, and count how frequently each letter appears in the whole word list, and also how frequently it appears in each individual position.
- Go through the whole list and give it a score which is the sum of
- The likelihood of each letter appearing in its position (i.e. 1st letter, 2nd letter, etc.)
- The likelihood of each unique1 letter appearing anywhere in the word.
So let’s say the word to rate is HELLO. The algorithm gives it a score of:
(freq of H in 1st letter = 3.93) + (freq of E in 2nd letter = 11.486) + (freq of L in 3rd letter = 7.157) + (freq of L in 4th letter = 6.31) + (freq of O in 5th letter = 2.859) + (freq of H in any letter = 2.789) + (freq of E in any letter = 10.486) + (freq of L in any letter = 5.617) + <== Note L is only counted once here (freq of O in any letter = 6.601) - = 57.23
So What’s The Best Word Already?!?
Per my algorithm, if it were a word, “SOAES” would be the best starting word2. However, Wordle requires real words as guesses, so here are the top 20 first words, and the scores my algorithm gives them:
1: tares 122.12 2: cares 122.02 3: bares 120.94 4: sales 120.42 5: dares 120.18 6: pares 120.07 7: tales 120.01 8: sores 119.12 9: canes 118.95 10: bales 118.83 11: mares 118.73 12: cores 118.62 13: dales 118.07 14: pales 117.97 15: lanes 117.96 16: banes 117.87 17: fares 117.82 18: lores 117.63 19: sates 117.56 20: bores 117.54
A few thoughts I have:
- My algorithm really likes words ending in “es”. In fact, the first word not ending in “es” is
TAELS(two hundred and third best option, and also not a word that I personally was familiar with). However, looking at a list of recent Wordle solutions, none of them are plurals, which would make “es” endings much less likely.
- The fact SALES is the fourth-best word indicates some kind of a problem. Even though S is really common in first and last letter, and also overall, it would still be a waste of a letter to have a word that repeats a letter. I may tweak the algorithm more- maybe if a word has the same letter twice, it only gets the per-character bonus from the character where it is the most likely. Or I may just leave it because I’ve already spent a lot of time on this.
- After doing this, I did some Googling to see if anyone else had done what I did. I found a few sites suggesting ADIEU as the first guess because it covers so many vowels. I think (in agreement with my algorithm) that covering R/S/T is still better.
There’s an Update!
I got some feedback after posting this, and made some adjustments. Read the update here.
The second-best word is “TOAES”, which actually seems like a better first guess to me because it doesn’t waste a letter on the first S. But the algorithm sees that S in first position gets more points (11.853) than (T in overall (5.246) + T in first letter (6.198)). More on this in “Upon Reflection”.