What’s the Best Initial Guess for Wordle?

Recently, a daily word game called Wordle has gone viral on Twitter. The rule of the game is simple, guess a five letters word in 6 tries. Some of the hint after guessing a word is shown in the figure below.

I found the game as interesting and challenging since there are hundreds of thousands of words in English but it is not my native language. Luckily I know Python! I calculated the statistics of letters occurrence in English and make an informed initial guess


First thing first, I load all the words from a .txt file provided on GitHub. Since the quiz only considers a five letters word, we eliminate all the words with length other than that. Some words might not be in the quiz dictionary, but I just let it be.

Next, we calculate the occurrence of the alphabet. in the five letters words. This can be done easily by using dictionary in Python, iterate through all the words and letters, count it, and normalize the data. The code can be seen below.

import string
letter_count = dict.fromkeys(string.ascii_lowercase, 0)

for word in words_5_letters:
  for letter in word:
    letter_count[letter] = letter_count[letter] + 1

total_count = sum(letter_count.values())

letter_count_normalized = {key: value/total_count for key, value in letter_count.items()}

sorted_letter_prob = {k: v for k, v in sorted(letter_count_normalized.items(), key=lambda item: item[1], reverse=True)}
Statistics 1: English Letter Probability in 5 Letters Words

The result shows that more than 10% of the letters is ‘a’, with ‘e’ trails behind at almost 10% and ‘s’ at slightly higher than 8%. This is slightly different from the distribution for words with any length, shown in the image below. With this statistics, we are confident that we should include those letters in our initial guess.

Frequency Table
English Letter Frequency (source)

Another statistics, let’s count the occurrence probability of a word in a letter. With “’Pandas’ library, the code is shown below.

import pandas as pd

data = dict.fromkeys(string.ascii_lowercase, [0, 0, 0, 0, 0])
df = pd.DataFrame.from_dict(data, orient='index')
df.columns = [1, 2, 3, 4, 5]

for word in words_5_letters:
  for count,letter in enumerate(word):
    df.loc[letter][count+1] = df.loc[letter][count+1] + 1
df_transposed = df.transpose()
df_normalized = df_transposed.div(df_transposed.sum(axis=1), axis=0)

The result is shown in the figure below. One can see that the most common letter for first to fifth position in a word are ‘s’, ‘a’, ‘r’, ‘e’, and ‘s’. This can be a hint on what first word is good as a guess. However, since using ‘s’ double is not efficient, we can substitute the first letter with the next high occurring letter, ‘c’.

Statistics 2: Letter Occurrence Probability in the Words

Hold up! what about the Statistics 1? Yes! We should also consider it so let us do the calculation! Suppose that we consider both Statistics 1 and 2 is equally important so we can measure the most probable word by averaging those criteria. The function to calculate the score is written below

def count_score(word):
  count_crit_1 = 0
  count_crit_2 = 0

  for count,letter in enumerate(word):
    count_crit_1 = count_crit_1 + sorted_letter_prob[letter]/100
    count_crit_2 = count_crit_2 + df_normalized.iloc[count][letter]
  return (count_crit_1 + count_crit_2)/2*100

Making an Informed Guess

To make an initial guess, let us iterate through all the five letters words and see which one has the highest probability. Note that we should not include a word with doubled letter since it is not an efficient guess. The code is shown below.

def letter_is_not_doubled(check_string):
  count = {}
  condition = True
  for s in check_string:
    if s in count:
      count[s] += 1
      count[s] = 1
  for key in count.keys():
    condition = condition and (count[key] == 1)
  return condition
words_score = {}

for word in words_5_letters:
  if letter_is_not_doubled(word):
    words_score[word] = count_score(word)
sorted_words_score = {k: v for k, v in sorted(words_score.items(), key=lambda item: item[1], reverse=True)}

From this calculation, we found that the highest probability word is ‘tares’. Thus, we can use this word as our first guess, an informed guess!

Once you play the quiz, you will notice that one guess is not enough so we need another one. For our next guess, we do not want to include the letters already exist in the first guess. Let us define a function to filter out which letter we want to exclude and calculate the score again.

def not_contain_this_letter(word, not_contain):
    condition = True
    for letter in not_contain:
        condition = condition and (letter not in word)
    return condition
not_contain = 'tares'

word_guess = [word for word in words_5_letters if not_contain_this_letter(word, not_contain)]

words_score_2 = {}

for word in word_guess:
    if letter_is_not_doubled(word):
        words_score_2[word] = count_score(word)
sorted_words_score_2 = {k: v for k, v in sorted(words_score_2.items(), key=lambda item: item[1], reverse=True)}

From this filtering, we found out that the best word for the second guess is ‘colin’! Using the same technique, by excluding ‘tares’ and ‘colin’ we found that the third best guess in case two is not enough is ‘bumpy’.

There you go! Make “tares” as your initial guess, following with ‘colin‘ for the second one, and ‘bumpy’ in case you think you need a third one.

Now you can play Wordle with statistically best initial guess. Good luck!

Personal Note on Learning New Things: Case Study on Chess

I started playing chess when I was a kid, but not regularly. Several years later, after watching the Queen’s Gambit on Netflix in January 2021, I began playing chess again. With the pandemic situation, outside the working hour, I restricted my activity in crowded places and I found chess can be a fun activity to do.

How the Journey Goes

In my initial games on chess on chess.com , I used “fazlurnu” as my username. After several games, my rating was below 800 which is around the average rating on chess.com for a game of rapid chess (10-minute time format). I thought I could improve so I keep on playing.

Rapid Time Format Rating Distribution on Chess.com

Two months later, in March 2021, my rating went to 1200. After that, I took a pause on chess because it was too time-consuming and started to distract me from important things. I closed my previous account.

On June 1st, 2021 I created a new one, with a username of toko_material. Within 5 months after opening that account. I reach 1636 for Rapid chess, which is higher than 98% of all the players, or in other words, the top 2% on chess.com.

My Rapid Chess Rating as of November 9th, 2021

I realize I made some important approaches on my chess journey, allowing me to go from average in the beginning, to be in the top 2% at this moment. I am sure it can be used as a reflection to study or master new things.

Here are some important notes.

Know the Basics, Know the Theories

When I started with the average rating, I had no clue of the theory in chess. I only knew only the rule, like how the pieces move, what is a checkmate, etc. As my rating shows, if you know this then you are an average.

Luckily, chess.com provides learning features where you can study several theories related to chess, starting from opening to end game. One theory that opens up my mind is the value of each piece is not the same. With this, you know when to trade certain pieces. You can even trade two knights and a bishop for a queen and the game is not losing for you.

The Value of the Chess Pieces. During the beginning of our chess… | by The  Chess King Shop | Medium

Another theory is the advantage of knight and bishop. You may see that both pieces are equal but in certain situations actually one of them is worth more than the other. Take a look at a closed game below, the bishops are obviously useless since their movement is restricted by the pawn. In this condition, a knight is better than a bishop.

Open vs Closed Chess Game - The Chess Website
A closed game

There are many more theories in chess, learning them gives you a lot of advantages compared to those who don’t. Of course, you can discover the theory by yourself from months or years of experience, but you should avoid reinventing the wheel!

Learn from the Professionals

There are millions of chess players in the world and some of them are in the highest-ranked position. Luckily, in this age, you can watch their games and learn from them!

Personally, I prefer watching Agadmator Chess Channel on Youtube. Antonio, the host, is a knowledgeable chess player and with sharp analysis and fun jokes. He analyses world-class chess players’ moves by using the help of chess engines.

In the picture below, Magnus Carlsen, the world chess champion, sacrificed his queen for two bishops and a knight when fighting Anish Giri. In the end, with excellent pieces coordination, Carlsen won the game. This proves the theory that a queen is worth as much as those three pieces and you can win the game with that queen sacrifice.

Magnus Carlsen vs Anish Giri match review by Agadmator

I have watched countless videos from Agadmator, I think it is a good help to improve my reasoning in making moves during my chess games.

Practice, Practice, and Practice

It is no secret that practice improves our performance, as the saying goes “practice makes perfect.” Repetition helps me to learn, evaluate, and understand chess more.

I have been practicing chess almost every day since June 2021 and I think it is one of the reasons for my improvement. Surely there were times where I lost my games, but with evaluations and having a small break, I can bounce back to the track.

With practicing, I personally think I develop a kind of memory. I do not need to think a lot, but I know that the move I am going to make is a good one and it will work. Analyzing the pattern, or choosing which trade to make, will also be easier.

This chess journey has taught me that mastering a new thing is always possible. I went from average to top 2% chess player on chess.com within 9 months, something I did not expect when I started to play chess regularly.

I am sure, with a good strategy, anyone can learn new things and master it within a relatively short period.

As the saying goes, “Never stop learning, because life never stops teaching”.

Duka Untuk 1000 Korban COVID-19 di Sulawesi Tengah

Palu, 14 Agustus 2021

Tanggal 26 Maret 2020, kasus pertama COVID-19 terdeteksi di Sulawesi Tengah, tepatnya di Kota Palu. Pasien tersebut merupakan pelaku perjalanan yang kemungkinan tertular saat sebelum kembali ke Palu. Lebih dari 500 hari kemudian, COVID-19 masih terdeteksi di Sulawesi Tengah bahkan dalam keadaan yang lebih berbahaya.

Hingga hari Sabtu, 14 Agustus 2021, tercatat lebih dari 1000 kasus meninggal yang diakibatkan oleh COVID-19 di Sulawesi Tengah. Ada catatan penting yang perlu kita renungkan, 614 kasus meninggal atau 60.4% dari total kasus meninggal terjadi di 1,5 bulan terakhir. Ini berarti, di Sulawesi Tengah, 1 orang meninggal akibat COVID-19 setiap 2 jam. Mungkin, kerabat kita sudah menjadi korban.

Dari 1017 kasus meninggal terkait COVID-19, angka terbesar datang dari Kabupaten Banggai sebanyak 218 kasus, disusul Kota Palu sebanyak 165 kasus. Banggai Laut, kabupaten terjauh dari Kota Palu mencatat 19 kasus meninggal akibat virus ini. Tidak ada kota atau kabupaten yang bebas dari COVID-19, penyakit ini telah mewabah ke segala penjuru hingga ke pelosok.

Lebih Mencekam dari Pulau Jawa

Pada situs Kementerian Kesehatan, terdapat tiga kategori transmisi komunitas untuk mengukur keparahan penyebaran COVID-19. Tiga kategori ini adalah Kasus Konfirmasi, Rawat Inap Rumah Sakit, dan Kematian. Perhitungan yang dilakukan Kementerian Kesehatan untuk kategori-kategori tersebut dibagi per 100 ribu penduduk per minggu. Dengan cara ini, kita dapat melihat tingkat “kepadatan” COVID-19 di masyarakat pada suatu daerah.

Sulawesi Tengah, per tanggal 11 Agustus 2021, mencatat 203,76 kasus terkonfirmasi positif per 100 ribu penduduk per minggu. Angka ini jauh lebih tinggi dari puncak kasus COVID-19 di Jawa Barat dengan 128,9, atau di Banten dengan 189,54 kasus terkonfirmasi positif per 100 ribu penduduk per minggu. Pada kategori ini, penyebaran COVID-19 di Sulawesi Tengah juga “lebih padat” dibandingkan dengan Jawa Tengah dan Jawa Timur.

Dengan jumlah fasilitas kesehatan yang lebih sedikit dari provinsi-provinsi tersebut, tentu beban yang dihadapi puskesmas, rumah sakit, dan tenaga Kesehatan kita lebih berat. Jelas, kondisi pandemi di provinsi kita lebih mencekam dari Pulau Jawa. Kita berada dalam kondisi krisis.

Pada tanggal 13 Agustus 2021, Sulawesi Tengah berada pada 6 besar provinsi dengan kasus terkonfirmasi per 100 ribu penduduk per minggu terbanyak. Kita memuncaki klasemen di Pulau Sulawesi, menjadi daerah yang paling berbahaya di antara provinsi-provinsi lainnya.

Gambaran kondisi krisis ini jelas terlihat pada masyarakat kita. Di Banggai misalnya, dalam sehari terdapat 12 Jenazah yang “antri” untuk pemulasaran COVID-19 pada tanggal 4 Agustus 2021. Kemudian di Palu, di hari yang sama, dr. Rochmat selaku Ketua Satgas COVID-19 Kota Palu, menyampaikan bahwa 7 orang penderita COVID-19 meninggal saat isolasi mandiri di minggu itu.

Mari kita renungkan sekali lagi, kita berada dalam kondisi krisis. Kasus COVID-19 di Sulawesi Tengah jauh lebih mencekam dari Pulau Jawa.


Efektif mulai tanggal 10 Agustus hingga 23 Agustus 2021, terdapat tiga daerah yang harus menjalani Pemberlakuan Pembatasan Kegiatan Masyarakat (PPKM) Level IV di Sulawesi Tengah. Tiga daerah tersebut adalah Palu, Banggai, dan Poso dengan lebih dari 1000 kasus aktif COVID-19. Hal ini tentu merupakan upaya pemerintah untuk mengurangi penyebaran virus ini di tengah masyarakat.

Masyarakat tidak diperbolehkan mengadakan acara yang melibatkan keramaian, termasuk pesta dan resepsi pernikahan. Kegiatan-kegiatan pada tempat umum juga dibatasi jumlah warga dan jam operasionalnya. Pun, menurut instruksi gubernur, bilamana ada keramaian pada lingkungan masyarakat, protokol kesehatan 5M harus dipatuhi.

Lain aturan dengan lapangan, beberapa kali kami menjumpai laporan mengenai keramaian yang terjadi di masyarakat. Misalnya, Wali Kota Palu menganjurkan masyarakat untuk melakukan kerja bakti demi mengejar target penghargaan Adipura. Kita tahu, tidak ada kerja bakti yang tidak menimbulkan keramaian, pun tidak mungkin dilakukan secara virtual. Alih-alih berupaya menurukan penyebaran COVID-19 di masyarakat, ajakan kerja bakti ini justru membuka celah baru bagi virus ini.

Mari kita renungkan sekali lagi, kita berada dalam kondisi krisis. Sudahi acara beramai-ramai, tekan laju penyebaran COVID-19!

Lawan COVID-19!

Simpul-simpul relawan yang diinisiasi oleh warga mulai terjalin. Sebut saja Roa Jaga Roa, gerakan oleh wartawan di Kota Palu yang bertujuan untuk membantu pasien-pasien isolasi mandiri (isoman). Beberapa nama lain yang bisa kami sebutkan adalah JagaPalu, Sigi Mosijagai, dan Relawan Nagasi oleh Pemerintah Kota Palu.

Gerakan relawan ini fokus pada penanganan pasien yang telah terkonfirmasi COVID-19. Ibarat genting bocor saat hujan deras, mereka membantu pemerintah “mengeringkan” lantai untuk mencegah korban jiwa. Kawan-kawan relawan Mendedikasikan energi, waktu, dan biaya atas nama kemanusian.

Melawan COVID-19 tidak cukup dengan mengeringkan lantai yang basah tersebut. Masalah di bagian hulu harus diatasi, yaitu dengan membenahi genting yang bocor. Laju penyebaran virus ini harus ditekan sehingga jumlah orang yang terjangkit dan meninggal berkurang. Ikhtiar membenahi genting yang bocor ini hanya bisa dilakukan oleh pemerintah, baik provinsi maupun kabupaten/kota, yang memiliki sumber daya dan kuasa. Instrumen-instrumen yang dapat digunakan tentu sudah termuat dalam arahan PPKM Level IV, yang kemudian harus diadaptasi sesuai kebutuhan masing-masing daerah.

Masyarakat dan pemerintah harus bergandengan tangan menekan penyebaran COVID-19. Pemerintah berkewajiban untuk meningkatkan kualitas dan kuantitas testing, tracing, dan treatment. Sementara masyarakat harus menegakkan protokol kesehatan 5M; memakai masker, mencuci tangan, menjaga jarak, menjauhi kerumunan, dan mengurangi mobilitas. Dalam melaksanakan perannya, masyarakat dan pemerintah perlu saling mengingatkan sehingga penanganan krisis akibat wabah dapat berjalan dengan baik.

Note: Tulisan yang sama telah dipublikasi di Harian Mercusuar, daring dan luring, dengan sedikit perubahan.

Data Driven Decision Making: Booking a Flight Amid Corona Virus Outbreak

On February 20th, an outbreak of Corona virus began in Italy. It is believed to begin from the region Lombardy. Since then, the number of cases began growing exponentially in northern Italy. As part of European Union, Italy shares a “seamless” border, both by land and by flight, with other European country, including France and Germany. This put the neighboring country into test, whether they can contain the contagion of the virus.

I’m a student living in Toulouse, France. As part of the curriculum, I have to do an internship at the end of my study. Interested in drone, I choose Japan as the country where I will spend 6 months as an intern, starting from the 6th of April. I plan to leave France at the end of March since I have a project presentation on the 26th. However, the Corona virus cases began to spread wildly in France, after the Outbreak in Italy. This occasion put me into uncertainties, will it be fine, for my case, to stay in France until end of March?

Toulouse from Above

If France has a major spike for the Corona virus, I might have to be quarantined when I arrive in another country outside France. The quarantine period may last up to 14 days. Taking this consideration, if I leave France on the 27th of March and take a flight directly to Japan, I won’t be able to start my internship in time. It will be delayed up to the next Monday, 13th of April. Clearly, I should leave as soon as possible, I will apply the visa in Indonesia. But, when should I leave?

I asked my research project supervisor regarding the situation. Amid the outbreak, it will be hard to stay until end of month. I proposed him to have the final presentation online and he agreed. I am thinking of meeting him once more before I leave, so I booked a flight to Indonesia for Tuesday, March 17th.

However, things are not going well. I heard from a friend that India will quarantine anyone coming from Europe, starting from Friday, March 13th. Trump also announced that US imposes a travel ban from the continents. I also read news that Indonesia banned any flight from and to Italy, when the cases hit 9000.

I have to make a calculation, like, literally!

My calculation should give me number when any flight from France, or Europe, will be banned by Indonesian government, assuming they will do the same restriction as for Italy. I take the number of Covid-19 cases from a website, they provide number of total cases as well as new cases, every day. I made a logarithmic regression taking the number of cases as the input and predict the number of cases on 17th of March. It gives me the result as shown below!

Logarithmic Regression of Total Cases (N)

The result is startling! By 17th of March, there will be more than 20 thousand cases in France. However, something does not feel right. The new cases count is about 500 on March 11th but on March 12th it will be 1800. The next day, March 13th, an addition of 1600 new cases will happen. The jump from 500 to 1800 is a wild estimation and decrement to 1600 on the next day does not make sense. Something must be wrong with the model, even the correlation (R2) is high.

I recall the comment section on 3Blue1Brown video, someone mentioned that the “number of cases” does not grow exponentially. It is the “number of new cases” that follows this trend. Therefore, I did remodel considering exponential growth of the “number of new cases” and find the result as follows.

Logarithmic Regression of New Cases (delta N)

The regression result seems reasonable, there’s no unreasonable spike between day to day. Taking this for my flight planning, I have to change reschedule it! March 17th is too late to go back to Indonesia. The latest will be this Saturday, March 14th, as Indonesian government will probably take more prudent consideration, banning the flight when the number hits 5 thousand. Another remark is there is only one direct flight from Europe to Indonesia. If the other countries impose a travel ban, it might be harder to reach Indonesia.

Soon after this calculation, I reschedule my flight for Saturday afternoon. Unfortunately, a night before the flight, the airline canceled all flight for European country. Soon after hearing the news, I booked another flight to Indonesia using Singapore Airline. Gladly, Singapore imposes travel ban starting from 23.59 on March 15th. I can have my flight to Indonesia and apply my visa to Japan when I arrived.

Had I waited for March 17th to go back to Indonesia, the return trip to would be extremely difficult as only limited flights served and the French government has put the country into lock down. Thanks to data driven decision making, I could go back to Indonesia in a less complicated situation.

Moral of the story: in hard times, whenever you want to make a decision, use data! It gives you an estimate of the future, much better than assuming.

Autonomous Mobile Robot on edX: a Review

I completed my first online course several days ago. I was skeptical about this novel media of learning since it does not allow a direct touch with the professor. However, I finished my first online course and in this post I’m going to share my experience with you.

The online course that I took is Autonomous Mobile Robot on edX. This course is provided by professors from ETH Zurich, one of the best university in the world, especially for autonomous systems. To have a certificate, you have to pay an upgrade fee and complete this course within a certain time range.

Source: edX

The course consists of weekly overview, lecture segments, problem set, and quizzes. For every new topic, they will provide you a weekly overview on what to expect from that particular topic. Next, they have some 8-15 minutes videos to learn. It shows a slide and a handwritten notes on it. Lastly, to validate the student capability, one has to answers question from problem set and quizzes.

Discussed Topics

I learnt some new concepts from this course. Firstly, I was introduced to “Locomotion Concepts” and “Mobile Robot Kinematics”. In these topics, I learnt how to model a dynamic system such as legged and wheeled robots. I had to solve problem related to the dynamic modelling as well as its control using inverse kinematics and forward kinematics.

The following topic is “Perception”, in which they introduce me to sensors and computer visions. Essentially, this is how your robot “see” the environment. This topic is very interesting because it allows me to know how sensors, especially camera, work and how to extract meaningful information from this devices. However, the problem set does not deeply elaborate the technical aspects. They mostly asked a conceptual question on the methods for computer vision. Therefore, to understand deeply, you need to practice a lot by yourself.

After perception, next is “Localization”. This topic allows me to understand how to find a position of a robot in a known environment. They teaches some methods such as Markov and Extended Kalman Filter (EKF) Localization. From this topic, I understand localization method for both discretized and continuous environment, as well as its advantages and disadvantages.

In contrast to localization, the next topic, “Simultaneous Localization and Mapping (SLAM)” is a method to localize a robot in an unknown environment. This is one of the most critical part in today robotics. Most of the robots are deployed in an unexplored place and one has to keep track of it. For this course, they teach the basic of SLAM, which is graphical representation of robot states and environment’s features. Then they explain further on the optimization of SLAM and elaborate it for EKF SLAM.

Surprisingly, after this completing this topic, I had an interview with a Japanese drone company. They asked me a lot of things related to drone and its automation , including SLAM. After explaining the EKF SLAM that I studied, they said they’re not using it in their company and they explain briefly on what they’re working on. This opened my mind that there are a lot of SLAM method out there and this course only covers a small part of it.

The last topic of this course is “Planning”, in which they teach you how a robot can navigate from a start to a specified goal. For this topic, they provided me a lecture on potential field planning as well as A*. On the problem set, one to solve a planning exercise using both methods. However, the planning problem is not integrated with the “actuation” part. In my opinion, elaborating it allows the student to understand deeply how it works in real life.


In my opinion, this is a good starting point to learn about robotics. One can build a basic knowledge and sense after finishing this course. The problem set is really helpful to understand the concept and validate your knowledge. They also provide you with two quizzes, which are harder than the problem set.

However, some questions in the problem set are not well defined, making it harder to solve it as you are not certain on what is expected and the format of your answer. Wrong formatting will lead to a wrong answer.

The courses is also not well maintained, one probably has answer to his/her question after a few weeks. This makes students find difficulties to continue their progress. I assume this thing happened because the course is launched long time ago and not a lot of people doing it right now.

Overall, as long as one has the required basic to do this course, such as mathematics and basic programming, it is a good introduction to robotics. Oh, and here’s my certificate from the online course! Check out this photo below or this link!