How to Predict an Outcome of a Chess Game?

A data driven approach in predicting an outcome of a chess game

I always wondered, what is the best strategy to win any chess game? Although I am quite sure that there isn’t one universal best strategy to win every game, but still we can explore some of the best strategies and tactics from past games played by professionals, which are favorable for either of the black piece or white piece players or both.

A game of chess is a competitive, strategy-based board game which is played between two players with an actual chess set or can be played online. If you go by the rules, chess is a pretty simple game. All you need to do is, understand the role and movement of each piece and play the game accordingly. But just knowing the rules is not going to help you against a good player in a tournament or a competition.

Chess is a game of patterns and building proper plans and strategies is the most important skill to have to win a chess game. A chess player always tries to analyze different patterns in a chess game and forms effective plans and strategies for possible gameplays.

Chess literature tells us that a standard game of chess between two professionals can be divided into 3 phases viz. ‘opening game’, ‘middle game’ and ‘end game’, each with their own importance.

Hence to win a game of chess, it is important to understand and analyze different styles of play and developing unique strategies and game plans for each phase of the game accordingly.

The game of chess has 3 possible outcomes viz. win for black piece player, win for white piece player and a draw. A win for either player can be obtained in three ways viz. ‘checkmate’, ‘resign’ and ‘out of time’. A draw can be obtained in two ways viz. ‘out of time’ and ‘draw’ (through stalemate, mutual agreement, threefold repetition etc.)

Figure 1: Major ways of obtaining each outcome of a game

From the above heatmap, we can see that the trend of obtaining a win is same for both players i.e., out of all the wins obtained, most of the wins were obtained through ‘resign’ (withdraw from the opponent) followed by ‘checkmate’ and ‘out of time’. Most of the draws were happened through usual ‘draw’ (i.e., through stalemate, mutual agreement, fifty-move rule etc.) and draws due to ‘out of time’ were very minimal.

In chess tournaments, chess games are generally played under a time limit. Each player gets a prescribed time limit to win a game. The first player who runs out of time loses the game.

The time controls are usually displayed in ‘M/S’ format, where ‘M’ refers to the total number of minutes assigned to a player and ‘S’ refers to the number of seconds, which are added to the total minutes, for each move by that player. For example, if time control is ‘10+15’, then 10 is the total number of minutes assigned to a player and 15 is the number of seconds which are added to a player’s total time when the player makes a move.

a) What are the top 10 most used increment codes?

Figure 2: Top 10 Most Used Increment Codes

From the above bar plot, we can see that ‘10+0’ increment code has been used the most number of times in the dataset followed by ‘15+0’ and ‘15+15’. The least used increment code amongst top 10 increment codes is ‘30+0’.

Out of all 20058 games, ‘10+0’ increment code has been used in 7721 games i.e., 38 percent of all games played and its number of usage is greater than all the other top 9 increment codes combined.

b) Which outcome of the game is dominating each of the top 10 most used increment codes?

Figure 3: Which Outcome is Dominating Each of the Top 10 Increment Codes

Here we can see that the winner’s trend is same across all top 10 increment codes i.e., white players have won more games than others, closely followed by black players. The draws are very hard to come by in these top 10 increment codes.

In a time-controlled chess game, speed along with accuracy in predicting the next move of an opponent is of utmost importance. When the clock is ticking, a player has to quickly analyze opponent’s plans and strategies and must make a move to counter attack the opponent in a very short amount of time. Hence, if a player takes too much time for a move, then he/she may run out of time quickly than the opponent.

Figure 4: Average Number of Turns Played in Each of the Top 10 Increment Codes

From the point plot above, we can see that ‘8+0’ increment code has been observed with highest number of average turns played, followed by ‘10+0’ and ‘15+0’.

It can be observed from the above plot that the increment codes ‘8+0’ and ‘10+0’, who have no additional increments for each move, have been observed with highest number of average turns and on the other hand the increment code ‘30+0’ which has the highest time limit allotted for a game, amongst top 10 increment codes, have been observed with least average number of turns along with ‘5+8’.

This implies that in shorter games, the players sometimes, may move a piece just to avoid running out of time and that particular move may not be a part of an actual plan. In the shorter time period games, if some player’s plans don’t work or backfires, then there will not be a lot of time to come up with counter attacking plans and strategies for that given situation.

Hence to keep up with time limit, a player may decide to go for some redundant moves until he/she figures out something. This may explain the highest average moves in the shorter games.

On the other hand the games with relatively longer time periods may record some lowest average number of turns just because of the fact that there is more time for thinking and to come up with counter attacking plans and strategies. Hence, this may be the reason for lowest average number of turns observed in some of the longer time period games.

Certain chess openings, if used effectively, favor one particular player (black pieces or white pieces) over the other.

Some openings are so aggressive that, if the opponent makes one mistake, the game will be over in the next move itself. The lack of understanding of certain openings or lack of game awareness can end the game within a few moves in the opening phase of the game itself.

a) What are the top 10 most used opening games?

Figure 5: Top 10 Most Used Opening Games

Here we can see that ‘Van’t Kruijs Opening’ has been used the most number of times followed by ‘Sicilian Defense’ and ‘Sicilian Defense: Bowlder Attack’. Only the first two openings were used more than 300 times and all the other openings were used less than 300 times.

b) Which outcome of the game is dominating each of the top 10 openings?

Now let’s see which opening favor which outcome of the game.

Figure 6: Outcomes Dominating each of the Top 10 Increment Codes

Here we can see that, out of top 10 most used openings, the white players have managed to win most of matches in only 4 openings viz. 1, 2, 3 and 9 (openings from top to bottom on y-axis) whereas the black players have managed to win most of the games in 6 openings viz. 4, 5, 6, 7, 8 and 10. Games ending in draw are minimal. We can observe a close contest between black and white piece players in the openings 7 and 8.

The dataset has 8 numeric variables and 8 categorical variables. Given these variables, let’s see how well an outcome of a game can be predicted.

a) Model Selection

For selecting a suitable model, various classifiers were instantiated and were fit to the training set. During model selection, the dataset with only numeric variables was used to select a better model. After fit to the dataset, these classifiers were made to predict on the validation sets created by training set using ‘cross_val_predict’ function from Scikit Learn.

After evaluating the performance of all classifiers, the Random Forest Classifier, out of convenience for hyperparameter tuning, was selected for making further predictions. Below is the performance of various classifiers on validation sets.

Table 1: Evaluation Metrics of Different Classifiers on Validation Sets

Evaluation Metrics:

Evaluation metrics used to measure the performance of each classifier are listed below.

  1. Accuracy: Accuracy is the ratio of correct predictions to the total predictions i.e., it tells us the proportion of correct predictions among all the predictions.
  2. Precision: It is the ratio of predictions which are correctly predicted as positives by the classifier, to all the predictions, predicted as positives by the classifier. It gives us the proportion of correct positive predictions amongst all the positive predictions.
  3. Recall: It is also called as ‘true positive rate’ and ‘sensitivity’. It is the ratio of predictions which are correctly predicted as positives by classifier to the actual positives i.e., it gives fraction of correct positive predictions amongst all the actual positives.

b) Handling Categorical Data:

As mentioned earlier, the dataset contains 8 numeric variables and 8 categorical variables. The ‘One Hot Encoder’ (dummy variables) approach was used to convert categorical variables into numeric ones because ‘Label Encoding’ approach would have misled the classifier, as several hundreds of sub-levels for a categorical variable were present.

Following the rule of 10 observations per column, the variables ‘id’, ‘white_id’, ‘black_id’ and ‘moves’ were removed from the dataset. Otherwise, the number of columns would have exceeded the number of rows. After removing these variables, both numeric and categorical variables were merged into a single dataset.

After handling the categorical variables, the RFC (Random Forest Classifier) was fit to dataset with both numeric and categorical variables and made to predict on validation sets created by training set. The accuracy of the classifier was increased from 63% to 66%. Other metrics are shown below.

Figure 7: Evaluation Metrics of Random Forest Classifier on Dataset with Both Numeric and Categorical Variables

c) Fine Tuning the Hyperparameters of Random Forest Classifier:

Better result can be obtained with hyperparameter tuning of a classifier. ‘GridSearchCV’ function was used to fine tune the hyperparameters of RFC.

Using GridSearchCV, the Random Forest Classifiers with different combinations of hyperparameter values, were fit to the training data. When fitting the classifier on the dataset, all the possible combinations of parameter values were evaluated and the best combination was saved for the later use.

The new found best RFC was then used to make predictions on the training set. The accuracy of the classifier increased from 66% to 99% which is a great deal of improvement observed after hyperparameter tuning.

Figure 8: Evaluation Metrics of Fine Tuned RFC

d) Predictions on the Test Set:

As a last step, the best RFC was used to make predictions on the test set which was kept separate till now. As expected, the accuracy, precision and recall of the classifier decreased to 62% on the test set.

62% accuracy may not be a satisfactory result, but it is much better than random guess i.e., 50%. If the dataset had more observations, we could have used those dropped categorical variables in the prediction process and in turn it would have increased the performance of the classifier even more.

Figure 9: Performance of Fine Tuned RFC on Test Set

e) Top 10 Important Features that Contributed in Prediction:

From figure 10, we can see the top 10 variables that have contributed, to the maximum extent, in the prediction making by the classifier. ‘White_rating’, ‘Black_rating’ variables have explained 8% of the target variable each, while ‘turns’ variable has explained 6% of the target variable. These are the top three contributors in the prediction process.

Figure 10: Features with their Relative Importance for Making Predictions

Till now we have explored different aspects of a chess game and their impact on the outcome of a game. The summarized information about all the analysis of the chess dataset is mentioned below.

  • ‘Resign’ is the most common way of winning a game for both players, followed by ‘checkmate’ and ‘out of time’. Most of the games end in draw in the usual ‘draw’ (stalemate, mutual agreement, threefold repetition etc.) followed by ‘out of time’.
  • The increment code ‘10+0’ was used most of the times followed by ‘15+10’ and ‘15+15’. In all of the top 10 increment codes, white piece players have won more matches than others, closely followed by black piece players, and the draws are hard to come by.
  • The highest average number of turns were played in ‘8+0’ increment code followed by ‘10+0’ and ‘15+10’.
  • The most used opening was ‘Van’t Kruijs Opening’ followed closely by ‘Sicilian Defense’ and Sicilian Defense: Bowlder Attack’. Out of top 10 most used openings, white players have won most of the matches in only 4 openings whereas black players have won most of the matches in 6 openings.
  • Random Forest Classifier was shortlisted to make predictions on the test set. With only numeric variables, the RFC’s accuracy was 63% on validation sets. When the categorical variables were also used along with numeric variables, its accuracy increased to 66%. After hyperparameter tuning, its accuracy increased to 99% on the training set. When finally, predictions were made on the test set by fine tuned RFC, the accuracy was decreased to 62%.
  • ‘White_rating’, ‘black_rating’ and ‘turns’ variables were the top three contributors in prediction making by RFC.

These are some insights that I got from chess dataset. To see detailed analyses, you can visit my Github repository here.

Data and ML enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store