Data Science

Gen 1 Pokedex

Project Overview

This is my first personal data analysis project since achieving my data science: analytics certification from Codecademy.

I decided to do a data analysis of the original generation 1 pokemon games. Specifically, this is an analysis of the pokedex as it originally was – no dark, steel or fairy types, all pokemon with their original stats and typings, and all mechanics as it was in the original games.

 

Here is a link to the GitHub folder containing the Jupyter Notebook file and the CSV’s exported using pandas.

The main script file is here.

Finally, I created some visual dashboards in Tableau.

Analysis Overview

While creating the code for this, I wanted to attempt cloud-based coding because there’s a smaller chance that I could finish this in a reasonable time if I always had to be at home. I stumbled upon Google Colab.


The only downside of using this service is that once you become inactive, the code and DataFrames in memory over at Google get cleared down (understandably). When you use Google Colab, you have to re-execute your code again to get to the point of continuing. 

While a bit of a limitation, it encouraged concise and linear code which helped keep everything manageable and understandable even after periods of absence. If you’re going to open the script file, I’d recommend using this platform.


With a platform chosen, it was time to code! I used this as an exercise in web scraping with Beautiful Soup as well. 

Scraping the data and retrieving the data took a bit of trial and error – it’s a bit harder when you don’t already know HTML but I got the hang of it. The source data was split over multiple tables, and I decided to use two separate sites to populate and merge data together to create the first proper DataFrame, but I enjoyed the challenge. 


This won’t be a play-by-play of the whole analysis (that’s what the script file is for), and viewing the visualisations on Tableau will also be better as the visualisations are interactive. Here are some notable highlights which I found interesting.

Observations

There isn’t a single mono-type (Pokemon with one type) in the game that is a flying type. The same can be said for the rock type. There is only one ghost-type line in the game, that being the Gengar line, which are all dual-type ghost-poison. 

 

Contrasting mono-types against a count of all types including dual-types, flying is one of the most popular types in the game, which is interesting considering that there are no mono-type flying-type pokemon in the game. While mono-type poison is the third most popular, it is the single most popular type in the game. 

The distributions of the pokedex’s individual and combined stats in some places shows a clear normal distribution, while others are more spread out. HP and defense are both fairly similar slightly right-skewed normally distributed bell curves. Attack, special and speed for some reason are more erratic and less centralised. Finally on this, the distribution of base stat totals is a bimodal distribution. Not overly mind-blowing but worth a mention. 

It’s quite something, just how hard a Golem exploding can damage a Chansey. 1633 damage! That’s insane. Abra gets incinerated in the blast, taking a huge 1131 damage, too. Back in generation 1, explosion would half the opposing Pokemon’s defense stat before it hit, which had to be accounted for in the damage calculation formula.

 

It also turns out that having a poor special stat, and having a dual-type typing that is 4x weak to grass, is a fine way to get obliterated by a Venusaur’s solar beam. Geodude, Onix and Rhyhorn, all Pokemon known for having horrible special defense, take an enormous 1000 damage!

On the lower end of the damage spectrum, poor Pikachu – it’s only able to one hit KO two Pokemon in the pokedex with thunder, those two being Krabby and Magikarp. You might be thinking “I’ve one hit KO more pokemon than that with Pikachu”, and you probably have. The damage calculation assumes that all Pokemon have perfect IV’s and are totally max trained in all stats. NPC’s Pokemon won’t be grinding out those extra stats in the main game.

Finally, the most common dual-type in the game is grass-poison, just followed by normal-flying.

In the GitHub folder, all of the CSVs are stored there so you can use them yourself for whatever you’d like to do.

The methodology behind how they were made can be seen in the script that’s also in the folder.

 

Lastly, at the bottom of the script is a sandbox area that you can use to see individual type combination effectiveness, add moves to the database to check more damage matchups, and retrieve a select Pokemon’s max level 100 stats. I’ll save the rest for the actual analysis!