Cleaning and Visualizing Overwatch Data

Jacob Heyman
5 min readApr 2, 2021

Some EDA and Visualizations made with Tableau

Overwatch

Overwatch is my favorite game. The amount of hours I have personally logged into this game over the years boarders on embarrassing. My obsession with this game, should explain why an Overwatch Kaggle dataset got me a little more excited about data then one should be. The data set was comprised of one players data over the course of 2017. In this blog I will show you some of my cleaning and EDA, as long as visualizations I made using Tableau.

Cleaning

In my past few blogs I have gone over some of my techniques for early cleaning. I started by reading in the data with pandas, checking the shape and dtypes of the data frame and then created a plot of the missing values for each column of the dataframe.

sns.heatmap(df.isnull(),yticklabels = False, cbar = False,cmap = 'tab20c_r')
plt.title('Missing Data: Overwatch')
plt.show()

This was a lot of missing data. My next step was to see what percentage of missing data there was in each column.

rows = df.shape[0]
null_total = df.isnull().sum()
missing_percent = (null_total/rows)*100
pd.DataFrame(missing_percent ,columns = ["missing_percent"])

This block of code gives me a dataframe of the percentage of missing values for each feature in the dataset.

For my first step of cleaning I decided to drop any column that was missing over 60% of its data.

df.drop(['note','eliminations','objective_kills','damage','healing','deaths','weapon_accuracy',
'offensive_assists','defensive_assists','scoped',
'Unnamed: 46','Unnamed: 47','Unnamed: 48','Unnamed: 49'],axis=1, inplace= True)

My next step was to fill in the missing data for the SR start and finish. These columns had a substantial amount of missing data, but knowing how SR changes for each match, I decided to fill in the missing values with a forward fill. Filling in the null values with the previous value would not greatly effect the overall game data.

df['sr_start'].fillna(method='ffill', inplace=True)

With the SR start and SR finish columns all filled, I decided to completely recalculate the delta SR (change in SR)

df['sr_delta'] = df['sr_start'] - df['sr_finish']

I next wanted to check for some outliers in the time date column. I first converted the column to datetime type and then made a plot of the SR delta over time.

df['date'] =  pd.to_datetime(df['date'], format='%m/%d/%Y')plt = df.plot(x="date", y="sr_delta",figsize= (10,10))
plt

There was a single data point outlier in the date column so I removed that from the dataset.

For my next few steps of cleaning I checked through each column that still had null values. I filled in the values on a case to case basis and ultimately dropped about 600 rows. Not the best for continuing with some ML models, but I could still work with this dataset to make some fun visualizations. Before exporting the dataset to tableau I made a few clerical cleaning steps by fixing spelling and capitalization in the hero’s and maps columns.

df['hero'] = df['charcter_1'].astype(str)
df['hero'] = df['hero'].str.capitalize()

Here is a simple code example of some data cleaning to make the string data more uniform to work with.

Lets play with Tableau

With the data mostly cleaned, I could now start the fun part of making visualizations. Tableau is one of my new favorite tools to work with. It is a business intelligence tool that offers a variety of options for creating visualizations with your data. For my first visualization, I wanted to to see the change in SR by hero played for each season.

Here you can see I display the total sum of SR delta by hero by season. I colored the SR delta to be blue when positive and orange when negative. I also added filters incase anyone wanted to view a specific season or hero.

Up next I wanted a simple line chart of the average finish SR for hero's for each season.

I colored each line for the different hero’s to show the different average SR over the seasons. I also grouped the hero’s into their specific roles and added a filter so anyone could visualize the change in sr by any of the three roles.

For my final visualization I wanted to see the day to day change in average SR.

In this visualization you can see a day by day breakdown of the average SR with each season color coded.

My final step was an attempt to build a dashboard using the three visualizations. Creating dashboards is still pretty new to me and I am trying to figure out how to display multiple visualizations without squishing it too much. https://public.tableau.com/profile/jacob.heyman#!/vizhome/OverwatchViz/Dashboard1?publish=yes

here is an attempt to get all three visualizations onto one dashboard. Moving forward I may keep the first visualization separate from the other too and create multiple dashboards to build a story.

--

--