Creating a dashboard for the Kings County Housing Data Set
Looking Back on a Previous Project
In Phase 2 of Flatirons Data Science Program, we where given the Kings County housing data set and asked to create a Linear Regression model to predict the prices of housing listings from a hold out set of data. The project was framed around a friendly competition to see who could create the best predictive model with the lowest amount of error.
Looking Back at my own work, I can see that I was so focused on producing a working model, I totally ignored understanding the as housing listings. I focused more on understanding the relation between the features of the data set and the statistical differences between the listings.
Here was a visualization that I had created that shows the difference in price distribution between properties that are inland verse waterfront properties. While I think the visualization helps show that there is a significant difference in the listing prices, it does not really display the data as physical locations. Where I to revisit the project I would want to be able to include some visualizations that utilize the geographic aspects of the data.
Using Tableau to Create Better Visualizations
Tableau is a Bossiness intelligence tool that is amazing at creating in depth visualizations. After graduating from flatiron I have been teaching myself how to use Tableau and how to use the tool to aid in data analysis. On of my favorite features in tableau is the ability to add interactive filters. These allow you to create a visualization where you can display certain conditions based on the filters you choose. While I was making my way through the tableau eLearning lessons, I got the idea to revisit the kings county dataset. I wanted to not only test out my new skills, but also create some new visualizations for a flatiron project I feel is only half done.
Kings County Data In Tableau
On of the first steps was to read the data into Tableau. Tableau will automatically try to sort the data into discrete dimensions and continuous measures. The sorting isn’t always the best for the data you are working with, so I first changed certain features such as waterfront, grade and zipcode into discrete dimensions. I also took note of the datatypes. Tableau will assign icons to specific datatypes like a # for numeric data and a globe for geographic data. With geographic data, tableau will automatically create a map with the data when dragged to a sheet.
Just By dragging the zip codes to a new sheet, I can already create a map of the different zip codes in the dataset.
The next steps I took was to add in the other features to create an interactive visualization of the housing data. By adding the measure values to the size option on the marks card, I created differently sized points on the map of all the listings sized by their price. I also added in color to the listing data points to show the difference between high and low prices. To reduce the amount of data displayed I added filters to allow whoever is viewing the data to display specific listings.
Here you can see that I have added filters to limit whether the listings are waterfront, a specific grade and a slider for housing prices. The slider for the price is particular useful here due to the amount of outlier house prices that where throwing off the visualization.
More work to be done
I am having so much fun working with tableau and revisiting old projects like this one to create some visualizations I was not able to do with python alone. The more I learn with tableau the more projects I want to revisit and test out my new skills. I look forward to finding new and interesting ways to display the data.