Examples of EDA for Image Analysis
Some code snippets for analyzing your image dataset
A different kind of EDA
Exploratory data analysis for images is totally different from working with a standard data set. You are working with a stack of images and not a dataframe. In this blog I’ll show some code snippets I have used to help display images and identify differences between the classes in my dataset.
EDA with the Alzheimer's data set
Im working with an Alzheimers data set. Alzheimers is categorized into increasing stages of deterioration of the hippocampus and grey matter in the brain. To visualize the progression of tissue degeneration, here are some different methods you can use to display the different classes in the dataset.
First up is showing the average image for each class. This is a great way to see the average pixel distribution in each image and see if there are any major differences between the classes.
fig, ax = plt.subplots(nrows=1,ncols=4,figsize=(20,5))
NonD_mean = find_mean_img(NonD_images, ax[0], 'NonDemented')
veryMild_mean = find_mean_img(veryMild_images, ax[1], 'VeryMildDemented')
mild_mean = find_mean_img(mild_images, ax[2], 'MildDemented')
Moderate_mean = find_mean_img(moderate_images, ax[3], 'ModerateDemented')
plt.savefig('./images/mean.png')
Here you can see clear deterioration in the most extreme progression of Alzheimer’s. The large areas of missing tissue became very visible as the disease progresses. To visualize the difference between a healthy brain and the fully progressed Alzheimers brain we plot the contrast between the average images like so.
contrast_mean = NonD_mean - Moderate_mean
plt.imshow((contrast_mean*255).astype(np.uint8))
plt.title(f'Difference Between NonD & ModerateDemented Mean')
plt.axis('off')
plt.savefig('./images/contrast.png')
plt.show()
The white segments of the this image show the large sections of difference between a healthy brain and a fully demented brain.
Similar to average images we can also plot the pixel standard deviation
fig, ax = plt.subplots(nrows=1,ncols=4,figsize=(20,5))
NonD_std = find_std_img(NonD_images, ax[0], 'NonDemented')
veryMild_std = find_std_img(veryMild_images, ax[1],'VeryMildDemented')
mild_std = find_std_img(mild_images, ax[2],'MildDemented')
Moderate_std = find_std_img(moderate_images,ax[3], 'ModerateDemented')
plt.savefig('./images/std.png')
Just like the average image, the std image shows clear deterioration in the the moderate demented class.
For the last example of image eda I will show how to do a histogram oriented gradient, or hog. This visualization demonstrates the difference of the brightness and the direction of change in the pixels. It is a great method for seeing important features between two classes.
#histogram of oriented gradients
fd_NonD, NonD_hog = hog(NonD_mean, orientations=8, pixels_per_cell=(8,8),
cells_per_block=(3,3),visualize=True)
fd_moderate, moderate_hog = hog(Moderate_mean, orientations=8, pixels_per_cell=(8,8),
cells_per_block=(3,3),visualize=True)
NonD_hogs = exposure.rescale_intensity(NonD_hog, in_range=(0,0.04))
moderate_hogs = exposure.rescale_intensity(moderate_hog, in_range=(0,0.04))
fig = plt.figure(figsize=(16,16))
fig.suptitle('Histogram of Oriented Gradients', x=.5,y=0.92,fontsize=16)
ax1 = fig.add_subplot(2,2,1)
ax1.imshow(NonD_hogs, cmap='binary')
ax1.set_title('Non-Demented')
ax2 = fig.add_subplot(2,2,2)
ax2.imshow(moderate_hogs, cmap='binary')
ax2.set_title('Moderately Demented')
plt.savefig('./images/hog.png')
In the moderately demented class you can see a huge concentration of dark pixels in the grey matter and hippocampus.
Happy Coding
These are just a few examples of how to visualize some images. Each dataset is unique and comes with its own challenges and different aspects to explore. Visualizing the differences in your classes can help you understand how your model will make classifications and also bolster your domain knowledge of the dataset.