SVMs are good for image recognition because of the high dimensional space, i.e. one dimension per pixel in the image. We will attempt to classify a dataset of faces. A face is a non-linear problem. For example we can tell that these two characters are related because the same actor did play them. But the only relation is the face, and perhaps the overly muscular body, the characters themselves have very little in common.
We import the usual things and the SVM classifier. Also we import the loader for the Olivetti faces. The Olivetti faces is a set of $400$ images of faces, $10$ faces per person. It is a very clean dataset: the faces are well centered and the support of each class is the same across all classes. And since we are working with images we import PCA for noise reductions and model selection tools.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-talk')
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import KFold, GridSearchCV
from sklearn.datasets import fetch_olivetti_faces
ofaces = fetch_olivetti_faces()
print(ofaces.DESCR)
We should also look at the images to see what we are working with.
fig, axes = plt.subplots(10, 10, figsize=(16, 16))
fig.subplots_adjust(hspace=0, wspace=0)
for i, ax in enumerate(axes.flat):
ax.imshow(ofaces.images[i*2], cmap='gray')
ax.axis('off')