Exercises rating:
★☆☆ - You should be able to based on Python knowledge plus the text.
★★☆ - You will need to do extra thinking and some extra reading/searching.
★★★ - The answer is difficult to find by a simple search, requires you to do a considerable amount of extra work by yourself (feel free to ignore these exercises if you're short on time).
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-talk')
from sklearn.cluster import AgglomerativeClustering, MiniBatchKMeans
from sklearn.datasets import load_digits
from sklearn.metrics import v_measure_score
from sklearn.manifold import TSNE
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline
digits = load_digits()
Make a pipeline and join PCA and k-means into a single model. Does the v-measure improves after the use of linear preprocessing?
Now use t-SNE as the preprocessing. Does the v-measure improves after the use of non-linear preprocessing?
Note that the t-SNE implementation of sklearn
is incomplete.
It does not have a plain transform
method
and is not applicable beyond the data for which it is fit
.
This is not a problem for us who are only exploring the
non-linearity of the digits dataset.
Instead of using plain TSNE
in your pipeline use the class defined below (remember to execute this cell).
class PipeTSNE(TSNE):
def transform(x):
return self.fit_transform(x)
Use linkage='ward'
for the time being.
Remember to use the PipeTSNE
defined above.
Keep linkage='ward'
in this exercise.
Remember to use the PipeTSNE
defined above.
Now it is time to use linkage='single'
in the agglomerative clustering.
Does single linkage perform better on the non-linearly preprocessed dataset
than it did when we saw it performed on the raw data of the digits dataset?