★☆☆ - You should be able to based on Python knowledge plus the text.
★★☆ - You will need to do extra thinking and some extra reading/searching.
★★★ - The answer is difficult to find by a simple search, requires you to do a considerable amount of extra work by yourself (feel free to ignore these exercises if you're short on time).
We will use the penguin dataset collected by the Palmer Station in Antarctica. The set is composed of data on $344$ penguins found on three islands from the Palmer Archipelago at the end of the Antarctic peninsula. There are $4$ numeric measures done on each penguin, and its species, its gender and location is given. The dataset has $3$ penguin species and $11$ rows of missing data.
The numerical measures will allow us to build classifiers for the
non numerical columns.
KNeighborsClassifier work in a similar fashion to the classifier
we have built ourselves,
import numpy as np from sklearn.preprocessing import LabelEncoder from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score from sklearn.neighbors import KNeighborsClassifier import seaborn as sns numeric = ['bill_length_mm', 'bill_depth_mm', 'flipper_length_mm', 'body_mass_g'] penguins = sns.load_dataset('penguins').dropna() penguins
Split the dataset into a training set and a testing set and check
the classifier with
accuracy_score against the testing set.
The number of neighbors in
is specified with the
n_neighborsin the penguin gender classifier. (★★☆)¶
Run the classifier training in a loop and print the accuracy score
for different values of
Values up to
n_neighbors=20 should be enough.
n_neighborson this classifier. (★★☆)¶
n_neighbors) for the island from where the penguins comes. (★★☆)¶
Run the previous exercises several times in order to get different splits of training and test sets.
The accuracy may vary dependent on the split but the values for
n_neighbors should be similar
in each run.