04.04 Pandas Exercises

Exercises rating:

★☆☆ - You should be able to based on Python knowledge plus the text.

★★☆ - You will need to do extra thinking and some extra reading/searching.

★★★ - The answer is difficult to find by a simple search, requires you to do a considerable amount of extra work by yourself (feel free to ignore these exercises if you're short on time).

We will use the titanic data for this set of exercises, it gives the account of the survivors from that tragic voyage. Since the accounts are often incomplete, or the accounts for many could not even be found, there is a very good deal of missing data, you will need to be careful with that. We use the seaborn library to import the dataset because it has an easy to use procedure made specifically for this purpose. There is no need to use seaborn to perform the exercises.

In [ ]:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-talk')
import pandas as pd
import seaborn as sns
titanic = sns.load_dataset('titanic')
titanic

1. Describe general statistics and memory/values for the data frame, what can you evaluate from it (★☆☆)?

In [ ]:
 
In [ ]:
 

Evaluation:

2. How many and what were the towns where people embarked on the Titanic (★☆☆)?

In [ ]:
 

3. What is the overall survival rate of the people that were on board (★☆☆)?

In [ ]:
 

4. What is the survival rate depending on the gender? Use a bar plot to show the difference. (★★☆)

In [ ]:
 

5. Is the survival rate different depending on the class in which the person traveled? Use a plot. (★★☆)

In [ ]:
 

6. Compare, with a plot, the survival rate according to both: gender and class. (★★☆)

In [ ]:
 

7. Is there a relationship between the town in which a passenger embarked onto the Titanic and the class in which he traveled? (★★★)

In [ ]: