05.03 Stats Exercises

Exercises rating:

★☆☆ - You should be able to based on Python knowledge plus the text.

★★☆ - You will need to do extra thinking and some extra reading/searching.

★★★ - The answer is difficult to find by a simple search, requires you to do a considerable amount of extra work by yourself (feel free to ignore these exercises if you're short on time).

Below we implemented the mean function directly in NumPy broadcasting. Let's try the same with the other basic statistic functions, using similar vectors as we did in the lecture. Our objective is that we learn how to encode equations using NumPy array operations and broadcasting. From here on we will see more equations and we will argue that these equations are implemented in a vectorial programming manner. Here we build ourselves some equations in a vectorial programming manner in order for the idea to not seem as overwhelming later.

Note: It is fine to reuse previous solutions in later exercises. It it not fine to use NumPy's mean, std, var, cov, or corrcoef.

In [ ]:
import numpy as np

arr = np.arange(30, 90, 2)
acr = np.arange(60, 120, 2) + np.random.rand(30)*3 - 1.5
arr, acr

1. Mean (already solved).

$$\bar{x} = \frac{1}{N} \sum_{i=1}^{N} x_i$$
In [ ]:
def daml_mean(x):
    return x.sum() / len(x)


# test
print(arr.mean())
print(daml_mean(arr))

2. Variance (★☆☆)

$$\sigma^2 = \frac{1}{N - d} \sum_{i=1}^{N} (x_i - \bar{x})^2$$
In [ ]:
def daml_var(x, ddof=0):
    pass


# test
print(arr.var(ddof=0))
print(arr.var(ddof=1))
print(daml_var(arr, 0))
print(daml_var(arr, 1))

3. Standard Deviation (★☆☆)

$$\sigma = \sqrt{\frac{1}{N - d} \sum_{i=1}^{N} (x_i - \bar{x})^2}$$
In [ ]:
def daml_std(x, ddof=0):
    pass


# test
print(arr.std(ddof=0))
print(arr.std(ddof=1))
print(daml_std(arr, 0))
print(daml_std(arr, 1))

4. Covariance (★☆☆)

$$cov(X, Y) = \sigma_{xy} = \frac{1}{N - d} \sum_{i=1}^{N} (x_i - \bar{x})(y_i - \bar{y})$$

Note: You only need to calculate the covariance between the arrays, and only between two arrays. No need to calculate the diagonal of the covariance matrix.

In [ ]:
def daml_cov(x, y, ddof=0):
    pass


# test
print(np.cov([arr, acr], ddof=0)[0, 1])
print(np.cov([arr, acr], ddof=1)[0, 1])
print(daml_cov(arr, acr, 0))
print(daml_cov(arr, acr, 1))

5. Correlation (★★☆)

$$corr(X, Y) = r = \frac{cov(X, Y)}{\sigma_x \sigma_y} = \frac{\sigma_{xy}}{\sigma_x \sigma_y}$$

Note: You only need to implement the correlation coefficient between two arrays. No need for the entire matrix of the p-values. Also, degrees of freedom are meaningless for correlation (the $1/(N - ddof)$ is simplified in the equation).

In [ ]:
def daml_corr(x, y):
    pass

# test
print(np.corrcoef([arr, acr])[0, 1])
print(daml_corr(arr, acr))