A random walk is a mathematical concept where random steps are performed repeatedly. Each step consist of adding a random number the the current cumulative sum of all previous steps. Variants of the random walk allow the random numbers to be units (-1 or 1), floating points, limited within a range, and/or taken from a distribution.
Several stochastic processes can be modeled through a random walk, notably processes seen in nature. We will simulate a handful of random walks and get a hang of vectorized computing and simple plotting in the process.
The pyplot
is an interface to many plotting facilities in matplotlib
.
For now do not worry about the %matplotlib inline
line.
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('seaborn-talk')
We generate a simple random walk by generating random numbers and summing them.
rng = np.random.RandomState(42)
steps = 1024
numbers = rng.randint(-9, 10, steps)
sums = numbers.cumsum()
plt.figure(figsize=(14, 6))
plt.plot(sums);
What we did is a loop without actually writing a loop: an aggregation. Yet, we can do better. If we generate a matrix of random numbers we can sum across columns, and build several random walks at once.
rng = np.random.RandomState(42)
steps = 1024
runs = 32
numbers = rng.randint(-9, 10, (runs, steps))
sums = numbers.cumsum(axis=1)
plt.figure(figsize=(14, 6))
plt.plot(sums.T);
That was a lot to take in, slow down. Let's reduce the number of steps and walks and do an in-deep look at every piece of that code.
rng = np.random.RandomState(42)
steps = 6
runs = 3
numbers = rng.randint(-9, 10, (runs, steps))
numbers
We have three sets of random numbers, three rows in a matrix.
sums = numbers.cumsum(axis=1)
sums
axis=1
means: perform the aggregation across columns.
Therefore we performed a cumulative sum inside each row,
we now have three random walks inside each row.
sums.T
The plotting engine understands columns as separate functions,
therefore we need to transpose (.T
) the matrix for plotting.
We only need to call plt.figure
to parametrize the image, in this case its size, in inches.
The default size of the image is 5 by 5 inches, which is quite small for most uses.
Note: The notebook main area uses a resolution of 72 DPI (dots per inch). Which means that its width resolution of 14-15 inches is just around 1024 pixels. Any image bigger than that (in pixels or inches) will be scaled down, and its aspect ratio adapted accordingly.
plt.figure(figsize=(14, 6))
plt.plot(sums.T);
Let's go back to the full example.
rng = np.random.RandomState(42)
steps = 1024
runs = 32
numbers = rng.randint(-9, 10, (runs, steps))
sums = numbers.cumsum(axis=1)
plt.figure(figsize=(14, 6))
plt.plot(sums.T);
We can extract statistics (also called features) about the walks. For example, the walk that reached the higher number at the end.
sums[:, -1].argmax()
Or the smallest value.
sums[:, -1].argmin()
How many walks turned on the positive side?
np.sum(sums[:, -1] > 0)
How many walks stray above 100 or below -100?
np.sum(np.abs(sums[:, -1]) > 100)
Which was the first walk to stray 100 (or -100) from the origin (and did stay there)?
This is slightly more complicated because we want to consider only the walks that stray that much.
Also note that argmax
(and argmin
) will take the first maximum (or minimum) value in an array,
i.e. if there is more than a single maximum value.
In the case of a boolean array (as below) argmax
will give the index of the first True
value.
above = np.abs(sums[:, -1]) > 100
(np.abs(sums[above, :]) > 100).argmax(axis=1).argmin()
Let's have a look at that specific random walk.
sums[above, :][4, :128]
Pretty sensible, this random walk reaches 100 very quickly.
Another way to reach the same array is to calculate which row index inside above
corresponds to the row index inside sums
.
idx = 4
sums[(above.cumsum() == (idx + 1)).argmax(), :128]
Can you figure out how this works?