01.01 Running Code in the Notebook

Let's try running some code in the notebook. And learn about some IPython extras that are not available in normal Python.

Remind yourself that to run a code cell one can use the run button at the top or use one of the keyboard shortcuts. The shortcuts for running a cell were:

  • Alt + Enter runs the current cell and inserts a new code cell below.
  • Shift + Enter runs current cell and goes to the next cell.
  • Ctrl + Enter runs the current cell and stays on the same cell.

Cell state is for the entire notebook - aka. Global State

The kernel state changes by running cells, and one can reference variables defined in other cells. Note that this can catch you unaware if you redefine a variable.

In [1]:
x = 42
In [2]:
print('the answer is', x)
the answer is 42

In general software engineering this is called global state, and it is considered a very bad practice. Yet, for the notebook this comes in as a friendly feature. Since you can reference things in other (often earlier) cells you can run the notebook sequentially and build on previous code. The downside of global state is that making notebooks into full python scripts may not be the best engineering solution for a large python application.

Also, since $42$ is the answer, the The Hitchhiker's Guide to the Galaxy, may or may not be a requirement to understand some examples. Whenever we find it useful for doing machine learning or not, it is a worth lecture on the state of modern technology.

If we are after a single value only we do not need to print it. The notebook will always print out the value of the last statement in the cell. That is, unless that value is None or an empty statement. (Note that adding an empty statement, i.e. ;, at the end of a cell is a good way to suppress the output of the last value.)

In [3]:
'the answer is ' + str(x)
Out[3]:
'the answer is 42'
In [4]:
'the answer is ' + str(x);

Global State

py-global-state.svg
There is a curious fact here on how we humans associate concepts and images. We think of the global state in this image as a state encompassing the entire planet, in this case we do it by imagining the two major competitors in the Cold War (in the second half of the 20th century) unifying under a common banner. Here we have Uncle Sam (initials U.S.) - the personification of the United States government, and Joseph Stalin - the most significant leader of the Soviet Union. The juxtaposition of the figures brings to our minds the concept of the Cold War between the Soviet Union and the United States. Yet neither of the figures lived through or was related to the Cold War! Uncle Sam was a patriotic figure in the US wars of independence, and Stalin died before the period that we know as the Cold War (or perhaps more exactly Stalin's death was one of the main event leading to the Cold War).

Restarting - the Kernel

The kernel is the process that is executing the python code we run. The kernel may start an operation that takes too long, or, sometimes, just crash. In such a case one needs to restart the kernel: there is a button to perform this at the top of the interface (the restart button) but it can be also performed by hitting 00 (double zero) in command mode.

After restarting, all state of the kernel (including variables) is lost. You will need to rerun all cells you wish to work with. Also, when you start a notebook anew or continue a previously shutdown notebook, a new kernel with clean state is attached to it. In other words, saving the notebook does not save kernel state.

A good practice thereof is to build a notebook sequentially. Where cells that appear later are only dependent on code that is above them. This sounds harder than it actually is, the typical behavior when experimenting is to keep going down in the notebook, and add new cells downwards, as we keep writing more python code.

Output

What is printed (to standard output) in the cell is displayed as its output. And what is printed to the standard error stream is displayed in red. This is not very useful in the code you write in the notebook itself but, since most python libraries print warnings to standard error, a red printout means that an internal warning of some sort got triggered.

In [5]:
print('cookies!')
cookies!
In [6]:
import sys
print('the cookie jar is empty', file=sys.stderr)
the cookie jar is empty

Therefore whenever you get a red printout, it is wise to check it closely. Often, most warnings can be ignored, but a handful of warnings may be indication of problems that are very difficult to find and debug. The good practice is that one is only allowed to ignore a warning he fully understands the provenance of.

Communication with the Kernel

The code in the cells is executed by the kernel synchronously but the communication with the kernel is asynchronous. The interface is therefore responsive but one needs to wait for the previous operation to finish before new output can be generated.

If you are seeing this in the notebook, execute the next two cells in quick succession. If you are reading the plain text I strongly suggest you try out the notebook, programming - as mathematics - is not a spectator sport.

In [7]:
import time
time.sleep(7)
In [8]:
print('finished')
finished

The behavior is that the second cell will only execute after the first cell completes. Yet, the notebook interface will not prevent you from executing the second cells whilst the first one is running. The second cell with then be queued to execute as soon as the first cell finishes. This is useful to remember. If you execute a cell but it is not running straight away it means that the kernel is placing cells on a queue, and will continue to do so as long as it waits for some earlier cell to execute. You need to find the culprit cell that is taking a long time to execute and decide whether to wait for it or stop the execution.

Asynchronous Output

Let's see the asynchronous communication in action. By running the following code the jupyter interface will slowly build the output.

In [9]:
for i in range(10):
    print(i)
    time.sleep(0.5)
0
1
2
3
4
5
6
7
8
9

If a cell is expected to run for a long time, something that we will be doing when training expensive models, one can execute a printout every a handful of computations in order to see the progress of the code. Debugging complex code is often easier by well placed printouts, rather than complex debuggers.

For very large outputs a scrollbar will be added. To control the scrolling there are options - Enable/Disable Scrolling for Outputs - in the context menu of the cell. Of course, if you are reading the plain text there will be no scroll bar. The scroll bar and the asynchronous output work well together, therefore you do not need to be careful of printing too much from inside a large calculation.

In [10]:
for i in range(20):
    print(i**2)
0
1
4
9
16
25
36
49
64
81
100
121
144
169
196
225
256
289
324
361

And these are pretty much all the basics needed in order to use the jupyter notebook to run python code. Jupyter, of course, has much more to offer but our objective is to learn the tools for machine learning, and that's enough of jupyter. Feel free to explore the full jupyter documentation: