Let's try running some code in the notebook. And learn about some IPython extras that are not available in normal Python.
Remind yourself that to run a code cell one can use the run
button
at the top or use one of the keyboard shortcuts.
The shortcuts for running a cell were:
Alt + Enter
runs the current cell and inserts a new code cell below.Shift + Enter
runs current cell and goes to the next cell.Ctrl + Enter
runs the current cell and stays on the same cell.The kernel state changes by running cells, and one can reference variables defined in other cells. Note that this can catch you unaware if you redefine a variable.
x = 42
print('the answer is', x)
In general software engineering this is called global state, and it is considered a very bad practice. Yet, for the notebook this comes in as a friendly feature. Since you can reference things in other (often earlier) cells you can run the notebook sequentially and build on previous code. The downside of global state is that making notebooks into full python scripts may not be the best engineering solution for a large python application.
Also, since $42$ is the answer, the The Hitchhiker's Guide to the Galaxy, may or may not be a requirement to understand some examples. Whenever we find it useful for doing machine learning or not, it is a worth lecture on the state of modern technology.
If we are after a single value only we do not need to print it.
The notebook will always print out the value of the last statement in the cell.
That is, unless that value is None
or an empty statement.
(Note that adding an empty statement, i.e. ;
, at the end of a cell
is a good way to suppress the output of the last value.)
'the answer is ' + str(x)
'the answer is ' + str(x);
The kernel is the process that is executing the python code we run.
The kernel may start an operation that takes too long, or, sometimes, just crash.
In such a case one needs to restart the kernel:
there is a button to perform this at the top of the interface
(the restart button)
but it can be also performed by hitting 00
(double zero) in command mode.
After restarting, all state of the kernel (including variables) is lost. You will need to rerun all cells you wish to work with. Also, when you start a notebook anew or continue a previously shutdown notebook, a new kernel with clean state is attached to it. In other words, saving the notebook does not save kernel state.
A good practice thereof is to build a notebook sequentially. Where cells that appear later are only dependent on code that is above them. This sounds harder than it actually is, the typical behavior when experimenting is to keep going down in the notebook, and add new cells downwards, as we keep writing more python code.
What is printed (to standard output) in the cell is displayed as its output. And what is printed to the standard error stream is displayed in red. This is not very useful in the code you write in the notebook itself but, since most python libraries print warnings to standard error, a red printout means that an internal warning of some sort got triggered.
print('cookies!')
import sys
print('the cookie jar is empty', file=sys.stderr)
Therefore whenever you get a red printout, it is wise to check it closely. Often, most warnings can be ignored, but a handful of warnings may be indication of problems that are very difficult to find and debug. The good practice is that one is only allowed to ignore a warning he fully understands the provenance of.
The code in the cells is executed by the kernel synchronously but the communication with the kernel is asynchronous. The interface is therefore responsive but one needs to wait for the previous operation to finish before new output can be generated.
If you are seeing this in the notebook, execute the next two cells in quick succession. If you are reading the plain text I strongly suggest you try out the notebook, programming - as mathematics - is not a spectator sport.
import time
time.sleep(7)
print('finished')
The behavior is that the second cell will only execute after the first cell completes. Yet, the notebook interface will not prevent you from executing the second cells whilst the first one is running. The second cell with then be queued to execute as soon as the first cell finishes. This is useful to remember. If you execute a cell but it is not running straight away it means that the kernel is placing cells on a queue, and will continue to do so as long as it waits for some earlier cell to execute. You need to find the culprit cell that is taking a long time to execute and decide whether to wait for it or stop the execution.
Let's see the asynchronous communication in action. By running the following code the jupyter interface will slowly build the output.
for i in range(10):
print(i)
time.sleep(0.5)
If a cell is expected to run for a long time, something that we will be doing when training expensive models, one can execute a printout every a handful of computations in order to see the progress of the code. Debugging complex code is often easier by well placed printouts, rather than complex debuggers.
For very large outputs a scrollbar will be added. To control the scrolling there are options - Enable/Disable Scrolling for Outputs - in the context menu of the cell. Of course, if you are reading the plain text there will be no scroll bar. The scroll bar and the asynchronous output work well together, therefore you do not need to be careful of printing too much from inside a large calculation.
for i in range(20):
print(i**2)
And these are pretty much all the basics needed in order to use the jupyter notebook to run python code. Jupyter, of course, has much more to offer but our objective is to learn the tools for machine learning, and that's enough of jupyter. Feel free to explore the full jupyter documentation: