In data science presenting our results is as important as achieving them. The Jupyter project (previously called IPython Notebook) provides a notebook application which allows runnable pieces of code to be mixed with text and images explaining them. Support for other languages, apart from Python, has been added a handful of years ago; although the notebook is still mostly a Python niche.
There are several flavors of notebooks in jupyter. One may be still use (some companies do) old Python 2 libraries and IPython notebooks; Python 2 is dead and buried, time to stop practicing necromancy and update to a modern language. The Jupyter Notebook application superseded the IPython Notebook and has been in use for a long time, although the application is finally starting to show its age. If you are familiar with the Juputer Notebook application feel free to use that to run the examples here, they all work in the Jupyter Notebook application.
What we will use and describe is the Jupyter Lab application which can run notebooks, terminals, edit files and has many features on top of the Jupyter Notebook application. Jupyter Lab will soon supersede the Jupyter Notebook application, therefore we shall learn it for the future. First a note on terminology: The Jupyter Lab is the application that runs among other things jupyter notebooks; The Jupyter Notebook application is the older program used to run jupyter notebooks; A jupyter notebook is a file that can be run by either the Jupyter Lab application or the Jupyter Notebook application. Yes, that's as confusing as it looks. Try to read this paragraph thrice and try to differentiate the concepts in your mind.
From now on we will reference jupyter notebook
as the running notebook inside the application -
preferably Jupyter Lab - on your machine.
Similar to some well known editors (e.g. vi
) the jupyter notebook is modal,
i.e. it has modes where different commands are accepted.
By default it has two modes an edit/input mode and a command/run mode.
Some extensions provide more modes.
Behind the scenes the notebook is connected to a process which executes the commands and returns their results. The engine (called the kernel by the jupyter project) runs inside a local (or remote) webserver talking to the notebook. Communication between the notebook and the kernel is asynchronous, making for a very responsive interface.
Opening Jupyter Lab one sees a left sidebar which shows the contents of the file system, and a right work area presenting buttons to create some of the supported files by Jupyter Lab. The general interface looks like the following:
We will describe below a typical work flow through the interface below. But first let's write out a reference for each part so we have something to come back to.
Left Sidebar
Menu Bar
Work Area
A typical work flow through a session is to: First open the Jupyter Lab and navigate in the left sidebar to the place where a notebook is to be created. Create the notebook from the launcher in the main work area. By default the notebook will be named Untitled.ipynb, one can rename it by right clicking the file in the left sidebar.
Once the notebook is running one would add cells, often interleaved markdown and code cells. Code cells are the default when adding a cell, they contain Python code and can be run with the run button or with a handful of shortcuts. To run a code cell from the keyboard one can do:
Alt + Enter
to run the cell and insert a new code cell below it.Shift + Enter
to run the cell and advance to the next cell.Ctrl + Enter
to run the code cell without moving.Code cells may produce output which is then displayed below the cell. The output can be a printout from the code or be more complex such as displaying an image or a graph. The last shortcut only works with code cells, yet it is probably the most useful shortcut when exploring notebooks written by someone else.
Markdown cells exist to annotate the document. The text you are reading has been originally written in markdown cells on a jupyter notebook. One changes from a code cell to a markdown cell by using the drop down menu at the top, or using the shortcuts:
Ctrl + 1
makes the cell a code cellCtrl + 2
makes the cell a markdown cellThere also exist raw cells, these are for jupyter extensions that may create other cell types.
Markdown is a very simple plain text format that can be easily transformed into an HTML presentation. It is similar to LaTeX in that paragraphs are separated by white lines; and is similar to plain text emails, where emphasis is done by surrounding words with asterisks or hyphens. There are several flavors of markdown but one can simply run the cell to see how the syntax presents itself.
The navigation across cells is modal,
which means that a different set of commands work
when one is editing a cell or when one is moving between cells.
One can click on a selected cell or press Enter
to enter
the edit mode and modify the contents of the cell.
By selecting another cell or pressing Esc
one exits
the edit mode and goes back to command mode,
where cells can be moved up and down by dragging.
When one closes the tab containing a notebook, the code running in it keeps going. In order to stop the Python code one must either shutdown the entire Jupyter Lab or select Close and Shutdown Notebookn the File menu. There are many more shortcuts to Jupyter but here is a handful of useful ones:
Ctrl + Shift + D
Single Document Mode,
it hides the tabs until you execute the shortcut again.Ctrl + Shift + Q
Close and Shutdown Notebook,
as opposed to keep it running in the background.Ctrl + S
Saves the notebook.Apart from notebooks Jupyter Lab has other uses. Two significant features are the ability to edit text files directly on the file system and the ability to open a terminal in order to perform more advanced operations by hand. We will have a quick reference at these two:
Text Editor
.txt
, changing that will guess the file syntax.__init__.py
file is still needed.Terminal View
PTY
connected through a websocket.xterm
emulator on Linux/MacOS.PowerShell
on MS Windows.Other Jupyter Lab features include a CSV (Comma Separated Value) visualizer and a display for several formats of images and graphs.
The ipython
program (improved Python) is a command line interface that can
be understood as another interface to an IPython Kernel (in reality the
IPython Kernel is a modified ipython
binary, since ipython
is an older project).
For quick exploration or just for people that prefer command line tools
(your faithful here included), ipython
is a good option.
Later, after some experimentation,
one can move the results to a notebook for presentation.