We have seen a god deal of the capabilities of NumPy
yet a bit chunk still remains.
We saw how we can cleverly build and select from the arrays,
and we saw how the *axis* and *shape* are used
to perform operations across specific dimensions.

Now we jump into operations on the contents of the arrays,
this comes in two parts:
we operate on the value in the array - a rather easy operation with NumPy arrays.
Or we *operate on the indexes into the array* - an operation
that can become mind boggling but also the operation that gives
NumPy its full power.
By indexing data in an array we form a new view,
and we can keep creating more complex views.
As long as we keep operating on views and indexes
we do not need to copy or move that data in memory,
this allows NumPy an optimal computing speed and memory management.
For a start let's import our library.

In [1]:

```
import numpy as np
```

After all the talk about executing several operations at once we
finally dive into the procedures that execute these operations.
Universal functions, *ufuncs* for short, are optimized operations
that perform arithmetic on NumPy arrays.
Under the hood the operations are assembly optimized loops
but from Python the operations appear to happen all at once.

To perform operations on NumPy arrays one adds, subtracts, multiplies, divides, or even other operations as if dealing with plain numbers. Behind the scenes NumPy does its magic and performs the operation across the array. Let's take an array:

In [2]:

```
x = np.arange(6).reshape((2, 3))
x
```

Out[2]:

We can perform some algebra on the entire array and the operation happens over all its elements.

In [3]:

```
x + 7
```

Out[3]:

Number conversion is allowed but discouraged.

In [4]:

```
x / 3.14
```

Out[4]:

One must remember that we are working with fixed memory locations. For example, attempting number conversion with assignment will fail:

In [5]:

```
x /= 3.14
```

This is because one cannot change the type of the array by assigning to it,
one *may* need a different amount of memory to hold the converted type.

As in Python the type conversion is upwards, towards floating point numbers, even if the result could be converted back to integers.

In [6]:

```
x // 3.14
```

Out[6]:

The full list of ufuncs is quite big but some are:

ufunc | operator | description |
---|---|---|

np.add | + | Add arguments element-wise. |

np.subtract | - | Subtract arguments element-wise. |

np.multiply | * | Multiply arguments element-wise. |

np.divide | / | Returns a true division of the inputs, element-wise. |

np.floor_divide | // | Return the largest integer smaller or equal to the division of the inputs. |

np.negative | - | Numerical negative, element-wise. |

np.power | ** | First array elements raised to powers from second array, element-wise. |

np.mod | % | Return element-wise remainder of division. |

np.sin | Trigonometric sine, element-wise. | |

np.cos | Cosine element-wise. | |

np.tan | Compute tangent element-wise. | |

np.arcsin | Inverse sine, element-wise. | |

np.arccos | Inverse cosine, element-wise. | |

np.arctan | Inverse tangent, element-wise. |

Note that not all operations are linked to an operator.
You can use them by directly invoking the ufunc on the array, e.g. `np.sin(x).`

And there are aggregation functions too, some of them:

aggregation function | nan-safe | description |
---|---|---|

np.sum | np.nansum | Compute sum of elements |

np.prod | np.nanprod | Compute product of elements |

np.mean | np.nanmean | Compute mean of elements |

np.average | Compute the weighted average | |

np.median | np.nanmedian | Compute median of elements |

np.std | np.nanstd | Compute standard deviation |

np.var | np.nanvar | Compute variance |

np.min | np.nanmin | Find minimum value |

np.max | np.nanmax | Find maximum value |

The different thing about aggregation functions is that these again accept the `axis=`

parameter.
Again, the tricky part is that the argument means the *axis through which the aggregation
will take place*.
This is best seen through examples:

In [7]:

```
x = np.arange(18).reshape((3, 6))
x
```

Out[7]:

In [8]:

```
np.sum(x, axis=None)
```

Out[8]:

In [9]:

```
np.sum(x, axis=0)
```

Out[9]:

In [10]:

```
np.sum(x, axis=1)
```

Out[10]:

The same as with concatenation,
if more dimensions are used then the concepts of rows and columns becomes fuzzy.
With more dimensions one needs to remember that the number given for the *axis*
is the index of the value in the *shape* of the array.

Apart from algebraic arithmetic, index arithmetic is important in NumPy. We can use an array of indexes to index another array. The is a concept that seems not overly useful at first sight but it is one of the most powerful concept in vectorial computing. One can work on indexes of data whilst the data is kept immobile in memory.

Let us look at an example with a simple array:

In [11]:

```
x = np.arange(18).reshape((3, 6))
x
```

Out[11]:

We will index the array with two other arrays. The result are the elements of the original array in positions $(1, 1)$, $(2, 1)$ and $(2, 3)$.

In [12]:

```
x[np.array([1, 2, 2]), np.array([1, 1, 3])]
```

Out[12]:

In the previous example we could have used Python lists but one can also use multidimensional arrays as indexes. For example:

In [13]:

```
x[np.array([[1, 2], [1, 2]]), np.array([[1, 1], [3, 1]])]
```

Out[13]:

Above we got the value at index $(2, 1)$ twice.
we can extend that concept.
We can use the `:`

operator to ask for *everything*
in one of the indexing positions.
In this case we will ask for *all columns*.

In [14]:

```
x[np.array([1, 1]), :]
```

Out[14]:

This way we selected the entire second row twice. Operating on indexes is a powerful tool.

Since we can use arrays to index arrays we can create arrays that work as masks. With fancy indexing we used arrays of integers that work as indexes. A mask is a boolean only arrays of the same size as the array being indexed, and we retrieve (index) only the entries for which the mask is true. But first how do we generate a boolean array?

In [15]:

```
x = np.arange(18).reshape((3, 6))
x < 12
```

Out[15]:

This array can then be used as a mask that selects all values where a true is present.

In [16]:

```
x[x < 12]
```

Out[16]:

The results shape is one dimensional because we cannot
tell how many *trues* may be found.

On the other hand, we can use indexing on specific index position to select parts of the array. Here we will select specific rows:

In [17]:

```
x[[True, False, True], :]
```

Out[17]:

We can use the `|`

(or) and the `&`

(and) operators.
Unfortunately in plain Python these are bitwise operators and have very high precedence,
therefore we need to at parentheses around the expressions.

In [18]:

```
x[(x < 12) & (x > 3)]
```

Out[18]:

One can retain the shape by masking columns or rows only.
To build row and column based masks one can use `np.all`

and `np.any`

.

In [19]:

```
x[np.all((x < 12) | (x < -2), axis=1), :]
```

Out[19]:

The boolean operators are a from of NumPy *ufuncs*.

Finally, since Python is slow compared to NumPy operations,
NumPy has a use for boolean arrays as a substitute for if-else loops.
Given a boolean array `np.where`

chooses from two arrays based on the
boolean values.
For example:

In [20]:

```
np.where([True, False, True, True, True], [7]*5, [2]*5)
```

Out[20]:

Contrary to most operations in NumPy sorting an array is performed in place. A priori the in-place nature of sorting appear to contradict the NumPy attempt at not copying or modifying data already in memory. But we will soon see that there are ways of sorting without moving the data.

Also contrary to the aggregation types we saw `axis=None`

cannot be used.
The default axis for sorting is `axis=-1`

.
As in Python indexing, $-1$ in the NumPy axis means the highest possible value.
For two dimensional arrays `axis=-1`

will be equivalent to `axis=1`

,
which is sorting across columns.

In [21]:

```
x = np.array([
[7, 7, 3, 1],
[2, 3, 1, 1],
[2, 3, 1, 2],
])
x.sort(axis=-1)
x
```

Out[21]:

Even if we cannot use `axis=None`

we can easily simulate such sorting
by first reshaping the array.

In [22]:

```
y = x.reshape(-1)
y.sort()
y
```

Out[22]:

And since sorting is in place we can see that the original array has been modified. Note as well that this modification is a good proof that in NumPy all values are stored sequentially in memory.

In [23]:

```
x
```

Out[23]:

The sorting is all good but if we have a big piece of data that we want to
use for several purposes, and only one of the purposes is sorting,
we may not want to copy all the data and sort it in place.
Moreover, sorting on a view may mess the data in the original place in memory.
To the rescue come the `arg`

*ufuncs* of NumPy.

Another way to return a sorted array instead of sorting it in place is to use `argsort`

.
The `argsort`

will return the indexes needed in order to sort the array
but will not actually sort the array itself.
Have a look at the image below.
The sorting happens when the arrows cross each other on the left hand side.
The right hand side just assigns the arrows into the sequential positions.
What `argsort`

does is to perform the sorting but not the assignment.
It returns the indexes with which indexing the original array will
return a sorted array.

In [24]:

```
x = np.array([7, 1, 3, 2])
x.argsort()
```

Out[24]:

In the same way as in place sorting,
`argsort`

uses `axis=-1`

as the default sorting axis
for multidimensional arrays.
For example we have the indexes needed across columns
to return a sorted array in the same way.

In [25]:

```
x = np.array([
[7, 7, 3, 1],
[2, 3, 1, 1],
[2, 3, 1, 2],
])
x.argsort()
```

Out[25]:

To actually sort the multidimensional array we will use fancy indexing.
But we require one more concept from NumPy: how to force dimensionality of arrays.
In order to change a one dimensional row array into a two dimensional column
array we use `np.newaxis`

to index the array on the dimension we want to add.

In [26]:

```
np.arange(3)[:, np.newaxis]
```

Out[26]:

`np.newaxis`

is just a fancy name for `None`

and you will see
`None`

used often in place of `np.newaxis`

.

Nevertheless, we now can use our multidimensional `argsort`

to sort the array without moving the data.

In [27]:

```
x[np.arange(3)[:, np.newaxis], x.argsort(axis=-1)]
```

Out[27]:

And in order so sort across rows we invert the indexing order.

In [28]:

```
x[x.argsort(axis=0), np.arange(4)]
```

Out[28]:

Combining fancy indexing, `arg`

*ufuncs* and axis arguments results
in incredibly powerful operations.
Sorting is not the only operation that has `arg`

*ufuncs*,
several aggregations allows for such *ufuncs* including
`argmin`

and `argmax`

.

Sorting is an expensive operation and often we do not need to sort only a part of the data.
Another common sorting procedure is `partition`

/`argpartition`

which sorts only part of the data.
For example, to retrieve the two smallest elements from an array
we partition it with $2$ as the argument.

In [29]:

```
x = np.array([6, 2, 3, 7, 1, 3, 2])
x[x.argpartition(2)]
```

Out[29]:

There is a little more about *ufuncs* and arithmetic with arrays than
what we talked until now.
The *ufuncs*, indexing, concatenation, axis and shape together
abide by a concept called broadcasting in NumPy.

When you add (or subtract, or multiply) a number against an array,
the number is summed to all elements, we know that this happens
because of *ufuncs*.
We say that the number is broadcast against all elements.

In [30]:

```
x = np.arange(18).reshape((3, 6))
x[0,:] + 42
```

Out[30]:

But we are not limited to broadcast single numbers only.
As long as arrays share parts of their *shape* they can be broadcast together.

If, for any reason two arrays need to be operated together,
be it fancy indexing or *ufunc* arithmetic,
the arrays are broadcaster together.
Broadcasting is a concept, the full broadcast structure
is not stored in memory, but the internal process NumPy
performs appear as if the arrays were broadcast before
the arrays are operated on.
For all practical purposes one can assume that when
a broadcast happens the array is fully broadcast in memory.
Some examples follow:

In [31]:

```
x[:,2:4] + np.arange(1, 3)
```

Out[31]:

In order to create a column vector we need to use `np.newaxis`

.

In [32]:

```
x[:,2:4] + np.arange(1, 4)[:, np.newaxis]
```

Out[32]:

The row vector example is equivalent to the following.
Part of the broadcasting procedure is to *extend an axis at the front* of the array.
For finer control one can do it explicitly.

In [33]:

```
x[:,2:4] + np.arange(1, 3)[np.newaxis, :]
```

Out[33]:

The Alexandra Palace was the birth place of television.
The British Broadcast Corporation (the BBC) built its mast in 1936
and started transmitting the first television channel in the world
in the same year.
The tower was not the first television transmitter.
Transmitters for television existed as early as 1910s
but the fierce competition with the radio marginalized television
technology for decades.
The tower in Alexandra Palace was the first public television broadcast.
And, despite several interruptions including the World War II,
the tower is used to broadcast television channels to this day.

These are NumPy constructs that seem very complicated at first but once you use them long enough they become second nature. It is important to notice that these two NumPy variables are function like objects that are not called as functions, instead we use indexing notation to get their functionality.

`np.c_`

concatenates as many arrays as given over the last axis.
It forces all arrays to have at least $2$ dimensions,
hence it makes it easy to add columns to arrays.

In [34]:

```
x = np.arange(12).reshape((3, 4))
np.c_[x, [1, 2, 3], [1, 1, 1], [0, 0, 0], x]
```

Out[34]:

`np_r_`

is what `np.c_`

uses behind the scenes.
`np.r_`

is very powerful but may be quite hard to grasp.
It may receive a string of three numbers, in order these mean:
the axis to concatenate over,
the minimum number of dimensions to force all arrays to,
and the axis to place the existing dimension of the arrays given.

That already sounds quite confusing but then come all the
remaining indexes into `np.r_`

.
These indexes may be arrays but also may be slices.
The slices generate new arrays.
If the step of the slice is an integer then `np.arange`

is used to generate the array.
If the step is a complex number (`j`

in Python) then
`np.linspace`

is used instead.

The most common use for `np.r_`

is to quickly generate data.
We will have a look at one such use here,
we generate each column of the following array
as a $10$ point `np.linspace`

.

In [35]:

```
n = 10j
np.r_['1,2,0', 1:7:n, 0:3:n, 10:12:n]
```

Out[35]:

These constructs are quite complex alright.
Yet, one needs to be aware of them when reading NumPy code.
Both `np.c_`

and `np.r_`

are very convenient once harnessed.
And the only way to harness them is practice.