10.09 ANN Architectures

Apart from multiple perceptrons interconnected with each other, several neural network architectures have been developed over the years. We did only see the autoencoder but many others exist.



Deep Architectures

  • Deep Neural Networks are simply networks with lots of layers, but training layers deep through backpropagation turns to be hard. A vanishing gradient problem it tackled by careful selection of activation functions.

  • Convolutional Neural Nets are networks with one or more convolutional layers, these layers are not fully connected providing feature selection based on parts of the input.

  • Recurrent Neural Networks have connections going backwards, i.e. the output of a neuron feeds into the input of a neuron in the same or previous layer. RNNs do not need to be deep networks but excel as such.

  • Long Short Term Memory (LSTM) are specifically constructed RNNs, with very specific activation functions per layer.

  • Autoencoders are DNNs which can repeat the patterns they were presented with.

Other Architectures

  • Hopfield Networks are early autoencoders, these could repeat a known pattern when presented with a similar one.

  • Boltzman Machines are networks of probabilistic neurons where all neurons are connected in all directions. The input and output is done from the same neurons (visible neurons).

  • Restricted Boltzman Machines are BMs in which the hidden layer neurons are not interconnected. These are much easier to train than full BMs.

  • Deep Belief Networks are stacked RBMs on top of each other. Each RBM can be trained separately, and we can stack several layers or RBMs. These were the early DNNs.

  • Self-Organizing Maps are unsupervised networks for data visualization and dimensionality reduction. They use the concept that connections through which data passes should be reinforced whilst all other connections should decay.

ANN Libraries

Today we still experience very fast evolution of NN techniques, and thanks to that also a very fast evolution of NN libraries. Some relevant names, all of which perform better than sklearn on NN building, follow.


Was one of the first academic libraries that allowed transparent use of GPUs as if one was working with NumPy array. Theano started the trend of using a directed acyclic graphs DAG, to quickly compute derivatives. It was developed at the university of Montreal and is still used in many places. Currently new features ceased to be added to Theano because it was considered counter productive to develop it further as many libraries with commercial backing can do the same.

Tensorflow and Keras

The current top library for most NN computing in industry. It really is just a DAG processing engine on top of tensors. Where tensors are pretty much NumPy arrays with a couple of tricks to allow for quick computation of derivatives. Its main selling point is tensorboard a web UI to monitor the processing of the graph, and therefore monitor the network training. Tensorflow adopted the Keras library as its front end, allowing for a user friendly API.

Issue with Tensorflow happen when one attempts to extend the library. The focus of Tensorflow is industry, and hence the quick development of known models. Research of completely new NN models can be tricky in Tensorflow.

A demo of a simple tensorflow (limited to a handful of 2-dimensional problems) interface can be found at tensorflow playground

Torch (pytorch)

In the Python universe known as pytorch is a library written (mostly) in Lua. The library allows for almost complete freedom of matching and mixing NN parts. It is hence often used to research new methods in NNs.

Just as its predecessors, Torch uses a DAG to compute derivatives. But it borrows from research in automatic graph building, and require less steps to preform the differentiation. Differentiation in torch is as easy as writing a function and asking for a derivative with a procedure. There is support for almost all array operations to be differentiated automatically.

JAX, FLAX and Friends

When one needs to go even further in the freedom of mixing and matching one goes all the way to the automatic differentiation libraries. The original library was autograd and one an still build reasonable NNs with it.

The issue with autograd use is that it does not support GPU computations. For automatic differentiation and nothing else we got JAX. And on top of JAX we have currently a rather new development of flax, which will provide a NN interface to the automatic differentiation.

Hence, for ultimate NN building freedom one would use autograd, or JAX if one has a GPU. But the API of these libraries can be quite a handful, and one need to understand NN internals very well in order to get anything out of them.