06.00 Machine Learning

Until now we have seen a lot of ways to describe data (including plotting) which allows us to get some insight into the processes that caused the creation of that data. We can also argue that we can get insight into the process of measuring this data. The analysis and interpretation of data are a branch of statistics therefore we can classify what we have been doing until now as an exercise in statistics.

Machine

skl-terminator.svg

Machine learning can be understood as a branch of statistics and is classified as such by some. On the other hand, several people argue that machine learning is a different area that overlap with statistics. Without caring about the schematics we will say that machine learning is a collection of techniques that extract and use information contained in the data to predict the behavior of similar data. Note that this is different from the general goal of statistics.

  • Statistics' goal is to interpret the resulting model of the data and from there understand the inherent process that creates the data. Whether we can construct a similar process and create new data in similar fashion is not a requirement.

  • Machine learning's goal is to construct a model that will predict the behavior of new inputs just as if the inherent process would perform, this without necessarily performing or understanding the inherent process creating the data.

It is viable to, and we often do, perform statistics on the products of machine learning. For example, after running thousands of models we perform statistics to try to understand non-linear relations in the hyperparameters of an ML model.

Forms of Machine Learning

Although vastly outdated the common classification of machine learning techniques into groups follows.

In Supervised Learning one has some answers to the problem and plans to automate the solution to this problem. The algorithms will attempt to find how data inputs map to the solutions. After which the resulting models will be able to give a solution to data never seen before. We often subdivide supervised learning further into.

  • Classification, where one predicts crisps classes. In other words one identifies, for example, ships from among cars. But there is no middle ground, an amphibious vehicle will be identified as a car or as a ship. Classification is the most common problem in the world around us, e.g. is the figure I'm walking towards a person or a lamp post? are they moving away or towards me? Hence classification is also the most common ML implementation out there.

  • Regression, where the answers are ordered numbers. The difference against classification is that the answer from a regression algorithm is a value anywhere within a reasonable range. We can have $27$ as the answer to a problem, so we can have $42$; and also any value in between, such as $32.64$. Regression is mos often used for ranking lists of items. Yes, your web search and your recommendation lists are ordered according to regressions. Another common use for regression algorithms is in the area of physical control of matter. Since matter has continuous nature in the world we see, managing water levels, wind speeds, temperature, or tremor intensity; is often done against regression predictions.

In Unsupervised Learning we do not have answers to the problem we are attempting to solve. But we are going to try to solve it anyway. Moreover, if we identify patterns that may turn to be useful, then we can use these patterns to find identifiers for new data. Even if we do not know what these identifiers are or what they may mean in the real world. We subdivide this search for patterns into.

  • Dimensionality Reduction, which isn't a technique for pattern search itself but one closely related to it anyway. Most patterns we search using ML are not easy to see, otherwise we would have seen them already. Patterns in highly dimensional spaces are particularly difficult for us humans to visualize. Dimensionality reduction techniques attempt to reduce the dataset one may have into a manageable chunk of dimensions without losing the patterns within. This may be achieved by projecting dimensions, from where the group of techniques gets its name. But many other techniques exist that: keep inter-distance between data points, or define close and far away points and then probabilistically place points in a new projection according to how far or close they are from each other.

  • Clustering is the ML group of techniques to find patterns in data about which we can tell little about. We can find groupings within the data that are similar to each other. Despite the fact that we may not know the real reasons why these groupings are close or far apart. We can then use the groupings we know to determine in which of the groupings new data points belong. The classic example of clustering are social networks. We can collect big numbers of features (dimensions) about individuals (data points), and then cluster them together. The result will be circles of friends among these individuals, yet we have little understanding on why exactly these specific groups of friends form as opposite to others. Based on a clustering feature count, arguing that it is more likely that playing golf wins you friends, rather than playing squash is not knowledge. It is just a statistical artifact that some scientists publish as click bait and then consider it "science".

A single technique does not necessarily fit one specific bullet point, e.g. SVMs can be used for classification or regression, and Neural Networks can be used for any of the points above. We will use this grouping of ML techniques as we explore some algorithms. Later we will come back and add several new groupings, needed due to the fact that some ML techniques do not fit into any of the above.

Validation

Since we do not have mathematical rigor on the selection of hyperparameters, we need some form of checking whether our model is (more-or-less) right. Let's see how to build a model in Python, then how those so-called hyperparameters work and then we see if we can think of a way of evaluating our models.