Inside the maths that drives AI


People usually talk about the race to the bottom in artificial intelligence as a bad thing. But it’s different when you’re discussing loss functions.

Loss functions are a crucial but frequently overlooked component of useful artificial intelligence (AI), and they’re all about getting to the bottom — albeit of a literal curve on a graph — as quickly as possible. When training an algorithm to automate tedious data analysis, such as looking for specific features in millions of photographs, you need a way of measuring its performance. That’s the ‘loss function’: it measures an algorithm’s error relative to the ‘ground truth’ of the data — information that is known to be real or true. Then you adjust the algorithm’s parameters, rinse and repeat, and hope the error is smaller next time. “You’re trying to find a minimum: the point where the error is as small as possible — hopefully zero,” says Anna Bosman, a computational-intelligence researcher at the University of Pretoria.

Dozens of off-the-shelf loss functions have been written. But choose the wrong one, or just handle it badly, and the algorithm can lead you astray. It could blatantly contradict human observations, make random fluctuations (known as experimental noise) look like data, or even obscure the central results of an experiment. “There are lots of things that can go wrong,” Bosman says. And worst of all, the opacity of AI means you might not even know that you’ve been misled.

That’s why a growing number of scientists are abandoning ready-made loss functions and constructing their own. But how do they get it right? How do you make a home-made loss function your go-to tool and not a time-swallowing mistake?

Error assessment

Machine-learning algorithms are generally trained on annotated data, or by being told when they get the answer wrong. Loss functions provide a mathematical measure of wrongness, but there are multiple ways to quantify that.

‘Absolute error’ functions, for example, report the difference between the algorithm’s prediction and the target value. Then there’s mean squared error: square the differences between your predictions and the ground truth, and then average them out across the whole data set.

Mean squared error is a simple, straightforward and proven approach that works well when errors are relatively small and consistent. But it can be problematic if your data are full of outliers, because the algorithm amplifies their impact. A loss function called pseudo-Huber (a smooth approximation of an approach called the Huber loss function) considers whether each data point’s error is large or small, providing a compromise between the mean squared error and absolute error loss functions.

Absolute error, mean squared error and Huber are most useful for regression analysis, which uses past data on continuous variables such as height or weight in a population to predict what shape future data sets will take (see ‘Quantifying loss’). Classification tasks, by contrast, answer questions such as what type of object something is and how many of them are in the data set. In this case, the machine-learning algorithm determines the probability that an object belongs to a particular class — how likely it is that a particular collection of pixels represents a dog, for example. The most useful loss functions include cross entropy, a measure that compares probability distributions related to the model’s output and the real-world value (also known as maximum likelihood), and hinge loss. The latter finds the curve that is the farthest possible distance from every data point, providing the cleanest possible decision about which category a data point falls into.

QUANTIFYING LOSS: figure showing three of the most popular ways to measure accuracy in artificial intelligence.



Leave a Reply

Your email address will not be published. Required fields are marked *