

If we select 100 random people where men are really short and women are really tall, then the model might incorrectly assume that height is the differentiating feature. If we did, then we wouldn’t need machine learning in the first place.įor example, let’s say you want to build a model that can differentiate between a male and a female based on certain features. In mathematical terms, we say that we don’t know the true distribution over all the inputs and outputs. But the problem in the real world is that we don’t know what “all the possibilities” would look like. True risk computes the average loss over all the possibilities. This is where “true risk” comes into picture. We want to know what the loss is over all the possibilities.

But this is applicable only to the given set of inputs and outputs. Given a set of inputs and outputs, this loss function measures the difference between the predicted output and the true output. To understand it, we need to talk a bit about the idea of a loss function. How do we find the function that’s the best representative of the true solution? Now how can we measure the effectiveness of this chosen function given that we don’t know what the actual distribution looks like? Bear in mind that all the potential functions can achieve the given goal. We have to choose this function from a set of potential functions. We can think of the process of supervised learning as choosing a function that achieves a given goal.

Since we don’t have every single data point that represents each class completely, we just use the next best thing available, which is a dataset that’s representative of the classes. The actual goal of supervised learning is to find a model that solves a problem as opposed to finding a model that best fits the given dataset. The concept of Empirical Risk Minimization becomes relevant in the world of supervised learning. Even though it has an ornate name, the underlying concept is actually quite simple and intuitive.
