#9 Machine Learning & Data Science Challenge 9

#9 Machine Learning & Data Science Challenge 9

Why we can't do a classification problem using Regression?

  • With linear regression, you fit a polynomial through the data - say, like in the example below, we fit a straight line through {tumor size, tumor type} sample set:

01.png

  • Above, malignant tumors get 1, and non-malignant ones get 0, and the green line is our hypothesis h(x). To make predictions, we may say that for any given tumor size x, if h(x) gets bigger than 0.5, we predict malignant tumors.

  • Otherwise, we predict benignly. It looks like this way, we could correctly predict every single training set sample, but now let's change the task a bit.

  • Intuitively it's clear that all tumors larger than a certain threshold are malignant. So let's add another sample with a huge tumor size, and run linear regression again:

02.png

  • Now our h(x)>0.5→malignant doesn't work anymore. To keep making correct predictions, we need to change it to h(x)>0.2 or something - but that is not how the algorithm should work.

  • We cannot change the hypothesis each time a new sample arrives. Instead, we should learn it off the training set data, and then (using the hypothesis we've learned) make correct predictions for the data we haven't seen before.

Linear regression is unbounded.