1

I get some metrics on validation data while training a model , and in my case the they are :

(0.25, 0.31, 0.46, 0.57, 0.65, 0.75, 0.77, 0.78, 0.84, 0.84, 0.85, 0.84, 0.84, 0.84, 0.82, 0.8, 0.8, 0.79, 0.78, 0.77, 0.77, 0.77, 0.75, 0.74, 0.73, 0.73, 0.73, 0.73, 0.73, 0.73)

They can described like this :

enter image description here

In my view , the ideal result should be like : enter image description here

Is it a matter of overfitting ?

Unfortunately , I tried few times to change the regular coefficients to avoid overfitting , and adjust learning rate coefficients to slow down , but it was still "convex" .

How can I achieve the ideal result showed above ?

Much appreciated if anyone would give me some constructive tips ?

joe
  • 399
  • 2
  • 12
  • What model are you training? – Armen Aghajanyan Dec 09 '16 at 08:34
  • General Linear model , such as logic regression – joe Dec 09 '16 at 08:38
  • Size of dataset? Type of validation? Still see same effect in k-fold cross-validation? – Neil Slater Dec 09 '16 at 09:37
  • I used the "toy data" to validate my recommendation system with Spark , so I simplified the whole process . The size of dataset is small (15K) . I tried few tens of times to adjust coefficients , and almost each result came out to be "convex" . – joe Dec 09 '16 at 11:04

2 Answers2

1

Yes, what you are seeing is a classic case of overfitting.

You stated that you use a linear model such as logistic regression. To regularize these types of models, usually L1 and/or L2 regularization is applied. L1 regularization is simply $||W||_1$ and L2 is $||W||_2^2$ usually.

Another method is to alter the labels of the model in a specific way, which is a method of regularization I created (shameless plug). Here is the link to the paper: https://arxiv.org/abs/1609.06693

Hope this helps.

0

just do cross-validation between test-ds & validation-ds -- to select the moment in iterations, when overfitting starts in order to ignore all further iterations

JeeyCi
  • 121
  • 4