Need feedback on idea for new regularization term

Question

I've been working on creating a regularization term that ensures that correlated attributes are given similar weights in a linear model. This helps to avoid some of the inconsistency in the weights of the correlated attributes. For more insight on the problem I'm trying to solve see: this post

Here is the regularization term: $\lambda \cdot r^2 \cdot (B_1 - B_2)^2$

Where $\lambda$ is the regularization parameter, $r$ is the correlation coefficient between the two attributes, and $B_1$ and $B_2$ are the model coefficients of the two attributes. You would need to extend the term for each pair of attributes that you are concerned about, which could be all pairs if you're dealing with a small number of attributes, or just the highly correlated ones.

If the attributes are highly correlated, and their coefficients in the linear model are very different, then a high loss penalty will be applied. Please let me know what you think of this novel regularization term, or if you've seen something like it already.

Edit: Thank you for your feedback on the potential usefulness and flaws of this term. Based on Broele's suggestion, I first need to figure out how to handle negative correlations. Then I can take Luca Anzalone's suggestion of empirically testing the term on some actual data. So besides my broader question of "Is this a useful term for improving the stability and interpretability of a linear model's weights?" I now have the specific question, "How do I modify this term to handle negative correlations?". My first thought is to handle it with cases. If $r$ is negative, then use addition instead of subtraction:

$\lambda \cdot r^2 \cdot (B_1 + B_2)^2$

This means that when there is a high negative correlation, the coefficients should become the negation of each other. However, I'm concerned that this equation might drive all of the weights towards zero. Is there a better way to formulate this regularization term and still achieve the goal of making weights more similar when correlation is high, and similar value but negated with high negative correlation?

What exactly is your question? We are a question-and-answer site, so we require you to articulate a speciifc question. "Please let me know what you think of this" is likely a bit too vague and open-ended -- see our [help/dont-ask]. — D.W., Jul 29 '23 at 08:35
Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. — Community, Jul 29 '23 at 13:39
I think you should compare (and maybe also provide the results) the linear model with and without your regularizer — Luca Anzalone, Jul 29 '23 at 16:15
@D.W. My broader initial question was, "Is this a useful term for improving the stability and interpretability of a linear model's weights?". Given the initial feedback, I have the more specific question of, "How do I modify this term to handle negative correlations?" — Brett L, Jul 29 '23 at 20:38
@Broele Thanks, that's a good point! I may an edit with my suggestion for handling negative correlations. I think it makes sense, but I'll be curious if people see it as the best solution. — Brett L, Jul 29 '23 at 20:41
@LucaAnzalone Thanks, his makes sense as the next step for assessing the usefulness of this term. I think it makes theoretical sense, but it needs empirical testing to see if it works as intended. — Brett L, Jul 29 '23 at 20:43
Do you really want to have both weights equal to each other? Imagine two features with $x_2 = 100 * x_1$? In my answer to you other post (it is more a post then a question), I showed that L2 regularization will already lead to a defined state, which would be influenced by this factor of 100. — Broele, Jul 29 '23 at 22:14
@Broele If I understand your concern correctly, I think that standardization should solve this issue, because the relationship X2 = 100 * X1 will become Z2 = Z1. That said, I think you are right that L2 regularization is a nice alternative to what I have suggested and would probably serve as a better general-purpose regularization technique that also helps with co-linearity issues. However, I still wonder if my proposed technique could serve some niche purpose, like in terms of interpretation or co-linearity in a linear model. — Brett L, Jul 31 '23 at 21:44
Indeed, standardization would solve this issue. And whether your approach outperforms L2 Regularization in cases of colinearity (or other high correlations) is still open. Analyzing this would definitely be out of scope for an answer, here — Broele, Jul 31 '23 at 22:13
It would be interesting to see how both approaches perform in case of multiple correlations. E.g. X3 = 0.5*X1 + 1.2*X2 + — Broele, Jul 31 '23 at 22:17

Need feedback on idea for new regularization term

0 Answers0

Linked