I've been working on creating a regularization term that ensures that correlated attributes are given similar weights in a linear model. This helps to avoid some of the inconsistency in the weights of the correlated attributes. For more insight on the problem I'm trying to solve see: this post
Here is the regularization term: $\lambda \cdot r^2 \cdot (B_1 - B_2)^2$
Where $\lambda$ is the regularization parameter, $r$ is the correlation coefficient between the two attributes, and $B_1$ and $B_2$ are the model coefficients of the two attributes. You would need to extend the term for each pair of attributes that you are concerned about, which could be all pairs if you're dealing with a small number of attributes, or just the highly correlated ones.
If the attributes are highly correlated, and their coefficients in the linear model are very different, then a high loss penalty will be applied. Please let me know what you think of this novel regularization term, or if you've seen something like it already.
Edit: Thank you for your feedback on the potential usefulness and flaws of this term. Based on Broele's suggestion, I first need to figure out how to handle negative correlations. Then I can take Luca Anzalone's suggestion of empirically testing the term on some actual data. So besides my broader question of "Is this a useful term for improving the stability and interpretability of a linear model's weights?" I now have the specific question, "How do I modify this term to handle negative correlations?". My first thought is to handle it with cases. If $r$ is negative, then use addition instead of subtraction:
$\lambda \cdot r^2 \cdot (B_1 + B_2)^2$
This means that when there is a high negative correlation, the coefficients should become the negation of each other. However, I'm concerned that this equation might drive all of the weights towards zero. Is there a better way to formulate this regularization term and still achieve the goal of making weights more similar when correlation is high, and similar value but negated with high negative correlation?