1

For knwoledge graph completion, it is very common to use margin-based ranking loss

In the paper:margin-based ranking loss is defined as

$$ \min \sum_{(h,l,t)\in S} \sum_{(h',l,t')\in S'}[\gamma + d(h,l,t) - d(h',l,t')]_+$$

Here $d(\cdot)$ is the predictive model, $(h,l,t)$ means a positive training instance, and $(h',l,t')$ means a negative training instance corresponding to $(h,l,t)$.

However, in the Andrew's paper, it defines

$$ \min \sum_{(h,l,t)\in S} \sum_{(h',l,t')\in S'}[\gamma + d(h',l,t') - d(h,l,t)]_+$$

It seems that they switch the terms $d(h',l,t')$ and $d(h,l,t)$.

My question is that

does it matter to switch $d(h',l,t')$ and $d(h,l,t)$? it's real strange definition. Thanks

Esmailian
  • 9,147
  • 2
  • 31
  • 47
jason
  • 309
  • 2
  • 4
  • 9

1 Answers1

2

In this paper, $d$ denotes "dissimilarity" which should be minimized for positive samples.

In this paper, $d$ ($g$ in the paper) denotes "similarity" which should be maximized for positive samples (or equivalently $-g\left(T^{(i)}\right)$ should be minimized)

Esmailian
  • 9,147
  • 2
  • 31
  • 47