Questions tagged [jaccard-coefficient]

Jaccard coefficient (or jaccard similarity) is a similarity function for computing the similarity between two sets

Jaccard similarity (or jaccard coefficient) is a similarity function for computing the similarity between two sets

The jaccard coefficient between two sets $A$ and $B$ is defined as

$$\text{jaccard}(A, B) = \cfrac{| \ A \cap B \ | }{| \ A \cup B \ |}$$

I.e. it's the ratio between the number of elements that $A$ and $B$ share to the total number of elements in both $A$ and $B$

16 questions
3
votes
1 answer

What is the correct formula for Jaccard coefficient with integer vectors?

I understand the Jaccard index is the number of elements in common divided by the total number of distinct elements. But it seems to be some discrepancy or terminology confusion about Jaccard being applied to "binary vectors", meaning a vector with…
Veronica
  • 133
  • 4
3
votes
1 answer

Jaccard Similarity with Binary Data

I have 5400 rows of data and 3211 columns of attributes. The first 4 columns are ID/Name/ParentID/ObjectType - the rest of the 3207 columns are the attributes that are to be used for similarity measures. Huge dimensionality, I know, but I wanted to…
2
votes
1 answer

Metrics - multi-class model comparisons

I am looking for a way to quantify the performance of multi-class model labelers, and thus compare them. I want to account for the fact that some classes are ‘closer’ than others (for example a car is ‘closer’ to a ‘truck’ than a ‘flower’ is. So,…
Tavi
  • 21
  • 1
2
votes
1 answer

When I would use a specific similarity coefficient over another?

Like using Jaccards over Dice. I want real examples, of when I would prefer to use Jaccards, Dice, Cosine or any other similarity coefficient.
2
votes
2 answers

Jaccard similarity between two items

Calculating similarity between two users is rather straightforward. Consider following example: User A = {7,3,2,4,1} User B = {4,1,9,7,5} Products in common = {1,4,7} Union of products = {1,2,3,4,5,7,9} Hence the Jaccard similarity: 3/7 =…
HonzaB
  • 1,669
  • 1
  • 12
  • 20
1
vote
0 answers

Which string distance equation for fuzzy-matching person names is reliable?

A reproducible example with a small bit of R code is available in this stackoverflow post (link so I dont need to re-type out the code). The fuzzytext library in R has the following available string methods c("osa", "lv", "dl", "hamming", "lcs",…
Canovice
  • 121
  • 4
1
vote
0 answers

Monotonicity of Jaccard and Dice in multilabel datasets

I understand that Jaccard and Dice follow a monotonic relation on binary datasets because the two are related as $J = {S \over {(2 - S)}}$, and I guess this would be the case when micro-average is used with multi-label datasets. However, would the…
1
vote
0 answers

What is the state of the art/research metric to compare ellipses but jaccard coefficient?

Im looking for the, if there is one, metric to compare ellipses with each other. Last time a had a similar dataset (malaria cells, now its pupiles) i used jaccard coefficient but that was more because of i didnt had the time to do further research…
Tollpatsch
  • 131
  • 1
1
vote
1 answer

Jaccard similarity calculate similarity

It is not clear to me how to calculate similarity between two products from the example. How do they calculate that?
mitexabel
  • 13
  • 4
1
vote
1 answer

Implementing Frequently bought together using a DB

We have a classic structure of an online shop database (products, customers, sales) and we want to implement a Frequently bought together feature. Our software is in ASP.NET and we do not know PHP to reverse engineer how this is being done in…
1
vote
1 answer

Distance Metric between 2 lists of sets

I have 2 list of of sets and I want to calculate a distance. set1 = [ {'A', 'B', 'C'}, {'A', 'D', 'X'}, {'X', 'A'} ] set2 = [ {'A', 'B', 'C', 'D'}, {'A', 'X'}, {'X', 'A', 'B'} ] So if the set of sets are equal I want the distance to be…
pettinato
  • 143
  • 6
0
votes
2 answers

Similarity of search results using Jaccard

I have a set of search results with ranking position, keyword and URL. I want to make a distance matrix so I can cluster the keywords (or the URLs). One approach would be to take the first n URL rankings for each keyword and use Jaccard similarity.…
HCg
  • 11
  • 1
  • 2
0
votes
1 answer

Efficiently Sending Two Series to a Function For Strings with an application to String Matching (Dice Coefficient)

I am using a Dice Coefficient based function to calculate the similarity of two strings: def dice_coefficient(a,b): try: if not len(a) or not len(b): return 0.0 except: return 0.0 if a == b: return 1.0 if len(a) == 1…
0
votes
0 answers

Weighting features in Jaccard Distance (1-hot-encoding)

I one-hot-encoded features and want to calculate similarity with the Jaccard index. But I am 100% sure that features have different importance for my clustering (i.e. some features are more important than others to calculate distance between my…
0
votes
1 answer

Why minhash algorithm use random permutation not random selection?

The MinHash algorithm is used to compute the similarity of two sets. The value calculated by MinHash is near to the Jaccard similarity coefficient. The Minhash steps are: Let f map all shingles(k-grams) of the universe to 1...2^m apply a random…
Atena
  • 109
  • 2
1
2