Highest Voted 'cosine-distance' Questions - Data Science Stack Exchange

15

votes

4 answers

Alternatives to TF-IDF and Cosine Similarity when comparing documents of differing formats

I've been working on a small, personal project which takes a user's job skills and suggests the most ideal career for them based on those skills. I use a database of job listings to achieve this. At the moment, the code works as follows: 1) Process…

asked Jan 02 '17 at 20:41

Richard Knoche

151
1
1
3

12

votes

2 answers

cosine_similarity returns matrix instead of single value

I am using below code to compute cosine similarity between the 2 vectors. It returns a matrix instead of a single value 0.8660254. [[ 1. 0.8660254] [ 0.8660254 1. ]] from sklearn.metrics.pairwise import cosine_similarity vec1 =…

machine-learning python scikit-learn cosine-distance

asked Jan 15 '18 at 13:22

Olivia Brown

223
1
2
4

12

votes

4 answers

Can I use cosine similarity as a distance metric in a KNN algorithm

Most discussions of KNN mention Euclidean,Manhattan and Hamming distances, but they dont mention cosine similarity metric. Is there a reason for this?

classification recommender-system cosine-distance

asked Jan 09 '18 at 16:05

Victor

591
3
7
19

11

votes

1 answer

Calculate cosine similarity in Apache Spark

I have a DataFrame with IDF of certain words computed. For example (10,[0,1,2,3,4,5],[0.413734499590671,0.4244680552337798,0.4761400657781007, 1.4004620708967006,0.37876590175292424,0.48374466516332]) .... and so on Now give a query Q, I can…

machine-learning nlp apache-spark cosine-distance

asked Aug 10 '16 at 05:43

Ganesh Krishnan

243
1
2
6

9

votes

1 answer

Why is the cosine distance used to measure the similatiry between word embeddings?

While computing the similarity between the words, cosine similarity or distance is computed on word vectors. Why aren't other distance metrics such as Euclidean distance suitable for this task. Let us consider 2 vectors a and b. Where, a = [-1,2,-3]…

word-embeddings distance cosine-distance

asked Sep 03 '20 at 12:45

Ashwin Geet D'Sa

1,049
1
9
19

8

votes

1 answer

Cosine Distance > 1 in scipy

I am working on a recommendation engine, and I have chosen to use SciPy's cosine distance as a way of comparing items. I have two vectors: a = [2.7654870801855078, 0.35995355443076027, 0.016221679989074141, -0.012664358453398751,…

python distance cosine-distance

asked Oct 13 '15 at 22:23

redgem

183
1
1
4

8

votes

2 answers

What should be the value of non-rated field when finding cosine similarity

I am working on a very basic book recommender system. I want to know what to do with the fields which aren't rated by the user when finding cosine similarity, should we ignore them and calculate only with the rated fields or should we mark them…

correlation recommender-system cosine-distance

asked Jun 12 '16 at 17:27

divyum

181
2

6

votes

1 answer

Evaluating the performance of a machine learned recommendation system

I have a set of resumes $R=\{{r_1,...,r_n\}}$, which I've transformed to a vector space using TF-IDF. Each resume has a label, which is the name of their current employer. Each of these labels comes from the set of possible employers $E =…

machine-learning recommender-system model-evaluations information-retrieval cosine-distance

asked Dec 06 '19 at 22:27

Data

467
3
11

5

votes

5 answers

Cosine similarity vs The Levenshtein distance

I wanted to know what is the difference between them and in what situations they work best? As per my understanding: [Cosine similarity][1] is a measure of similarity between two non-zero vectors of an inner product space that measures the cosine of…

similarity metric cosine-distance levenshtein-distance

asked Nov 18 '19 at 08:52

Pluviophile

3,520
11
29
49

5

votes

1 answer

Calculating cosine similarity between 3D arrays using Python

I have two matrices with multiple columns and three rows each. I calculated the cosine similarity (sklearn) but it gives the result as a matrix. How can I obtain one single value? The matrices are the embeddings of two words each, obtained from…

python cosine-distance bert matrix

asked Jun 18 '19 at 10:36

GAYATRI VENUGOPAL

51
1
2

4

votes

4 answers

How to find similarity/distance matrix with mixed Continuous and Categorical data?

Say I have a dataset like this: Hotel HasPool AvgPrice 1 1 $123 2 0 $234 3 1 $200 Currently I have broken down the dataset into 2 (one containing all continuous, other all categorical). The continuous…

similarity cosine-distance

asked Dec 07 '15 at 15:40

UD1989

258
1
3
6

4

votes

1 answer

word2vec word embeddings creates very distant vectors, closest cosine similarity is still very far, only 0.7

I started using gensim's FastText to create word embeddings on a large corpus of a specialized domain (after finding that existing open source embeddings are not performing well on this domain), although I'm not using its character level n-grams, so…

word2vec word-embeddings gensim embeddings cosine-distance

asked May 31 '19 at 10:35

Oren Matar

221
1
7

4

votes

3 answers

Cosine similarity with arrays contaning NaN

I am trying to calculate a cosine similarity using Python in order to find similar users basing on ratings they have given to movies. As it can be expected there are a lot of NaN values. I am using movie dataset from Kaggle. When I use np.dot() on…

python recommender-system numpy cosine-distance

asked Apr 27 '19 at 13:15

user641597

133
3
7

4

votes

3 answers

clustering 2-dimensional euclidean vectors - appropriate dissimilarity measure

I've got a set of approx. 50 000 2-dimensional euclidean vectors which are connected with 20 groups, i.e. each group has approx. 2500 2-dimensional euclidean vectors. My data includes endpoints coordinates, i.e. $x_0, y_0, x_1, y_1$. Now I would…

clustering k-means similarity distance cosine-distance

asked Jul 09 '18 at 13:50

jakes

95
12

4

votes

1 answer

How to find similar time series?

I've got a collection of yearly data (one value per year per category), and I'd like to find series that are most similar to one another. Example data is here. I don't know much about data science, but it seems like cosine similarity might be the…

time-series similarity cosine-distance

asked Mar 19 '18 at 20:16

user2315852

71
1
4

Questions tagged [cosine-distance]