Highest Voted 'search' Questions - Data Science Stack Exchange

26

votes

2 answers

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…

asked Feb 10 '16 at 16:40

tokestermw

418
1
4
8

15

votes

5 answers

How can I ensure anonymity with queries to small datasets?

I'm building a service that will contain personal data relating to real people. Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently. An example of a…

descriptive-statistics search anonymization counts privacy

asked Mar 23 '23 at 08:57

mal

253
6

12

votes

3 answers

How does a query into a huge database return with negligible latency?

For example, when searching something in Google, results return nigh-instantly. I understand that Google sorts and indexes pages with algorithms etc., but I imagine it infeasible for the results of every single possible query to be indexed (and…

bigdata google search

asked May 15 '14 at 11:22

resgh

231
1
7

8

votes

5 answers

Best way to search for a similar document given the ngram

I have a database of about 200 documents who's ngrams I have extracted. I want to find the document in my database that is most similar to a query document. In otherwords, I want to find the document in the database that shares the most number of…

nlp similarity search information-retrieval

asked Nov 17 '15 at 03:06

okebz

113
4

8

votes

2 answers

What are some standard ways of computing the distance between individual search queries?

I made a similar question asking about distance between "documents" (Wikipedia articles, news stories, etc.). I made this a separate question because search queries are considerably smaller than documents and are considerably noisier. I hence…

machine-learning nlp search

asked Jul 05 '14 at 16:20

Matt

811
1
7
12

7

votes

1 answer

How can we effectively measure the impact of our data decisions

Apologies if this is very broad question, what I would like to know is how effective is A/B testing (or other methods) of effectively measuring the effects of a design decision. For instance we can analyse user interactions or click results,…

search

asked Jul 23 '14 at 08:06

EdChum

355
1
10

6

votes

2 answers

Preparing for a Machine Learning Design Interview

I am not sure if this is a relevant post here but: I made it to the final round for the Machine Learing Engineer position at Facebook. The final round interview is virtual (thanks to Corona) and will consist of: 2 - General Algorithmic Coding…

machine-learning ranking search

asked Mar 20 '20 at 06:54

Wolfy

237
2
9

6

votes

2 answers

Why do popular search engines not follow the usual AND, OR logic for queries?

I am teaching myself Information Retrieval from Christopher Manning's book (PDF link: http://nlp.stanford.edu/IR-book/pdf/01bool.pdf). I tried Exercise 1.13: "Try using the Boolean search features on a couple of major web search engines.…

information-retrieval search search-engine

asked Jan 11 '17 at 05:55

user21595

6

votes

3 answers

Can we quantify how position within search results is related to click-through probability?

Suppose, for example, that the first search result on a page of Google search results is swapped with the second result. How much would this change the click-through probabilities of the two results? How much would its click-through probability drop…

recommender-system search information-retrieval regression

asked Oct 10 '14 at 03:45

zihaolucky

141
4

5

votes

4 answers

How does Google categorize results from its image search?

While doing a Google image search, the page displays some figured out categories for the images of the topic being searched for. I'm interested in learning how this works, and how it chooses and creates categories. Unfortunately, I couldn't find…

machine-learning classification google search

asked Jun 26 '14 at 12:11

yakka

51
1

5

votes

1 answer

Where is the cost parameter C in the RBF kernel in SVM?

RBF kernel using SVM depends on two parameters C and gamma. If the equation of the kernel RBF as the following: $K(X,X')= \exp(\gamma||X-X'||^2)$ In the equation I can see where can I use gamma, but I can't find the C parameter. So, can enybody tell…

machine-learning classification svm search

asked May 07 '15 at 21:41

Weam

51
1
2

5

votes

2 answers

How to deal with position bias in search?

In search, position of the search result affects the click-through rate a great deal. How do people usually deal with this ? In practice how to remove such bias to create unbiased training data for training learning to rank model ?

machine-learning recommender-system search

asked Mar 22 '18 at 00:14

Jing

171
3

4

votes

3 answers

When is there enough data for generalization?

Are there any general rules that one can use to infer what can be learned/generalized from a particular data set? Suppose the dataset was taken from a sample of people. Can these rules be stated as functions of the sample or total population? I…

machine-learning data-mining statistics search

asked Aug 04 '14 at 19:10

Matt

811
1
7
12

4

votes

2 answers

Weighted k nearest neighbor search

I've searched quite a bit and haven't landed on any useful results. The problem statement is: Given a set of vectors, I wish to find its approximate k-nearest neighbors. The caveat here is that each of my dimensions resemble a different entity and…

machine-learning data search

asked Aug 13 '15 at 13:52

sushant-hiray

141
4

4

votes

2 answers

Algorithm for multiple extended string matching

I need to implement an algorithm for multiple extended string matching in text. Algorithms to match regular expression would be perhaps too slow. Extended means the presence of wildcards (any number of characters instead of a star), for…

algorithms search

asked Mar 10 '15 at 10:20

Konstantin

153
9

Questions tagged [search]