Questions tagged [search]

53 questions
26
votes
2 answers

How fit pairwise ranking models in XGBoost?

As far as I know, to train learning to rank models, you need to have three things in the dataset: label or relevance group or query id feature vector For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…
tokestermw
  • 418
  • 1
  • 4
  • 8
15
votes
5 answers

How can I ensure anonymity with queries to small datasets?

I'm building a service that will contain personal data relating to real people. Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently. An example of a…
12
votes
3 answers

How does a query into a huge database return with negligible latency?

For example, when searching something in Google, results return nigh-instantly. I understand that Google sorts and indexes pages with algorithms etc., but I imagine it infeasible for the results of every single possible query to be indexed (and…
resgh
  • 231
  • 1
  • 7
8
votes
5 answers

Best way to search for a similar document given the ngram

I have a database of about 200 documents who's ngrams I have extracted. I want to find the document in my database that is most similar to a query document. In otherwords, I want to find the document in the database that shares the most number of…
okebz
  • 113
  • 4
8
votes
2 answers

What are some standard ways of computing the distance between individual search queries?

I made a similar question asking about distance between "documents" (Wikipedia articles, news stories, etc.). I made this a separate question because search queries are considerably smaller than documents and are considerably noisier. I hence…
Matt
  • 811
  • 1
  • 7
  • 12
7
votes
1 answer

How can we effectively measure the impact of our data decisions

Apologies if this is very broad question, what I would like to know is how effective is A/B testing (or other methods) of effectively measuring the effects of a design decision. For instance we can analyse user interactions or click results,…
EdChum
  • 355
  • 1
  • 10
6
votes
2 answers

Preparing for a Machine Learning Design Interview

I am not sure if this is a relevant post here but: I made it to the final round for the Machine Learing Engineer position at Facebook. The final round interview is virtual (thanks to Corona) and will consist of: 2 - General Algorithmic Coding…
Wolfy
  • 237
  • 2
  • 9
6
votes
2 answers

Why do popular search engines not follow the usual AND, OR logic for queries?

I am teaching myself Information Retrieval from Christopher Manning's book (PDF link: http://nlp.stanford.edu/IR-book/pdf/01bool.pdf). I tried Exercise 1.13: "Try using the Boolean search features on a couple of major web search engines.…
user21595
6
votes
3 answers

Can we quantify how position within search results is related to click-through probability?

Suppose, for example, that the first search result on a page of Google search results is swapped with the second result. How much would this change the click-through probabilities of the two results? How much would its click-through probability drop…
5
votes
4 answers

How does Google categorize results from its image search?

While doing a Google image search, the page displays some figured out categories for the images of the topic being searched for. I'm interested in learning how this works, and how it chooses and creates categories. Unfortunately, I couldn't find…
yakka
  • 51
  • 1
5
votes
1 answer

Where is the cost parameter C in the RBF kernel in SVM?

RBF kernel using SVM depends on two parameters C and gamma. If the equation of the kernel RBF as the following: $K(X,X')= \exp(\gamma||X-X'||^2)$ In the equation I can see where can I use gamma, but I can't find the C parameter. So, can enybody tell…
Weam
  • 51
  • 1
  • 2
5
votes
2 answers

How to deal with position bias in search?

In search, position of the search result affects the click-through rate a great deal. How do people usually deal with this ? In practice how to remove such bias to create unbiased training data for training learning to rank model ?
Jing
  • 171
  • 3
4
votes
3 answers

When is there enough data for generalization?

Are there any general rules that one can use to infer what can be learned/generalized from a particular data set? Suppose the dataset was taken from a sample of people. Can these rules be stated as functions of the sample or total population? I…
Matt
  • 811
  • 1
  • 7
  • 12
4
votes
2 answers

Weighted k nearest neighbor search

I've searched quite a bit and haven't landed on any useful results. The problem statement is: Given a set of vectors, I wish to find its approximate k-nearest neighbors. The caveat here is that each of my dimensions resemble a different entity and…
4
votes
2 answers

Algorithm for multiple extended string matching

I need to implement an algorithm for multiple extended string matching in text. Algorithms to match regular expression would be perhaps too slow. Extended means the presence of wildcards (any number of characters instead of a star), for…
Konstantin
  • 153
  • 9
1
2 3 4