Questions tagged [search]
53 questions
26
votes
2 answers
How fit pairwise ranking models in XGBoost?
As far as I know, to train learning to rank models, you need to have three things in the dataset:
label or relevance
group or query id
feature vector
For example, the Microsoft Learning to Rank dataset uses this format (label, group id, and…
tokestermw
- 418
- 1
- 4
- 8
15
votes
5 answers
How can I ensure anonymity with queries to small datasets?
I'm building a service that will contain personal data relating to real people.
Initially the dataset will be quite small, and as such it may be possible to identify individuals if the search parameters are narrowed sufficiently.
An example of a…
mal
- 253
- 6
12
votes
3 answers
How does a query into a huge database return with negligible latency?
For example, when searching something in Google, results return nigh-instantly.
I understand that Google sorts and indexes pages with algorithms etc., but I imagine it infeasible for the results of every single possible query to be indexed (and…
resgh
- 231
- 1
- 7
8
votes
5 answers
Best way to search for a similar document given the ngram
I have a database of about 200 documents who's ngrams I have extracted. I want to find the document in my database that is most similar to a query document. In otherwords, I want to find the document in the database that shares the most number of…
okebz
- 113
- 4
8
votes
2 answers
What are some standard ways of computing the distance between individual search queries?
I made a similar question asking about distance between "documents" (Wikipedia articles, news stories, etc.). I made this a separate question because search queries are considerably smaller than documents and are considerably noisier. I hence…
Matt
- 811
- 1
- 7
- 12
7
votes
1 answer
How can we effectively measure the impact of our data decisions
Apologies if this is very broad question, what I would like to know is how effective is A/B testing (or other methods) of effectively measuring the effects of a design decision.
For instance we can analyse user interactions or click results,…
EdChum
- 355
- 1
- 10
6
votes
2 answers
Preparing for a Machine Learning Design Interview
I am not sure if this is a relevant post here but:
I made it to the final round for the Machine Learing Engineer position at Facebook. The final round interview is virtual (thanks to Corona) and will consist of:
2 - General Algorithmic Coding…
Wolfy
- 237
- 2
- 9
6
votes
2 answers
Why do popular search engines not follow the usual AND, OR logic for queries?
I am teaching myself Information Retrieval from Christopher Manning's book (PDF link: http://nlp.stanford.edu/IR-book/pdf/01bool.pdf). I tried Exercise 1.13:
"Try using the Boolean search features on a couple of major web search engines.…
user21595
6
votes
3 answers
Can we quantify how position within search results is related to click-through probability?
Suppose, for example, that the first search result on a page of Google search results is swapped with the second result. How much would this change the click-through probabilities of the two results? How much would its click-through probability drop…
zihaolucky
- 141
- 4
5
votes
4 answers
How does Google categorize results from its image search?
While doing a Google image search, the page displays some figured out categories for the images of the topic being searched for. I'm interested in learning how this works, and how it chooses and creates categories.
Unfortunately, I couldn't find…
yakka
- 51
- 1
5
votes
1 answer
Where is the cost parameter C in the RBF kernel in SVM?
RBF kernel using SVM depends on two parameters C and gamma. If the equation of the kernel RBF as the following:
$K(X,X')= \exp(\gamma||X-X'||^2)$
In the equation I can see where can I use gamma, but I can't find the C parameter.
So, can enybody tell…
Weam
- 51
- 1
- 2
5
votes
2 answers
How to deal with position bias in search?
In search, position of the search result affects the click-through rate a great deal. How do people usually deal with this ? In practice how to remove such bias to create unbiased training data for training learning to rank model ?
Jing
- 171
- 3
4
votes
3 answers
When is there enough data for generalization?
Are there any general rules that one can use to infer what can be learned/generalized from a particular data set? Suppose the dataset was taken from a sample of people. Can these rules be stated as functions of the sample or total population?
I…
Matt
- 811
- 1
- 7
- 12
4
votes
2 answers
Weighted k nearest neighbor search
I've searched quite a bit and haven't landed on any useful results.
The problem statement is:
Given a set of vectors, I wish to find its approximate k-nearest neighbors.
The caveat here is that each of my dimensions resemble a different entity and…
sushant-hiray
- 141
- 4
4
votes
2 answers
Algorithm for multiple extended string matching
I need to implement an algorithm for multiple extended string matching in text. Algorithms to match regular expression would be perhaps too slow.
Extended means the presence of wildcards (any number of characters instead of a star), for…
Konstantin
- 153
- 9