Most Popular
1500 questions
8
votes
1 answer
What tokenizer does OpenAI's GPT3 API use?
I'm building an application for the API, but I would like to be able to count the number of tokens my prompt will use, before I submit an API call. Currently I often submit prompts that yield a 'too-many-tokens' error.
The closest I got to an answer…
Herman Autore
- 83
- 1
- 3
8
votes
1 answer
what is the difference between "fully developed decision trees" and "shallow decision trees"?
As reading Ensemble methods on scikit-learn docs, it says that
bagging methods work best with strong and complex models (e.g., fully
developed decision trees), in contrast with boosting methods which
usually work best with weak models (e.g.,…
Mithril
- 373
- 6
- 15
8
votes
2 answers
What is the difference between BERT and Roberta
I want to understand the difference between BERT and Roberta. I saw the article below.
https://towardsdatascience.com/bert-roberta-distilbert-xlnet-which-one-to-use-3d5ab82ba5f8
It mentions that Roberta was trained on 10x more data but I don't…
Noman Tanveer
- 83
- 1
- 1
- 7
8
votes
2 answers
Image clustering by similarity measurement (CW-SSIM)
I'm trying to use scikit-learn and pyssim for clustering a set of images - less than 100.
The end goal is to place the images into several buckets (clusters) according to the calculated similarity measures - CW-SSIM.
The task seems to be trivial,…
Oleg Puzanov
- 111
- 1
- 4
8
votes
4 answers
How to give name to topics created using LDA?
I have categorized 800,000 documents into 500 categories using the Mahout topic modelling.
Instead of representing the topic using the top 5/10 words for each topics, I want to infer a generic name for the group using any existing algorithm.
For the…
adihere
- 81
- 1
- 1
- 2
8
votes
2 answers
How to teach neural network a policy for a board game using reinforcement learning?
I need to use reinforcement learning to teach a neural net a policy for a board game. I chose Q-learining as the specific alghoritm.
I'd like a neural net to have the following structure:
layer - rows * cols + 1 neurons - input - values of…
Luke
- 189
- 1
- 11
8
votes
1 answer
Why a restricted Boltzman machine (RBM) tends to learn very similar weights?
These are 4 different weight matrices that I got after training a restricted Boltzman machine (RBM) with ~4k visible units and only 96 hidden units/weight vectors. As you can see, weights are extremely similar - even black pixels on the face are…
ffriend
- 2,791
- 16
- 18
8
votes
4 answers
How to select particular column in Spark(pyspark)?
testPassengerId = test.select('PassengerId').map(lambda x: x.PassengerId)
I want to select PassengerId column and make RDD of it. But .select is not working. It says 'RDD' object has no attribute 'select'
dsl1990
- 181
- 1
- 1
- 2
8
votes
1 answer
Coreference Resolution for German Texts
Does anyone know a libarary for performing coreference resolution on German texts?
As far as I know, OpenNLP and Stanford NLP are not able to perform coreference resolution for German Texts.
The only tool that I know is CorZu which is a python…
Pasmod Turing
- 463
- 2
- 6
8
votes
1 answer
Where exactly does $\geq 1$ come from in SVMs optimization problem constraint?
I've understood that SVMs are binary, linear classifiers (without the kernel trick). They have training data $(x_i, y_i)$ where $x_i$ is a vector and $y_i \in \{-1, 1\}$ is the class. As they are binary, linear classifiers the task is to find a…
Martin Thoma
- 18,630
- 31
- 92
- 167
8
votes
2 answers
Machine Learning: Single input to variable number of outputs
Is there a machine learning algorithm that maps a single input to an output list of variable length? If so, are there any implementations of the algorithm for public use? If not, what do you recommend as a workaround?
In my case, the input is a…
ricksmt
- 183
- 1
- 5
8
votes
1 answer
Recognition human in images through HOG descriptor and SVM classifier performs poorly
I'm using a HOG descriptor, coupled with a SVM classifier, to recognise humans in pictures. I'm using the Python wrappers for OpenCV.
I've used the excellent tutorial at pymagesearch, which explains what the algorithm does and furnishes hints on how…
martina
- 255
- 2
- 8
8
votes
2 answers
Pylearn2 vs TensorFlow
I am about to dive into a long NN research project and wanted a push in the direction of Pylearn2 or TensorFlow? As of Dec 2015 has the community started to lean one direction or another?
This link has given me concern about getting tied to…
user3155053
- 183
- 3
8
votes
1 answer
When do I have to use aucPR instead of auROC? (and vice versa)
I'm wondering if sometimes, to validate a model, it's not better to use aucPR instead of aucROC? Do these cases only depend on the "domain & business understanding" ?
Especially, I'm thinking about the "unbalanced class problem" where, it seems…
jmvllt
- 619
- 1
- 8
- 15
8
votes
5 answers
Best way to search for a similar document given the ngram
I have a database of about 200 documents who's ngrams I have extracted. I want to find the document in my database that is most similar to a query document. In otherwords, I want to find the document in the database that shares the most number of…
okebz
- 113
- 4