Questions tagged [deepmind]

Google's DeepMind is an artificial intelligence company that works to conduct research and advance the state of the art in machine learning applications. Topics include, science, engineering, research, and ethics.

Google's DeepMind is an artificial intelligence company that works to conduct research and advance the state of the art in machine learning applications. Topics include, science, engineering, research, and ethics. It is famous for developing the AlphaGo platform which was able to defeat the world's best human player of Go. Other notable accomplishments include its work on solving the protein folding problem using computational biology.

18 questions
7
votes
1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…
user141493
  • 191
  • 1
  • 1
  • 8
3
votes
1 answer

Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

I just saw this video on Youtube. Which Policy Gradient method was used to train the AI to walk? Was it DDPG or D4PG or what?
3
votes
1 answer

On what principle did Google's DeepMind learn to walk?

I just saw this video on Youtube. On what principle did Google's DeepMind learn to walk? Was it Q-Learning or a Genetic Algorithm or Policy Gradient?
3
votes
2 answers

What does scaling a gradient do?

In the MuZero paper pseudocode, they have the following line of code: hidden_state = tf.scale_gradient(hidden_state, 0.5) What does this do? Why is it there? I've searched for tf.scale_gradient and it doesn't exist in tensorflow. And, unlike…
Pro Q
  • 175
  • 5
3
votes
1 answer

DQN fails to find optimal policy

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples…
3
votes
1 answer

Game theory in Reinforcement Learning

In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm. Deep Mind Alpha-Star: Mastering this problem requires breakthroughs in several AI research challenges including: Game theory: StarCraft is a game…
3
votes
0 answers

Deep Reinforcement Learning for dynamic pricing

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc. Action…
3
votes
1 answer

Question on embedding similarity / nearest neighbor methods [SCANN Paper]

Question on embedding similarity / nearest neighbor methods: In https://arxiv.org/abs/2112.04426 the DeepMind team writes: For a database of T elements, we can query the approximate nearest neighbors in O(log(T)) time. We use the SCaNN library…
Aditya
  • 2,440
  • 2
  • 15
  • 34
2
votes
1 answer

Is the "training loop" used in AlphaGo Zero the same as an "epoch"?

I am confused about the training stage of AlphaGo Zero using the data collected from the selfplay stage. According to an AlphaGo Zero Cheat Sheet I found, the training routine is: Loop from 1 to 1,000: Sample a mini-batch of 2048 episodes from…
ihavenoidea
  • 193
  • 1
  • 5
2
votes
4 answers

Which AI algorithm is best for chess?

I'm working on my chess bot, and I would like to implement simple artificial intelligence for it. I'm new in it, so I'm unsure how to do it specifically on chess. I heard about Q-learning, Supervised/Unsupervised learning, Genetic algorithm, etc.,…
1
vote
1 answer

AlphaGo Zero loss function

As far as I understood from the AlphaGo Zero system: During the self-play part, the MCTS algorithm stores a tuple ($s$, $\pi$, $z$) where $s$ is the state, $\pi$ is the distribution probability over the actions in the state and $z$ is an integer…
1
vote
0 answers

temperature variable in boltzmmann-exploration in reinforcement learning

I have been using epsilon greedy action selection strategy and recently have come across boltzmann(softmax) action selection strategy. One thing I am not clear about boltzmann exploration is the temperature variable. How should we define this…
chink
  • 555
  • 9
  • 17
1
vote
0 answers

Deepmind conditional neural process: evaluation

Going through the Deepmind jupyter notebook conditional neural processes, the plots at the bottom of the notebook show that the ground truth and the predicted distribution only overlap around the "context points". These context points are already in…
Shadi
  • 203
  • 2
  • 7
1
vote
0 answers

Can OpenAI's CLIP Model or DeepMind's Flamingo Model Predict Classes Truly Never Before Seen for Zero- or Few-Shot Learning?

One type of statement about zero-shot and few-shot learning in the literature I continually come across is that these models can predict new unseen classes at inference time for which they were never trained on. However, such sources typically do…
user141493
  • 191
  • 1
  • 1
  • 8
1
vote
0 answers

How are Learned Latent Arrays for the Perceiver Resampler in DeepMind's Flamingo Vision-Language Model Actually Calculated? By which Technique?

In "Flamingo: a Visual Language Model for Few-Shot Learning" (Alayrac et al. 2022) https://arxiv.org/abs/2204.14198 DeepMind makes use of "learned latent queries" in their "Perceiver Resampler" to ensure that parameters do not scale quadratically…
1
2