Questions tagged [deepmind]

Google's DeepMind is an artificial intelligence company that works to conduct research and advance the state of the art in machine learning applications. Topics include, science, engineering, research, and ethics. It is famous for developing the AlphaGo platform which was able to defeat the world's best human player of Go. Other notable accomplishments include its work on solving the protein folding problem using computational biology.

18 questions

votes

1 answer

How Exactly Does In-Context Few-Shot Learning Actually Work in Theory (Under the Hood), Despite only Having a "Few" Support Examples to "Train On"?

Recent models like the GPT-3 Language Model (Brown et al., 2020) and the Flamingo Visual-Language Model (Alayrac et al., 2022) use in-context few-shot learning. The models are able to make highly accurate predictions even when only presented with a…

asked Oct 24 '22 at 23:26

user141493

votes

1 answer

Which Policy Gradient Method was used by Google's Deep Mind to teach AI to walk

I just saw this video on Youtube. Which Policy Gradient method was used to train the AI to walk? Was it DDPG or D4PG or what?

machine-learning deep-learning reinforcement-learning policy-gradients deepmind

asked Apr 10 '21 at 12:10

learner

votes

1 answer

On what principle did Google's DeepMind learn to walk?

I just saw this video on Youtube. On what principle did Google's DeepMind learn to walk? Was it Q-Learning or a Genetic Algorithm or Policy Gradient?

machine-learning deep-learning q-learning genetic-algorithms deepmind

asked Mar 29 '21 at 11:45

learner

votes

2 answers

What does scaling a gradient do?

In the MuZero paper pseudocode, they have the following line of code: hidden_state = tf.scale_gradient(hidden_state, 0.5) What does this do? Why is it there? I've searched for tf.scale_gradient and it doesn't exist in tensorflow. And, unlike…

machine-learning machine-learning-model ai deepmind

asked Jan 02 '20 at 06:53

Pro Q

votes

1 answer

DQN fails to find optimal policy

Based on DeepMind publication, I've recreated the environment and I am trying to make the DQN find and converge to an optimal policy. The task of an agent is to learn how to sustainably collect apples (objects), with the regrowth of the apples…

reinforcement-learning q-learning dqn convergence deepmind

asked Apr 01 '19 at 01:23

macwiatrak

votes

1 answer

Game theory in Reinforcement Learning

In one of the recent blog post by Deepmind, they have used game theory in Alpha Star algorithm. Deep Mind Alpha-Star: Mastering this problem requires breakthroughs in several AI research challenges including: Game theory: StarCraft is a game…

deep-learning reinforcement-learning deepmind

asked Mar 25 '19 at 06:59

Karthik Rajkumar

votes

0 answers

Deep Reinforcement Learning for dynamic pricing

I am trying to implement a Deep Q Network model for Dynamic pricing in Logistics. I can define State Space (Origin, Destination, type of the shipment, customer, Type of the product, Commodity of the shipment, AVAILABILITY of capacity etc. Action…

deep-learning tensorflow reinforcement-learning dqn deepmind

asked Mar 15 '19 at 11:37

Karthik Rajkumar

votes

1 answer

Question on embedding similarity / nearest neighbor methods [SCANN Paper]

Question on embedding similarity / nearest neighbor methods: In https://arxiv.org/abs/2112.04426 the DeepMind team writes: For a database of T elements, we can query the approximate nearest neighbors in O(log(T)) time. We use the SCaNN library…

deep-learning word-embeddings deepmind

asked Jan 01 '22 at 03:32

Aditya

2,440
2
15
34

votes

1 answer

Is the "training loop" used in AlphaGo Zero the same as an "epoch"?

I am confused about the training stage of AlphaGo Zero using the data collected from the selfplay stage. According to an AlphaGo Zero Cheat Sheet I found, the training routine is: Loop from 1 to 1,000: Sample a mini-batch of 2048 episodes from…

deep-learning keras tensorflow training deepmind

asked Mar 16 '20 at 00:10

ihavenoidea

votes

4 answers

Which AI algorithm is best for chess?

I'm working on my chess bot, and I would like to implement simple artificial intelligence for it. I'm new in it, so I'm unsure how to do it specifically on chess. I heard about Q-learning, Supervised/Unsupervised learning, Genetic algorithm, etc.,…

machine-learning deep-learning explainable-ai game deepmind

asked Dec 01 '21 at 14:20

Jenia

vote

1 answer

AlphaGo Zero loss function

As far as I understood from the AlphaGo Zero system: During the self-play part, the MCTS algorithm stores a tuple ($s$, $\pi$, $z$) where $s$ is the state, $\pi$ is the distribution probability over the actions in the state and $z$ is an integer…

deep-learning keras tensorflow loss-function deepmind

asked Nov 17 '19 at 20:04

ihavenoidea

vote

0 answers

temperature variable in boltzmmann-exploration in reinforcement learning

I have been using epsilon greedy action selection strategy and recently have come across boltzmann(softmax) action selection strategy. One thing I am not clear about boltzmann exploration is the temperature variable. How should we define this…

deep-learning reinforcement-learning ai softmax deepmind

asked Sep 26 '19 at 08:01

chink

vote

0 answers

Deepmind conditional neural process: evaluation

Going through the Deepmind jupyter notebook conditional neural processes, the plots at the bottom of the notebook show that the ground truth and the predicted distribution only overlap around the "context points". These context points are already in…

gaussian deepmind

asked Jan 24 '19 at 12:24

Shadi

vote

0 answers

Can OpenAI's CLIP Model or DeepMind's Flamingo Model Predict Classes Truly Never Before Seen for Zero- or Few-Shot Learning?

One type of statement about zero-shot and few-shot learning in the literature I continually come across is that these models can predict new unseen classes at inference time for which they were never trained on. However, such sources typically do…

nlp computer-vision gpt meta-learning deepmind

asked Nov 02 '22 at 19:57

user141493

vote

0 answers

How are Learned Latent Arrays for the Perceiver Resampler in DeepMind's Flamingo Vision-Language Model Actually Calculated? By which Technique?

In "Flamingo: a Visual Language Model for Few-Shot Learning" (Alayrac et al. 2022) https://arxiv.org/abs/2204.14198 DeepMind makes use of "learned latent queries" in their "Perceiver Resampler" to ensure that parameters do not scale quadratically…

deep-learning nlp computer-vision language-model deepmind

asked Oct 13 '22 at 00:31

user141493

2 Next