4. Multi-Agent Reinforcement Learning

Official course description

Most of reinforcement learning is concerned with a single agent that seeks to demonstrate proficiency at a single task. In this agent’s environment, there are no other agents. However, if we’d like our agents to become truly intelligent, they must be able to communicate with and learn from other agents. In the final part of this nanodegree, we will extend the traditional framework to include multiple agents.

You’ll also learn all about Monte Carlo Tree Search (MCTS) and master the skills behind DeepMind’s AlphaZero.

Use Monte Carlo Tree Search to play Connect 4. (Source)

You’ll also get the third project, where you’ll write an algorithm to train a pair of agents to play tennis.

In Project 3, you will train a pair of agents to play tennis. (Source)

[toc]

Multi-Agent Systems

Multi-Agent System
- Introduction
  - There are several agents in the same environment
  - The agents interact with not only environment but also other agents’ actions
- Motivations
  - We live in a multi-agent world
  - Intelligent agents have to interact with humnas
  - Agents need to work in complex environments
- Benefits
  - Agents can share their experience(knowledge) with other agents
  - Robust: When one agent failed, another agent can take over the tasks
Multi-Agent Reinforcement Learning (MARL)
- Agents train without considering other agents
  - Other agents are just part of environment(state)
  - Non-Stationarity environment: each agent recognizes state differently, according to other agents’ action
- The matter agent approach
  - With single policy, return action vector for each agent
- Multi-Agent Environment
  - Cooperation: Maximize rewards of each agent
  - Competition: Maximize their own rewards (One agent take, the others lose)
  - Mixed Environment

AlphaZero

Zero-Sum Game

Compete two agents (If one win, other must lose)

Monte Carlo Tree Search (MCTS)

MCTS in Zero-Sum Game
- Initialize top-node for current state, loop over actions for some $N_{tot}$:
  1. Start from the top-node, repeatedly pick the child-node with the alrgets $U$
  2. If $N = 0$ for the node, play a random game. Else, expand node, play a random game from a randomly selected child
  3. Update statistics, back-propagate and update $N$ and $U$ as neede
- Select move with highest visit counts

[Project] Collaborateion and Competition

For this project, you will work with the Tennis environment.

Nanodegree Deep Reinforcement Learning

4. Multi-Agent Reinforcement Learning

Multi-Agent Systems

AlphaZero

Zero-Sum Game

Monte Carlo Tree Search (MCTS)

[Project] Collaborateion and Competition

Related Posts

[Deep Reinforcement Learning Nanodegree Chapter 3] Policy-Based Methods 10 Dec 2020

[Deep Reinforcement Learning Nanodegree Chapter 2] Value-Based Methods 27 Nov 2020

[Deep Reinforcement Learning Nanodegree Chapter 1] Foundations of Reinforcement Learning 20 Nov 2020