Learners

Learner

PyTorch

TensorFlow

MindSpore

DQN: Deep Q-Networks

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

C51DQN: Distributional Reinforcement Learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Double DQN: DQN with Double Q-learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Dueling DQN: DQN with Dueling network

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Noisy DQN: DQN with Parameter Space Noise

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PERDQN: DQN with Prioritized Experience Replay

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

QRDQN: DQN with Quantile Regression

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

VPG: Vanilla Policy Gradient

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PPG: Phasic Policy Gradient

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PPO: Proximal Policy Optimization

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PDQN: Parameterised DQN

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

SPDQN: Split PDQN

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MPDQN: Multi-pass PDQN

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

A2C: Advantage Actor Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

SAC: Soft Actor-Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

SAC-Dis: SAC for Discrete Actions

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

DDPG: Deep Deterministic Policy Gradient

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

TD3: Twin Delayed DDPG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Multi-Agent Learner

PyTorch

TensorFlow

MindSpore

IQL: Independent Q-Learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

VDN: Value-Decomposition Networks

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

QMIX: VDN with Q-Mixer

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

WQMIX: Weighted QMIX

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

QTRAN: Q-Transformation

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

DCG: Deep Coordination Graph

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

IDDPG: Independent DDPG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MADDPG: Multi-Agent DDPG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

ISAC: Independent SAC

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MASAC: Multi-Agent SAC

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

IPPO: Independent PPO

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MAPPO: Multi-Agent PPO

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MATD3: Multi-Agent TD3

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

VDAC: Value-Decomposition Actor-Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

COMA: Counterfacutal Multi-Agent PG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MFQ: Mean-Field Q-Learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MFAC: Mean-Field Actor-Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)