Learners¶
Learner |
PyTorch |
TensorFlow |
MindSpore |
---|---|---|---|
DQN: Deep Q-Networks |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
C51DQN: Distributional Reinforcement Learning |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
Double DQN: DQN with Double Q-learning |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
Dueling DQN: DQN with Dueling network |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
Noisy DQN: DQN with Parameter Space Noise |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
PERDQN: DQN with Prioritized Experience Replay |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
QRDQN: DQN with Quantile Regression |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
VPG: Vanilla Policy Gradient |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
PPG: Phasic Policy Gradient |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
PPO: Proximal Policy Optimization |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
PDQN: Parameterised DQN |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
SPDQN: Split PDQN |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MPDQN: Multi-pass PDQN |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
A2C: Advantage Actor Critic |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
SAC: Soft Actor-Critic |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
SAC-Dis: SAC for Discrete Actions |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
DDPG: Deep Deterministic Policy Gradient |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
TD3: Twin Delayed DDPG |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
Multi-Agent Learner |
PyTorch |
TensorFlow |
MindSpore |
---|---|---|---|
IQL: Independent Q-Learning |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
VDN: Value-Decomposition Networks |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
QMIX: VDN with Q-Mixer |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
WQMIX: Weighted QMIX |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
QTRAN: Q-Transformation |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
DCG: Deep Coordination Graph |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
IDDPG: Independent DDPG |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MADDPG: Multi-Agent DDPG |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
ISAC: Independent SAC |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MASAC: Multi-Agent SAC |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
IPPO: Independent PPO |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MAPPO: Multi-Agent PPO |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MATD3: Multi-Agent TD3 |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
VDAC: Value-Decomposition Actor-Critic |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
COMA: Counterfacutal Multi-Agent PG |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MFQ: Mean-Field Q-Learning |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |
MFAC: Mean-Field Actor-Critic |
\(\checkmark\) |
\(\checkmark\) |
\(\checkmark\) |