Agents



强化学习Agents(智能体)是能够与环境进行交互的、具有自主决策能力和自主学习能力的独立单元。 在与环境交互过程中,Agents获取观测信息,根据观测信息计算出动作信息并执行该动作,使得环境进入下一步状态。 通过不断和环境进行交互,Agents收集经验数据,再根据经验数据训练模型,从而获得更优的策略。 以下列出了“玄策”平台中包含的单&多智能体强化学习Agents。

Agent

PyTorch

TensorFlow

MindSpore

DQN: Deep Q-Networks

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

C51DQN: Distributional Reinforcement Learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Double DQN: DQN with Double Q-learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Dueling DQN: DQN with Dueling network

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Noisy DQN: DQN with Parameter Space Noise

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PERDQN: DQN with Prioritized Experience Replay

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

QRDQN: DQN with Quantile Regression

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

VPG: Vanilla Policy Gradient

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PPG: Phasic Policy Gradient

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PPO: Proximal Policy Optimization

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

PDQN: Parameterised DQN

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

SPDQN: Split PDQN

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MPDQN: Multi-pass PDQN

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

A2C: Advantage Actor Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

SAC: Soft Actor-Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

SAC-Dis: SAC for Discrete Actions

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

DDPG: Deep Deterministic Policy Gradient

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

TD3: Twin Delayed DDPG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

Multi-Agent

PyTorch

TensorFlow

MindSpore

IQL: Independent Q-Learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

VDN: Value-Decomposition Networks

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

QMIX: VDN with Q-Mixer

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

WQMIX: Weighted QMIX

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

QTRAN: Q-Transformation

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

DCG: Deep Coordination Graph

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

IDDPG: Independent DDPG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MADDPG: Multi-Agent DDPG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

ISAC: Independent SAC

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MASAC: Multi-Agent SAC

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

IPPO: Independent PPO

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MAPPO: Multi-Agent PPO

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MATD3: Multi-Agent TD3

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

VDAC: Value-Decomposition Actor-Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

COMA: Counterfacutal Multi-Agent PG

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MFQ: Mean-Field Q-Learning

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)

MFAC: Mean-Field Actor-Critic

\(\checkmark\)

\(\checkmark\)

\(\checkmark\)