Configs

XuanCe provides a structured way to manage configurations for various DRL scenarios, making it easy to experiment with different setups



Basic Configurations

The basic parameter configuration is stored in the “xuance/config/basic.yaml” file, as shown below:

dl_toolbox: "torch"  # The deep learning toolbox. Choices: "torch", "mindspore", "tensorflow"

project_name: "XuanCe_Benchmark"
logger: "tensorboard"  # Choices: "tensorboard", "wandb".
wandb_user_name: "your_user_name"

parallels: 10
seed: 2910
render: True
render_mode: 'rgb_array' # Choices: 'human', 'rgb_array'.
test_mode: False
test_steps: 2000

device: "cpu"

It should be noted that the value of the device variable in the basic.yaml file varies depending on the specific deep learning framework, as outlined below:

- PyTorch: “cpu”, “cuda:0”;
- TensorFlow: “cpu”/”CPU”, “gpu”/”GPU”;
- MindSpore: “CPU”, “GPU”, “Ascend”, “Davinci”。


Algorithm Configurations for Different Tasks

As an example, taking the parameter configuration for the DQN algorithm in the Atari environment, in addition to the basic parameter configuration, the algorithm-specific parameters are stored in the “xuance/configs/dqn/atari.yaml” file.

Due to the presence of over 60 different scenarios in the Atari environment, where the scenarios are relatively consistent with variations only in tasks, a single default parameter configuration file is sufficient.

For environments with significant scene variations, such as the “CarRacing-v2” and “LunarLander” scenarios in the “Box2D” environment, the former has a state input of a 96x96x3 RGB image, while the latter consists of an 8-dimensional vector. Therefore, the DQN algorithm parameter configurations for these two scenarios are stored in the following two files:

  • xuance/configs/dqn/box2d/CarRacing-v2.yaml

  • xuance/configs/dqn/box2d/LunarLander-v2.yaml

Within the following content, we provid the preset arguments for each implementation that can be run by following the steps in Quick Start.



DQN-based Implementations

agent: "DQN"
env_name: "Classic Control"
env_id: "CartPole-v1"
vectorize: "Dummy_Gym"
policy: "Basic_Q_network"
representation: "Basic_MLP"
runner: "DRL"

representation_hidden_size: [128,]
q_hidden_size: [128,]
activation: 'ReLU'

seed: 1
parallels: 10
n_size: 10000
batch_size: 256
learning_rate: 0.001
gamma: 0.99

start_greedy: 0.5
end_greedy: 0.01
decay_step_greedy: 10000
sync_frequency: 50
training_frequency: 1
running_steps: 200000  # 200k
start_training: 1000

use_obsnorm: False
use_rewnorm: False
obsnorm_range: 5
rewnorm_range: 5

test_steps: 10000
eval_interval: 20000
test_episode: 1
log_dir: "./logs/dqn/"
model_dir: "./models/dqn/"


Policy Gradient-based Implementations

agent: "PG"
env_name: "Classic Control"
env_id: "CartPole-v1"
representation: "Basic_MLP"
vectorize: "Dummy_Gym"
policy: "Categorical_Actor"
runner: "DRL"

representation_hidden_size: [128,]
actor_hidden_size: [128,]
activation: 'ReLU'

seed: 1
parallels: 10
running_steps: 300000
n_steps: 128
n_epoch: 1
n_minibatch: 1
learning_rate: 0.0004

ent_coef: 0.01
clip_grad: 0.5
clip_type: 1  # Gradient clip for Mindspore: 0: ms.ops.clip_by_value; 1: ms.nn.ClipByNorm()
gamma: 0.98
use_gae: False
gae_lambda: 0.95
use_advnorm: False

use_obsnorm: True
use_rewnorm: True
obsnorm_range: 5
rewnorm_range: 5

test_steps: 10000
eval_interval: 50000
test_episode: 1
log_dir: "./logs/pg/"
model_dir: "./models/pg/"


MARL Implementations

agent: "IQL"  # the learning algorithms_marl
env_name: "mpe"
env_id: "simple_spread_v3"
continuous_action: False
policy: "Basic_Q_network_marl"
representation: "Basic_MLP"
vectorize: "Dummy_Pettingzoo"
runner: "Pettingzoo_Runner"

use_recurrent: False
rnn:
representation_hidden_size: [64, ]
q_hidden_size: [64, ]  # the units for each hidden layer
activation: "ReLU"

seed: 1
parallels: 16
buffer_size: 100000
batch_size: 256
learning_rate: 0.001
gamma: 0.99  # discount factor
double_q: True  # use double q learning

start_greedy: 1.0
end_greedy: 0.05
decay_step_greedy: 2500000
start_training: 1000  # start training after n episodes
running_steps: 10000000  # 10M
train_per_step: False  # True: train model per step; False: train model per episode.
training_frequency: 1
sync_frequency: 100

use_grad_clip: False
grad_clip_norm: 0.5

eval_interval: 100000
test_episode: 5
log_dir: "./logs/iql/"
model_dir: "./models/iql/"


Customized Configurations

Users can also choose not to use the default parameters provided by XuanCe, or in cases where XuanCe does not include the user’s specific task, they can customize their own .yaml parameter configuration file in the same manner.

However, during the process of obtaining the runner, it is necessary to specify the location where the parameter file is stored, as shown below:

import xuance as xp
runner = xp.get_runner(method='dqn',
                       env='classic_control',
                       env_id='CartPole-v1',
                       config_path="xxx/xxx.yaml",
                       is_test=False)
runner.run()