Training Environment Coding Convention
We found that keeping track of all the reward functions, configuration settings, and robot parameters in the training environment can quickly become overwhelming. To help manage this complexity, we’ve adopted a coding convention that organizes these elements consistently, making them easier to locate and maintain.
Environment configuration class ordering
The configuration classes within a environment definition is ordered as follows:
@configclass
class YourEnvCfg(ManagerBasedRLEnvCfg):
"""Configuration for your robot learning environment."""
# Scene settings
scene: SceneCfg = SceneCfg()
# Policy commands
commands: CommandsCfg = CommandsCfg()
# Policy observations
observations: ObservationsCfg = ObservationsCfg()
# Policy actions
actions: ActionsCfg = ActionsCfg()
# Policy rewards
rewards: RewardsCfg = RewardsCfg()
# Termination conditions
terminations: TerminationsCfg = TerminationsCfg()
# Randomization events
events: EventsCfg = EventsCfg()
# Curriculum
curriculum: CurriculumCfg = CurriculumCfg()
def __post_init__(self):
# post init of parent
super().__post_init__()
# override default configuration parameters
Configuration term parameter ordering
For the parameter within each term, we order the parameters based on the class argument ordering:
# example observation term
base_ang_vel = ObsTerm(
func=mdp.base_ang_vel,
noise=Unoise(n_min=-0.2, n_max=0.2),
scale=0.2,
)
# example reward term
track_lin_vel_xy_exp = RewTerm(
func=mdp.track_lin_vel_xy_exp,
params={"command_name": "base_velocity", "std": math.sqrt(0.25)},
weight=1.0,
)
Asymmetric observation config
Typically, for asymmetric observations, we will add all actor observations to the critic along with some additional privileged observations. To ensure that the configurations of the duplicated terms are consistent, the critic observation config should inherit from the actor's.
One thing to note is that we do not want to add noise to the critic's observation terms. To do this, we can set the enable_corruption
parameter to False
.
The resulting code should look like
@configclass
class ObservationsCfg:
"""Observation specifications for the MDP."""
@configclass
class PolicyCfg(ObsGroup):
"""Observations for policy group."""
# observation terms (order preserved)
velocity_commands = ObsTerm(
func=mdp.generated_commands,
params={"command_name": "base_velocity"},
)
base_ang_vel = ObsTerm(
func=mdp.base_ang_vel,
noise=Unoise(n_min=-0.2, n_max=0.2),
scale=0.2,
)
# ... (more terms)
def __post_init__(self):
self.enable_corruption = True # <-- enable noise for actor obs
class CriticCfg(PolicyCfg): # <-- inherit all terms from the actor
"""Observations for critic group."""
# observation terms (order preserved)
base_lin_vel = ObsTerm(func=mdp.base_lin_vel)
# ... (more terms)
def __post_init__(self):
self.enable_corruption = False # <-- disable noise for critic obs
# observation groups
policy: PolicyCfg = PolicyCfg()
critic: CriticCfg = CriticCfg()
Reward terms
Reward terms are grouped by their purpose. Measurements on task-space performance is put at the front due to its importance. Then, terms for basic behaviors, such as survival, motion smoothness etc. are followed. Lastly, terms for "fine-tuning" the policy are added.
@configclass
class RewardsCfg:
"""Reward terms for the MDP."""
# === Reward for task-space performance ===
# ... (terms)
# === Reward for basic behaviors ===
# ... (terms)
# === Reward for encouraging behaviors ===
# ... (terms)
__post_init__ behavior
Differ from the Isaac Lab code, we encourage to set configurations in the corresponding config class and config terms. Only override parameters in the __post_init__()
function:
during temporary debugging and parameter space explorations,
when the base configuration is inaccessible (for example, the
ROBOT_CFG
variable)
Last updated