Beluga Challenge environment for reinforcement learning. More...

Public Member Functions
	__init__ (self, str path, int base_index=-1)
	Initialize the Beluga Challenge environment.

	step (self, str action_name, params=None)
	Execute a single environment step with the given action.

	reset (self)
	Reset the environment with a new problem instance.

	reset_specific_problem (self, problem)
	Reset the environment with a specific problem instance.

	get_reward (self, bool could_execute, str action_name, production_line_n_old)
	Calculate the reward for the current action.

	get_observation_high_level (self)
	Get the high-level observation of the current state.

	get_max_steps (self)
	Get the maximum number of steps to solve the current problem.

	check_action_execution (self, str action_name, obs)
	Check if the action can be executed without actually executing it.

Public Attributes
ProblemState	state = None

	path = path

int	step_count = 0

	problem_name = None

list	sorted_problems = []

int	problem_count = 0

	base_index = base_index

int	problems_solved = 0

int	block_size = 6

dict	check_action_map

Detailed Description

Beluga Challenge environment for reinforcement learning.

This class implements the main environment for the Beluga Challenge shipping container optimization problem. It manages problem states, action execution, reward calculation, and episode management.

Constructor & Destructor Documentation

◆ init()

rl.env.environment.Env.__init__	(		self,
		str	path,
		int	base_index = -1 )

Initialize the Beluga Challenge environment.

Parameters

path	Path to the directory containing problem JSON files
base_index	Base index for problem selection

    def __init__(self, path: str, base_index: int = -1):
        """!
        @brief Initialize the Beluga Challenge environment
        @param path Path to the directory containing problem JSON files
        @param base_index Base index for problem selection
        """
        # Initialize environment variables here
        self.state : ProblemState = None  # Not initialized yet, will be set in reset()
        self.path = path 
        self.step_count = 0  # Counter for the number of steps taken, as termination condition in training/evaluation
        self.problem_name = None
        self.sorted_problems = []  # List to hold sorted problems by jig count
        self.problem_count = 0  # Counter for the number of problems solved
        self.base_index = base_index  # Base index for problem selection, used to select problems in ascending order of jig count
        self.problems_solved = 0  # Counter for the number of problems solved
        self.block_size = 6  # Number of problems to select in each block
 
        # Map action names to action functions
        self.check_action_map = {
            "load_beluga": check_load_beluga,
            "unload_beluga": check_unload_beluga,
            "get_from_hangar": check_get_from_hangar,
            "deliver_to_hangar": check_deliver_to_hangar,
            "left_stack_rack": check_left_stack_rack,
            "right_stack_rack": check_right_stack_rack,
            "left_unstack_rack": check_left_unstack_rack,
            "right_unstack_rack": check_right_unstack_rack
        }
 
        # Find all JSON files in the problems folder
        if os.path.exists(self.path):
            problem_files = [f for f in os.listdir(self.path) if f.endswith('.json')]
            
            # Extract the number of jigs from each filename using regular expression
            jig_counts = []
            for file in problem_files:
                match = re.search(r'_j(\d+)_', file)
                if match:
                    jig_count = int(match.group(1))
                    jig_counts.append((file, jig_count))
                     
            # Sort by number of jigs (ascending)
            self.sorted_problems = sorted(jig_counts, key=lambda x: x[1])
            self.problem_count = len(self.sorted_problems)  # Set the problem count based on the sorted problems
 
 
 

Member Function Documentation

◆ check_action_execution()

rl.env.environment.Env.check_action_execution	(		self,
		str	action_name,
			obs )

Check if the action can be executed without actually executing it.

Parameters

action_name	Name of the action to check
obs	Current observation of the environment

Returns: True if the action can be executed, False otherwise

    def check_action_execution(self, action_name: str, obs):
        """!
        @brief Check if the action can be executed without actually executing it
        @param action_name Name of the action to check
        @param obs Current observation of the environment
        @return True if the action can be executed, False otherwise
        """
        
        if action_name in self.check_action_map:
            return self.check_action_map[action_name](self.state, obs)

◆ get_max_steps()

rl.env.environment.Env.get_max_steps ( self )

Get the maximum number of steps to solve the current problem.

Returns: Maximum steps based on the problem size

    def get_max_steps(self):
        """!
        @brief Get the maximum number of steps to solve the current problem
        @return Maximum steps based on the problem size
        """
    
        return len(self.state.jigs) * 20 + 100
 

◆ get_observation_high_level()

rl.env.environment.Env.get_observation_high_level ( self )

Get the high-level observation of the current state.

Returns: High-level observation array

    def get_observation_high_level(self):
        """!
        @brief Get the high-level observation of the current state
        @return High-level observation array
        """
        return self.state.get_observation_high_level()
    

◆ get_reward()

rl.env.environment.Env.get_reward	(		self,
		bool	could_execute,
		str	action_name,
			production_line_n_old )

Calculate the reward for the current action.

Parameters

could_execute	Boolean indicating if the action was successfully executed
action_name	Name of the action taken
production_line_n_old	Number of production lines before the action

Returns: Reward value based on the action and state

    def get_reward(self, could_execute: bool, action_name: str, production_line_n_old):
        """!
        @brief Calculate the reward for the current action
        @param could_execute Boolean indicating if the action was successfully executed
        @param action_name Name of the action taken
        @param production_line_n_old Number of production lines before the action
        @return Reward value based on the action and state
        """
 
        # Goal completed
        if self.state.is_terminal():
            return 10000
        
        # Penalty if action fails, but less severe
        if not could_execute: 
            return -1000 + min(20, self.step_count) * 10 # Mild penalty with moderate increase
        
        if action_name == "unload_beluga":
            # Reward if beluga is completely unloaded
            if len(self.state.belugas) > 0:
                if len(self.state.belugas[0].current_jigs) == 1:
                    return 2000.0  
            return 100.0  
        
        if action_name == "load_beluga":
            if len(self.state.belugas) > 0:
                if len(self.state.belugas[0].outgoing) == 1:
                    return 2000.0  
                return 100.0  
            else: 
                return 5000.0  
 
        # Actions that stack or unstack racks
        if action_name in ["right_stack_rack", "left_stack_rack", "right_unstack_rack", "left_unstack_rack"]:
            return 10.0 
        
        if action_name == "deliver_to_hangar":
            if production_line_n_old > len(self.state.production_lines):
                return 2000.0 
            return 200.0 
        
        if action_name == "get_from_hangar":
            return 50.0 
        
        return 0
 

◆ reset()

rl.env.environment.Env.reset ( self )

Reset the environment with a new problem instance.

Resets the environment's state from a JSON file in the problems folder, selecting problems in ascending order of jigs count (in blocks of 6 problems)

Returns: Initial observation of the new episode

    def reset(self):
        """!
        @brief Reset the environment with a new problem instance
        
        Resets the environment's state from a JSON file in the problems folder,
        selecting problems in ascending order of jigs count (in blocks of 6 problems)
        
        @return Initial observation of the new episode
        """
 
        number = randint(1, self.block_size + 1)
 
        # Increase base_index only if problems_solved > 0 (to prevent increase at initialization)
        # and if it's a multiple of block_size * 3 (every 18 problems when block_size=6)
        if self.problems_solved > 0 and (self.problems_solved % (self.block_size * 3)) == 0:
            self.base_index += 1
            self.problems_solved = 0
 
        # If we have reached the end of the sorted problems, choose a random problem
        if self.base_index + number >= self.problem_count:
            self.problem_name = os.path.join(self.path, self.sorted_problems[randint(0, self.problem_count)][0])
        else:    
            self.problem_name = os.path.join(self.path, self.sorted_problems[self.base_index + number][0])
 
        self.state = load_from_json(self.problem_name)
        return self.get_observation_high_level()
    

◆ reset_specific_problem()

rl.env.environment.Env.reset_specific_problem	(		self,
			problem )

Reset the environment with a specific problem instance.

Parameters

problem Path to the specific problem JSON file

Returns: Initial observation of the new episode

    def reset_specific_problem(self, problem):
        """!
        @brief Reset the environment with a specific problem instance
        @param problem Path to the specific problem JSON file
        @return Initial observation of the new episode
        """
        self.problem_name = problem
        self.state = load_from_json(self.problem_name)
        self.step_count = 0
 
        return self.get_observation_high_level()
 

◆ step()

rl.env.environment.Env.step	(		self,
		str	action_name,
			params = None )

Execute a single environment step with the given action.

Parameters

action_name	Name of the action to execute
params	Parameters for the action (optional)

Returns: Tuple of (observation, reward, done_flag)

    def step(self, action_name: str, params=None):
        """!
        @brief Execute a single environment step with the given action
        @param action_name Name of the action to execute
        @param params Parameters for the action (optional)
        @return Tuple of (observation, reward, done_flag)
        """
    
        n_production_lines = len(self.state.production_lines)
        could_execute = False
 
        if params == [] and action_name == "unload_beluga":
            could_execute = self.state.apply_action(action_name, {})
        else:
            # Unpack params as needed. This example assumes params is a dictionary
            # containing the arguments to be passed (besides state).
            if params != []:
                could_execute = self.state.apply_action(action_name, params)
            else:
                could_execute = False
 
        obs = self.get_observation_high_level()  # Get the current observation before executing the action
        reward = self.get_reward(could_execute, action_name, n_production_lines)
        self.step_count += 1  # Increment the step count
 
        if self.state.is_terminal():
            self.problems_solved += 1
 
        return obs, reward, self.state.is_terminal()
 

Member Data Documentation

◆ base_index

rl.env.environment.Env.base_index = base_index

◆ block_size

int rl.env.environment.Env.block_size = 6

◆ check_action_map

dict rl.env.environment.Env.check_action_map

Initial value:

=  {
            "load_beluga": check_load_beluga,
            "unload_beluga": check_unload_beluga,
            "get_from_hangar": check_get_from_hangar,
            "deliver_to_hangar": check_deliver_to_hangar,
            "left_stack_rack": check_left_stack_rack,
            "right_stack_rack": check_right_stack_rack,
            "left_unstack_rack": check_left_unstack_rack,
            "right_unstack_rack": check_right_unstack_rack
        }

◆ path

rl.env.environment.Env.path = path

◆ problem_count

int rl.env.environment.Env.problem_count = 0

◆ problem_name

rl.env.environment.Env.problem_name = None

◆ problems_solved

int rl.env.environment.Env.problems_solved = 0

◆ sorted_problems

list rl.env.environment.Env.sorted_problems = []

◆ state

rl.env.environment.Env.state = None

◆ step_count

rl.env.environment.Env.step_count = 0

The documentation for this class was generated from the following file:

rl/env/environment.py

Public Member Functions

Public Attributes

Detailed Description

Constructor & Destructor Documentation

◆ __init__()

Member Function Documentation

◆ check_action_execution()

◆ get_max_steps()

◆ get_observation_high_level()

◆ get_reward()

◆ reset()

◆ reset_specific_problem()

◆ step()

Member Data Documentation

◆ base_index

◆ block_size

◆ check_action_map

◆ path

◆ problem_count

◆ problem_name

◆ problems_solved

◆ sorted_problems

◆ state

◆ step_count

◆ init()