Simulate an Action with a 'Epsilon-Greedy' Choice Policy — rl_action

This implementation of an 'epsilonGreedy' action selection policy accepts a parameter epsilon, which describes an agent's propensity to explore the action space. The higher the epsilon, the more likely the agent is to select a random action; the lower epsilon, the more likely the agent is to select the exploitative action (one with highest expected value).

# S3 method for epsilonGreedy
rl_action_simulate(policy = "epsilonGreedy", values, epsilon, ...)

Arguments

policy: Defines the action selection policy as "epsilonGreedy"; argument included in this method to support S3 Generics.
values: A numeric vector containing the current value estimates of each action.
epsilon: A parameter (between zero and one) modulating the RL agent's propensity to explore. That is, the higher the epsilon, the less exploitative choices the RL agent will make.
...: Additional arguments passed to or from other methods.

Value

A number representing which action will be taken.

Examples


# The lower the epsilon, the less exploration
exploit <- numeric(100)
for (trial in seq_along(exploit)) {
  exploit[trial] <- rl_action_simulate(
    policy = "epsilonGreedy",
    values = c(0.2, 0.25, 0.15, 0.8),
    epsilon = 0.1
  )
}
# Choice 4 (0.8 is most optimal option) and we see it is selected the most
sum(exploit == 4)
#> [1] 95

# The higher the epsilon, the more exploration
explore <- numeric(100)
for (trial in seq_along(exploit)) {
  explore[trial] <- rl_action_simulate(
    policy = "epsilonGreedy",
    values = c(0.2, 0.25, 0.15, 0.8),
    epsilon = 0.8
  )
}
# Choice 4 (0.8 is most optimal option) but we see more exploration here
sum(explore == 4)
#> [1] 34