rl_action_simulate.epsilonGreedy.Rd
This implementation of an 'epsilonGreedy' action selection
policy accepts a parameter epsilon
, which describes an agent's propensity
to explore the action space. The higher the epsilon, the more likely the
agent is to select a random action; the lower epsilon, the more likely the
agent is to select the exploitative action (one with highest expected
value).
# S3 method for epsilonGreedy
rl_action_simulate(policy = "epsilonGreedy", values, epsilon, ...)
Defines the action selection policy as "epsilonGreedy"; argument included in this method to support S3 Generics.
A numeric vector containing the current value estimates of each action.
A parameter (between zero and one) modulating the RL agent's propensity to explore. That is, the higher the epsilon, the less exploitative choices the RL agent will make.
Additional arguments passed to or from other methods.
A number representing which action will be taken.
# The lower the epsilon, the less exploration
exploit <- numeric(100)
for (trial in seq_along(exploit)) {
exploit[trial] <- rl_action_simulate(
policy = "epsilonGreedy",
values = c(0.2, 0.25, 0.15, 0.8),
epsilon = 0.1
)
}
# Choice 4 (0.8 is most optimal option) and we see it is selected the most
sum(exploit == 4)
#> [1] 95
# The higher the epsilon, the more exploration
explore <- numeric(100)
for (trial in seq_along(exploit)) {
explore[trial] <- rl_action_simulate(
policy = "epsilonGreedy",
values = c(0.2, 0.25, 0.15, 0.8),
epsilon = 0.8
)
}
# Choice 4 (0.8 is most optimal option) but we see more exploration here
sum(explore == 4)
#> [1] 34