rl_action_simulate.softmax.Rd
This implementation of a 'Softmax' action selection policy
accepts a choice temperature parameter tau
, which describes an agent's
propensity to explore the action space. The higher the tau (temperature),
the more random the actions; the lower the tau (temperature), the more
exploitative the actions (e.g., lower temperature increases the probability
of taking the action with the highest expected value).
# S3 method for softmax
rl_action_simulate(policy = "softmax", values, tau, ...)
Defines the action selection policy as "softmax"; argument included in this method to support S3 Generics.
A numeric vector containing the current value estimates of each action.
A choice temperature (greater than zero) parameter defining the exploration versus exploitation trade-off where higher tau (temperature) values lead to more uncertain choice distributions (more exploration) and lower tau (temperature) values lead to more certain choice distributions (more exploitation).
Additional arguments passed to or from other methods.
A number representing which action will be taken.
# The smaller the tau, the less exploration
cold <- numeric(100)
for (trial in seq_along(cold)) {
cold[trial] <- rl_action_simulate(
policy = "softmax",
values = c(0.2, 0.25, 0.15, 0.8),
tau = 0.2
)
}
# Choice 4 (0.8 is most optimal option) so we see it chosen most
sum(cold == 4)
#> [1] 87
hot <- numeric(100)
for (trial in seq_along(hot)) {
hot[trial] <- rl_action_simulate(
policy = "softmax",
values = c(0.2, 0.25, 0.15, 0.8),
tau = 5
)
}
# Choice 4 (0.8 is most optimal option) but we see more exploration here
sum(hot == 4)
#> [1] 23