Simulate an Action with a 'Softmax' Choice Policy — rl_action

This implementation of a 'Softmax' action selection policy accepts a choice temperature parameter tau, which describes an agent's propensity to explore the action space. The higher the tau (temperature), the more random the actions; the lower the tau (temperature), the more exploitative the actions (e.g., lower temperature increases the probability of taking the action with the highest expected value).

# S3 method for softmax
rl_action_simulate(policy = "softmax", values, tau, ...)

Arguments

policy: Defines the action selection policy as "softmax"; argument included in this method to support S3 Generics.
values: A numeric vector containing the current value estimates of each action.
tau: A choice temperature (greater than zero) parameter defining the exploration versus exploitation trade-off where higher tau (temperature) values lead to more uncertain choice distributions (more exploration) and lower tau (temperature) values lead to more certain choice distributions (more exploitation).
...: Additional arguments passed to or from other methods.

Value

A number representing which action will be taken.

Examples


# The smaller the tau, the less exploration
cold <- numeric(100)
for (trial in seq_along(cold)) {
  cold[trial] <- rl_action_simulate(
    policy = "softmax",
    values = c(0.2, 0.25, 0.15, 0.8),
    tau = 0.2
  )
}
# Choice 4 (0.8 is most optimal option) so we see it chosen most
sum(cold == 4)
#> [1] 87

hot <- numeric(100)
for (trial in seq_along(hot)) {
  hot[trial] <- rl_action_simulate(
    policy = "softmax",
    values = c(0.2, 0.25, 0.15, 0.8),
    tau = 5
  )
}
# Choice 4 (0.8 is most optimal option) but we see more exploration here
sum(hot == 4)
#> [1] 23