Dear author, thanks for making the code available.
I have two questions regarding ICQ_softmax, where the weight is approximated by the softmax over the minibatch:
- Why len(weights), e.g., this line, is needed to scale the softmax distribution?
- Why the softmax is performed wrt the TD error in this line, instead of wrt the Q value suggested in the paper?
Thanks!
Dear author, thanks for making the code available.
I have two questions regarding ICQ_softmax, where the weight is approximated by the softmax over the minibatch:
Thanks!