Possible bug in the calculation of the state space

It seems that you assume that the state space has `8 * 8 * 2` states. We have a `2 * 4 = 8` grid, so one might think that there are 8 ways to place A, 8 ways to place B, and then there are 2 ways to place the ball. However, if you use this approach, you assume that A, B and the ball can be placed in the same cell. In the original Littman's paper (minimax Q-learning), this is not the case. A, B and the ball must always be in different cells - so, in that case, the correct number of states is `8 * 7 * 2 = 112`. 

I'm trying to understand why you defined the state space like `self.state_space = (8, 8, 2)`, which seems to suggest that 

1. you calculated the state space wrongly, or 
2. maybe I am misinterpreting what the variable `self.state_space` is supposed to represent. 
3. you allow players to be in the same cell

You write in the comments like this `self.state_space: <num of variable1, num of variable2, num of variable3>`, but this is unclear to me. You use `state_space` to define the Q-functions in the agent. Clearly, these should be represented as multi-dimensional arrays, such that each entry in the array corresponds to a tuple `(a1, a2, state)`, so I think that makes sense.

Could you please clarify what is your approach to define the state space, and how does that affect e.g. the definition of the Q-function and its shape?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible bug in the calculation of the state space #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Possible bug in the calculation of the state space #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions