vw-bandit Python implementation of multi-armed bandit using epsilon-greedy exploration and reward-average sampling estimation