Today, we wrapped up the WEP and asked any lingering questions we had and worked on an independent project for the rest of the day. Naturally, I decided to try to figure out the issue with CartPole. The most likely issue is a min and max switched up but I feel like I’ve checked them all so I decided to try something else – seeding the environment. Basically, every time the environment is reset at the beginning of each episode that the agent plays, the initial condition is randomized. However, by seeding it, the environment is reset to the same condition every time. Minecraft’s world generation operates on a similar base concept. If no seed is given, then the world is generated randomly with a random seed each game but if a seed is given, then the world generated will be constant regardless of the number of times the seed is used. By seeding the environment, I’ll be able to tell if the issue is in the agent being too dumb to get a general ‘understanding’ of how to win in randomly generated environments but smart enough to figure out a solution after playing a single environment enough or if the issue is actually just something critically wrong with my code. It’s been running for a bit and I’m feeling a bit optimistic because the score seems to be consistently around 20 points and slowly rising, although there’s still the chance it’ll slowly get down to 8-10 points again.
Update: After a bit more training, the reward is approaching like 10 points again so I think the issue is in my code rather than the seeding/generation of the environment. Debug time!