An Easy Problem Demonstration

When it comes to the utilisation of AI in the real world, demonstrating an AI’s uses for the non-technical layman can be a real challenge. Towards this end, we created a learning result that allowed for this level of demonstration, with the ability to visualise an AI’s learning processes.

For the purposes of demonstration, we found that an environment utilizing Unity and ML-Agents would be the best fit for such a task — and in using these tools, we created a program that would work, for all intents and purposes, like your standard rhythm racing game.

As can be seen in the video footage above, a player proceeds along within a sci-fi space of sorts, with the player moving forward while avoiding walls and other obstacles — and if the player makes contact with point blocks, points are added to their score, whereas other block types cause an immediate game over.

One can essentially change the visual elements of the game into components for learning, in the same way as many other games, given training data from real world players interaction with the game, an AI can be trained to play the game, and get better with every attempt through Machine Learning.

We will list more specifics of our attempt below.

Production/Execution Environment

Unity 2018 2.14.f1

ML-Agents v0.5

Outline

In using Unity + ML-Agents, a basic reinforced learning environment is created for this purpose. Our objective was to not only establish a learning process that utilizes the steps for Unity environment creation and ML-Agents, but we also hoped to use this as a method for demonstrating basic learning methods.

Training Scenario Details

This scenario details a player who proceeds along an infinite 3D path with the objective of avoiding obstacles on the path for the longest period possible. Within the learning process, 1STEP refers to one frame of the screen at 60fps.

Because this is our first attempt, we made sure to create the scenario in a way that reduces variation in input patterns and allows for the ability to simply accumulate results as to allow the player to gain more definite learning results.

・The AI/Player selects one action to perform from the following: 0: Do nothing, 1: Move left, 2: Move right.

・The player can send out a ray in front of them towards 11 and send the recorded information as a Vector Observation.

・If the player character makes contact with the side wall, the episode ends and -1 is deducted from the player’s reward.

・If the player character makes contact with an obstacle block, the episode ends and -1 is deducted from the player’s reward.

・If the player character makes contact with a point block, the player’s reward is increased by 0.05.

Learning Method and Results

In order to demonstrate basic learning methods, in this attempt we trained the ML-Agent using a standard reinforced learning method (PPO) and RNN(LSTM) for 1,000,000 STEPS each, while observing reported actions. (We will prepare a comparison video regarding the below)

●Reinforced Learning (PPO): Reinforced Learning for Basic Frame Units

・For 100,000 steps, the number of missed point blocks and mistakes was high. Slight movements to right and left recorded frequently.

・For 200,000 steps, the number of missed point blocks were few, but mistakes occurred relatively often. Slight movements to right and left recorded frequently.

・For 500,000 steps, the number of missed point blocks were few, and mistakes occurred less often. Slight movements to right and left recorded frequently.

・For 1,000,000 steps, the number of missed point blocks were few, and mistakes occurred less often. Slight movements to right and left were still present.

●RNN(LSTM): Reinforced Learning Considering Time-Series for Past Frames

・For 100,000 steps, the number of missed point blocks and mistakes was high. Slight movements to right and left recorded.

・For 200,000 steps, the number of missed point blocks and mistakes was high. Slight movements to right and left recorded.

・For 500,000 steps, the number of missed point blocks and mistakes were both decreased. Slight movements to right and left recorded.

・For 1,000,000 steps, the number of missed point blocks and mistakes were both decreased. Slight movements to right and left noticeably decreased.

Current Problems and Points of Improvement Regarding Training/Learning

Both training methods led to cases in which the user would blatantly collide with obstacle blocks — as such, we would like to make adjustments to random parameters, implement reinforced learning, and observe changes from there.

We only used vector-based observation data for this occasion, so we would like to attempt learning again in using image-based recognition.

How to Create

・Copy the template directory within ML-Agent’s Example and rename

・Rename each script and make adjustments to the class names within the file

・Reassign changed scripts as appropriate

・Set the Brain mode to be Player and make it so you are able to operate the game on your own while making a game environment with Unity.

・Set the Brain mode to External and launch the training script with the command line and perform training

・Set the Brain mode to Internal and assign the Model File that came about through command-line training and confirm operability

In Closing

In the bitgrit community, we hope to offer a corner like this that covers subject matter like the ones listed above, ensuring for a learning environment that makes it easier for anyone to take part in, and to be involved in the world data scientist community.

Follow Bitgrit’s socials 📱 to stay updated on talks and upcoming competitions!

AI bitgrit DataScience Tech