DDPG: Score vs Episodes
Some of these models have been deployed on Flux's website. CartPole example has been trained on Deep Q Networks. An Atari-Pong example will also be added in a few days. It is trained on Duel-DQN. Here is a demo of Pong trained using Flux.
Add more variety of models, especially the ones which have come up in the last 18 months.
Create an interface to easily train and test any environment from OpenAIGym.jl.
This mini-project of the GSoC phase 2 was the most challenging part. AlphaGo Zero is a breakthrough AI by Google DeepMind. It is an AI to play Go, which is considered to be one of most challeenging games in the world, mainly due to number of states it can lead to. AlphaGo Zero defeated the best Go player in the world. AlphaFo.jl's objective is achieve the results produced by AlphaGo Zero algorithm over Go, and achieve similar results on any zero-sum game.
Now, we have a package to train AlphaGo zero model in Julia! And it is really simple to train the model. We just have to pass the training parameters, the environment on which we want to train the model and then play with it. For more info in the AlphaGo.jl refer to the blog post.
Game of Go
Monte Carlo tree search
Couldn't train the model well
Game of Gomoku to test the algorithm (since it is easier game)
Train a model on any game
Generative Adversarial Networks
Generative Adversarial Networks
GANs have been extremely suceessful in learning the underlying representation of any data. By doing so, it can reproduce some fake data. For example the GANs trained on MNIST Human handwritten digits dataset can produce some fake images which look very similar to those in the MNIST. These neural nets have great application in image editing. It can remove certain features from the image, add some new ones; depending on the dataset. The GANs contain of two networks: generator and discriminator. Generator's objective os to generate fake images awhereas the discriminator's objective is to differentiate between the fake images generted by thhe generator and the real images in the dataset.
More models of GAN like infoGAN, BEGAN, CycleGAN
Some cool animations with GANs
Data pipeline for training and producing images with dataset, and GAN type as input.
Decoupled Neural Interface
Decoupled Neural Interface is a new technique to train the model. It does not use the backpropagation from the output layer right upto the input layer. Instead it uses a trick to "estimate" the gradient. It has a small linear layer neural network to predict the gradients, instead of running the backpropagation rather than finding the true gradients. The advantage of such a model is that it can be parallelized. This technique results in slight dip in the accuracy, but we have improved speed if we have parallelized the layers in the network.
During the past three months, I learn a lot about Reinforcement Learning and AlphaGo in particular. I experienced training an RL model for days, finally saw it working well! I encountered the issues faced in training the models and learnt to overcome them. All in all, as an aspiring ML engineer these three months have been the most productive months. I am glad that I could meet most of my objectives. I have worked on some extra models to make up for the objectives I could not meet.
I really would like to thank my mentor Mike Innes for guiding me throughout the project, and James Bradbury for his valuable inputs for improving the code in the Reinforcement Learning models. I also would like to thank Neethu Mariya Joy for deploying the trained models on the web. And last but not the least, The Julia Project and NumFOCUS: for sponsoring me and all other JSoC students for JuliaCon'18 London.