Genetic Algorithms in one night
07 Aug 2016TL;DR
Tonight I decided to implement a genetic algorithm to take care of the parameter optimization for my master thesis. I wanted to do it in one night, but ended up spending a couple of hours on the following day fixing some stuff. You can find the resulting code here.
Why? How?
I have to optimize 5 models, each with more than 4 continuous parameters for my master thesis and I got tired of hand tuning them.
So I decided to spend one night trying to implement a Genetic algorithm to optimize the parameters for me. Up until now I had been doing a mix of a grid search and hand picking parameter values to try to improve on the RMSE score of my models.
I have been working in Python 2.7 on my thesis so I decided to use and to make the implementation generic so it could be used in other projects.
I decided to write this post to help me organize my thoughts while implementing and to see if I would get some feedback on my implementation.
Sources
Most of the ideas for the algorithms came from what I remembered of my Neural Networks class at TU Delft where we covered the topic and from wikipedia, of course.
What do I need from the models to optimize?
First I need to define what inputs I need from the problem to optimize.
- Objective function: The function that we want to minimize, I’ll treat all problems as a minimization problem.
- input: parameter tuple;
- output: score from 0 to 1;
- Parameter Space The list of valid values for each parameter, this will be a list of lists in Python. Let’s consider only discrete values, as a continuous parameter can be discretized.
What do I need to implement?
I will divide the code into two classes, the main GeneticOptimizer class and the Individual class, where each instance represents a possible parameter combination and it’s score on the objective function.
Individual Class
The Individual class represents each of the solutions tested on the objective function, it serves to store the parameter combination and the achieved score.
Attributes
An individual consists of:
- genome: parameter tuple;
- score: Objective function value if already known;
Methods
- get_score: Returns score of Individual
- set_score: Sets score of Individual
- to_string: Represents Individual in string format
Genetic algorithm class
Attributes
The main Attributes that need to be stored by the GeneticOptimizer class are:
- paremeter_space: The list of lists of valid parameter values;
-
objective_func: The function that receives the parameter combination and outputs the score, from 0 to 1;
-
mutation_prob: probability that a individual mutates during a generation;
- pop_size: Number of individuals in the population;
- population: List of individuals in this generation;
-
genome_set: Set of different genomes present in the population;
- pool: A process pool, to allow for parallel execution of the algorithm.
Methods
- generate_individual: Sample parameter space to generate element for population;
- output: One valid parameter tuple to add to test population;
- crossover: Mix two individuals to create next generation individual;
- input: 2 parameter tuples, score of each individual;
-
output: New individual;
For the crossover instead of just giving equal probability to the parameters of each individual I decided to give a probability based on their score:
- mutate: Change one individual to replicate the mutation mechanism that exists in nature;
- input: Individual to be mutated;
- output: New individual;
-
score_population: Use the objective function to find the score of every individual in the population. Here I added a small trick: as my objective functions are costly to evaluate, some take more than 30 min, I added a dictionary of past parameters and scores as a form of memoization of the results.
-
update_population: The core of the algorithm, from the current population creates the next generation using different mechanisms. The process is as follows:
- Define the number of individuals generated by each mechanism, I set the following percentages of the population size:
- 30% Elitists - the best performing individuals of the previous generation
- 50% Crossed individuals - Individuals generated by crossover of the elitists;
- 20% Random new individuals - Generated by randomly sampling the parameter space;
- Generate the individuals with each mechanism, and making sure __no two individuals in the population have the same genome;
- Look through the individuals in the new generation and with probability mutate_prob mutate them.
- Define the number of individuals generated by each mechanism, I set the following percentages of the population size:
Results
I’m pretty happy with the results I got. It served as a refresher on genetic algorithms and it was fun to implement. I spent a bit more time on this than what I was planing, mostly because of the improvements I did after the first runs and due to the writing of this blog post.
This is a very simple post, I just wanted to write down a bit of the process. If you have any questions feel free to contact me at jmirandadealme on gmail.
All the code can be found here.