genetic programming, Pyevolve, Python

Genetic Programming meets Python

I’m proud to announce that the new versions of Pyevolve will have Genetic Programming support; after some time fighting with these evil syntax trees, I think I have a very easy and flexible implementation of GP in Python. I was tired to see people giving up and trying to learn how to implement a simple GP using the hermetic libraries for C/C++ and Java (unfortunatelly I’m a Java web developer hehe).

The implementation is still under some tests and optimization, but it’s working nice, here is some details about it:

The implementation has been done in pure Python, so we still have many bonus from this, but unfortunatelly we lost some performance.

The GP core is very very flexible, because it compiles the GP Trees in Python bytecodes to speed the execution of the function. So, you can use even Python objects as terminals, or any possible Python expression. Any Python function can be used too, and you can use all power of Python to create those functions, which will be automatic detected by the framework using the name prefix =)

As you can see in the source-code, you don’t need to bind variables when calling the syntax tree of the individual, you simple use the “getCompiledCode” method which returns the Python compiled function ready to be executed.

Here is a source-code example:

from pyevolve import *
import math

error_accum = Util.ErrorAccumulator()

# This is the functions used by the GP core,
# Pyevolve will automatically detect them
# and the they number of arguments
def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))

def eval_func(chromosome):
   global error_accum
   error_accum.reset()
   code_comp = chromosome.getCompiledCode()

   for a in xrange(0, 5):
      for b in xrange(0, 5):
         # The eval will execute a pre-compiled syntax tree
         # as a Python expression, and will automatically use
         # the "a" and "b" variables (the terminals defined)
         evaluated     = eval(code_comp)
         target        = math.sqrt((a*a)+(b*b))
         error_accum += (target, evaluated)
   return error_accum.getRMSE()

def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=5, method="ramped")
   genome.evaluator.set(eval_func)

   ga = GSimpleGA.GSimpleGA(genome)
   # This method will catch and use every function that
   # begins with "gp", but you can also add them manually.
   # The terminals are Python variables, you can use the
   # ephemeral random consts too, using ephemeral:random.randint(0,2)
   # for example.
   ga.setParams(gp_terminals       = ['a', 'b'],
                gp_function_prefix = "gp")
   # You can even use a function call as terminal, like "func()"
   # and Pyevolve will use the result of the call as terminal
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(1000)
   ga.setMutationRate(0.08)
   ga.setCrossoverRate(1.0)
   ga.setPopulationSize(2000)
   ga.evolve(freq_stats=5)

   print ga.bestIndividual()

if __name__ == "__main__":
   main_run()

I’m very happy and testing the possibilities of this GP implementation in Python.

And of course, everything in Pyevolve can be visualized any time you want (click to enlarge):

ramped_small

ramped_big

The visualization is very flexible too, if you use Python decorators to set how functions will be graphical represented, you can have many interesting visualization patterns. If I change the function “gp_add” to:

@GTree.gpdec(representation="+", color="red")
def gp_add(a, b): return a+b

We’ll got the follow visualization (click to enlarge):

full

I hope you enjoyed it, I’m currently fixing some bugs, implementing new features, docs and preparing the next release of Pyevolve, which will take some time yet =)

Genetic Algorithms, Time Waste

The Darwin’s cake experiment

Suppose that you are the owner of a famous bakery, and you have a recipe of a really delicious cake which is well known and desired by many of your clients. Is in this scene that enters the Darwin’s cake experiment.

Suppose that you also have nearly 1.000 clients (you are very famous hehe) that you can send new cakes done by you with different amounts of ingredients and these same clients will return to you how much they liked the new cake recipe in a rating between 1 and 10 in a way to know what is the most popular desired taste.

So I was thinking, this is an optimization problem. Your problem is to find the almost “perfect” amouts of each ingredient of the cake for you most popular clients taste. If we use a Genetic Algorithm to solve this optimization problem, we can imagine some like this:

Create, let’s say, 1.000 cakes (the individuals) with random amounts of ingredients and send them to clients evaluation (fitness function), and then take the rating returned by your clients (the fitness). So you can now create a new generation of cake recipes by applying the genetic operators on the the first generation based on the clients ratings and so on.

This is just a joke, but if a big company decides to make it real, I think it’ll be very funny and they will create the first computer-generated cake !

I was thinking too, if things like this can be done to chemical products; you can do experiments in an automated way, this is a very interesting research field for robotics and AI =)

News, Python, Science

Prime Numbers and the Benford’s Law

Today, I read a news article from the Physorg.com about the new pattern found in the Prime Numbers, the article talks about the new discovery by Bartolo Luque and Lucas Lacasa:

In a recent study, Bartolo Luque and Lucas Lacasa of the Universidad Politécnica de Madrid in Spain have discovered a new pattern in primes that has surprisingly gone unnoticed until now. They found that the distribution of the leading digit in the prime number sequence can be described by a generalization of Benford’s law.

I was very surprised by the fact that nobody have noticed that before and after read the original paper (if you are interested, read it) describing the new patterns discovered, I was very impressed and impatient to see it in pratice !

The new pattern discovered is based on the so-called GBL (Generalized Benford’s Law), which you can see in the paper at the Eq 3.1:

gbl

Where the P(d) means the probability of appearance of the leading digit d. The alpha is the exponent of the original power law distribution (for alpha = 1, the GBL reduces to the Benford’s law).

The authors says that for a given integer interval of [1,N], there exists a particular value alpha(N) for which the GBL fits with extremely good accuracy the first digit distribution of the primes appearing in that interval and showing the functional relation between alpha and N in the Eq 3.2:

functional

Where a = 1.10 +- 0.05 for large values of N. They also cite a GBL extension, but I’ll use just these formulae to plot our distributions.

So I have implemented these formulae into the simple pybenford module as follows:

def gbl(alpha, digit):
   return 1/(10**(1-alpha)-1)*((digit + 1)**(1-alpha)-digit**(1-alpha))

def calc_alpha(n, a=1.10):
   return 1/(math.log(n)-a)

def gbenford_law(alpha):
   return [gbl(alpha, digit)*100.0 for digit in xrange(1,10)]

For the reason that we are using an infinite integer sequence, we must always pick the sequence interval [1, N] where N = 10^D  (see the  Natural Density section of the paper for more information).

The next step is to create a list of prime numbers between an arbitrary interval of D=8, or [1,10^8]. In this step I used the Sieve (see more information) utility to create a file with the generated prime numbers in the cited interval, I used the follow command to get this file output:

sieve2310.exe -s 1 -e 100000000  >>sieve_n8.txt

The sieve is very fast, this will create the file “sieve_n8.txt” with nearly 66MB (don’t worry, it’s a very fast generation, it took 8 seconds for me using a Intel Core 2 Duo 2GHz).

And we are ready to use Python and pybenford to read the prime numbers, calculate the leading digits frequency and plot our result ! Here is the code I created:

import pybenford

sieve_file = open("sieve_n8.txt", "r")
prime_list = [int(prime) for prime in sieve_file]
sieve_file.close()

alpha              = pybenford.calc_alpha(10**8)
benford_law        = pybenford.gbenford_law(alpha)
prime_distribution = pybenford.calc_firstdigit(prime_list)
pybenford.plot_comparative(prime_distribution, benford_law, "Prime Numbers")

And voilà, here is the output plot showing an extremely good accuracy claimed by paper authors (click on the image to enlarge):

prime_plot

The plotting of the distributions (click to enlarge)

If you are interested on Benford’s law, there are some posts about it here and here.

I hope you liked this =)

UPDATE 10/05: Mike Loukides did a good work generalizing for other bases, thank you for sharing your experiment Mike.

UPDATE 08/08 (lol): There are many more comments about this post on Reddit, see here.

Genetic Algorithms, News, Science

Evolving autopilots could boost space slingshots

From the NewScientist article:

COULD space probes use genetic algorithms as autopilots to help them navigate the complexities of the solar system?

Deep-space missions such as NASA’s veteran z Voyager probes often rely on gravity assists. They use a planet’s gravitational field as a slingshot, which allows them to visit other celestial bodies without using up too much fuel. But programming a probe with its trajectory years ahead of time can be a problem, says Ian Carnelli of the European Space Agency in Noordwijk, the Netherlands.

Missed launch windows, unexpected winds and misbehaving rockets mean that probes hardly ever leave Earth in the planned position or velocity, and radiation pressure from solar flares can perturb the craft’s course in deep space. If the probe is out of position when it starts a gravity-assisted manoeuvre, the slingshot will be inefficient.

In the Journal of Guidance, Control and Dynamics (DOI: 10.2514/1.32633), Carnelli and colleagues Bernd Dachwald and Massimiliano Vasile suggest that a probe could navigate for itself using a genetic algorithm (GA).

(…)

Carnelli likens this to hundreds of virtual pilots flying simulated spacecraft, with the GA disposing of those that waste fuel or steer a slow course, while “breeding” the best ones together, a process akin to natural selection. “After hundreds of generations of the GA you obtain a ‘pilot’ that is an extremely good performer – able to fly the assist trajectory that uses the least propellant while reaching the next target planet faster,” he says. Carnelli has run successful simulations of GA-enabled missions to Mercury via Venus, and Pluto via Jupiter.

(…)

Read the full article.

News, Science

‘Evolutionary Algorithms’ Mimic Natural Evolution In Silico And Lead To Innovative Solutions For Complex Problems

An interesting news article was recently published by Science Daily, it talks mainly about the use of Evolutionary Algorithms to solve some complex problems like resource management in low rainfall regions, building bricks and automotive electronics:

Extensive resource management is required in low rainfall regions, where groundwater reserves are rare and must be tapped with great care. Various factors must be taken into account: How the ground water interacts with its environment, where drilling must be performed without disadvantaging neighbours, how the ground water can be protected over a long period of time, and how the development costs can be kept as low as possible: This complex application problem was examined by Tobias Siegfried and Wolfgang Kinzelbach, professor at the Institute for Environmental Engineering at the ETH Zurich, with the help of simulated evolution (…)

090502091200-large

Perfected tower construction with the help of Evolutionary Algorithms.
(Credit: Johannes Bader /ETH Zürich)

Read the full article.

Python, Time Waste

Delicious.com, checking user numbers against Benford’s Law

Sometimes, Benford’s Law is used to check some datasets and detect fraud. If a dataset which is supposed to follow the Benford’s Law distribution diverges from the law, we can say that the dataset is a possible fraud (caution with assumptions, and please, note the word “possible” here).

So I had an idea to check the number of users of the Delicious.com website, which is supposed to follow the Benford’s Law. I processed the tag “programming” and I got 40 pages of links with 376 user numbers from links of the Delicious.com. So, here is the plot:

delicious_plot

As we can see on the graph, the correlation was 0.95 (between -1 and 1), so we can say (really !!), Delicious.com is not lying about the user numbers on the links =) does anyone knows some suspect sites ?

Follow the source-code of the Python program, it uses a simple regex to get the user numbers from the pages:

import pybenford
import re
import urllib
import time

PAGES         = 40
DELICIOUS_URL = "http://delicious.com/tag/programming?page=%d"

reg         = re.compile('(\d+)', re.DOTALL |  re.IGNORECASE)
users_set = []

for i in xrange(1, PAGES+1):
   print "Reading the page %02d of %02d..." % (i, PAGES),
   site_handle = urllib.urlopen(DELICIOUS_URL % i)
   site_data   = site_handle.read()
   site_handle.close()
   map_to_int = map(int, reg.findall(site_data))
   print "%02d records!" % len(map_to_int)
   users_set.extend(map_to_int)
   time.sleep(5) # Be nice with servers !

print "Total records: %d" % len(users_set)

benford_law   = pybenford.benford_law()
digits_scale = pybenford.calc_firstdigit(users_set)
pybenford.plot_comparative(digits_scale, benford_law, "Delicious.com")