Year: 2009

Pyevolve, Python

Pyevolve in action, solving a 100 cities TSP problem

Here is the video of Pyevolve (the development version) optimizing a 100 cities TSP problem. I’ve used the Edge Recombination and the simple Swap Mutation method:

The video is a composition the of image outputs at every 100th generation and only when the score has changed when compared to the previous generation. I’ll release the source together with the new 0.6 release, planned to this month (December) and no later than January.

Best wishes and if I can’t post again until the next year, merry christmas and happy new year !!!

– Christian S. Perone

c, LLVM

A method for JIT’ing algorithms and data structures with LLVM

llvm_dragon

Hello folks, I always post about Python and EvoComp (Pyevolve), but this time it’s about C, LLVM, search algorithms and data structures. This post describes the efforts to implement an idea: to JIT (verb) algorithms and the data structures used by them, together.

AVL Tree Intro

Here is a short intro to AVL Trees from Wikipedia:

In computer science, an AVL tree is a self-balancing binary search tree, and it is the first such data structure to be invented. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; therefore, it is also said to be height-balanced. Lookup, insertion, and deletion all take O(log n) time in both the average and worst cases, where n is the number of nodes in the tree prior to the operation. Insertions and deletions may require the tree to be rebalanced by one or more tree rotations.

The problem and the idea

When we have a data structure and algorithms to handle (insert, remove and lookup) that structure, the native code of our algorithm is usually full of overhead; for example, in an AVL Tree (Balanced Binary Tree), the overhead appear in: checking if we really have a left or right node while traversing the nodes for lookups, accessing nodes inside nodes, etc. This overhead creates unnecessary assembly operations which in turn, creates native code overhead, even when the compiler optimize it. This overhead directly impacts on the performance of our algorithm (this traditional approach, of course, give us a very flexible structure and the complexity (not Big-O) is easy to handle, but we pay for it: performance loss).

(more…)

News, Pyevolve, Python, Science

Pyevolve on SIGEVOlution

SIGEVOlution200901WebCover

I’m proud to announce that Pyevolve is featuring on the last issue of SIGEVOlution (Volume 4, Issue 1), a newsletter from the ACM Special Interest Group on Evolutionary Computation. I would like to thank the newsletter editor Pier Luca Lanzi and the board for the corrections in the article and for the well done reformatted version of the paper.

Pyevolve is currently in version 0.5, in a few months I’ll be releasing the new 0.6 release with the new major features that are currently implemented in the development version only (you can check it at the subversion repository in sourceforge.net).

I hope you enjoy the article !

Yours,
– Christian S. Perone

Pyevolve, Python

Pyevolve benchmark on different Python flavors

So I did a comparative of Pyevolve GP/GA core in different Python interpreters. I’ve used my Pentium Core 2 Duo (E4500 @ 2.20GHz, 1GB RAM), using Ubuntu 9.04 and Windows XP SP3 just for IronPython 2.6.1 (IronPython doesn’t run with Mono, so I used the win xp with .net 2.0).

The interpreters used were:

Unladen Swallow 2009Q2

I tried using 2009Q3 (the currently main trunk), but I think it’s unstable yet, cause it was more slow than 2009Q2, so I used 2009Q2; I compiled it with GCC 4.3.3 just using the default configure parameters (./configure).

CPython 2.6.2

I used the default CPython package of Ubuntu 9.04.

CPython 2.5.4

I used the default CPython package of Ubuntu 9.04 too, the python2.5 package.

PyPy 1.1.0 (svn:r68612)

I used the last svn version of the repository, the release 68612. My Pentium Core 2 Duo had only 1GB of RAM, and the PyPy translation process eats more RAM than Java (sorry for the joke), so I used a notebook with 3GB of RAM to create the pypy-c, what took 1 hour (I used –opt=3) and a beautiful ascii Mandelbrot fractal !

Jython 2.5.1

I used the default installer from the Jython project site. I used the Sun JRE 1.6.0_16.

IronPython 2.6.10920.0

I’ve used the 2.6 RC1 available at IronPython project site with MS .NET 2.0.

To test the GA core I’ve used this source-code (a simple sphere function):

from pyevolve import G1DList
from pyevolve import Mutators, Initializators
from pyevolve import GSimpleGA, Consts

# This is the Sphere Function
def sphere(xlist):
   total = 0
   for i in xlist:
      total += i**2
   return total

def run_main():
   genome = G1DList.G1DList(140)
   genome.setParams(rangemin=-5.12, rangemax=5.13)
   genome.initializator.set(Initializators.G1DListInitializatorReal)
   genome.mutator.set(Mutators.G1DListMutatorRealGaussian)
   genome.evaluator.set(sphere)

   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(1500)
   ga.setMutationRate(0.01)
   ga.evolve(freq_stats=500)

   best = ga.bestIndividual()

if __name__ == "__main__":
   run_main()

And to test the GP core, I’ve used this source-code (a simple symbolic regression):

from pyevolve import GTree
from pyevolve import Mutators
from pyevolve import GSimpleGA, Consts, Util
import math

rmse_accum = Util.ErrorAccumulator()

def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))

def eval_func(chromosome):
   global rmse_accum
   rmse_accum.reset()
   code_comp = chromosome.getCompiledCode()

   for a in xrange(0, 10):
      for b in xrange(0, 10):
         evaluated     = eval(code_comp)
         target        = math.sqrt((a*a)+(b*b))
         rmse_accum   += (target, evaluated)
   return rmse_accum.getRMSE()

def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=4, method="ramped")
   genome.evaluator += eval_func
   genome.mutator.set(Mutators.GTreeGPMutatorSubtree)

   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setParams(gp_terminals       = ['a', 'b'],
                gp_function_prefix = "gp")

   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(40)
   ga.setCrossoverRate(1.0)
   ga.setMutationRate(0.08)
   ga.setPopulationSize(800)

   ga(freq_stats=10)
   best = ga.bestIndividual()

if __name__ == "__main__":
   main_run()

UPDATE 19/08: the x-axis is measured in “seconds“, and the y-axis is the python flavor;

The results are are described in the graph below:

pyevolve_pyvmsAs we can see, Unladen Swallow 2009Q2 did a little better performance than CPython 2.6.2, but Jython and PyPy (experimental) were left behind in that scenario, even behind IronPython 2.6.1.

genetic programming, Pyevolve, Python

Successful pyevolve multiprocessing speedup for Genetic Programming

As we know, Genetic Programming usually requires intensive processing power for the fitness functions and tree manipulations (in crossover operations), and this fact can be a huge problem when using a pure Python approach like Pyevolve. So, to overcome this situation, I’ve used the Python multiprocessing features to implement a parallel fitness evaluation approach in Pyevolve and I was surprised by the super linear speedup I got for a cpu bound fitness function used to do the symbolic regression of the Pythagoras theorem: c = \sqrt{a^2 + b^2}. I’ve used the same seed for the GP, so it has consumed nearly the same cpu resources for both test categories. Here are the results I obtained:

pyevolve_multiprocessing

The first fitness landscape I’ve used had 2.500 points and the later had a fitness landscape of 6.400 points, here is the source code I’ve used (you just need to turn on the multiprocessing option using the setMultiProcessing method, so Pyevolve will use multiprocessing when you have more than one single core, you can enable the logging feature to check what’s going on behind the scenes):

from pyevolve import *
import math

rmse_accum = Util.ErrorAccumulator()

def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))

def eval_func(chromosome):
   global rmse_accum
   rmse_accum.reset()
   code_comp = chromosome.getCompiledCode()

   for a in xrange(0, 80):
      for b in xrange(0, 80):
         evaluated     = eval(code_comp)
         target        = math.sqrt((a*a)+(b*b))
         rmse_accum   += (target, evaluated)
   return rmse_accum.getRMSE()

def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=4, method="ramped")
   genome.evaluator += eval_func
   genome.mutator.set(Mutators.GTreeGPMutatorSubtree)

   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setParams(gp_terminals       = ['a', 'b'],
                gp_function_prefix = "gp")

   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(20)
   ga.setCrossoverRate(1.0)
   ga.setMutationRate(0.08)
   ga.setPopulationSize(800)
   ga.setMultiProcessing(True)

   ga(freq_stats=5)
   best = ga.bestIndividual()

if __name__ == "__main__":
   main_run()

As you can see, the population size was 800 individuals with a 8% mutation rate and a 100% crossover rate for a simple 20 generations evolution. Of course you don’t need so many points in the fitness landscape, I’ve used 2.500+ points to create a cpu intensive fitness function, otherwise, the speedup can be less than 1.0 due the communication overhead between the processes. For the first case (2.500 points fitness landscape) I’ve got a 3.33x speedup and for the last case (6.400 points fitness landscape) I’ve got a 3.28x speedup. The tests were executed in a 2 cores pc (Intel Core 2 Duo).

Genetic Algorithms, News, Science

Meanwhile, at the Hall of Justice!

UPDATE 05/10: there is an article in the Physorg too.

Sometimes we face new applications for EC, but for this I was not expecting, from Eurekalert:

WASHINGTON, Oct. 5 — Criminals are having a harder time hiding their faces, thanks to new software that helps witnesses recreate and recognize suspects using principles borrowed from the fields of optics and genetics.

(…)

His software generates its own faces that progressively evolve to match the witness’ memories. The witness starts with a general description such as “I remember a young white male with dark hair.” Nine different computer-generated faces that roughly fit the description are generated, and the witness identifies the best and worst matches. The software uses the best fit as a template to automatically generate nine new faces with slightly tweaked features, based on what it learned from the rejected faces.

“Over a number of generations, the computer can learn what face you’re looking for,” says Solomon.

Read the full article here.

Python, Time Waste

Beautiful Django

The ugly web is over; the trick is to add a Django middleware to process every HttpResponse (with content-type text/html) of Django using BeautifulSoup. The source-code of the middleware is simple:

from BeautifulSoup import BeautifulSoup

class BeautifulMiddleware(object):
    def process_response(self, request, response):
        if response.status_code == 200:
            if response["content-type"].startswith("text/html"):
                beauty = BeautifulSoup(response.content)
                response.content = beauty.prettify()
        return response

We simple check for HTTP response code 200 and then check for a “text/html” content and use BeautifulSoup to process the response. See an example of what it does:

1) I’d a html in my Django application, very ugly and with missing tags:

imagem

This HTML template will be rendered as showed above by Django without the BeautifulSoup middleware, but with the middleware pluged in the settings of your Django app, it will render that html source:

imagem2

BeautifulSoup has figured out sensible places to put the closing tags of the HTML source and has created a pretty indented structure, automagically =)

It’s very easy and interesting create new django middlewares, examples can be JavaScript obfuscators, compressors, automatic performance analysis of html code to improve the render speed of browser and these sort of things.

News, Science

On the irreversibility of evolution

evo_comic

Today I’ve read about an important work done by a team of evolutionary biologists of the University of Oregon, which reveals an important result about the evolutionary irreversibility. The concept of irreversibility states that the future results of evolution at any point in time must depend on the present state and by the past, showing the determinism of evolution; on the other hand, the evolution reversibility dictates that the natural selection can produce the same forms in any given environment, independent of history.

This question about the irreversibility of evolution has remained unsolved because of the fact that we rarely know what features the ancestors had and what the mechanisms was used to evolve into the actual organisms, but the team of Joe Thornton has solved those issues by studying the problem at the molecular level, resurrecting ancestral proteins (GR1) as they existed long ago and using manipulation to study evolutionary process in two directions: forward and reverse.

The results of the work done by the research team was:

Our observations suggest that history and contingency during glucocorticoid receptor evolution strongly limited the pathways that could be deterministically followed under selection.

(…)

Selection is an extraordinarily powerful evolutionary force; nevertheless, our observations suggest that, because of the complexity of glucocorticoid receptor architecture, low-probability permissive substitutions were required to open some mutational trajectories to exploration under selection, whereas restrictive substitutions closed other potential paths. Under selection, some kind of adaptation will always occur, but the specific adaptive forms that are realized depend on the historical trajectory that precedes them. The conditions that once facilitated evolution of the glucocorticoid receptor’s ancestors were destroyed during the realization of its present form. The past is difficult to recover because it was built on the foundation of its own history, one irrevocably different from that of the present and its many possible futures.

So my friend, that’s the way nature evolve, possible never looking back. But this is a great new step for future works and research on the irreversibility of evolution.

References

[1] http://www.nature.com/nature/journal/v461/n7263/abs/nature08249.html
[2] http://www.uoregon.edu/~joet/PDF/bridgham-thornton-nature2009.pdf
[3] http://sciencenow.sciencemag.org/cgi/content/full/2009/923/1

I'm starting a new course "Machine Learning: Foundations and Engineering" for 2024.