Terra Incognita

Successful pyevolve multiprocessing speedup for Genetic Programming

As we know, Genetic Programming usually requires intensive processing power for the fitness functions and tree manipulations (in crossover operations), and this fact can be a huge problem when using a pure Python approach like Pyevolve. So, to overcome this situation, I’ve used the Python multiprocessing features to implement a parallel fitness evaluation approach in Pyevolve and I was surprised by the super linear speedup I got for a cpu bound fitness function used to do the symbolic regression of the Pythagoras theorem: $c = \sqrt{a^2 + b^2}$ . I’ve used the same seed for the GP, so it has consumed nearly the same cpu resources for both test categories. Here are the results I obtained:

The first fitness landscape I’ve used had 2.500 points and the later had a fitness landscape of 6.400 points, here is the source code I’ve used (you just need to turn on the multiprocessing option using the setMultiProcessing method, so Pyevolve will use multiprocessing when you have more than one single core, you can enable the logging feature to check what’s going on behind the scenes):

from pyevolve import *
import math

rmse_accum = Util.ErrorAccumulator()

def gp_add(a, b): return a+b
def gp_sub(a, b): return a-b
def gp_mul(a, b): return a*b
def gp_sqrt(a):   return math.sqrt(abs(a))

def eval_func(chromosome):
   global rmse_accum
   rmse_accum.reset()
   code_comp = chromosome.getCompiledCode()

   for a in xrange(0, 80):
      for b in xrange(0, 80):
         evaluated     = eval(code_comp)
         target        = math.sqrt((a*a)+(b*b))
         rmse_accum   += (target, evaluated)
   return rmse_accum.getRMSE()

def main_run():
   genome = GTree.GTreeGP()
   genome.setParams(max_depth=4, method="ramped")
   genome.evaluator += eval_func
   genome.mutator.set(Mutators.GTreeGPMutatorSubtree)

   ga = GSimpleGA.GSimpleGA(genome, seed=666)
   ga.setParams(gp_terminals       = ['a', 'b'],
                gp_function_prefix = "gp")

   ga.setMinimax(Consts.minimaxType["minimize"])
   ga.setGenerations(20)
   ga.setCrossoverRate(1.0)
   ga.setMutationRate(0.08)
   ga.setPopulationSize(800)
   ga.setMultiProcessing(True)

   ga(freq_stats=5)
   best = ga.bestIndividual()

if __name__ == "__main__":
   main_run()

As you can see, the population size was 800 individuals with a 8% mutation rate and a 100% crossover rate for a simple 20 generations evolution. Of course you don’t need so many points in the fitness landscape, I’ve used 2.500+ points to create a cpu intensive fitness function, otherwise, the speedup can be less than 1.0 due the communication overhead between the processes. For the first case (2.500 points fitness landscape) I’ve got a 3.33x speedup and for the last case (6.400 points fitness landscape) I’ve got a 3.28x speedup. The tests were executed in a 2 cores pc (Intel Core 2 Duo).

11/10/2009 by Christian S. Perone

Genetic Algorithms, News, Science

Meanwhile, at the Hall of Justice!

UPDATE 05/10: there is an article in the Physorg too.

Sometimes we face new applications for EC, but for this I was not expecting, from Eurekalert:

WASHINGTON, Oct. 5 — Criminals are having a harder time hiding their faces, thanks to new software that helps witnesses recreate and recognize suspects using principles borrowed from the fields of optics and genetics.

(…)

His software generates its own faces that progressively evolve to match the witness’ memories. The witness starts with a general description such as “I remember a young white male with dark hair.” Nine different computer-generated faces that roughly fit the description are generated, and the witness identifies the best and worst matches. The software uses the best fit as a template to automatically generate nine new faces with slightly tweaked features, based on what it learned from the rejected faces.

“Over a number of generations, the computer can learn what face you’re looking for,” says Solomon.

Read the full article here.

05/10/2009 by Christian S. Perone

Python, Time Waste

Beautiful Django

The ugly web is over; the trick is to add a Django middleware to process every HttpResponse (with content-type text/html) of Django using BeautifulSoup. The source-code of the middleware is simple:

from BeautifulSoup import BeautifulSoup

class BeautifulMiddleware(object):
    def process_response(self, request, response):
        if response.status_code == 200:
            if response["content-type"].startswith("text/html"):
                beauty = BeautifulSoup(response.content)
                response.content = beauty.prettify()
        return response

We simple check for HTTP response code 200 and then check for a “text/html” content and use BeautifulSoup to process the response. See an example of what it does:

1) I’d a html in my Django application, very ugly and with missing tags:

This HTML template will be rendered as showed above by Django without the BeautifulSoup middleware, but with the middleware pluged in the settings of your Django app, it will render that html source:

BeautifulSoup has figured out sensible places to put the closing tags of the HTML source and has created a pretty indented structure, automagically =)

It’s very easy and interesting create new django middlewares, examples can be JavaScript obfuscators, compressors, automatic performance analysis of html code to improve the render speed of browser and these sort of things.

01/10/2009 by Christian S. Perone

News, Science

On the irreversibility of evolution

Today I’ve read about an important work done by a team of evolutionary biologists of the University of Oregon, which reveals an important result about the evolutionary irreversibility. The concept of irreversibility states that the future results of evolution at any point in time must depend on the present state and by the past, showing the determinism of evolution; on the other hand, the evolution reversibility dictates that the natural selection can produce the same forms in any given environment, independent of history.

This question about the irreversibility of evolution has remained unsolved because of the fact that we rarely know what features the ancestors had and what the mechanisms was used to evolve into the actual organisms, but the team of Joe Thornton has solved those issues by studying the problem at the molecular level, resurrecting ancestral proteins (GR1) as they existed long ago and using manipulation to study evolutionary process in two directions: forward and reverse.

The results of the work done by the research team was:

Our observations suggest that history and contingency during glucocorticoid receptor evolution strongly limited the pathways that could be deterministically followed under selection.

(…)

Selection is an extraordinarily powerful evolutionary force; nevertheless, our observations suggest that, because of the complexity of glucocorticoid receptor architecture, low-probability permissive substitutions were required to open some mutational trajectories to exploration under selection, whereas restrictive substitutions closed other potential paths. Under selection, some kind of adaptation will always occur, but the specific adaptive forms that are realized depend on the historical trajectory that precedes them. The conditions that once facilitated evolution of the glucocorticoid receptor’s ancestors were destroyed during the realization of its present form. The past is difficult to recover because it was built on the foundation of its own history, one irrevocably different from that of the present and its many possible futures.

So my friend, that’s the way nature evolve, possible never looking back. But this is a great new step for future works and research on the irreversibility of evolution.

References

[1] http://www.nature.com/nature/journal/v461/n7263/abs/nature08249.html
[2] http://www.uoregon.edu/~joet/PDF/bridgham-thornton-nature2009.pdf
[3] http://sciencenow.sciencemag.org/cgi/content/full/2009/923/1

27/09/2009 by Christian S. Perone

Genetic Algorithms, genetic programming, News, Science

New SIGEvolution issue – Volume 3 Issue 4!

Great news, the new issue of SIGEvolution was released, click in the cover above to go to the SIGEvolution site; this issue has an interview is with Hans-Paul Schwefel.

22/09/2009 by Christian S. Perone

Genetic Algorithms, Pyevolve, Python

n-queens problem using Pyevolve

Last night I’ve read a post on Reddit written by Matthew Rollings showing a code in Python to solve Eight Queens puzzle using EA. So I decided to implement it in Python again but this time using Pyevolve, here is the code:

from pyevolve import *
from random import shuffle

BOARD_SIZE = 64

def queens_eval(genome):
   collisions = 0
   for i in xrange(0, BOARD_SIZE):
      if i not in genome: return 0
   for i in xrange(0, BOARD_SIZE):
      col = False
      for j in xrange(0, BOARD_SIZE):
         if (i != j) and (abs(i-j) == abs(genome[j]-genome[i])):
            col = True
      if col == True: collisions +=1
   return BOARD_SIZE-collisions

def queens_init(genome, **args):
   genome.genomeList = range(0, BOARD_SIZE)
   shuffle(genome.genomeList)

def run_main():
   genome = G1DList.G1DList(BOARD_SIZE)
   genome.setParams(bestrawscore=BOARD_SIZE, rounddecimal=2)
   genome.initializator.set(queens_init)
   genome.mutator.set(Mutators.G1DListMutatorSwap)
   genome.crossover.set(Crossovers.G1DListCrossoverCutCrossfill)
   genome.evaluator.set(queens_eval)

   ga = GSimpleGA.GSimpleGA(genome)
   ga.terminationCriteria.set(GSimpleGA.RawScoreCriteria)
   ga.setMinimax(Consts.minimaxType["maximize"])

   ga.setPopulationSize(100)
   ga.setGenerations(5000)
   ga.setMutationRate(0.02)
   ga.setCrossoverRate(1.0)

   # This DBAdapter is to create graphs later, it'll store statistics in
   # a SQLite db file
   sqlite_adapter = DBAdapters.DBSQLite(identify="queens")
   ga.setDBAdapter(sqlite_adapter)

   ga.evolve(freq_stats=10)

   best = ga.bestIndividual()
   print best
   print "\nBest individual score: %.2f\n" % (best.score,)

if __name__ == "__main__":
   run_main()

It tooks 49 generations to solve a 64×64 (4.096 chess squares) chessboard, here is the output:

Gen. 0 (0.00%): Max/Min/Avg Fitness(Raw) [20.83(27.00)/13.63(7.00)/17.36(17.36)]
Gen. 10 (0.20%): Max/Min/Avg Fitness(Raw) [55.10(50.00)/39.35(43.00)/45.92(45.92)]
Gen. 20 (0.40%): Max/Min/Avg Fitness(Raw) [52.51(55.00)/28.37(24.00)/43.76(43.76)]
Gen. 30 (0.60%): Max/Min/Avg Fitness(Raw) [67.45(62.00)/51.92(54.00)/56.21(56.21)]
Gen. 40 (0.80%): Max/Min/Avg Fitness(Raw) [65.50(62.00)/19.89(31.00)/54.58(54.58)]

        Evolution stopped by Termination Criteria function !

Gen. 49 (0.98%): Max/Min/Avg Fitness(Raw) [69.67(64.00)/54.03(56.00)/58.06(58.06)]
Total time elapsed: 39.141 seconds.

And here is the plots generated by the Graph Plot Tool of Pyevolve:

19/09/2009 by Christian S. Perone

Python, Time Waste

Word is smart, but OpenOffice is wise

UPDATE 19/09: it seems that some people had misunderstood the post title, so here is a clarification: I’m not comparing Word with OpenOffice or something like that, the title refers to the design choices of OpenOffice in using Python 2.6 as an option for scripting language, it’s a humorous title and should not be considered in literal sense.

This is a very simple, but powerful, Python script to call Google Sets API (in fact it’s not an API call – since Google doesn’t have an official API for Sets service – but an interesting and well done scraper using BeautifulSoup) inside the OpenOffice 3.1.1 Writer… anyway, you can check the video to understand what it really does:

And here is the very complex source-code:

from xgoogle.googlesets import GoogleSets

def growMyLines():
   """ Calls Google Set unofficial API (xgoogle) """
   doc = XSCRIPTCONTEXT.getDocument()
   controller = doc.getCurrentController()
   selection = controller.getSelection()
   count = selection.getCount();
   text_range = selection.getByIndex(0);
   lines_list = text_range.getString().split("\n");
   gset = GoogleSets(lines_list)
   gset_results = gset.get_results()
   results_concat = "\n".join(gset_results)
   text_range.setString(results_concat);

g_exportedScripts = growMyLines,

You need to put the “xgoogle” module inside the “OpenOffice.org 3\Basis\program\python-core-2.6.1\lib” path, and the above script inside “OpenOffice.org 3\Basis\share\Scripts\python”.

I hope you enjoyed =) with new Python 2.6 core in OpenOffice 3, they have increased the productivity potential at the limit.

18/09/2009 by Christian S. Perone

Python

Send Subversion commit messages to Twitter

Hello, this is a bridge between Subversion (svn) and Twitter, the intent of this tool is to update a Twitter account when new commit messages arrives in a Subversion repository. We almost never have access to svn repository to add a post-commit hook in a way to call our script and send updates to twitter, so this tool was done to overcome that situation. Using it, you can monitor for example a svn repository from Google Hosting, from Sourceforge.net, etc…

The process of the tool is simple: it will firstly check Twitter account for the last svn commit message (the messages always start with a “$” prefix, or with another user-defined prefix), and then it will check the svn repository server to verify if it has new commits compared to the last tweet revision, if it has, it will update twitter with newer commit messages.

Here is a simple example of a Twitter account updated with commit messages:

The tool is very simple to use in command-line:

Python twittsvn v.0.1
By Christian S. Perone
https://blog.christianperone.com

Usage: twittsvn.py [options]

Options:
  -h, --help            show this help message and exit

  Twitter Options:
    Twitter Accounting Options

    -u TWITTER_USERNAME, --username=TWITTER_USERNAME
                        Twitter username (required).
    -p TWITTER_PASSWORD, --password=TWITTER_PASSWORD
                        Twitter password (required).
    -r TWITTER_REPONAME, --reponame=TWITTER_REPONAME
                        Repository name (required).
    -c TWITTER_COUNT, --twittercount=TWITTER_COUNT
                        How many tweets to fetch from Twitter, default is
                        '30'.

  Subversion Options:
    Subversion Options

    -s SVN_PATH, --spath=SVN_PATH
                        Subversion path, default is '.'.
    -n SVN_NUMLOG, --numlog=SVN_NUMLOG
                        Number of SVN logs to get, default is '5'.

And here is a simple example:

# python twittsvn.py -u twitter_username -p twitter_password \
-r any_repository_name

You must execute it in a repository directory (not the server, your local files) or use the “-s” option to specify the path of your local repository. You should put this script to execute periodicaly using cron or something like that.

To use the tool, you must install pysvn and python-twitter. To install Python-twitter you can use “easy_install python-twitter”, but for pysvn you must check the download section at the project site.

Here is the source-code of the twittsvn.py:

# python-twittsvn - A SVN/Twitter bridge
# Copyright (C) 2009  Christian S. Perone
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see .

import sys
import re
import time
from optparse import OptionParser, OptionGroup

__version__ = "0.1"
__author__  = "Christian S. Perone "

TWITTER_PREFIX = "$"
TWITTER_MSG    = TWITTER_PREFIX + "[%s] Rev. %d by [%s] %s: %s"

try:
   import pysvn
except ImportError:
   raise ImportError, "pysvn not found, check http://pysvn.tigris.org/project_downloads.html"

try:
   import twitter
except ImportError:
   raise ImportError, "python-twitter not found, check http://code.google.com/p/python-twitter"

def ssl_server_trust_prompt(trust_dict):
    return True, 5, True

def run_main():
   parser = OptionParser()
   print "Python twittsvn v.%s\nBy %s" % (__version__, __author__)
   print "https://blog.christianperone.com\n"

   group_twitter = OptionGroup(parser, "Twitter Options",
                       "Twitter Accounting Options")
   group_twitter.add_option("-u", "--username", dest="twitter_username",
                            help="Twitter username (required).",
                            type="string")
   group_twitter.add_option("-p", "--password", dest="twitter_password",
                            help="Twitter password (required).",
                            type="string")
   group_twitter.add_option("-r", "--reponame", dest="twitter_reponame",
                            help="Repository name (required).",
                            type="string")
   group_twitter.add_option("-c", "--twittercount", dest="twitter_count",
                            help="How many tweets to fetch from Twitter, default is '30'.",
                            type="int", default=30)
   parser.add_option_group(group_twitter)

   group_svn = OptionGroup(parser, "Subversion Options",
                       "Subversion Options")
   group_svn.add_option("-s", "--spath", dest="svn_path",
                        help="Subversion path, default is '.'.",
                        default=".", type="string")
   group_svn.add_option("-n", "--numlog", dest="svn_numlog",
                        help="Number of SVN logs to get, default is '5'.",
                        default=5, type="int")
   parser.add_option_group(group_svn)

   (options, args) = parser.parse_args()   

   if options.twitter_username is None:
      parser.print_help()
      print "\nError: you must specify a Twitter username !"
      return

   if options.twitter_password is None:
      parser.print_help()
      print "\nError: you must specify a Twitter password !"
      return

   if options.twitter_reponame is None:
      parser.print_help()
      print "\nError: you must specify any repository name !"
      return

   twitter_api = twitter.Api(username=options.twitter_username,
                             password=options.twitter_password)
   twitter_api.SetCache(None) # Dammit cache !
   svn_api     = pysvn.Client()

   svn_api.callback_ssl_server_trust_prompt = ssl_server_trust_prompt

   print "Checking Twitter synchronization..."
   status_list = twitter_api.GetUserTimeline(options.twitter_username,
                                             count=options.twitter_count)
   print "Got %d statuses to check..." % len(status_list)

   last_twitter_commit = None
   for status in status_list:
      if status.text.startswith(TWITTER_PREFIX):
         print "SVN Commit messages found !"
         last_twitter_commit = status
         break     

   print "Checking SVN logs for ['%s']..." % options.svn_path
   log_list = svn_api.log(options.svn_path, limit=options.svn_numlog)

   if last_twitter_commit is None:
      print "No twitter SVN commit messages found, posting last %d svn commit messages..." % options.svn_numlog
      log_list.reverse()

      for log in log_list:
         message = log["message"].strip()
         date    = time.ctime(log["date"])
         if len(message) <= 0: message = "(no message)"
         twitter_api.PostUpdate(TWITTER_MSG % (options.twitter_reponame,
                                               log["revision"].number,
                                               log["author"], date, message))
      print "Posted %d svn commit messages to twitter !" % len(log_list)
   else:
      print "SVN commit messages found in twitter, checking last revision message...."
      msg_regex        = re.compile(r'Rev\. (\d+) by')
      last_rev_twitter = int(msg_regex.findall(last_twitter_commit.text)[0])

      print "Last revision detected in twitter is #%d, checking for new svn commit messages..." % last_rev_twitter
      rev_num = pysvn.Revision(pysvn.opt_revision_kind.number, last_rev_twitter+1)

      try:
         log_list = svn_api.log(options.svn_path, revision_end=rev_num,
                                limit=options.svn_numlog)
      except pysvn.ClientError:
         print "No more revisions found !"
         log_list = []

      if len(log_list) <= 0:
         print "No new SVN commit messages found !"
         print "Updated !"
         return

      log_list.reverse()

      print "Posting new messages to twitter..."
      posted_new = 0
      for log in log_list:
         message = log["message"].strip()
         date    = time.ctime(log["date"])
         if len(message) <= 0:             
            message = "(no message)"
            if log["revision"].number > last_rev_twitter:
            twitter_api.PostUpdate(TWITTER_MSG % (options.twitter_reponame,
                                                  log["revision"].number,
                                                  log["author"], date, message ))
            posted_new+=1
      print "Posted new %d messages to twitter !" % posted_new
      print "Updated!"

if __name__ == "__main__":
   run_main()

12/08/2009 by Christian S. Perone