News: Algorithm searches for models that best explain experimental data

From the original news article from PhysOrg:

An evolutionary computation approach developed by Franklin University’s Esmail Bonakdarian, Ph.D., was used to analyze data from two classical economics experiments. As can be seen in this figure, optimization of the search over subsets of the maximum model proceeds initially at a quick rate and then slowly continues to improve over time until it converges. The top curve (red) shows the optimum value found so far, while the lower, jagged line (green) shows the current average fitness value for the population in each generation.


Regression analysis has been the traditional tool for finding and establishing statistically significant relationships in research projects, such as for the economics examples Bonakdarian chose. As long as the number of independent variables is relatively small, or the experimenter has a fairly clear idea of the possible underlying relationship, it is feasible to derive the best model using standard software packages and methodologies.

However, Bonakdarian cautioned that if the number of independent variables is large, and there is no intuitive sense about the possible relationship between these variables and the dependent variable, “the experimenter may have to go on an automated ‘fishing expedition’ to discover the important and relevant independent variables.”

You can see the original research paper here.

The bad good news for optimization

A new published paper called “NP-hardness of Deciding Convexity of Quartic Polynomials and Related Problems” brings light (or darkness) into a 20 years-old problem related to the whether or not does exist a polynomial time algorithm that can decide if a multivariate polynomial is globally convex. The answer is: NO.

Convex vs Non Convex Functions
Convex (top) vs Non Convex (bottom) Functions (Source: MIT News)

From the news article:

For complex functions, finding global minima can be very hard. But it’s a lot easier if you know in advance that the function is convex, meaning that the graph of the function slopes everywhere toward the minimum. Convexity is such a useful property that, in 1992, when a major conference on optimization selected the seven most important outstanding problems in the field, one of them was whether the convexity of an arbitrary polynomial function could be efficiently determined.

Almost 20 years later, researchers in MIT’s Laboratory for Information and Decision Systems have finally answered that question. Unfortunately, the answer, which they reported in May with one paper at the Society for Industrial and Applied Mathematics (SIAM) Conference on Optimization, is no. For an arbitrary polynomial function — that is, a function in which variables are raised to integral exponents, such as 13x4 + 7xy2 + yz — determining whether it’s convex is what’s called NP-hard. That means that the most powerful computers in the world couldn’t provide an answer in a reasonable amount of time.

At the same conference, however, MIT Professor of Electrical Engineering and Computer Science Pablo Parrilo and his graduate student Amir Ali Ahmadi, two of the first paper’s four authors, showed that in many cases, a property that can be determined efficiently, known as sum-of-squares convexity, is a viable substitute for convexity. Moreover, they provide an algorithm for determining whether an arbitrary function has that property.

Read the full news article here, or the paper here.

Using pyearthquake to plot Japan USGS earthquake data into the near real-time MODIS satellite imagery

The aim of this post is to show to the reader how to plot the recent Japan earthquake data from the USGS using the pyearthquake module. If you want to know more information about the pyearthquake module, take a look in this post where I previously used it. pyearthquake is a pure-python module which exposes an API to retrieve data from the USGS site as well from the MODIS Rapid Response System (recently, they created a separate project to handle Japan daily images).

Installing pyearthquake

The first thing we need to do is to install the Python pyearthquake module, you can do it using easy_install from setuptools:

easy_install pyearthquake

The easy_install should automatically handle the required modules, but if you face problems (specially in Ubuntu with basemap, here is the version requirements: matplotlib >= 0.99.0, numpy >= 1.3.0, PIL >= 1.1.6 and basemap >= 0.99.4.

Retrieving USGS catalogs

USGS provides the follow catalog datasets:

M1+ Earthquakes (past hour) – M1+PAST_HOUR

This is the worldwide catalog with earthquake data from the past hour;

M1+ Earthquakes (past day) – M1+PAST_DAY

This is the worldwide catalog with earthquake data from the past day;

M1+ Earthquakes (past 7 days) – M1+PAST_7DAY

This is the worldwide catalog with earthquake data from the past 7 days;

This is how you retrieve any of these catalogs using pyearthquake:

>>> from pyearthquake import *
>>> catalog = usgs.retrieve_catalog("M1+PAST_7DAY")
>>> len(catalog)

In that context, we have 1179 incidents with magnitude 1+ from the past 7 days.

Lets filter now only events with magnitude 6+, which represents the recent significant earthquakes:

>>> mag6_list = [event for event in catalog if float(event["Magnitude"]) >= 6.0]
>>> len(mag6_list)

We have now 30 events with magnitude 6+ from the past 7 days, let’s print it:

>>> for row in mag6_list:
...    print row["Eqid"], row["Magnitude"], row["Depth"],
...          row["Datetime"], row["Depth"], row["Region"]
c0001z5z 6.3 8.70 Friday, March 11, 2011 20:11:22 UTC 8.70 near the east coast of Honshu, Japan
c0001z4n 6.6 10.00 Friday, March 11, 2011 19:46:49 UTC 10.00 near the west coast of Honshu, Japan
c0001z2t 6.1 24.80 Friday, March 11, 2011 19:02:58 UTC 24.80 near the east coast of Honshu, Japan
c0001z2a 6.2 10.00 Friday, March 11, 2011 18:59:15 UTC 10.00 near the west coast of Honshu, Japan
c0001yib 6.2 18.90 Friday, March 11, 2011 15:13:14 UTC 18.90 near the east coast of Honshu, Japan
c0001y4u 6.5 11.60 Friday, March 11, 2011 11:36:39 UTC 11.60 near the east coast of Honshu, Japan

As you can see, almost all last events are earthquakes from the Japan coast. Here, you can extract any individual event and retrieve USGS products like ShakeMaps, etc (see more information in this post on how to fetch and show those USGS products).

Plotting events into the map

Now we will plot those events into the map (pyearthquake uses the matplotlib toolkit called basemap to plot events into the map):

>>> usgs.plot_events(catalog)

The “catalog” variable is the same we retrieved before using the USGS M1+PAST_7DAY catalog.

Here is what what this statement will show:

Now we can zoom into Japan using the button , and this is the result:

The colored dots are the events from the retrieved catalog, the more strong the color the more strong was the earthquake; see the dark red color near the cost, that event was the unfortunate and catastrophic 8.9 magnitude earthquake which devastated Japan yesterday.

Retrieving near real-time MODIS Rapid Response images

MODIS has processed image subsets of Aqua and Terra satellites. One of these subsets is called “japan” and it has the entire country coverage. Let’s retrieve this MODIS subset and then plot our events in this same map:

>>> import datetime
>>> now =
>>> bmap = modis.get_modis_subset(now,
>>> usgs.plot_events(catalog, bmap)

What pyearthquake is going to do here is to download (this may take some time) the entire subset for the resolution of 250m (the best available), parse the subset metadata, align image into the map between the lat/lng bounds and then plot the events over this satellite image, so we’ll have the last high resolution satellite images from the Japan together with the earthquake events plot, and here is the result:

And here is the zoom near the coast:

matplotlib >= 0.99.0, numpy >= 1.3.0, PIL >= 1.1.6 and basemap >= 0.99.4

Health and Genetic Algorithms

From R&D Mag – Developing a potential life-saving mathematical tool -:

Math and medicine are coming together to help people who have suffered an abdominal aortic aneurysm, which with 15,000 is the 13th-leading cause of death in the United States.

At the heart of the effort are genetic algorithms written by Oak Ridge National Laboratory researchers that allow physicians to more efficiently assess and organize the often vast amounts of information contained in patient reports. Ultimately, with this tool—a sophisticated way to quickly extract key phrases—doctors will be able to characterize features and findings in reports and provide better patient care.


This work builds on previous studies involving genetic algorithms developed for mammography. That system allows doctors to quickly identify trends specific to an individual patient and match images and text to a database of known cancerous and pre-cancerous conditions.

Read full article here.

toggle Sign up now! View the newsletter sample.

Developing a potential life-saving mathematical tool

Darwin on the track

From The Economist article:

WHILE watching the finale of the Formula One grand-prix season on television last weekend, your correspondent could not help thinking how Darwinian motor racing has become. Each year, the FIA, the international motor sport’s governing body, sets new design rules in a bid to slow the cars down, so as to increase the amount of overtaking during a race—and thereby make the event more interesting to spectators and television viewers alike. The aim, of course, is to keep the admission and television fees rolling in. Over the course of a season, Formula One racing attracts a bigger audience around the world than any other sport.

Read the full article here.

PyOhio 2010: Genetic Programming in Python

Eric Floehr (from Intellovations)  kindly sent me the presentation he presented at PyOhio 2010. I think Eric has captured some nice features of Pyevolve which few people use, like DB Adapters, dot plotting, Interactive Mode, Real Time statistics, etc. He also presents an interesting use case where he uses Genetic Programming in order to forecast weather based on some historical data:

Thank you Eric !