The ugly web is over; the trick is to add a Django middleware to process every HttpResponse (with content-type text/html) of Django using BeautifulSoup. The source-code of the middleware is simple:
from BeautifulSoup import BeautifulSoup
class BeautifulMiddleware(object):
def process_response(self, request, response):
if response.status_code == 200:
if response["content-type"].startswith("text/html"):
beauty = BeautifulSoup(response.content)
response.content = beauty.prettify()
return response
We simple check for HTTP response code 200 and then check for a “text/html” content and use BeautifulSoup to process the response. See an example of what it does:
1) I’d a html in my Django application, very ugly and with missing tags:
This HTML template will be rendered as showed above by Django without the BeautifulSoup middleware, but with the middleware pluged in the settings of your Django app, it will render that html source:
BeautifulSoup has figured out sensible places to put the closing tags of the HTML source and has created a pretty indented structure, automagically =)
It’s very easy and interesting create new django middlewares, examples can be JavaScript obfuscators, compressors, automatic performance analysis of html code to improve the render speed of browser and these sort of things.
Today I’ve read about an important work done by a team of evolutionary biologists of the University of Oregon, which reveals an important result about the evolutionary irreversibility. The concept of irreversibility states that the future results of evolution at any point in time must depend on the present state and by the past, showing the determinism of evolution; on the other hand, the evolution reversibility dictates that the natural selection can produce the same forms in any given environment, independent of history.
This question about the irreversibility of evolution has remained unsolved because of the fact that we rarely know what features the ancestors had and what the mechanisms was used to evolve into the actual organisms, but the team of Joe Thornton has solved those issues by studying the problem at the molecular level, resurrecting ancestral proteins (GR1) as they existed long ago and using manipulation to study evolutionary process in two directions: forward and reverse.
The results of the work done by the research team was:
Our observations suggest that history and contingency during glucocorticoid receptor evolution strongly limited the pathways that could be deterministically followed under selection.
(…)
Selection is an extraordinarily powerful evolutionary force; nevertheless, our observations suggest that, because of the complexity of glucocorticoid receptor architecture, low-probability permissive substitutions were required to open some mutational trajectories to exploration under selection, whereas restrictive substitutions closed other potential paths. Under selection, some kind of adaptation will always occur, but the specific adaptive forms that are realized depend on the historical trajectory that precedes them. The conditions that once facilitated evolution of the glucocorticoid receptor’s ancestors were destroyed during the realization of its present form. The past is difficult to recover because it was built on the foundation of its own history, one irrevocably different from that of the present and its many possible futures.
So my friend, that’s the way nature evolve, possible never looking back. But this is a great new step for future works and research on the irreversibility of evolution.
Great news, the new issue of SIGEvolution was released, click in the cover above to go to the SIGEvolution site; this issue has an interview is with Hans-Paul Schwefel.
UPDATE 19/09: it seems that some people had misunderstood the post title, so here is a clarification: I’m not comparing Word with OpenOffice or something like that, the title refers to the design choices of OpenOffice in using Python 2.6 as an option for scripting language, it’s a humorous title and should not be considered in literal sense.
This is a very simple, but powerful, Python script to call Google Sets API (in fact it’s not an API call – since Google doesn’t have an official API for Sets service – but an interesting and well done scraper using BeautifulSoup) inside the OpenOffice 3.1.1 Writer… anyway, you can check the video to understand what it really does:
You need to put the “xgoogle” module inside the “OpenOffice.org 3\Basis\program\python-core-2.6.1\lib” path, and the above script inside “OpenOffice.org 3\Basis\share\Scripts\python”.
I hope you enjoyed =) with new Python 2.6 core in OpenOffice 3, they have increased the productivity potential at the limit.
Benford’s law is one of those very weird things that we can’t explain, and when we discover more and more phenomena that obey the law, we became astonished. Two people (Simon Newcomb – 1881 and Frank Benford – 1938) noted the law in the same way, while flipping pages of a logarithmic table book; they noticed that the pages at the beginning of the book were dirtier than the pages at the end.
Currently, there are no a priori criteria that say to us when a dataset will or will not obey the Benford’s Law. And it is because of this, that I’ve done an analysis on the Twitter Public Timeline.
The Twitter API to get Public Timeline is simply useless for this analysis because in the API Docs, they say that the Public Timeline is cached for 60 seconds ! 60 seconds is an eternity, and there is a request rating of 150 request/hour. So, it doesn’t help, buuuuuut, there is an alpha testing API with pretty and very useful streams of data from the Public Timeline; there are many methods in the Twitter Streaming API, and the most interesting one is the “Firehose”, which returns ALL the public statuses, but this method is only available for intere$ting people, and I’m not one of them. Buuuut, we have “Spritzer”, which returns a portion of all public statuses, since it’s only what we have available in the moment, it MUST be useful, and it’s a pretty stream of data =)
So, I’ve got the Spritzer real-time stream of data and processed each new status which arrived with a regex to find all the numbers in the status; if the status was “I have 5 dogs and 3 cats”, the numbers collected should be “[5, 3]”. All those accumulated numbers were then checked against the Benford’s Law. I’ve used Matplotlib to plot the two curves (the Benford’s Law and the Twitter statuses digits distribution) to empirically observe the correlation between them. You can note in the upper right corner of the video, the Pearson’s correlation between the two distributions too.
Here is the video (I’ve seen only after creating of the video, but the color of the curves are the inverse as seen in the legend):
The video represents the 15 minutes (3.160 captured statuses) of the Twitter Public Timeline. At the end of the video, you can see a Pearson’s correlation of 0.95. It seems that we have found another Benford’s son =)
The little tool to handle the Twitter Spritzer stream of data and plot the correlation graph in real-time was entirely written in Python, I’ll do a clean-up and post it here soon I got time. The tool has generated 1823 png images that were merged using ffmpeg.
As many (or very few in the real life haha) people know, today is the Pi Approximation Day ! So it’s time to make a contribution to celebrate this funny day =)
My contribution is to use Python and Pyevolve to approximate Pi number using Genetic Programming approach. I’ve created the functions gp_add(+), gp_sub(-), gp_div(/), gp_mul(*) and gp_sqrt (square root) to use as non-terminals of the GP. The fitness function is very simple too, it simple returns the absolute difference between the Python math.pi and the evaluated individual. I’ve used also a population size of 1k individuals with max tree depth of 8 and the random ephemeral constants as random integers. The best approximation I’ve got while running the GP for about 8 minutes (40 generations) was 3.1416185511, best for 3 digits, you can improve it and let it run for more time to get better approximations.
Here is the formulae I’ve got with the GP (click to enlarge):
I’m very proud of my city today, because it is here (in Porto Alegre, RS, Brazil) that the FISL 2009 (10th) is happening ! The FISL is a conference which is attracting more than 7.000 attendees in the same time I’m writing this post. All these people (developers, companies, etc..) are here to talk and focus on “free software”. The conference is being presented with the presence of distinguished speakers like Richard Stallman (FSF), Peter Sunde (from Pirate Bay), Jon “Maddog” Hall (founder of Open Source Internation) and many other great speakers and Python people too =) even the president of Brazil will be present tomorrow (friday) !!!
Unfortunately, almost the all sessions are in Portuguese, but this conference is a “must-go” for every developer ! See here some photos of the event.
This post is just to point some interesting analysis using Benford’s Law to check anomalies in the Iran’s election, the first is from Walter Melbane, an expert in electoral fraud, the article “Note on the presidential election in Iran, June 2009” is available here. The second paper is from Boudewijn F. Roukema from Torun Centre for Astronomy, the title is “Benford’s Law anomalies in the 2009 Iranian presidential election” and it’s available here.
I’ve done some other posts about the Benford’s Law too, like here, here and here.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.