Beautiful Django
The ugly web is over; the trick is to add a Django middleware to process every HttpResponse (with content-type text/html) of Django using BeautifulSoup. The source-code of the middleware is simple:
from BeautifulSoup import BeautifulSoup
class BeautifulMiddleware(object):
def process_response(self, request, response):
if response.status_code == 200:
if response["content-type"].startswith("text/html"):
beauty = BeautifulSoup(response.content)
response.content = beauty.prettify()
return response
We simple check for HTTP response code 200 and then check for a “text/html” content and use BeautifulSoup to process the response. See an example of what it does:
1) I’d a html in my Django application, very ugly and with missing tags:
This HTML template will be rendered as showed above by Django without the BeautifulSoup middleware, but with the middleware pluged in the settings of your Django app, it will render that html source:
BeautifulSoup has figured out sensible places to put the closing tags of the HTML source and has created a pretty indented structure, automagically =)
It’s very easy and interesting create new django middlewares, examples can be JavaScript obfuscators, compressors, automatic performance analysis of html code to improve the render speed of browser and these sort of things.
I’d advise to cache the output before using this plugin on a production site.
Reformatting HTML on the fly, as well as obfuscating JavaScript, upon each requests will rise the server load dramatically and reduce the number of concurrent request it can handle.
I believe caching should be used regardless in any aspect of the application that can be cached. However for those that have not fully embraced things like memcached and have powerful enough server(s) the processing is minimal (number is still reduced regardless).
However with vanilla Django being the fastest web framework in existence (followed by vanilla CodeIgniter) it would be best to present benchmarks for the people that use its high performance as one of the main reason for using it.
1) I highly doubt that django is the fastest web framework in existence. Nice, but hardly the fastest. I might buy YAWS + erlyweb.
2) Regarding using Beautiful Soup as middleware: I think it is a very cute idea, but malformed html is inherently ambiguous. You should get rid of the malformation rather than postprocessing all of your html. The postprocessing will hide errors, make debugging more complicated, plus it looks like it will add whitespace weight to your pages.
I’ve never seen a comparative of web frameworks performance, but it’s clear impossible to do that, because of features that one has and another doesn’t, and which features can cause performance loss or not, etc… so it strongly depends on your needs.
Of course BeautifulSoup cannot fix all tags in the perfect way, simply because nobody can, but sometimes, fixing missing tags, can enhance browser rendering speed. BeautifulSoup should be used with prudence to fix subtle problems and not to “I’ll create this wrong markup and BeautifulSoup takes care for me”, it’s not the BeautifulSoup goal. Indeed, the main goal of this post is to show how people can use Django middlewares to process Django HttpResponse in an easy way, and I’m trying to design some like this to increase the rendering speed of the html in browser. Using BeautifulSoup as a middleware it’s just an initial idea that happened to me and I considered interesting and funny, and useful as a starting point.
How would someone go about “cache”ing the output? If I were using CloudFront, wouldn’t the formatted HTML be cached when first loaded?
That’s a great idea, and I’ve often thought of doing it myself as Django’s template language often leaves ugly spaces and incorrectly indented code…but the last time I used BeautifulSoup, it was REALLY SLOW. How is your performance with this middleware?
@H3, I’ve not done performance tests, but it should reduce the performance. But as Thalin said, it would be better if you use cache for this kind of middleware. Middlewares can be great friends or evil enemies, but we can write it in C too. This is just my first test using BeautifulSoup as middleware.
+1 for a C library version.
I have written some thing like this 2 years go but found that Beautiful Soup some times messes up content and produces correct HTML but not what was there originally. Maybe it was my HTML, maybe it was my version of Beautiful Soup but i opted out not to use it when firebug already formats everything nicely for me.
Also it makes your HTML grow because of all the tabs and newlines.
Sometimes it grows the html, but when is for fixing the tags, this should increase performance in rendering (in the browser).
Why not just write templates that generate valid HTML?
Did you had worked with big templates in projects with a significative complexity ? It’s hard to keep a nice and valid structure, and this is true when there are many developers/designers.
Btw, what is important here, is the example of automatic and simple processing of the Django output using middlewares, what I’m really trying to do now, is to write a middleware for processing the output in a way to speed up the render on the browser, but this is a more complex task.
It’s not hard to get correct HTML if you require the use my validator app ( http://lukeplant.me.uk/resources/djangovalidator/ ) while in development (or something similar). You’ll be immediately notified of any HTML validation errors, which are recorded. It makes it very easy to have perfect HTML. Obviously it adds a performance hit during to every page, but it’s not much, and it’s only for development, not production.
User generated content can be checked by other means (e.g. tidy, BeautifulSoup) when it is entered or rendered.
Rewriting content in a middleware will cause you lots of other grief. (see http://code.djangoproject.com/ticket/9163 for example). It will interact especially badly with per view caching (which happens before the content rewriting), and things like ETags.
I think your talking about a different idea of HTML beauty over Perone. He is simply suggesting its hard to keep the code looking ‘Beautiful’ which white space and correct indentation. Use any template engine to create even a semi complicated HTML page, and you will quickly see indentation and white space is nearly impossible to get right in the output source code.
You sound like your talking about valid HTML, which I don’t think Beautiful Soup will even fix for you. Not to mention, no one here is talking about user generated content. Come to think of it.. Did you even read the article and comments? ..or are you here to troll around? 🙂