How To Build Your Own Blog

orange cone The title is a bit of a fraud. This is not so much an article on how to build your own weblog as a short bit about what I think is important in building a weblogging system, what is not important and how to drive most directly toward a system that is simple, flexible and works. It also describes some experiments that I’ve been performing, and will soon deploy on my orangecone website.

My experiences with WordPress have confirmed one thing: that while both systems have their strong advantages, they really are more like weblog construction sets than actual applications. WordPress in particular seems to require the modification of PHP, HTML and CSS code to make any substantial changes to the look and or feel of the weblog. In addition, WordPress requires the installation of MySQL and the addition and configuration of database users and a database. WordPress also doesn’t have very clean modularity: if you make changes to index.php, then upgrading in the future becomes an error prone code merging problem.

Another slightly annoying problem is one of web standards. WordPress thoughtfully includes a link to an XHTML validator which will check your current page and make sure that it lives up to the appropriate XHTML web standard. But the users of the weblog can foil this quite simply by inserting or importing uncompliant XHTML. This might be understandable, given that in my old MovableType weblog the users often typed in HTML by hand. But in theory weblogs and Wiki software has been moving towards increased use of so-called structured text, which doesn’t have (or in some cases, allow) traditional markup. If one was sure that for any given structured text, you at least created well formed XHTML, then there would be no need to expose this validation to the user: it would just create valid XHTML.

Toward this end, I’ve been experimenting with Python’s docutils which include a parser and XML converter for structured text in a format called reStructructuredText. This format is a fairly obvious plaintext format which can be converted easily into pretty XML which captures structural information such as sections, lists, tables and markup to produce links. This XML is rather simple to transform into traditional xml. For instance, if your browser is fairly modern and speaks XML, you can look at some text, and the transformed text. The addition of an appropriate XSLT stylesheet will transform it into HTML, or some browsers can read XSLT style directly and produce the same page. You can even add a cascading style sheet and get a decorated page if you like.

So here is one basic idea: the user will enter his text as structured text, not HTML. These postings will be automatically processed into an XML form, which will then be translated into appropriate HTML.

This approach has a number of other advantages:

Outputs to other formats (HTML, XHTML, or even PDF) are possible.
The output can be rearranged easily using any XML processing tools.
There is a separation of concerns: the backend processes easily extensible entities which can be updated separately from the layout and display processes.

A weblog system which I do find incredibly interesting is blosxom. Blosxom is a very short Perl program written to maintain a blog which is organized as a bunch of plain text files organized into hierarchical directories. It is only a few pages long, and yet provides most of the major functionality needed by weblogs, and being written in Perl, is easy to extend using alternative libraries. I like many ideas in bloxsom, and will probably borrow many ideas. Personally, I find Python to be a better language, and since I’m already experienced with its structured text libraries, I see no reason to switch away from it.

There are many features of weblogs which I think are essential: you should produce nice RSS feeds for your sight so that others can easily keep track of updates. I also like weblogs to serve as RSS aggregators: I use my blog as my homepage, so I like to be just a click away from my regular news, and I don’t mind telling others which web resources I access frequently. If they read my site, they are probably interested in them as well.

When I get time, I am a bit of an amateur photographer. I like to have at least simple photoblog functionality integrated into the website. I suspect I know how I want to do that, I’ll have more to say about that later. To round out the list of things, I think perhaps a weather forecast for my area, a quote of the day from my expanding quotes file, and maybe a stock report or two.

There are lots of things I think that my system does not require as well. In my brief abortive attempt to allow comments on my weblog, they appear to mostly be spam for gay sex websites. I can easily postpone individual participation in the website for now. I don’t really need to worry much about XMLRPC interfaces, compatibility with other weblog tools, or forums. If you need a weblog like that, you can easily find one. I won’t be providing one.

Anyway, these are my thoughts for the moment. I should have a prototype of this system on my orange cone website shortly. Until then, consider the beauty of smaller, more modular and hence more understandable software. And write some.