Well, my experiments with podcasting are revealing a couple of problems with the current podcasting solutions, and I thought I’d write a couple of them down so that others can comment and/or integrate them into their future developments.
Please, please, please, set the user agent fields in your ipodder requests…
Programmers out there, please set the user agent field in your http request, including a version number and a web address where we can learn more about your client. I’ve had a number of clients who seem to access my podcast directory in a particularly unfriendly way, often repeatedly downloading the same file up to a dozen times. I would like to be able to determine if I am doing something wrong or potentially report a bug to your client, but I can’t if I cannot identify which client you were using.
Be a polite client
Please, use ETag and Last-Modified tags. Try to do something sensible with all status codes that are returned from the HTTP server. In particular, make sure that you act appropriately with redirects, accept status codes like 503 (resource unavailable) and read the suggested Retry-After response code, and handle ranges and partial downloads. Take some time to really make the HTTP transactions processing in your client bulletproof.
Scheduling ipodder updates…
Most ipodder clients now have way of scheduling the times when an ipodder polls and downloads new content. I’m going to put forth the bold suggestion that most of these are actually bad: the reason being that certain intervals are too short to be friendly to the remote site’s bandwidth, and they tend to cluster around the top of the hour, and cluster around certain common hours. I’m beginning to see this fairly clearly in my access logs. What this means is that occasionally my http server is completely idle, and other times people are contending for my bandwidth in trying to download my latest/greatest podcast.
A much more polite system would be for the downloading script itself to pick random times to update, and at infrequent intervals. This would help even out the access patterns at the server, and would make for better download speeds at the client side. If your client properly handles the status code 503, you can even have your client back off and retry (perhaps with exponentially decaying frequency ala Ethernet).
Addendum: It is probably counterproductive to do this with BitTorrent. In that world, you actually want lots of people to collide at the same time, so that you can take maximum advantage of the parallelism available. Perhaps your webserver should deny requests for BitTorrent feeds until it reaches some threshold, and schedule them (perhaps using RetryAfter) to all come back at some later time to create a more efficient Torrent Network.
Integrating functionality to the http server
Some of the above requires cooperation from the http server, so why not integrate the functionality directly into an http server. If you pick a high performance but simple server like thttpd or boa, you could (at least in theory) directly implement throttling, access control and perhaps even P2P functionality like BitTorrent into a single server which would be simpler to setup and to manage.
Just some ideas and hints. Feel free to comment.