March 2005 Archives

At not so regular intervals, I look at the web server access logs to see who's accessing the site and where the bandwidth is going. One of the things I look at is bandwidth used by IP address and/or User Agent.

One issue that always causes high bandwidth usage is someone using an RSS feed reader / aggregator that is mis-configured or badly programmed. After an RSS feed has been retrieved the first time, an RSS reader / aggregator is supposed to check with the server and see if the RSS feed has changed before downloading it again. RSS readers / aggregators that do not check first repeatedly download a feed, usually once an hour (but I've encountered ones downloading 3-4 times per hour), 24 hours a day, seven days a week. Just one person's RSS reader / aggregator misbehaving in this way will chew up a minimum of 250MB of my bandwidth in one month.

There's really not a good way to deal with this abuse, other than to block the IP address from retrieving the RSS feeds. (Apache's mod_rewrite is a very good tool for accomplishing this.) The IP address alone is usually not sufficient to identify the user and contact them about whatever RSS reader / aggregator they are using. Until now, I had been using mod_rewrite to just serve up a blank html page to banned IP addresses and hoped they would take a hint.

I wasn't satisfied with that solution, so I've created a special rss-abuse.xml feed that will be served to IP addresses that have been blocked for RSS feed abuse. A user from a blocked IP address trying to retrieve one of my RSS feeds will see this message in their RSS reader / aggregator:

IP address blocked for RSS feed abuse

Your IP address has been blocked from retrieving RSS feeds from this web site, because your RSS aggregator is not properly checking whether my RSS feeds have changed before downloading them again. The repeated downloading of my feeds every hour, 24 hours a day, 7 days a week when they have not changed, causes excessive bandwidth usage for this web site, constituting RSS feed / bandwidth abuse.

If you wish to be unblocked, you will need to correct the problem with your aggregator, or use a different aggregator, then contact me and let me know what you've done to address the problem.

Part of the reason I'm posting this is that in addition to the two new IP addresses I've blocked today, Bloglines is currently abusing my blogging category RSS feed (downloading it every hour even though it has not changed since January 24th). Bloglines users are currently subscribed to 4 of my RSS feeds, but only the blogging category one is being abused. I've sent a complaint to Bloglines through their contact form to address the issue.

So, if you see the message from the rss-abuse feed in your RSS reader / aggregator, hopefully this post will shed some more light on why you're seeing it and what you need to do to fix the problem.

Update 18-Mar-2005: One user who has been blocked since January saw the rss-abuse feed and contacted me. He was using the built-in RSS reader in Mozilla Thunderbird. Mozilla Thunderbird's RSS reader does not appear to properly check whether RSS feeds have changed before downloading them again, based on this currently open bug. As a result, I am now blocking Mozilla Thunderbird from retrieving any of my RSS feeds.