Tuesday, February 16, 2010

Bandwidth problem? investigate your log files

On some of my APEX sites I seem to have a bandwidth problem. Not that the sites become slow, they just eat so much bandwidth without reason.

We set a limit on the bandwidth a site can take a month, just so we are in control about the hosting/bandwidth fees we need to pay. If for some reason we see the site is eating a lot of bandwidth (which we didn't expect) we enable full logging for that site.

Next we investigate the logs. You can read the raw data, but you can compare that with reading a tracefile. You can understand things if you look closely at it, but it's not very user friendly, so for tracefiles you use tkprof to get a more human readable output. Same counts for the Apache logs, there are programs that can format that in a better way. I used AWStats to convert my logs to a nice overview. The result looks very much like statcounter stats (without the chart).

Out of these logs you can get very valuable information about your customers, not only do you see how many people you have per month/day/hour or which IP is accessing your site, you also get statistics about the platform they use.

For me it was interesting to see that for that site Windows and IE are less used than what I expected.

Further on in the log files you see how people reached your site; which keywords they used in a search engine or from which other site they came and how long they stayed on your site.

For me most important was to get an insight where my bandwidth went to. It didn't take me long to see that Robots/Spiders where the problem! It's unbelievable that Yahoo's bot is eating so much bandwidth!

To solve this issue you need to create a robots.txt with some parameters like specified in this link.

Looking at your stats and investigating the full logs can be very useful from time to time...

Supported by hosting

1 comment:

Janick said...

Interesting concerning the yahoo bot.
But I think it got stuck with dynamic url's(if you look at the amount of hits) so it might be better to have a look at http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-03.html;_ylt=AkE7gZurcP4oHTgip8PLx5CygiN4
section "Dynamic URL Rewrite"

If you exclude yahoo from crawling you will loose your ranking after a time.