Site Security Sanity

I’ve been reading through IncrediBILL’s Random Rants blog, and I just have to love his stance on bad bots, both of the spammy variety and otherwise.  It’s a constant battle, one that I deal with on a daily basis for this site, and a few others.  His vitriol echoes my own sentiments, although I am generally not heard to utter most of it in public.

Notice that I said bad bots, not bots in general.  There are some genuinely good bots out there, like GoogleBot and Yahoo Slurp.  Those actually serve a purpose, don’t destroy your site, and give you something in return (traffic.)  These are ones I leave alone.

With about 5 sites right now under my watch, I’ve had to rely on writing some of my own tools to collect and analyze logfile data to determine who really is good and who is bad.  I’m starting to think that I should write a “bad bot” service and link the sites together under its own centralized umbrella.  But, that will wait for another time.

In the meantime, there’s a pre-flight checklist of sorts that I go through before making a site live:

  1. Set up .htaccess with mod_rewrite. This by itself is worth its weight in gold.  If you can learn to master its intricacies, you will save yourself a mountain of headaches later.  I find myself constantly going back to it to update, but it’s worth it to add 1 or 2 lines and BAM, no more bad bot.  Thank you Apache gods for giving us such a wonderful tool in our arsenal!  It’s also quite handy to use for hotlinkers that steal your bandwidth.
  2. Write a clear, concise robots.txt file. This goes in the root of the site and lets all of the good bots know where to get their cheese.
  3. Enable a form mutator system, also known as a negative captcha. Prior to this, we were receiving tons of spambot-generated junk.  After installing and running it for over a year, it has proven 100% effective at stopping the spambots, while at the same time giving us 100% true positives.  In other words, if you really are a human filling out a form, you will pass.  There’s also a bit of honeypot code in there as well, which, if you’re a bot, will also catch you.  It’s a two-pronged attack that has proven extremely effective and also given me a lot of insight as to the nature of spambotting.
  4. Check all file and directory permissions down to the bottom. If you don’t think this is something you should be bothered to do, think again.  Directory security is one of the most basic things you can do to prevent unwanted intrusions.  Check your permissions for each and every file and directory and see if they jibe with what they should be, then check them again.
  5. Keep all password files outside of the web root. Just do it.  Even if it means typing in longer absolute paths to them in your .htaccess files, just do it.  If it’s in the web root or further down, it can potentially be exposed and downloaded.
  6. Scrub all incoming data. The customer may always be right, but not necessarily about what you want.  Removing SQL injection attacks, javascript redirects, encoded nasties like malware, and the like can save you and your company endless amounts of time later on.  Sure, it’s more processing time and more work up front, but you won’t have to spend ten times that much later on reconstructing a lost database and having to explain to everybody where it went.  And make sure you are doing all of your validation server-side, even if it means a longer round-trip time or more processing power.  In my experience, it’s not that much more.

Those are just the basics right there.  I do have a longer list, which I will post up here later.

%d bloggers like this: