March 27, 2006

Downtime

The Penny-Arcade Forums are down. Now what am I suppose to do at work?

Obviously the addicted hordes must be hounding the sysadmin for updates. So he's turned the placeholder page into a Downtime FAQ.

Okay, update on progress. I uncovered what (I believe) was causing all of our problems. An overzealous RAID controller is trying a bit too hard to ensure data integrity. On periods of high load (ie, when you guys would all hit the forums) the combined load between the disk checks and regular server load was too much for the power supply. The result was the machine writing bad bits into RAM causing a whole host of problems. I need to tune the settings to reduce the priority of those checks, and then I can pick back up with the work I was planning on. The work will end up doing some load distribution that was sorely needed, but I didn't have enough motivation to do before the data started corrupting. After all, nobody likes downtime.

So apprently the thought of a Test Server never occured to the guy. The forum's been falling down regularly over the last few months and he can't be bothered to actually do any work about it until the machine started randomly spewing bits and the database blows up. How about some preventative maintenance, eh?

Now, some serious data was lost, and unfortunately MySQL does nothing gracefully, so I am uncertain if I will be able to recover some of those lost posts. Of note the loss of the ban thread saddens me. We maintain an internal list of bans and the reasons behind them for the staff to see, but I liked keeping everyone in the know. There is also a list of users who got deleted accidentally, for my own sake I'll list it here (BucketMan, CovertOperative, SenorAmor, Bogart, & TheSentinel). MySQL told me that we lost at-least 50 users in one of its "table repairs,"

What the fuck is the point of having a database if it can't gracefully recover from failure? The whole theory of relational databases is based upon the idea of data integrity. And it's not really much of a "repair" if it's eating data in the meantime, is it?

Q: If MySQL sucks so much, why not just move to PostgreSQL now and not worry about building a new forum software?

A: A tempting idea. phpBB does have builtin support for PostgreSQL, however, after many years of hacking this code, a lot of the changes we have in place are not built to be database independent. A good example is the jail code Ramius wrote before he left. Ramius wrote some very slick MySQL code which is optimized for MySQL 4.0+, the problem is, the code won't run on a PostgreSQL server. By the time I found all those little broken things and fixed them, I would be close to rewriting everything. The bigger issue is that we would just be delaying the inevitable. phpBB has stalled somewhat from a development standpoint. Despite their recent attempts to resume doing bugfixes and other fairly simple tasks, we need to jump ship soon.

"Real slick code" that's married to a data-eating system and not portable to other SQL databases? I'd pull that kind of crap if I were writing a one-off program for my own use, but I'd never release that kind of shit into the light of day. And so much for documentation and maintenance, once the last guy leaves nobody else knows what to do with his fancy hackz.

What it comes down to is this; if I wanted a "traditional" forum software I would just patch phpBB to use postgresql and be done with it. phpBB powers the largest forum on the internet, with some hacks it would serve us too. I want something more though. The current theory in large webforums is to tune your web and database server software and throw more hardware at your problems. The better sites will even introduce things like memcache to scale better. I think that we need to rethink the way forum software is written. We have learned a lot in the past 6 years, and I think it is time we use some of that knowledge. I have nothing against PHP, it is a good language, but Ruby on Rails provides a better framework for what I have in mind.
If you like the idea of helping out with a project like this, shoot me an email. Due to the nature of the project, I am looking for fairly experienced programmers...

Maybe they should do some basic tuning before trying to reinvent the wheel? Even if you have a fancy new code you still need a scalable front-end if you want to grow. There's nothing wrong with a good caching server and a good caching webserver system can be put in front of whatever forum backend they end up with. Heck, it's not as if Ruby is even all that speedy. One suspects zealotry whenever someone mentions Ruby on Rails as the solution, especially when the guy would rather rewrite entire systems rather than fix the problem at hand. To add to the zealot-quotient, make it Open Source, but we only want experienced programmers. Because they have nothing better to do.

To-do List

...Rebuild FreeBSD 6.1 Beta - Done!

Because running a beta operating system really helps your stability and uptime.

I'm sure all his friends and cow-orkers worship him as the righteous Geek God, but crap like this makes me wish more work would get outsourced to India so wannabe hackz like him can starve to death on the street or get a real job. Not that our admins are that much better, as our database server blew up last weekend and we couldn't get it fixed for two days because our three-letter-initialed service contractor only works M-F. Oy vey.

Posted by mikewang on 03:33 PM