This is something that's been bugging me for a long time now. Over the years, I've come to realize that programming time is 10% about writing the code to do the work, 70% about figuring out where failures might occur and dealing with them, 10% about documentation, and 10% about documentation. (That last 10% may be substituted with Desktop Tower Defense or something equally time wasting.)
Or something like that. The point is that writing the code to do what I want isn't hard. It's dealing with all the other things that do--especially error conditions. There are so many weird corner cases to consider. And when you're working on code for a high volume web site that has its servers under load 24 hours a day, it doesn't take long to encounter those odd situations.
Murphy is always watching.
Years ago, after battling similar problems at Yahoo, I began to develop certain ideas about how errors should be detected, handled, and reported. An important idea here is that the developer should always be in control of when the script/program/process dies. Aside from something truly fatal (like a segfault) library routines should detect errors and report them back to their caller in the form of a known-to-be-bad return value.
The problem is that I keep running into code I want to use that breaks that rule in multiple places. In Perl terms, that means that I'll be happily testing my code and suddenly something goes wrong and my script dies in a place I didn't expect. Upon digging into it, I find that the CPAN library I'm using has something like this lurking in it:
if (not $good) {
Carp::croak("bad stuff happened!");
}
Or...
if (not $good) {
die "badness here!";
}
Sigh!
This means I have to read the code a bit more and see if I can discern why the developer wants my script to die in some cases, but in others he's content to just do this:
if (not $good) {
$@ = "bad things happened";
return undef;
}
What is it about some errors that makes them fatal while others aren't so bad that I'm deemed able to deal with them? Why has this developer taken that decision away from me? It makes no sense at all.
What this means is that I then need to litter my code with ugly crap like this:
eval {
$object->methodThatMayDie;
};
if ($@) {
# handle error here
}
The problem with that, aside from the fact that I'm dealing with another developer's inconsistent coding, is that it pollutes my code and forces me to make yet another frustrating decision.
Do I use a small number of big eval blocks and give up knowing exactly where the code died? Or do I pollute my code with a larger number of smaller eval blocks so that I can react to specific problems with a more specific solution? That means the module developer would have had to document which methods or functions may die on me. Otherwise I have to go trudging through their code and waste my time figuring that out. Guess which is more frequent.
Or do I override the module's use of die or Carp or whatever. I can do it, but that has other side effects I probably don't want to deal with either.
Why do I even need to deal with this in the first place? Can't people provide consistent interfaces? Is there something so bad about returning an error code and leaving it up to the user of your code to decide how to handle error conditions?
Maybe they do want to exit() or die(). Maybe they want to retry the logic after waiting a bit. Maybe they want to page someone and log the failure. Maybe...
You get the idea.
This whole concept of "fatal" exceptions seems wrong to me. Unless things are so bad that the kernel is going to kill my process, I should be the one in charge of deciding when my code will blow up. And I shouldn't have to do extra work to asset that authority. Should I?
I know that in the Java world, it's common to do a bunch of stuff in a big try block and then try to figure out what, if anything, blew up later. But I'm a firm believer in dealing with specific problems at the exact place they occur.
I really wish more people thought that way. It'd make my life easier.
(comments)