I would like to address some of the issues raised in the recent debate over whether data belongs in tables or not. I think Honroy hit the nail on the head when he wrote, "It's important to consider the likelihood of change as well." But the issues are subtle and complex, which is the reason it has taken me a full day to get around to writing this response.
So let me begin with one of Jeff's responses, since he's the one I'm mainly picking on here :-)
there are no problems. That was an example. If you need historical reporting or tracking, you simply model it that way. As the article clearly says.
Ah, but the devil is in the details. How exactly are you going to model it? If you actually sit down to work it out you will find that the complications mount very quickly. For example, one way to model historical changes is to add effective-start-date and effective-end-date columns. But now you have to deal with the possibility that the time periods could overlap, or that there could be gaps between the periods. (Actually, you already have a similar problem even in the original simple example because there is nothing to prevent shifts from overlapping or having gaps, but that's another can of worms.) If you have an effective-end-date, how do you model the currently effective shift schedule? Do the current shifts have NULL end dates, or end dates far into the future (creating a Y2K-like problem)? What do you do if, due to an operational error, some incorrect data finds its way into the shifts table? What if that incorrect data was used in subsequent dependent calculations? Do you log all your database updates so that you can tell which calculations used the wrong data? Or do you have to go back and recompute everything from scratch?
I'm not saying that these problems can't be solved. Obviously they can. But the structure of the shifts table, and in particular the way in which the contents of the shifts table evolves over time, is such it is not at all clear that storing it in a database is really the best way to do it. The shifts table evolves over time in a very particular way. First, changes are rare. Second, retroactive changes are even rarer, and are generally an indication that some kind of mistake has been made and needs to be dealt with. And third, it is important to know not just what the shift schedule was on date X, but also what the system on date Y thought the shifts shedule was on date X. And it is not at all clear that that information might not be better stored in a config file under revision control.
The devil is always in the details, and it gets worse. What if your data volume is huge and performance is an issue? In cases like that it is not at all clear that it would not be best to hard-code certain rarely-changing parameters directly into the code instead of sticking them in a table and hoping that your database optimizes the join properly. What if you're writing an embedded system and the data is parameters that, if they are wrong, can make something blow up?
There are certainly cases where it is appropriate to put data in tables, and some of Jeffrey's examples (though not all of them) are good examples of such cases. But to extrapolate from there to claim validity for a general rule that "data belongs in your tables, not in your code" is a serious mistake.