Wednesday, July 25, 2007

(Some) Data belongs in revision control

I would like to address some of the issues raised in the recent debate over whether data belongs in tables or not. I think Honroy hit the nail on the head when he wrote, "It's important to consider the likelihood of change as well." But the issues are subtle and complex, which is the reason it has taken me a full day to get around to writing this response.

So let me begin with one of Jeff's responses, since he's the one I'm mainly picking on here :-)

there are no problems. That was an example. If you need historical reporting or tracking, you simply model it that way. As the article clearly says.

Ah, but the devil is in the details. How exactly are you going to model it? If you actually sit down to work it out you will find that the complications mount very quickly. For example, one way to model historical changes is to add effective-start-date and effective-end-date columns. But now you have to deal with the possibility that the time periods could overlap, or that there could be gaps between the periods. (Actually, you already have a similar problem even in the original simple example because there is nothing to prevent shifts from overlapping or having gaps, but that's another can of worms.) If you have an effective-end-date, how do you model the currently effective shift schedule? Do the current shifts have NULL end dates, or end dates far into the future (creating a Y2K-like problem)? What do you do if, due to an operational error, some incorrect data finds its way into the shifts table? What if that incorrect data was used in subsequent dependent calculations? Do you log all your database updates so that you can tell which calculations used the wrong data? Or do you have to go back and recompute everything from scratch?

I'm not saying that these problems can't be solved. Obviously they can. But the structure of the shifts table, and in particular the way in which the contents of the shifts table evolves over time, is such it is not at all clear that storing it in a database is really the best way to do it. The shifts table evolves over time in a very particular way. First, changes are rare. Second, retroactive changes are even rarer, and are generally an indication that some kind of mistake has been made and needs to be dealt with. And third, it is important to know not just what the shift schedule was on date X, but also what the system on date Y thought the shifts shedule was on date X. And it is not at all clear that that information might not be better stored in a config file under revision control.

The devil is always in the details, and it gets worse. What if your data volume is huge and performance is an issue? In cases like that it is not at all clear that it would not be best to hard-code certain rarely-changing parameters directly into the code instead of sticking them in a table and hoping that your database optimizes the join properly. What if you're writing an embedded system and the data is parameters that, if they are wrong, can make something blow up?

There are certainly cases where it is appropriate to put data in tables, and some of Jeffrey's examples (though not all of them) are good examples of such cases. But to extrapolate from there to claim validity for a general rule that "data belongs in your tables, not in your code" is a serious mistake.

26 comments:

Jeff said...

Ok, now i think you've gone off the deep end. Since you are so intent in proving me wrong, let me ask you this one favor. I think you owe me this much:

Please summarize for me, in a sentence or two, what you think my position is on the topic of storing data in tables versus storing data in code.

Thanks in advance.

As for your comments, they aren't even worth commenting on ... they have absolutely no point. I just don't see anything but rambling. One more favor, if you could: in a sentence or two, state your position. I have no idea what your point is; I really don't. I am not trying to be a jerk or a troll, but I jsut simply have no idea what you are disagreeing with me about or what your point is. Of *course* you don't store *every* piece of information that your apps will *ever* use in database!!! who ever said that?

Ron said...

Please summarize for me, in a sentence or two, what you think my position is on the topic of storing data in tables versus storing data in code.

I think your position is that "Data belongs in your tables -- not in your code." That is the title of your original post after all.

One more favor, if you could: in a sentence or two, state your position

My position is that the decision of where to store data is complex, and depends on many factors, not just the structure of the data, but also how it evolves over time, efficiency concerns, security concerns, etc. and that any unqualified rule of the form "Data belongs in X, not in Y" is at best vacuous and at worst actively harmful.

Does that help?

Jeff said...

unfortunately, no. Sounds like you are arguing a title and not any of the words I actually wrote ... I was hoping for maybe an intellgent debate and to learn something. oh well. thanks, anyway.

Ron said...

you are arguing a title and not any of the words I actually wrote

You don't write your own titles? OK, how about your first sentence:

Try your best to store all of the data relating to your database in the tables.

Did you not write those words either?

If you had written something like "Sometimes people put data in code that ought to be in tables instead" we would not be having this discussion. But that isn't what you said. You made an unqualified recommendation to put data in tables irrespective of any other considerations, and that is bad advice. (I see in the comments on your blog that you are trying to backpedal a bit, but that doesn't change the fact that your original post is an unqualified endorsement of putting data in tables, no ifs ands or buts.)

Jeff said...

One last thing: re-reading your post for the 3rd time to find a point, I think I may have found it.

Here's the gist of our discussion:

I say: We are working with a database, we have data in our tables already, shifts are data, so let's model them as a shifts table instead of writing IF-THEn and CASSE statements throughout our code. now we can join to that table, track changes, descriptions, etc all in one place in our data. It is faster, shorter, ultimately much simpler.

You say: no, because that will require a lot of work. I might model it wrong. I am not sure how to handle effective dates or history. The constraints sound confusing. the SQL might get confusing or long or inefficient, the joins might not be easy. The data might get edited or corrupt or changed.

I say: if that stuff is difficult for you, then you shouldn't be putting ANY data in a database! Why are you writing a data-driven application in the first place? If you cannot handle modeling shifts, how are you modeling your employees and rates and customers and everything else? Learn database modeling. Lean how to create constraints. Learn permissions. Learn how to write efficient joins. In short, learn SQL and how relational databases work!

* * * * *

I can't argue your point if it is that storing any data in a table is not easy and straightforward, and that is requires planning and knowledge of relational databases. Of course it does! I thought it was implied that we understood how to use databases, since my entire article focused on T-SQL code and data-driven applications!! If *you* don't know SQL, or *you* are not comfortable modeling data in a database, then *of course* don't stuff things in random SQL tables in your applications! Who ever suggested that ????

It's funny .. you are presenting my point of view as if I am saying this:

"You are writing an application without a database? what?? use a database! I see data there! get it out of your code! All data must go in a database! I don't care if you know SQL or not, use a database!! Even a poorly modeled database is better than no database at all! All data always belongs in tables and nowhere else ever!"

Ok, that's enough.

Jeff said...

yes, that is correct. The first line states:

Try your best to store all of the *data* relating to your *database* in the tables.

Not application. DATABASE!!

Ron said...

Not application. DATABASE!!

Yes, I see that. It doesn't alter my position in any way. In fact, it just makes your advice even more vacuous because trying to figure out whether something is "related to your database" is even harder than trying to figure out whethr it's code or data. Everything is related to everything else in one way or another. In effect, your first sentence could be restated as "Put things in the database that ought to be there." I can't really dispute that, but it's not very helpful either.

Ron said...

You say: no, because that will require a lot of work. I might model it wrong.

Not quite. I say: not necessarily because of all the reasons you cite, and others.

I can't argue your point if it is that storing any data in a table is not easy and straightforward, and that is requires planning and knowledge of relational databases.

No, that is not my point. My point is that deciding where to put data is a complex decision that has many factors that ought to be considered, and any simple rule like the one you propose is as likely to make things worse as it is to make them better.

Go back and re-read the parable of the razor.

Jeff said...

Ah, I get it. You have no interest in my point of view, my experience, my opinion or thoughts or what I was trying to write; no interest in determining the truth of the matter or coming up with a solid understanding of the matters at hand, or listening to my subsequent comments or clarifications. Your only goal is to simply attack the wording of a sentence or two in a random blog post. Well, you won. I didn't word that sentence perfectly. There was room for interpretation. You have shamed and embarrassed me. I admit defeat. My writing skills are not infallible.

I'll be sure to fire my editor over this one. thanks for wasting my time.

Ron said...
This comment has been removed by the author.
Ron said...

[Reposting to fix a typo]

You have no interest in my point of view

It seems to me that train runs both ways.

listening to my subsequent comments or clarifications

This is actually a perfect example of what I'm talking about. We're conducting a debate in two different blogs using a combination of postings and comments (and there's a branch over in Reddit too, and for all I know one in Digg as well). The result is a great deal of confusion and heat, much of which is due to (I would claim) the fact that the structure of blogs is a poor match for the structure of the temporal evolution of our particular debate (since people successfully conduct debates in blogs all the time). My only claim is that getting a good impedance match between data storage mechanisms and the demands of a particular problem domain is a subtle art. And the ironic beauty is that the more this argument spins wildly out of control the more it proves my point.

Jeff said...

Sorry, just saw this:

In fact, it just makes your advice even more vacuous because trying to figure out whether something is "related to your database" is even harder than trying to figure out whethr it's code or data. Everything is related to everything else in one way or another. In effect, your first sentence could be restated as "Put things in the database that ought to be there." I can't really dispute that, but it's not very helpful either.


Holy. crap. so in other words, you *did* understand what I wrote. because how you suggest I can reword that sentence because it has the same meaning is the EXACT meaning I wished to imply! It was just a throwaway intro line to a blog post! "Put stuff in your database that ought to be there." And then i gave some examples for ideas! that was the ENTIRE point of my ENTIRE post!

So, if I had written a post that started like this:

"Take the time to make the most of your day!"

And then listed some "stop and smell the roses" examples, I could expect someone like you to reply:

"That is the worst advice I ever read. How much time should be taken? An hour? 13 hours? the entire day? And which days of the week? Just weekends? Maybe all of them? if so, when do you have time to work! you will lose your job! And smelling roses -- what if you have an allergy? What if you get stung by a bee? Sure, you *could* say take time from a busy day to enjoy yourself, but who defines enjoy? What if you enjoy murdering people? Should you do that? This blog post is harmful and if you blindfully follow this author's irresponsible advice, you will end up not only will no job, no family, no money, but also in prison with lots of bee stings. "

The more I think about this, the funnier it gets .... so, in retrospect, I will admit that while you wasted my time from an intelligent database/application design perspective, at the very least you did provide me with a good laugh! That's about all this debate was good for, unfortunately, but better than nothing.

The internet sure is lucky to have semantic police like yourself out there, keeping us all honest!

Jeff said...

>>You have no interest in my point of view

It seems to me that train runs both ways.


I have to respond to that. I've asked over and over and over for clarifications on your part .. questions that you did not answer ... tried hard to find ways to reword questions and statements to get you to make a valid, concrete point. I am (was) VERY interested in your opinion and your thoughts, it intrigued me greatly how anyone take your point of view on a simple back-and-white Shifts table example and claim that it should not be stored in a database. You, however, just keep fixating on one or two sentences ....

OK -- I promise -- I really am gone this time. I swear!! :)

Ron said...

you *did* understand what I wrote

Glad you noticed :-)

So, if I had written a post that started like this...I could expect someone like you to reply...

Nope. If your post had started like that I would not have bothered to respond because it would have been self-evident that it was not particularly useful advice.

The internet sure is lucky

I don't think it's "going off the deep end" as you put it to imagine that some impressionable young hacker might read your blog and suppose that you actually meant what you actually wrote, and then look at your qualifications and suppose that it might actually be a good idea to follow what you actually said (instead of what you now say you meant to say). Notwithstanding your sarcasm, I will consider challenging you on this to have been time well spent, especially if it makes even one person think twice about blindly following a simple and intuitively "obvious" coding rule.

Ron said...

I've asked over and over and over for clarifications on your part .. questions that you did not answer

Sorry, I really have done my best to answer all your questions, but there's a lot of comments flying back and forth here and I might easily have overlooked one. I have restated my position (several times) and told you what I think your position is, mostly by quoting you directly. If I have overlooked a question I apologize. If you would tell me which question you think I haven't answered I'll take another shot at it.

Jeff said...

...suppose that it might actually be a good idea to follow what you actually said (instead of what you now say you meant to say).
Are you for real?

*You* just TOLD me what I said: "Put things in the database that ought to be there." And *you* even admitted that it's not bad advice!! That's ALL I said!! Arggghh!! You acknowledged that's how my exact words translate yourself! And then the post followed with some simple "food for thought" examples ... the entire post was to get you stop and think objectively about your code and decide if things can be done differently, that's it! Why do you claim I am stating a rigid manifesto when you even SAID that what I wrote directly translates to just a simple, general introduction!!

I am more concerned about the impressionable young hacker who has learned this from you: "sure, we already have a solid database with hours and employees and full of data, but don't put Shifts in there! Code that concept in your application instead -- shifts with titles and assignments and history do not belong in your database."

What about the impressionable young hacker who reads THAT? Any concern for him? Why should he store employees in his database? Or hours? Or managers? Or stores or buildings or anything else? All of those things might change, right? He needs to track history on all of those, but that is tough to do in a database, right? The only thing anyone has learned from you, the only concrete ideas and thoughts other than general criticism of my wording, is that storing stuff in a database is difficult and maybe be easier to stuff things in your code. That is the VERY DEFINITION of someone who doesn't understand data modeling.

Ron said...

*You* just TOLD me what I said: "Put things in the database that ought to be there."

No, I said that is how it *could* be restated. I was trying to do you the courtesy of interpreting it in a way that didn't make it vacuous.

That's ALL I said!!

OK, as y ou wish.

And *you* even admitted that it's not bad advice!!

No, I didn't say it wasn't bad advice, I just said I couldn't dispute it. The reason I can't dispute it is that it's tautological, so it's not wrong, it's just vacuous. I'll leave it up to the audience to decide whether or not it's "bad."

Jeff said...

>>No, I said that is how it *could* be restated. I was trying to do you the courtesy of interpreting it in a way that didn't make it vacuous.

Yes. And as I said over and over and over, and as the entire rest of my article clearly indicated, that is HOW IT WAS MEANT. It was a generic, boring, plain, INTRO sentence to a "food for thought" article that encourage people to examine their code to see if there were alternate ways to do things! How can I get you to understand this??

It's funny. As you know doubt realize, I am not an English expert, don't claim to be, and I don't fixate and obsess over minor wording of phrases. (unless it involves code, of course!) Lots of times I write things in which something is not worded clearly, or in which a situation I did not think of might alter the best way to do something that I discuss. And you know what most people do? They add a comment: "Hey, Jeff -- that first part wasn't clear. Do you mean ALWAYS do that, or just sometimes? Because what about this case..."

Happens all the time. that's what most normal, polite people do who are interested in learning and discussing and not picking apart grammar and language without attempting to understand what I am saying. And we usually have a great discussion, I (try my best to) address or clarify in the feedback, sometimes update or amend my original post, and we all learned something.

Sometimes, if the issue is more complicated or subtle, someone might reference my blog post in their blog and point out some alternate scenarios or situations that I did not account for. Those are GREAT posts as well, and I enjoy them, and again, we all benefit. I link to them, they link to me, together we work together to not obsess over wording or which title is more logically correct, but we discuss actual SQL or coding or situations and everyone benefits. Even if they strongly disagree with my entire post, but they back it up with facts and good arguments and it makes me feel silly, it all works and in the end I have learned something new.

That's the idea, right? not to pick fights, not to draw conclusions from a single vague title or sentence but to take the time to understand what the author is saying, and to be part of a community that wants to learn SQL and make themselves better programmers. Right?

Unfortunately, what did you decide to do?

* You call me out by name and use a clumsy analogy of a bad product design to compare my thoughts and ideas
* You misrepresent my post and my viewpoint, with NO attempt ahead of time to learn what I am *actually* saying -- by asking me directly or simply carefully reading my article
* You state that it is BAD advice that should not be followed
* You call me irresponsible
* You imply that I am stating hard and fast rules that apply to all situations and all applications with rigid authority -- yet my post NEVER even comes CLOSE to saying any of that, regardless of how YOU chose to personally interpret a sentence
* you say I did not back up my statements with anything than examples, and yet YOU don't back up any of your statements by doing anything OTHER than referencing my example! And you did such a bad job, and picked such a bad example, and ignored not only what I wrote but also basic data modeling fundamentals, which ironically proved you 100% wrong!
* you gave the following reason to not always store data in a database : it might be difficult. that was it. it might be hard and it will take some amount of planing to do it right. (unlike all those other tables in the database, right??)
* you suggest that following my advice means that we should store SQL statements and lines of code and everything else we can possibly find n our database. (EVERYTHING is data, right? That's what I wrote apparently.. hmmm ... I can't find that sentence)
* you made sure to warn again of my dangerous advice towards the end
* finally, you conclude: "just because something looks like data, doesn't mean it belongs in a table." yet not one single real example was given why that took actual intelligent data modeling and the fact that we are writing database driven code into account. Not one. That assertion by YOU, the strongest one made in either post, was completely lacking in any facts or details or examples to back it up that took into account actual data modeling for database driven applications, which was the environment that my original article took place it. (all my examples where in T-SQL! On the SQLTeam website! Is that not a clue that perhaps it implies a database is already present and we have some knowledge of basic data modeling??)
* you ignore key points in my post and basic database fundamentals in your argument against me, and then brush them off when brought to your attention as if they were not relevant.

I was trying to think about why I was getting so worked up by you, and this is why. I love feedback. I love criticism and advice on ways to do things better. But what you did is the very definition of the word trolling.

And, worse -- EVERY TIME I asked you a real question that had ANYTHING to do with databases in any of the follow ups, you either a) ignored it or b) answered in a manner that shows absolutely no skill or basic knowledge whatsoever in database modeling. None. I hate to offend you, but that's a fact until you prove otherwise. And your concerns about not modeling things in a database because of the inherent difficultly, or Y2K-Like problems (!) only backs that up.

If you want to stop obsessing over how *you* interpreted something and keep arguing over it, and if you want to stop clinging to a sentence or two and ignoring every other word that I've written every where else to "prove me wrong", then you can keep continuing what you are doing and have at it. Sounds like you enjoy it.

If you ever want to intelligently discuss database design modeling, with actual examples, let me know. The one caveat: we will not disregard proper modeling of data in our database for reasons such as "it might be difficult". As I said over and over and over (which you ignore over and over), if you can't model an entity such as "Shifts" in your database in an acceptable matter then you have no business storing ANYTHING -- customers, offices, hours, managers, etc -- in a database, either. You absolutely should *not* be storing *any* data in your database if you have no idea how to do it.

. . .

(prediction: every sentence in this post relating to databases and data modeling will be ignored.)

Jeff said...

As you know doubt realize, I am not an English expert, don't claim to be, and I don't fixate and obsess over minor wording of phrases.

I wish I could say I wrote it that way on purpose to illustrate my point, but no such luck!

please ignore ... meant "no" as I'm sure you realize.

Ron said...

Dude, you are taking this much too seriously. I'm not trying to "prove" anything, I'm just stating my opinion. It's just a blog.

I'm really sorry if this has caused hard feelings, that was not my intent.

As you no doubt realize, I am not an English expert

Actually, no, I didn't realize that. I think your English is just fine. (Is English not your native language? If not, what is, if you don't mind my asking?)

every sentence in this post relating to databases and data modeling will be ignored

It's hard to tease out the substance of what you say from all the histrionics. The fact of the matter is that most of the charges you levy are false (i.e. that I didn't give examples, that I didn't answer your questions, that I didn't read your post), but I don't have time to refute you point-by-point. I'll just say that if your message really was intended to be, "Put into the database what is appropriate to put in to the database" then I think it is not inappropriate to respond by pointing out that figuring out what is appropriate to put into the database can be really really hard, and hence that advice is not as useful as it might appear to be at first glance (and moving data from code to DB can in some cases cause more problems than it solves -- c.f. the Slim Twin tab).

Once again, I apologize if the style of my response caused hard feelings. But I stand by its substance.

Jeff said...

I win! You didn't address a single database modeling question or point! :)

Anyway,

I'll just say that if your message really was intended to be, "Put into the database what is appropriate to put in to the database" ...

As stated for the 10th time or so now, that WAS my point, and you even admitted that my very words can be translated to that EXACT thought.


.. then I think it is not inappropriate to respond by pointing out that figuring out what is appropriate to put into the database can be really really hard, and hence that advice is not as useful as it might appear to be at first glance ...


which is a fine, but that is not how you responded. You did not state it as an opinion at all; you first misrepresented my post, and then stated over and over how WRONG and misleading my post was. And then you did not give one single good example of WHY or HOW it is wrong OTHER than to use a lack of basic database knowledge (data modeling, security, planning, etc) as a reason. And your chosen example was horrendous -- to imply that we can and should model all that other data in our database, and write SQL against that data, but not store shifts in a table along with everything else because "modeling it can be tricky" ... unlike when we modeled the data in all of those other tables, right?


(and moving data from code to DB can in some cases cause more problems than it solves -- c.f. the Slim Twin tab).


which again you did not back up with one, single fact that had ANY basis in data modeling. Not one. if you gave ONE solid scenario where storing appropriate data in a properly modeled database causes "problems", and where instead embedding it into your code solves those problems, then you'd be somewhere. But not even close. Your only example backfired horribly.

Once again, I apologize if the style of my response caused hard feelings. But I stand by its substance.


I still haven't decided which was worse -- your style, or the substance. Both were equally shallow and equally wrong on so many different levels.

Do you know how frustrating it is to read your post, and for you to tease your readers with the "big reason why Jeff Smith is wrong ... but keep reading, I won't spoil it yet!" and then when your big reveal occurs: "jeff is wrong because ... shifts might change!" Especially when it is pure nonsense because you didn't bother to closely read what I wrote in the *very* *next* *paragraph* that addresses that!? Is that good writing? Is that a good response to my article? Was that responsible blogging (if there is such a thing!) ? Are you proud of that?

Can you see no reason why i would be annoyed or take things personally?

Ron said...

You did not state it as an opinion at all

Everything in my blog is my opinion.

I win!

Indeed. Congratulations.

Jeff said...
This comment has been removed by the author.
Jeff said...

>I win!

>>Indeed. Congratulations.

hmmm ... That kind of sums the whole thing up, doesn't? I write several paragraphs in an attempt to convey a somewhat complex idea or concept, yet you completely ignore all of it except for a tiny segment that you (purposefully?) interpret literally and out of context.

That kind of neatly recaps this entire episode, doesn't it?

niv said...

Chill out, people.
a. jeffrey, you didn't say "everything is data", ron did (if I read correctly).
b. the line between data and code is fine indeed.
c. should you store queries in tables, because code is data? maybe. There are times when storing queries in tables (kind of like a vector/list/collection of functions, something which I'm fond of) is a good idea.

A. Nonny Mouse said...

<snarkiness>I wonder if Jeffrey knew who he was talking to if he would be slightly more respectful of Ron's thoughts about software design...
</snarkiness>