comments (not for humans)
"So let's go back to the input validation", Mr. X said. "How do you want to do the validation of the names now?"

"Well", I said. "I guess we should still not allow characters that are not commonly used in names, because that makes them invalid as names."
"Correct", Mr. X replied. "But how do you know which characters are not allowed?"
"Well, for once characters like slashes and plus signs shouldn't be allowed."
"Yes, but how do you know you've covered them all?"
"Ehm...", I thought about this for a while.
"Think about the item numbers", Mr. X said. "All characters other than numbers are not allowed, right?"
"Yes?"
"So instead of specifying the characters that are not allowed, you could...?"
"Specify the characters that are allowed!"
"Exactly. Specifying allowed input is called white listing. Specifying the input that is not allowed is called black listing. You should always prefer white listing over black listing if possible. Then you won't by accident allow malicious input you didn't think of. I think you are ready to work on the input validation code now."

Mr. X left my office, and I was eager to start implementing the fix. I started of by writing some unit tests for my valid and invalid names and order items, and implemented the input validation routines accordingly. I also added my quote escaping routine, and used it wherever I saw a SQL statement.

40 minutes or so later, Mr. X came by my office.
"How is it going?"
"Well, I'm done with a lot of the inputs, but I'm having problems with this one", I said and opened the order registration form. "At the bottom here there is a field where users can add comments to their orders, and I find it hard to figure out which characters I should allow..."
"Aha", said mr. X. He didn't sound supprised. "Comments are really hard to validate, because to validate it you would almost have to implement a natural language parser. Comments can contain things like mathematical expressions and the character set that can be used in a comment, is thus quite hard to determine. This is why it's necessary to use both input validation AND output escaping."

Continue to part 6...

Go back to: Part 1, Part 2, Part 3, Part 4
Comments closed for this post