comments (not for humans)
Many web sites have SQL-injection and XSS (Cross Site Scripting) vulnerabilities, and security articles often mention lack of input validation as the reason for these problems. This isn't necessarily correct.

The metacharacter problem
Both SQL-injection and XSS are metacharacter problems. A metacharacter is a control character used in a part of the system to control the display or flow of data. These problems occur every time a system communicates with a system of a different flavour, be it a browser, a database or a legacy system.

Why input validation can fail to solve these problems
Consider a blogging system allowing users to post comments to the entries. While this is a simple system, it contains enough functionality for me to explain. The blogging system has a comment form containing the fields: name, e-mail, headline and comment body.

Let's start out with the name field. To avoid getting XSS or SQL injection, input validation needs to block out any letters which are not used in a name. Using best practice, we create a white list of allowed values. While creating this white list would be easy if all names had the same format ("Joe Smith"), a problem arises when Conan O'Brian comes along. The ' in his name is a SQL control character and can also be used for strings in javascript or HTML tags. So how do we handle this using input validation. We have to allow this character. We also have to allow foreign names which do not necessarily follow standard formats.

Next, consider the comment field. If a user wants to post some code for showing how a certain javascript is written, maybe the blog should allow that. This means input validation will fail to remove XSS, or SQL injection for that matter.

So what do people do? During input validation, many people escape the HTML in the data, or escape the quote tags (', "). But is this really the best solution? Why should a comment exist in escaped format in the application. Java or C# or whatever language you are using, does not require the data to be escaped while contained in a string. You can quickly run into trouble when you start displaying data from multiple sources. What data is escaped, and what data isn't? Is all data escaped? What is it escaped for? HTML? SQL? Both? Some weird legacy system?

The solution
Don't get me wrong. I still think input validation should be present in every application. But to avoid metacharacter problems, data needs to be escaped when it leaves the system, not when it enters it. This means that the web application needs to escape data just before sending it to the database (preferably by using prepared statements) or a legacy system. Data presented on an HTML page needs to be escape when it's written to the HTML page. And best practice for escaping should be "Escaping by default", which means you need a reason if you are printing unescaped data.

In the figure on the right, I have marked where input validation (blue) or output escaping (red) should be performed in a web application.

For more information on the metacharacter problem, please check out "Innocent Code" by Sverre Huseby.


VidarK

Good post

Very useful clarification. :-)
Jack

Must read for everyone

It's amazing how such simple concept escapes most people. In a company audit, I had to painstakingly explain to the auditor why we do not escape user inputs. It's amazing that most security websites still teach you to cripple your product by taking out "dangerous" punctuations.
Dan Bergh Johnsson
I am going to present at the OWASP Europe 2010 on this and how it can be addressed using a domain driven design approach. I would like to acknowledge your post with a citation to "preceding wok", and will do so unless you think it unwanted for any reason.

Some of my thoughts on the issue:
http://dearjunior.blogspot.com/2010/06/problem-with-xss.html
http://dearjunior.blogspot.com/search/label/domain%20driven%20security

Erlend
@Dan Bergh Johnsson: Please feel free to do that. I'm honoured :-) There's a good chance I'll attend your talk.
mehdy
oh
RahulDash
nice article
Comments closed for this post