Just want to understand the thinking here and arrive at a correct and accepted approach to this issue. For context this is in a web environment and we are talking about escaping on input to the database.
I understand many of the reasons behind not escaping on input when taking user input and storing it into a database. You might want to use that input in a variety of different ways (as JSON, as SMS etc) and you also might want to show that input to the user in its original form.
Before putting anything into the database we make sure there is no SQL injection attacks to protect the database.
However following the principals set out here and here, they suggest the approach of saving user input as is. This user input might not be an SQL injection attack, but it could be other malicious code. In these cases is it OK to store Javascript based XSS attacks into the database?
I just want to know if my assumptions here are correct, are we all fine with storing malicious code in the database so long as that malicious code doesn't directly affect the database? Is it a case of it not being the database's problem, it can hold this malicious code and its up to the output device to avoid the pitfalls of the malicious code?
Or should we be doing more escaping on input than suggested by these principals - does the security concerns come before the idea of escaping on output? Should we take the approach that no malicious code enters the database? Why would we want to store malicious code anyway?
What is the correct approach for saving malicious code into a database in the context of a web client/server environment?
[For the purposes of this I am ignoring any sites that specifically allow code to be shared on them, I am thinking of "normal" inputs such as Name, Comment and Description fields.]
1 Answers
Answers 1
Definition: I use the term "sanitize" instead of filter or escape, because there's a third option: rejecting invalid input. For example, returning an error to the user saying "character ‽ may not be used in a title" prevents ever having to store it at all.
saving user input as is
The security principle of "defense in depth" suggests that you should sanitize any potential malicious input as early and often as possible. Whitelist only the values and strings useful to your application. But even if you do, you'll have to encode/escape these values too.
Why would we want to store malicious code anyway?
There are times where accuracy is more important than paranoia. For example: user feedback may need to include potentially disruptive code. I could imagine writing user feedback that says, "Every time I use type %00
as part of a wiki title the application crashes." Even if wiki titles don't need the %00
characters, the comment should still transmit them accurately. Failing to allow this in comments prevents operators from learning about a serious issue. See: Null Byte Injection
up to the output device to avoid the pitfalls of the malicious code
If you need to store arbitrary data, the correct approach is to escape as you switch to any other encoding type. Note that you must decode (unescape) and then encode (escape); there is no such thing as non-encoded data - even binary is at least Big-Endian or Small-Endian. Most folks use the language's built in strings as the 'most decoded' format, but even that can get wonky when considering Unicode vs ASCII. User input in web applications will be URLEncoded, HTTP Encoded, or encoded according to the "Content-Type" header. See: http://www.ietf.org/rfc/rfc2616.txt
Most systems now do this for you as part of templating or parameterized queries. For example, a parameterized query function like Query("INSERT INTO table VALUES (?)", name)
would prevent the need to escape single quotes or anything else in the name. If you don't have a convenience like this, it helps to create objects that track data per encoding type, such as HTMLString
with a constructor like NewHTMLString(string)
and Decode()
function.
Should we take the approach that no malicious code enters the database?
Because the database cannot determine all future possible encodings, it is impossible to sanitize against all potential injections. For example, SQL and HTML may not care about backticks, but JavaScript and bash do.
0 comments:
Post a Comment