Kingdom: Input Validation and Representation

Input validation and representation problems ares caused by metacharacters, alternate encodings and numeric representations. Security problems result from trusting input. The issues include: "Buffer Overflows," "Cross-Site Scripting" attacks, "SQL Injection," and many others.

Out-of-Bounds Read

The program reads data from outside the bounds of allocated memory.
Buffer overflow is probably the best known form of software security vulnerability. Most software developers know what a buffer overflow vulnerability is, but buffer overflow attacks against both legacy and newly-developed applications are still quite common. Part of the problem is due to the wide variety of ways buffer overflows can occur, and part is due to the error-prone techniques often used to prevent them.

In a classic buffer overflow exploit, the attacker sends data to a program, which it stores in an undersized stack buffer. The result is that information on the call stack is overwritten, including the function's return pointer. The data sets the value of the return pointer so that when the function returns, it transfers control to malicious code contained in the attacker's data.

Although this type of stack buffer overflow is still common on some platforms and in some development communities, there are a variety of other types of buffer overflow, including heap buffer overflows and off-by-one errors among others. There are a number of excellent books that provide detailed information on how buffer overflow attacks work, including Building Secure Software [1], Writing Secure Code [2], and The Shellcoder's Handbook [3].

At the code level, buffer overflow vulnerabilities usually involve the violation of a programmer's assumptions. Many memory manipulation functions in C and C++ do not perform bounds checking and can easily exceed the allocated bounds of the buffers they operate upon. Even bounded functions, such as strncpy(), can cause vulnerabilities when used incorrectly. The combination of memory manipulation and mistaken assumptions about the size or makeup of a piece of data is the root cause of most buffer overflows.

In this case, the program reads from outside the bounds of allocated memory, which can allow access to sensitive information, introduce incorrect behavior, or cause the program to crash.

Example 1: In the following code, the call to memcpy() reads memory from outside the allocated bounds of cArray, which contains MAX elements of type char, while iArray contains MAX elements of type int.

void MemFuncs() {
char array1[MAX];
int array2[MAX];
memcpy(array2, array1, sizeof(array2));
Example 2: The following short program uses an untrusted command line argument as the search buffer in a call to memchr() with a constant number of bytes to be analyzed.

int main(int argc, char** argv) {
char* ret = memchr(argv[0], 'x', MAX_PATH);
printf("%s\n", ret);

The program is meant to print a substring of argv[0], by searching the argv[0] data up to a constant number of bytes. However, as the (constant) number of bytes might be larger than the data allocated for argv[0], the search might go on beyond the data allocated for argv[0]. This will be the case when x is not found in argv[0].
[1] J. Viega, G. McGraw Building Secure Software Addison-Wesley
[2] M. Howard, D. LeBlanc Writing Secure Code, Second Edition Microsoft Press
[3] J. Koziol et al. The Shellcoder's Handbook: Discovering and Exploiting Security Holes John Wiley & Sons