CAPEC-80: Using UTF-8 Encoding to Bypass Validation Logic

ID CAPEC-80
Typical Severity High
Likelihood Of Attack High
Status Draft

This attack is a specific variation on leveraging alternate encodings to bypass validation logic. This attack leverages the possibility to encode potentially harmful input in UTF-8 and submit it to applications not expecting or effective at validating this encoding standard making input filtering difficult. UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. Legal UTF-8 characters are one to four bytes long. However, early version of the UTF-8 specification got some entries wrong (in some cases it permitted overlong characters). UTF-8 encoders are supposed to use the "shortest possible" encoding, but naive decoders may accept encodings that are longer than necessary. According to the RFC 3629, a particularly subtle form of this attack can be carried out against a parser which performs security-critical validity checks against the UTF-8 encoded form of its input, but interprets certain illegal octet sequences as characters.

https://capec.mitre.org/data/definitions/80.html

Weaknesses

# ID Name Type
CWE-20 Improper Input Validation weakness
CWE-73 External Control of File Name or Path weakness
CWE-74 Improper Neutralization of Special Elements in Output Used by a Downstream Component ('Injection') weakness
CWE-172 Encoding Error weakness
CWE-173 Improper Handling of Alternate Encoding weakness
CWE-180 Incorrect Behavior Order: Validate Before Canonicalize weakness
CWE-181 Incorrect Behavior Order: Validate Before Filter weakness
CWE-692 Incomplete Denylist to Cross-Site Scripting weakness
CWE-697 Incorrect Comparison weakness
Loading...