Owasp Cross Site Scripting Prevention Cheat Sheet

Posted : admin On 1/2/2022

This article provides a simple positive model for preventing XSS using output escaping/encoding properly. While there are a huge number of XSS attack vectors, following a few simple rules can completely defend against this serious attack.

This article does not explore the technical or business impact of XSS. Suffice it to say that it can lead to an attacker gaining the ability to do anything a victim can do through their browser.

Both reflected and stored XSS can be addressed by performing the appropriate validation and escaping on the server-side. DOM Based XSS can be addressed with a special subset of rules described in the DOM based XSS Prevention Cheat Sheet.

For a cheatsheet on the attack vectors related to XSS, please refer to the XSS Filter Evasion Cheat Sheet. More background on browser security and the various browsers can be found in the Browser Security Handbook.

Before reading this cheatsheet, it is important to have a fundamental understanding of Injection Theory.

Cross-Site Scripting (XSS) 1. Cross-Site Scripting (XSS) Joni Hall and Daniel Tumser 2. Table of Contents Introduction Related Works Technical Aspects Types of XSS o Reflected XSS o Stored XSS o DOM-Based XSS o Prevention Careers and Jobs Social Impact Ethical Impact Future Expectations Conclusion References. So now we're going to talk about the cross-site scripting rule from OWASP's cross-site scripting rule cheat sheet. Rule zero, this is the most fundamental rule. Do not insert untrusted data except in the slots that we're going to talk about. And this is because we want to simplify being to able to prevent cross-site scripting. See full list on cheatsheetseries.owasp.org. The OWASP Cheat Sheet Series was created to provide a set of simple good practice guides for application developers and defenders to follow.

A Positive XSS Prevention Model

This article treats an HTML page like a template, with slots where a developer is allowed to put untrusted data. These slots cover the vast majority of the common places where a developer might want to put untrusted data. Putting untrusted data in other places in the HTML is not allowed. This is a 'whitelist' model, that denies everything that is not specifically allowed.

Given the way browsers parse HTML, each of the different types of slots has slightly different security rules. When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not break out of that slot into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept in specific places and is isolated from code contexts with escaping.

This document sets out the most common types of slots and the rules for putting untrusted data into them safely. Based on the various specifications, known XSS vectors, and a great deal of manual testing with all the popular browsers, we have determined that the rules proposed here are safe.

The slots are defined and a few examples of each are provided. Developers SHOULD NOT put data into any other slots without a very careful analysis to ensure that what they are doing is safe. Browser parsing is extremely tricky and many innocuous looking characters can be significant in the right context.

Why Can't I Just HTML Entity Encode Untrusted Data?

HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a <div> tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a <script>tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.

You Need a Security Encoding Library

Writing these encoders is not tremendously difficult, but there are quite a few hidden pitfalls. For example, you might be tempted to use some of the escaping shortcuts like ' in JavaScript. However, these values are dangerous and may be misinterpreted by the nested parsers in the browser. You might also forget to escape the escape character, which attackers can use to neutralize your attempts to be safe. OWASP recommends using a security-focused encoding library to make sure these rules are properly implemented.

Microsoft provides an encoding library named the Microsoft Anti-Cross Site Scripting Library for the .NET platform and ASP.NET Framework has built-in ValidateRequest function that provides limited sanitization.

The OWASP Java Encoder Project provides a high-performance encoding library for Java.

The following rules are intended to prevent all XSS in your application. While these rules do not allow absolute freedom in putting untrusted data into an HTML document, they should cover the vast majority of common use cases. You do not have to allow all the rules in your organization. Many organizations may find that allowing only Rule #1 and Rule #2 are sufficient for their needs. Please add a note to the discussion page if there is an additional context that is often required and can be secured with escaping.

Do NOT simply escape the list of example characters provided in the various rules. It is NOT sufficient to escape only that list. Blacklist approaches are quite fragile. The whitelist rules here have been carefully designed to provide protection even against future vulnerabilities introduced by browser changes.

RULE #0 - Never Insert Untrusted Data Except in Allowed Locations

The first rule is to deny all - don't put untrusted data into your HTML document unless it is within one of the slots defined in Rule #1 through Rule #5. The reason for Rule #0 is that there are so many strange contexts within HTML that the list of escaping rules gets very complicated. We can't think of any good reason to put untrusted data in these contexts. This includes 'nested contexts' like a URL inside a javascript -- the encoding rules for those locations are tricky and dangerous.

If you insist on putting untrusted data into nested contexts, please do a lot of cross-browser testing and let us know what you find out.

Directly in a script:

Inside an HTML comment:

In an attribute name:

In a tag name:

Directly in CSS:

Most importantly, never accept actual JavaScript code from an untrusted source and then run it. For example, a parameter named 'callback' that contains a JavaScript code snippet. No amount of escaping can fix that.

RULE #1 - HTML Escape Before Inserting Untrusted Data into HTML Element Content

Rule #1 is for when you want to put untrusted data directly into the HTML body somewhere. This includes inside normal tags like div, p, b, td, etc. Most web frameworks have a method for HTML escaping for the characters detailed below. However, this is absolutely not sufficient for other HTML contexts. You need to implement the other rules detailed here as well.

Escape the following characters with HTML entity encoding to prevent switching into any execution context, such as script, style, or event handlers. Using hex entities is recommended in the spec. In addition to the 5 characters significant in XML (&, <, >, ', '), the forward slash is included as it helps to end an HTML entity.

RULE #2 - Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes

Rule #2 is for putting untrusted data into typical attribute values like width, name, value, etc. This should not be used for complex attributes like href, src, style, or any of the event handlers like onmouseover. It is extremely important that event handler attributes should follow Rule #3 for HTML JavaScript Data Values.

Inside UNquoted attribute:

Inside single quoted attribute:

Inside double quoted attribute :

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute.

The reason this rule is so broad is that developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote.

Unquoted attributes can be broken out of with many characters, including [space]%*+,-/;<=>^ and .

RULE #3 - JavaScript Escape Before Inserting Untrusted Data into JavaScript Data Values

Rule #3 concerns dynamically generated JavaScript code - both script blocks and event-handler attributes. The only safe place to put untrusted data into this code is inside a quoted 'data value.' Including untrusted data inside any other JavaScript context is quite dangerous, as it is extremely easy to switch into an execution context with characters including (but not limited to) semi-colon, equals, space, plus, and many more, so use with caution.

Inside a quoted string:

One side of a quoted expression:

Inside quoted event handler:

Please note there are some JavaScript functions that can never safely use untrusted data as input - EVEN IF JAVASCRIPT ESCAPED!

Owasp Cross Site Scripting Prevention Cheat SheetOwasp Cross Site Scripting Prevention Cheat Sheet

For example:

Except for alphanumeric characters, escape all characters less than 256 with the xHH format to prevent switching out of the data value into the script context or into another attribute. DO NOT use any escaping shortcuts like ' because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible to escape-the-escape attacks where the attacker sends ' and the vulnerable code turns that into ' which enables the quote.

If an event handler is properly quoted, breaking out requires the corresponding quote. However, we have intentionally made this rule quite broad because event handler attributes are often left unquoted. Unquoted attributes can be broken out of with many characters including [space]%*+,-/;<=>^ and .

Also, a </script> closing tag will close a script block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser. Please note this is an aggressive escaping policy that over-encodes. If there is a guarantee that proper quoting is accomplished then a much smaller character set is needed. Please look at the OWASP Java Encoder JavaScript escaping examples for examples of proper JavaScript use that requires minimal escaping.

RULE #3.1 - HTML escape JSON values in an HTML context and read the data with JSON.parse

In a Web 2.0 world, the need for having data dynamically generated by an application in a javascript context is common. One strategy is to make an AJAX call to get the values, but this isn't always performant. Often, an initial block of JSON is loaded into the page to act as a single place to store multiple values. This data is tricky, though not impossible, to escape correctly without breaking the format and content of the values.

Ensure returned Content-Type header is application/json and not text/html. This shall instruct the browser not misunderstand the context and execute injected script

Bad HTTP response:

Good HTTP response:

A common anti-pattern one would see:

JSON serialization

A safe JSON serializer will allow developers to serialize JSON as string of literal JavaScript which can be embedded in an HTML in the contents of the <script> tag. HTML characters and JavaScript line terminators need be escaped. Consider the Yahoo JavaScript Serializer for this task.

HTML entity encoding

This technique has the advantage that html entity escaping is widely supported and helps separate data from server side code without crossing any context boundaries. Consider placing the JSON block on the page as a normal element and then parsing the innerHTML to get the contents. The javascript that reads the span can live in an external file, thus making the implementation of CSP enforcement easier.

An alternative to escaping and unescaping JSON directly in JavaScript, is to normalize JSON server-side by converting < to u003c before delivering it to the browser.

RULE #4 - CSS Escape And Strictly Validate Before Inserting Untrusted Data into HTML Style Property Values

Rule #4 is for when you want to put untrusted data into a stylesheet or a style tag. CSS is surprisingly powerful, and can be used for numerous attacks. Therefore, it's important that you only use untrusted data in a property value and not into other places in style data. You should stay away from putting untrusted data into complex properties like url, behavior, and custom (-moz-binding).

You should also not put untrusted data into IE’s expression property value which allows JavaScript.

Owasp Cross Site Scripting Prevention Cheat Sheet

Property value:

Please note there are some CSS contexts that can never safely use untrusted data as input - EVEN IF PROPERLY CSS ESCAPED! You will have to ensure that URLs only start with http not javascript and that properties never start with 'expression'.

For example:

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the HH escaping format. DO NOT use any escaping shortcuts like ' because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible to escape-the-escape attacks where the attacker sends ' and the vulnerable code turns that into ' which enables the quote.

If attribute is quoted, breaking out requires the corresponding quote. All attributes should be quoted but your encoding should be strong enough to prevent XSS when untrusted data is placed in unquoted contexts.

Unquoted attributes can be broken out of with many characters including [space]%*+,-/;<=>^ and .

Also, the </style> tag will close the style block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser. Please note that we recommend aggressive CSS encoding and validation to prevent XSS attacks for both quoted and unquoted attributes.

RULE #5 - URL Escape Before Inserting Untrusted Data into HTML URL Parameter Values

Rule #5 is for when you want to put untrusted data into HTTP GET parameter value.

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the %HH escaping format. Including untrusted data in data: URLs should not be allowed as there is no good way to disable attacks with escaping to prevent switching out of the URL.

All attributes should be quoted. Unquoted attributes can be broken out of with many characters including [space]%*+,-/;<=>^ and . Note that entity encoding is useless in this context.

WARNING: Do not encode complete or relative URL's with URL encoding! If untrusted input is meant to be placed into href, src or other URL-based attributes, it should be validated to make sure it does not point to an unexpected protocol, especially javascript links. URL's should then be encoded based on the context of display like any other piece of data. For example, user driven URL's in HREF links should be attribute encoded.

For example:

RULE #6 - Sanitize HTML Markup with a Library Designed for the Job

If your application handles markup -- untrusted input that is supposed to contain HTML -- it can be very difficult to validate. Encoding is also difficult, since it would break all the tags that are supposed to be in the input. Therefore, you need a library that can parse and clean HTML formatted text. There are several available at OWASP that are simple to use:

An open-source .Net library. The HTML is cleaned with a white list approach. All allowed tags and attributes can be configured. The library is unit tested with the OWASP XSS Filter Evasion Cheat Sheet

For more information on OWASP Java HTML Sanitizer policy construction, see here.

The SanitizeHelper module provides a set of methods for scrubbing text of undesired HTML elements.

Other libraries that provide HTML Sanitization include:

  • HTML sanitizer from Google Closure Library
  • PHP HTML Purifier.
  • JavaScript/Node.js Bleach.
  • Python Bleach.

RULE #7 - Prevent DOM-based XSS

For details on what DOM-based XSS is, and defenses against this type of XSS flaw, please see the OWASP article on DOM based XSS Prevention Cheat Sheet.

Bonus Rule #1: Use HTTPOnly cookie flag

Preventing all XSS flaws in an application is hard, as you can see. To help mitigate the impact of an XSS flaw on your site, OWASP also recommends you set the HTTPOnly flag on your session cookie and any custom cookies you have that are not accessed by any Javascript you wrote. This cookie flag is typically on by default in .NET apps, but in other languages you have to set it manually. For more details on the HTTPOnly cookie flag, including what it does, and how to use it, see the OWASP article on HTTPOnly.

Bonus Rule #2: Implement Content Security Policy

There is another good complex solution to mitigate the impact of an XSS flaw called Content Security Policy. It's a browser side mechanism which allows you to create source whitelists for client side resources of your web application, e.g. JavaScript, CSS, images, etc. CSP via special HTTP header instructs the browser to only execute or render resources from those sources.

For example this CSP:

Will instruct web browser to load all resources only from the page's origin and JavaScript source code files additionaly from static.domain.tld. For more details on Content Security Policy, including what it does, and how to use it, see this article on Content Security Policy.

Bonus Rule #3: Use an Auto-Escaping Template System

Many web application frameworks provide automatic contextual escaping functionality such as AngularJS strict contextual escaping and Go Templates. Use these technologies when you can.

Bonus Rule #4: Use the X-XSS-Protection Response Header

This HTTP response header enables the Cross-site scripting (XSS) filter built into some modern web browsers. This header is usually enabled by default anyway, so the role of this header is to re-enable the filter for this particular website if it was disabled by the user.

Bonus Rule #5: Properly use modern JS frameworks like Angular (2+) or ReactJS

Modern javascript frameworks have pretty good XSS protection built in. It is important how to use them properly to benefit from it.

When using ReactJS do not use dangerouslySetInnerHTML. If you really, really really have to use dangerouslySetInnerHTML remember that now all framework protections are turned off and you have to escape or sanitize all the data by yourself.

For Angular (2+) remember to build Angular templates with -prod parameter (ng build --prod) in order to avoid template injection.

And also remember to update your framework to the newest version, with all possible bug fixes, as soon as possible.

Data TypeContextCode SampleDefense
StringHTML Body<span>UNTRUSTED DATA </span>HTML Entity Encoding (rule #1).
StringSafe HTML Attributes<input type='text' name='fname' value='UNTRUSTED DATA '>Aggressive HTML Entity Encoding (rule #2), Only place untrusted data into a whitelist of safe attributes (listed below), Strictly validate unsafe attributes such as background, id and name.
StringGET Parameter<a href='/site/search?value=UNTRUSTED DATA '>clickme</a>URL Encoding (rule #5).
StringUntrusted URL in a SRC or HREF attribute<a href='UNTRUSTED URL '>clickme</a> <iframe src='UNTRUSTED URL ' />Canonicalize input, URL Validation, Safe URL verification, Whitelist http and https URL's only (Avoid the JavaScript Protocol to Open a new Window), Attribute encoder.
StringCSS Valuehtml <div>Selection</div>Strict structural validation (rule #4), CSS Hex encoding, Good design of CSS Features.
StringJavascript Variable<script>var currentValue='UNTRUSTED DATA ';</script> <script>someFunction('UNTRUSTED DATA ');</script>Ensure JavaScript variables are quoted, JavaScript Hex Encoding, JavaScript Unicode Encoding, Avoid backslash encoding (' or ' or ).
HTMLHTML Body<div>UNTRUSTED HTML</div>HTML Validation (JSoup, AntiSamy, HTML Sanitizer...).
StringDOM XSS<script>document.write('UNTRUSTED INPUT: ' + document.location.hash );<script/>DOM based XSS Prevention Cheat Sheet

The following snippets of HTML demonstrate how to safely render untrusted data in a variety of different contexts.

Safe HTML Attributes include:align, alink, alt, bgcolor, border, cellpadding, cellspacing, class, color, cols, colspan, coords, dir, face, height, hspace, ismap, lang, marginheight, marginwidth, multiple, nohref, noresize, noshade, nowrap, ref, rel, rev, rows, rowspan, scrolling, shape, span, summary, tabindex, title, usemap, valign, value, vlink, vspace, width.

The purpose of output encoding (as it relates to Cross Site Scripting) is to convert untrusted input into a safe form where the input is displayed as data to the user without executing as code in the browser. The following charts details a list of critical output encoding methods needed to stop Cross Site Scripting.

Encoding TypeEncoding Mechanism
HTML Entity EncodingConvert & to &amp;, Convert < to &lt;, Convert > to &gt;, Convert ' to &quot;, Convert ' to &#x27;, Convert / to &#x2F;
HTML Attribute EncodingExcept for alphanumeric characters, escape all characters with the HTML Entity &#xHH; format, including spaces. (HH = Hex Value)
URL EncodingStandard percent encoding, see here. URL encoding should only be used to encode parameter values, not the entire URL or path fragments of a URL.
JavaScript EncodingExcept for alphanumeric characters, escape all characters with the uXXXX unicode escaping format (X = Integer).
CSS Hex EncodingCSS escaping supports XX and XXXXXX. Using a two character escape can cause problems if the next character continues the escape sequence. There are two solutions (a) Add a space after the CSS escape (will be ignored by the CSS parser) (b) use the full amount of CSS escaping possible by zero padding the value.

XSS Attack Cheat Sheet

The following article describes how to exploit different kinds of XSS Vulnerabilities that this article was created to help you avoid:

  • OWASP: XSS Filter Evasion Cheat Sheet - Based on - RSnake's: 'XSS Cheat Sheet'.

Description of XSS Vulnerabilities

  • OWASP article on XSS Vulnerabilities.

Discussion on the Types of XSS Vulnerabilities

  • Types of Cross-Site Scripting.

How to Review Code for Cross-site scripting Vulnerabilities

  • OWASP Code Review Guide article on Reviewing Code for Cross-site scripting Vulnerabilities.

How to Test for Cross-site scripting Vulnerabilities

  • OWASP Testing Guide article on Testing for Cross site scripting Vulnerabilities.

Jeff Williams - [email protected]

Jim Manico - [email protected]

Neil Mattatall - [email protected]

This article is focused on providing clear, simple, actionable guidance for providing Input Validation security functionality in your applications.

Input validation is performed to ensure only properly formed data is entering the workflow in an information system, preventing malformed data from persisting in the database and triggering malfunction of various downstream components. Input validation should happen as early as possible in the data flow, preferably as soon as the data is received from the external party.

Data from all potentially untrusted sources should be subject to input validation, including not only Internet-facing web clients but also backend feeds over extranets, from suppliers, partners, vendors or regulators, each of which may be compromised on their own and start sending malformed data.

Input Validation should not be used as the primary method of preventing XSS, SQL Injection and other attacks which are covered in respective cheat sheets but can significantly contribute to reducing their impact if implemented properly.

Input validation should be applied on both syntactical and Semantic level.

Syntactic validation should enforce correct syntax of structured fields (e.g. SSN, date, currency symbol).

Semantic validation should enforce correctness of their values in the specific business context (e.g. start date is before end date, price is within expected range).

It is always recommended to prevent attacks as early as possible in the processing of the user’s (attacker's) request. Input validation can be used to detect unauthorized input before it is processed by the application.

Input validation can be implemented using any programming technique that allows effective enforcement of syntactic and semantic correctness, for example:

  • Data type validators available natively in web application frameworks (such as Django Validators, Apache Commons Validators etc).
  • Validation against JSON Schema and XML Schema (XSD) for input in these formats.
  • Type conversion (e.g. Integer.parseInt() in Java, int() in Python) with strict exception handling
  • Minimum and maximum value range check for numerical parameters and dates, minimum and maximum length check for strings.
  • Array of allowed values for small sets of string parameters (e.g. days of week).
  • Regular expressions for any other structured data covering the whole input string (^...$) and not using 'any character' wildcard (such as . or S)

Whitelisting vs blacklisting

It is a common mistake black list validation in order to try to detect possibly dangerous characters and patterns like the apostrophe ' character, the string 1=1, or the <script> tag, but this is a massively flawed approach as it is trivial for an attacker to avoid getting caught by such filters.

Plus, such filters frequently prevent authorized input, like O'Brian, where the ' character is fully legitimate. For more information on XSS filter evasion please see the this wiki page.

Defense In Depth Techniques

White list validation is appropriate for all input fields provided by the user. White list validation involves defining exactly what IS authorized, and by definition, everything else is not authorized.

If it's well structured data, like dates, social security numbers, zip codes, e-mail addresses, etc. then the developer should be able to define a very strong validation pattern, usually based on regular expressions, for validating such input.

If the input field comes from a fixed set of options, like a drop down list or radio buttons, then the input needs to match exactly one of the values offered to the user in the first place.

Validating free-form Unicode text

Free-form text, especially with Unicode characters, is perceived as difficult to validate due to a relatively large space of characters that need to be whitelisted.

It's also free-form text input that highlights the importance of proper context-aware output encoding and quite clearly demonstrates that input validation is not the primary safeguards against Cross-Site Scripting. If your users want to type apostrophe ' or less-than sign < in their comment field, they might have perfectly legitimate reason for that and the application's job is to properly handle it throughout the whole life cycle of the data.

The primary means of input validation for free-form text input should be:

  • Normalization: Ensure canonical encoding is used across all the text and no invalid characters are present.
  • Character category whitelisting: Unicode allows whitelisting categories such as 'decimal digits' or 'letters' which not only covers the Latin alphabet but also various other scripts used globally (e.g. Arabic, Cyryllic, CJK ideographs etc).
  • Individual character whitelisting: If you allow letters and ideographs in names and also want to allow apostrophe ' for Irish names, but don't want to allow the whole punctuation category.

References:

Owasp Cheat Sheet Pdf

Regular expressions

Developing regular expressions can be complicated, and is well beyond the scope of this cheat sheet.

There are lots of resources on the internet about how to write regular expressions, including this site and the OWASP Validation Regex Repository.

In summary, input validation should:

  • Be applied to all input data, at minimum.
  • Define the allowed set of characters to be accepted.
  • Defines a minimum and maximum length for the data (e.g. {1,25} ).

Validating an U.S. Zip Code (5 digits plus optional -4)

Validating U.S. State Selection From a Drop-Down Menu

Java Regex Usage Example

Example validating the parameter “zip” using a regular expression.

Some white list validators have also been predefined in various open source packages that you can leverage. For example:

Be aware that any JavaScript input validation performed on the client can be bypassed by an attacker that disables JavaScript or uses a Web Proxy. Ensure that any input validation performed on the client is also performed on the server.

It is very difficult to validate rich content submitted by a user. For more information, please see the XSS cheatsheet on Sanitizing HTML Markup with a Library Designed for the Job.

All user data controlled must be encoded when returned in the html page to prevent the execution of malicious data (e.g. XSS). For example <script> would be returned as &lt;script&gt;

The type of encoding is specific to the context of the page where the user controlled data is inserted. For example, HTML entity encoding is appropriate for data placed into the HTML body. However, user data placed into a script would need JavaScript specific output encoding.

Detailed information on XSS prevention here: OWASP XSS Prevention Cheat Sheet

Many websites allow users to upload files, such as a profile picture or more. This section helps provide that feature securely.

Additional information on upload protection here: File Upload Protection Cheat Sheet.

Upload Verification

  • Use input validation to ensure the uploaded filename uses an expected extension type.
  • Ensure the uploaded file is not larger than a defined maximum file size.
  • If the website supports ZIP file upload, do validation check before unzip the file. The check includes the target path, level of compress, estimated unzip size.

Upload Storage

  • Use a new filename to store the file on the OS. Do not use any user controlled text for this filename or for the temporary filename.
  • When the file is uploaded to web, it's suggested to rename the file on storage. For example, the uploaded filename is test.JPG, rename it to JAI1287uaisdjhf.JPG with a random file name. The purpose of doing it to prevent the risks of direct file access and ambigious filename to evalide the filter, such as test.jpg;.asp or /../../../../../test.jpg.
  • Uploaded files should be analyzed for malicious content (anti-malware, static analysis, etc).
  • The file path should not be able to specify by client side. It's decided by server side.

Public Serving of Uploaded Content

  • Ensure uploaded images are served with the correct content-type (e.g. image/jpeg, application/x-xpinstall)

Beware of 'special' files

The upload feature should be using a whitelist approach to only allow specific file types and extensions. However, it is important to be aware of the following file types that, if allowed, could result in security vulnerabilities:

  • crossdomain.xml / clientaccesspolicy.xml: allows cross-domain data loading in Flash, Java and Silverlight. If permitted on sites with authentication this can permit cross-domain data theft and CSRF attacks. Note this can get pretty complicated depending on the specific plugin version in question, so its best to just prohibit files named 'crossdomain.xml' or 'clientaccesspolicy.xml'.
  • .htaccess and .htpasswd: Provides server configuration options on a per-directory basis, and should not be permitted. See HTACCESS documentation.
  • Web executable script files are suggested not to be allowed such as aspx, asp, css, swf, xhtml, rhtml, shtml, jsp, js, pl, php, cgi.

Upload Verification

  • Use image rewriting libraries to verify the image is valid and to strip away extraneous content.
  • Set the extension of the stored image to be a valid image extension based on the detected content type of the image from image processing (e.g. do not just trust the header from the upload).
  • Ensure the detected content type of the image is within a list of defined image types (jpg, png, etc)

Email Validation Basics

Many web applications do not treat email addresses correctly due to common misconceptions about what constitutes a valid address. Specifically, it is completely valid to have an mailbox address which:

  • Is case sensitive in the local portion of the address (left of the rightmost @ character).
  • Has non-alphanumeric characters in the local-part (including + and @).
  • Has zero or more labels.

At the time of writing, RFC 5321 is the current standard defining SMTP and what constitutes a valid mailbox address. Please note, email addresses should be considered to be public data.

Many web applications contain computationally expensive and inaccurate regular expressions that attempt to validate email addresses. Recent changes to the landscape mean that the number of false-negatives will increase, particularly due to:

  • Increased popularity of sub-addressing by providers such as Gmail (commonly using + as a token in the local-part to affect delivery)
  • New gTLDs with long names (many regular expressions check the number and length of each label in the domain)

Following RFC 5321, best practice for validating an email address would be to:

  • Check for presence of at least one @ symbol in the address.
  • Ensure the local-part is no longer than 64 octets.
  • Ensure the domain is no longer than 255 octets.
  • Ensure the address is deliverable.

To ensure an address is deliverable, the only way to check this is to send the user an email and have the user take action to confirm receipt. Beyond confirming that the email address is valid and deliverable, this also provides a positive acknowledgement that the user has access to the mailbox and is likely to be authorized to use it.

This does not mean that other users cannot access this mailbox, for example when the user makes use of a service that generates a throw away email address.

  • Email verification links should only satisfy the requirement of verify email address ownership and should not provide the user with an authenticated session (e.g. the user must still authenticate as normal to access the application).
  • Email verification codes must expire after the first use or expire after 8 hours if not used.

Address Normalization

As the local-part of email addresses are, in fact - case sensitive, it is important to store and compare email addresses correctly. To normalise an email address input, you would convert the domain part ONLY to lowercase.

Unfortunately this does and will make input harder to normalise and correctly match to a users intent. It is reasonable to only accept one unique capitalisation of an otherwise identical address, however in this case it is critical to:

Login CSRF

  • Store the user-part as provided and verified by user verification.
  • Perform comparisons by lowercase(provided)lowercase(persisted).

Owasp Authentication Cheat Sheet

Dave Wichers - [email protected]