Why URL validation with filter_var might not be a good idea

Since PHP 5.2 brought us the filter_var function, the time of such monsters was over (taken from here):

$urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}

The simple, yet effective syntax:

filter_var($url, FILTER_VALIDATE_URL)

As third parameter, filter flags can be passed. Considering URL validation, the following 4 flags are availible:

FILTER_FLAG_SCHEME_REQUIRED
FILTER_FLAG_HOST_REQUIRED
FILTER_FLAG_PATH_REQUIRED 
FILTER_FLAG_QUERY_REQUIRED 

The first two FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are the default.

Get started!

Alright, let’s look at some critical examples.

filter_var('http://example.com/"><script>alert("xss")</script>', FILTER_VALIDATE_URL) !== false; //true

Well, nobody said that filter_var was built to fight XSS. Let’s accept this and move on:

filter_var('php://filter/read=convert.base64-encode/resource=/etc/passwd', FILTER_VALIDATE_URL) !== false; //true

Way more critical. Any scheme will pass the filter. http(s) and ftp would have been acceptable, but this is problematic. filter_var has to deal with all the evilness that a url can contain.

filter_var('foo://bar', FILTER_VALIDATE_URL) !== false; //true

And the best

filter_var('javascript://test%0Aalert(321)', FILTER_VALIDATE_URL) !== false; //true

Let’s take a closer look: javascript is the scheme. Of course, hit javascript:alert(1+2+3+4); in the address bar of your browser and you’ll see:

Javascript-URL

Javascript-URL

This is the way that bookmarklets work and not a secret. But let’s move on: The double // starts an ordinary javascript comment and convinces filter_var that we are dealing with a valid url scheme – look at the examples above. After that, the sequence %0A follows, which is exactly the output of the following code:

echo urlencode("\n");

Get it? Because of the url encoded newline, the javascript comment started with // will be finished and what follows is arbitrary javascript code. Imagine a dating site where user urls are validated with filter_var and displayed on the front page. Very evil. Try it yourself.

And now?

The following modification of filter_var could be worth wile:

function validate_url($url)
{
	$url = trim($url);
	
	return ((strpos($url, "http://") === 0 || strpos($url, "https://") === 0) &&
		    filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) !== false);
}

But even with this wrapping function, the – at least very unusual – url http://x passes validation. Maybe, the regex monsters are not that bad ;). And before I forget: filter_var is not multibyte capable. The absolutely valid url http://스타벅스코리아.com is being rejected:

var_dump(filter_var("http://스타벅스코리아.com", FILTER_VALIDATE_URL) !== false); //bool(false)

To conclude: use filter_var with care, adapt to your situation and be aware of the weaknesses. Finally, I’d like to recommend this nice collection of filter_var tests dependent on the filter flags. Ah, and have a look at Symfony 2’s url validator, if you like.

Weitere Posts:

Dieser Beitrag wurde unter php, Security, PHP-WTF, webdev veröffentlicht. Setze ein Lesezeichen auf den Permalink.

12 Antworten auf Why URL validation with filter_var might not be a good idea

  1. Sebastian sagt:

    Vielen Dank für diesen hilfreichen Artikel. Habe deine URL-Validierung gleich weiterverarbeitet siehe: http://sklueh.de/2012/09/lightweight-validator-in-php/

    public function check_url($mValue)
    {
    //Danke an David Müller (https://d-mueller.de)

  2. ganaysa sagt:

    Thank you. I’m using Regex to validate URLs.

  3. Marc Gutt sagt:

    Und ich dachte mit filter_var() rüste ich auf und nicht ab. So ein Unsinn. Bei validate_url() vermisse ich übriges noch htmlspecialchars(), da die erste XSS Attacke ebenfalls durch geht:
    validate_url('http://example.com/"><script>alert("xss")</script>')

  4. Jabari Hunt sagt:

    Sanitize the string first, check if it starts with „http“, then check if it’s a URL…


    $url = filter_var($url, FILTER_SANITIZE_STRING);

    if
    (
    substr($url, 0, 4) == 'http' &&
    filter_var($url, FILTER_VALIDATE_URL)
    )
    { // do your stuff here }

    1. Benny sagt:

      Wow, that is nice and simple, Jabari Hunt. I haven’t been able to find an exploit in the 15 minutes I tried. Are you sure this is watertight? Thanks for showing me this.

  5. Pingback: URL Validation
  6. ubaid sagt:

    filter_var FILTER_VALIDATE_URL is also not able to recognize urls with parameters, for example http://www.example.com/searchform.php3?keysearch3=479&keysearch2=27

    1. John Wick sagt:

      You should have done some mistake, this URL is expected to be allowed. I just tested it and it does pass.

  7. For anyone developing with WordPress, just use

    esc_url_raw($url) === $url

    to validate a URL (here’s WordPress‘ documentation on `esc_url_raw`: https://developer.wordpress.org/reference/functions/esc_url_raw/).

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert