{"id":669,"date":"2012-09-19T23:13:33","date_gmt":"2012-09-19T21:13:33","guid":{"rendered":"https:\/\/d-mueller.de\/blog\/?p=669"},"modified":"2012-09-19T23:13:33","modified_gmt":"2012-09-19T21:13:33","slug":"why-url-validation-with-filter_var-might-not-be-a-good-idea","status":"publish","type":"post","link":"https:\/\/d-mueller.de\/blog\/why-url-validation-with-filter_var-might-not-be-a-good-idea\/","title":{"rendered":"Why URL validation with filter_var might not be a good idea"},"content":{"rendered":"<div style=\"background:#FFFD91;padding:10px;margin-bottom:20px\">Prefer this in German? <a href=\"https:\/\/d-mueller.de\/blog\/warum-url-validierung-mit-filter_var-keine-gute-idee-ist\/\">Warum URL-Validierung mit filter_var keine gute Idee ist<\/a><\/div>\n<p>Since PHP 5.2 brought us the <a href=\"http:\/\/php.net\/manual\/de\/function.filter-var.php\">filter_var<\/a> function, the time of such monsters was over (<a href=\"http:\/\/phpcentral.com\/208-url-validation-in-php.html\">taken from here<\/a>):<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\n$urlregex = &quot;^(https?|ftp)\\:\\\/\\\/([a-z0-9+!*(),;?&amp;=\\$_.-]+(\\:[a-z0-9+!*(),;?&amp;=\\$_.-]+)?@)?[a-z0-9+\\$_-]+(\\.[a-z0-9+\\$_-]+)*(\\:[0-9]{2,5})?(\\\/([a-z0-9+\\$_-]\\.?)+)*\\\/?(\\?[a-z+&amp;\\$_.-][a-z0-9;:@\/&amp;%=+\\$_.-]*)?(#[a-z_.-][a-z0-9+\\$_.-]*)?\\$&quot;;\r\nif (eregi($urlregex, $url)) {echo &quot;good&quot;;} else {echo &quot;bad&quot;;}\r\n<\/pre>\n<p>The simple, yet effective syntax:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfilter_var($url, FILTER_VALIDATE_URL)\r\n<\/pre>\n<p>As third parameter, filter flags can be passed. Considering URL validation, the following 4 flags are availible:<\/p>\n<pre data-enlighter-language=\"enlighter\" class=\"EnlighterJSRAW\">\r\nFILTER_FLAG_SCHEME_REQUIRED\r\nFILTER_FLAG_HOST_REQUIRED\r\nFILTER_FLAG_PATH_REQUIRED \r\nFILTER_FLAG_QUERY_REQUIRED \r\n<\/pre>\n<p>The first two <i>FILTER_FLAG_SCHEME_REQUIRED<\/i> and <i>FILTER_FLAG_HOST_REQUIRED<\/i> are the default.<\/p>\n<h2>Get started!<\/h2>\n<p>Alright, let&#8217;s look at some critical examples.<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfilter_var(&#039;http:\/\/example.com\/&quot;&gt;&lt;script&gt;alert(&quot;xss&quot;)&lt;\/script&gt;&#039;, FILTER_VALIDATE_URL) !== false; \/\/true\r\n<\/pre>\n<p>Well, nobody said that filter_var was built to fight XSS. Let&#8217;s accept this and move on:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfilter_var(&#039;php:\/\/filter\/read=convert.base64-encode\/resource=\/etc\/passwd&#039;, FILTER_VALIDATE_URL) !== false; \/\/true\r\n<\/pre>\n<p>Way more critical. Any scheme will pass the filter. http(s) and ftp would have been acceptable, but this is problematic. filter_var has to <a href=\"http:\/\/www.ietf.org\/rfc\/rfc2396.txt\">deal with all the evilness<\/a> that a url can contain.<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfilter_var(&#039;foo:\/\/bar&#039;, FILTER_VALIDATE_URL) !== false; \/\/true\r\n<\/pre>\n<h2>And the best<\/h2>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfilter_var(&#039;javascript:\/\/test%0Aalert(321)&#039;, FILTER_VALIDATE_URL) !== false; \/\/true\r\n<\/pre>\n<p>Let&#8217;s take a closer look: javascript is the scheme. Of course, hit <i>javascript:alert(1+2+3+4);<\/i> in the address bar of your browser and you&#8217;ll see:<\/p>\n<div id=\"attachment_638\" style=\"width: 469px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/d-mueller.de\/blog\/wp-content\/uploads\/2012\/09\/javascript-url.png\"><img loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-638\" src=\"https:\/\/d-mueller.de\/blog\/wp-content\/uploads\/2012\/09\/javascript-url.png\" alt=\"Javascript-URL\" title=\"Javascript-URL\" width=\"459\" height=\"234\" class=\"size-full wp-image-638\" srcset=\"https:\/\/d-mueller.de\/blog\/wp-content\/uploads\/2012\/09\/javascript-url.png 459w, https:\/\/d-mueller.de\/blog\/wp-content\/uploads\/2012\/09\/javascript-url-300x152.png 300w\" sizes=\"auto, (max-width: 459px) 100vw, 459px\" \/><\/a><p id=\"caption-attachment-638\" class=\"wp-caption-text\">Javascript-URL<\/p><\/div>\n<p>This is the way that bookmarklets work and not a secret. But let&#8217;s move on: The double \/\/ starts an ordinary javascript comment and convinces filter_var that we are dealing with a valid url scheme &#8211; look at the examples above. After that, the sequence <i>%0A<\/i> follows, which is exactly the output of the following code:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\necho urlencode(&quot;\\n&quot;);\r\n<\/pre>\n<p>Get it? Because of the url encoded newline, the javascript comment started with <i>\/\/<\/i> will be finished and what follows is arbitrary javascript code. Imagine a dating site where user urls are validated with filter_var and displayed on the front page. Very evil. <a href=\"http:\/\/codepen.io\/anon\/pen\/logCd\">Try it yourself<\/a>.<\/p>\n<h2>And now?<\/h2>\n<p>The following modification of filter_var could be worth wile:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nfunction validate_url($url)\r\n{\r\n\t$url = trim($url);\r\n\t\r\n\treturn ((strpos($url, &quot;http:\/\/&quot;) === 0 || strpos($url, &quot;https:\/\/&quot;) === 0) &amp;&amp;\r\n\t\t    filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) !== false);\r\n}\r\n<\/pre>\n<p>But even with this wrapping function, the &#8211; at least very unusual &#8211; url <i>http:\/\/x<\/i> passes validation. Maybe, the regex monsters are not that bad ;). And before I forget: filter_var is not multibyte capable. The absolutely valid url <a href=\"http:\/\/\uc2a4\ud0c0\ubc85\uc2a4\ucf54\ub9ac\uc544.com\">http:\/\/\uc2a4\ud0c0\ubc85\uc2a4\ucf54\ub9ac\uc544.com<\/a> is being rejected:<\/p>\n<pre data-enlighter-language=\"php\" class=\"EnlighterJSRAW\">\r\nvar_dump(filter_var(&quot;http:\/\/\uc2a4\ud0c0\ubc85\uc2a4\ucf54\ub9ac\uc544.com&quot;, FILTER_VALIDATE_URL) !== false); \/\/bool(false)\r\n<\/pre>\n<p>To conclude: use filter_var with care, adapt to your situation and be aware of the weaknesses. Finally, I&#8217;d like to recommend <a href=\"http:\/\/www.hashbangcode.com\/examples\/filter_var_url_validate\/\">this nice collection<\/a> of filter_var tests dependent on the filter flags. Ah, and have a look at <a href=\"https:\/\/github.com\/symfony\/Validator\/blob\/master\/Constraints\/UrlValidator.php\">Symfony 2&#8217;s url validator<\/a>, if you like.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Prefer this in German? Warum URL-Validierung mit filter_var keine gute Idee ist Since PHP 5.2 brought us the filter_var function, the time of such monsters was over (taken from here): $urlregex = &quot;^(https?|ftp)\\:\\\/\\\/([a-z0-9+!*(),;?&amp;=\\$_.-]+(\\:[a-z0-9+!*(),;?&amp;=\\$_.-]+)?@)?[a-z0-9+\\$_-]+(\\.[a-z0-9+\\$_-]+)*(\\:[0-9]{2,5})?(\\\/([a-z0-9+\\$_-]\\.?)+)*\\\/?(\\?[a-z+&amp;\\$_.-][a-z0-9;:@\/&amp;%=+\\$_.-]*)?(#[a-z_.-][a-z0-9+\\$_.-]*)?\\$&quot;; if (eregi($urlregex, $url)) {echo &quot;good&quot;;} else {echo &hellip; <a href=\"https:\/\/d-mueller.de\/blog\/why-url-validation-with-filter_var-might-not-be-a-good-idea\/\">Weiterlesen <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[4,6,14,3],"tags":[],"class_list":["post-669","post","type-post","status-publish","format-standard","hentry","category-php","category-security","category-php-wtf","category-webdev"],"_links":{"self":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/posts\/669","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/comments?post=669"}],"version-history":[{"count":0,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/posts\/669\/revisions"}],"wp:attachment":[{"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/media?parent=669"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/categories?post=669"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/d-mueller.de\/blog\/wp-json\/wp\/v2\/tags?post=669"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}