Since PHP 5.2 brought us the filter_var function, the time of such monsters was over (taken from here):
$urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$"; if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}
The simple, yet effective syntax:
filter_var($url, FILTER_VALIDATE_URL)
As third parameter, filter flags can be passed. Considering URL validation, the following 4 flags are availible:
FILTER_FLAG_SCHEME_REQUIRED FILTER_FLAG_HOST_REQUIRED FILTER_FLAG_PATH_REQUIRED FILTER_FLAG_QUERY_REQUIRED
The first two FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are the default.
Get started!
Alright, let’s look at some critical examples.
filter_var('http://example.com/"><script>alert("xss")</script>', FILTER_VALIDATE_URL) !== false; //true
Well, nobody said that filter_var was built to fight XSS. Let’s accept this and move on:
filter_var('php://filter/read=convert.base64-encode/resource=/etc/passwd', FILTER_VALIDATE_URL) !== false; //true
Way more critical. Any scheme will pass the filter. http(s) and ftp would have been acceptable, but this is problematic. filter_var has to deal with all the evilness that a url can contain.
filter_var('foo://bar', FILTER_VALIDATE_URL) !== false; //true
And the best
filter_var('javascript://test%0Aalert(321)', FILTER_VALIDATE_URL) !== false; //true
Let’s take a closer look: javascript is the scheme. Of course, hit javascript:alert(1+2+3+4); in the address bar of your browser and you’ll see:
This is the way that bookmarklets work and not a secret. But let’s move on: The double // starts an ordinary javascript comment and convinces filter_var that we are dealing with a valid url scheme – look at the examples above. After that, the sequence %0A follows, which is exactly the output of the following code:
echo urlencode("\n");
Get it? Because of the url encoded newline, the javascript comment started with // will be finished and what follows is arbitrary javascript code. Imagine a dating site where user urls are validated with filter_var and displayed on the front page. Very evil. Try it yourself.
And now?
The following modification of filter_var could be worth wile:
function validate_url($url) { $url = trim($url); return ((strpos($url, "http://") === 0 || strpos($url, "https://") === 0) && filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) !== false); }
But even with this wrapping function, the – at least very unusual – url http://x passes validation. Maybe, the regex monsters are not that bad ;). And before I forget: filter_var is not multibyte capable. The absolutely valid url http://스타벅스코리아.com is being rejected:
var_dump(filter_var("http://스타벅스코리아.com", FILTER_VALIDATE_URL) !== false); //bool(false)
To conclude: use filter_var with care, adapt to your situation and be aware of the weaknesses. Finally, I’d like to recommend this nice collection of filter_var tests dependent on the filter flags. Ah, and have a look at Symfony 2’s url validator, if you like.
Vielen Dank für diesen hilfreichen Artikel. Habe deine URL-Validierung gleich weiterverarbeitet siehe: http://sklueh.de/2012/09/lightweight-validator-in-php/
public function check_url($mValue)
{
//Danke an David Müller (https://d-mueller.de)
…
Thank you. I’m using Regex to validate URLs.
Und ich dachte mit filter_var() rüste ich auf und nicht ab. So ein Unsinn. Bei validate_url() vermisse ich übriges noch htmlspecialchars(), da die erste XSS Attacke ebenfalls durch geht:
validate_url('http://example.com/"><script>alert("xss")</script>')
Sanitize the string first, check if it starts with „http“, then check if it’s a URL…
$url = filter_var($url, FILTER_SANITIZE_STRING);
if
(
substr($url, 0, 4) == 'http' &&
filter_var($url, FILTER_VALIDATE_URL)
)
{ // do your stuff here }
Wow, that is nice and simple, Jabari Hunt. I haven’t been able to find an exploit in the 15 minutes I tried. Are you sure this is watertight? Thanks for showing me this.
filter_var FILTER_VALIDATE_URL is also not able to recognize urls with parameters, for example http://www.example.com/searchform.php3?keysearch3=479&keysearch2=27
You should have done some mistake, this URL is expected to be allowed. I just tested it and it does pass.
For anyone developing with WordPress, just use
esc_url_raw($url) === $url
to validate a URL (here’s WordPress‘ documentation on `esc_url_raw`: https://developer.wordpress.org/reference/functions/esc_url_raw/).