Regex not working on extracting URL from text

kencook · December 12, 2012, 8:03pm

Eventually I want to end up with the 9bbfc150 part of the URL (not the iframe id in case the user only pastes the URL into the form). Currently finding 0 URLs.

[php]<?php

$url_string = ‘’;

preg_match_all(’#(www.|https?://){?}[a-zA-Z0-9]{2,254}.[a-zA-Z0-9]{2,4}(\S*)#i’,$url_string,$a);

$count = count($a[1]);
echo “URLs = " .$count.”

";
for ($row = 0; $row < $count ; $row++) {
echo $a[1]["$row"]."
";

?>[/php]

m11 · December 12, 2012, 8:45pm

The regex isn’t valid. I’m not sure why you have # at the beginning and end but they should be forward slashes.

I also don’t know what {?} is supposed to do.

With these two errors fixed, you are only matching the following string:

/embed/9bbfc150/?f=1&autoplay=0&player=full&secret=754743&loop=0&nologo=0&hd=0"></iframe>

Also, why the loop when there is only 1 URL in the string?

m11 · December 12, 2012, 9:31pm

Here are a couple options…

You could extract the URL from the src value then extract the ID from that

[php]preg_match_all(’/src="(.*?)"/i’, $url_string, $a);[/php]

You could extract the ID if the URL is always the same /embed/ID

[php]preg_match_all(’/src=".*?/embed/([^/]+)/i’, $url_string, $a);[/php]

This is the same method but also contains the domain validation from your original regex

[php]preg_match_all(’/(www.|https?://)[a-zA-Z0-9]{2,254}.[a-zA-Z0-9]{2,4}/embed/([^/]+)/i’, $url_string, $a);[/php]