preg_replace question on filtering spaces

datapharmer · October 6, 2007, 8:36pm

I am trying to put http:// in front of all my links automatically, but I keep getting an extra space in front of the http. Could someone tell my what is wrong with this filter or how I can fix this problem? I am going crazy trying to figure it out!

/* add http:// anywhere that it is missing */ $content = preg_replace("/([^w/,.,@])([w-]+.[w-]+)/i", "$1http://$2",$content);

Thanks in advance!

Q1712 · October 7, 2007, 1:42am

what is in content before replacing?

why that complicated?

datapharmer · October 7, 2007, 2:44pm

It sorts out data that has already been processed with a WYSIWYG editor and any data like embedded video etc, then looks at anything else that is a url. The reason it is so complicated is that it must be able to find all websites, not just ones within a very specific context, but it must be able to sort out things that aren’t websites too.

phpbb has a function that does something similar, but it doesn’t work well.

For example:

If I type as text:
http://phphelp.com/
or
http://www.phphelp.com
it will show up to you as a link

but if I only type this:
phphelp.com
You will see it as only text.

This is a problem, because you and I can both recognize that all three are links and all three should go to the same place under most circumstances.

There is more to my code than the snippet I included, but that seems to be the section that is picking up an extra space. I was able to fix the display of the code by making

[^w/,.,@]

become

[^w/|.|@]

but it is still generating a space before, but now it is being handled properly on rendering in the browser, and it seems all modern browsers can handle it, so it may no longer be an issue, but I would like it to generate proper html if possible.

Q1712 · October 7, 2007, 6:40pm

wow long expenation, thx.

now that i know it should work i tested it and can’t see the extra space coming from that regex:

<pre><?php
$content='dfg php.net fdh
php.net/manual';

$content = preg_replace("/([^w/,.,@])([w-]+.[w-]+)/i", "$1http://$2",$content);

echo $content;
?></pre>

maybe an other line of code is doing that?

datapharmer · October 7, 2007, 9:02pm

aha! you are correct. It isn’t coming from there, but I am now certain it is coming from here:

<pre><?php
$content='dfg http://php.net fdh
http://php.net/manual';
$content = preg_replace("/([^data="w|-w|>|][w]+://[w-?&;#~=./@]+[w/])/i"," <a HREF="$1">$1</a>",$content);
echo $content;
?></pre>

If it isn’t possible to prevent the space from being included to begin with, maybe it is possible to trim spaces and hard returns from $1 without removing the rest of the spacing in $content? Any help would be appreciated!

Q1712 · October 7, 2007, 9:35pm

that one looks strange to me.

the solution would be to change the backrefereces:

$content = preg_replace("/([^data="w|-w|>|])([w]+://[w-?&;#~=./@]+[w/])/i","$1<a href="$2">$2</a>",$content);

but there are more probs in there:
inside of [] u may only use single chars:

[^data="w|-w|>|]

a char that is not:
d
a
t
a

"
a-z

a-z
|

|

i know what it should be, but i’m not sure how to do this with just regex.

i get back to u as soon as i have an idea.

Q1712 · October 7, 2007, 10:30pm

had an idea:

[code]

<?php

error_reporting(E_ALL);
$content='dfg http://php.net fdh

http://php.net/manual



http://php.net/manual
http://php.net/manual
site: “http://php.net/manual”’;
$content= preg_replace("/([w]+://[w-?&;#~=./@]+[w/])/i","$1",$content);

$content= preg_replace("/(<[^>])<a [^>]>([^<])/i","$1$2",$content);

$content= preg_replace("/(<a[^>]>[^<])<a [^>]>([^<]*)/i","$1$2",$content);
echo $content;

?>

[/code]

this is encoding everything and then setting it back if the url was inside an html-tag or beween the starting and ending a-tag

seems to work, some testing should be done anyway

datapharmer · October 7, 2007, 11:44pm

Wow, that’s great! It seems to work like a charm, and is a lot simpler than what I had come up with! Just so I am certain I am following the code correctly:

// wraps everything that looks like a url as a hyperlink
$content= preg_replace("/([w]+://[w-?&;#~=./@]+[w/])/i","<a href="$1">$1</a>",$content);

// gets rid of href wrapping anything that is already wrapped in a tag 
$content= preg_replace("/(<[^>]*)<a [^>]*>([^<]*)</a>/i","$1$2",$content);

// removes extra trailing </a>
$content= preg_replace("/(<a[^>]*>[^<]*)<a [^>]*>([^<]*)</a>/i","$1$2",$content);

Thanks again for all your invaluable help!

Q1712 · October 8, 2007, 6:57am

right

preg_replace question on filtering spaces

[^data="w|-w|>|]

a char that is not: d a t a

a char that is not:
d
a
t
a