preg replacing a word in a string, but not when the is somewhere between tags

Hi
It seems I have finally arrived at a problem requiring regular expressions (never needed them before). It tried my best, but I could use some help.

Simplified problem: I need to replace a word in a text, but with exceptions. If the word is somewhere between two specific (html) tags, I must not be replaced. Example: In the sentence below I need to make all words ‘sun’ bold, except the second one because it is somewhere between italics.:

“The word sun must be bold. But not this instance of sun, because it is between italics. This sun though, should be bold again.”

So I went like this:

[php]$triggerword=“sun”;
$replace_with = “sun”;
$string = “The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again.”;[/php]

This will make all instances of sun bold:
[php]preg_replace("/$triggerword/", $replace_with, $string);[/php]

I figured I needed negative lookahead to spot the closing tag:
[php]preg_replace("/$triggerword(?!.*</i>)/", $replace_with, $string);[/php]

That works. The first and second SUN are no longer bold as expected.

Now a negative lookbehind to spot the opening tag:
[php]preg_replace("/(?<!.)$triggerword(?!.</i>)/", $replace_with, $string);[/php]

But that doesn’t seem to work, apparently because lookbehind may not be variable length. But I cannot use fixed length because the number of characters following the tag ARE variable.

So now I am kind of stuck. Any ideas? Part of me thinks I am missing the easy way out. I am also interested in a completely different approach to achieve the same result. Thanks in advance.

here is a solution, it’s a bit of a hack, but it works. :slight_smile:
It involves cutting the $string before and after the italic tags, processing these bits then putting the string back together and returning it.

There is probably an easier way but when i started pressing keys, this is where i arrived :stuck_out_tongue:
[php]<?php
$triggerword=“sun”;
$replace_with = “sun”;
$string = “The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again.”;

function meddle_with_string($string,$triggerword,$replace_with) {
$string = htmlentities($string);
$st_str = strpos($string, ‘<i>’);
$end_str = strpos($string, ‘</i>’);
$substring_a = preg_replace("/$triggerword/", $replace_with, trim(substr($string, 0, $st_str)));
$substring_b = trim(substr($string, $st_str, $end_str-strlen(’</i>’)));
$substring_c = preg_replace("/$triggerword/", $replace_with, trim(substr($string, $end_str+strlen(’</i>’))));
return html_entity_decode($substring_a . ’ ’ . $substring_b . ’ ’ . $substring_c);
}
$newstring = meddle_with_string($string,$triggerword,$replace_with);
?>[/php]
print $string;
print $newstring;
// output:
The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again.
The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again.

Hope that helps,
Red :wink:

Thank you Red, I really appreciate this.
Your ‘hack’ could be the trick i need, but i fear it will only work if there is just one occurrence of italics tags in the text string, is it not?

My problem involves repacing the word any number of times in a large string of text, all of which may or may not be between italics…

you could possibly include a counter (count how many occurrences of <i> is present in the string then use a for loop based on strpos however, I feel that could get a little messy.
Let me see what else I can come up with. :wink:

Thank you very much. :smiley:

[php]$string = ‘The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again. The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again. The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again.’;

echo preg_replace(’%sun(?![^<]*)%i’, ‘sun’, $string);[/php]

Output:
The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again. The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again. The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again.

see, told you there was an easier way :stuck_out_tongue:

Fantastic JimL, not quite sure I follow the expression, but it does work indeed.

So I went ahead and dropped the variables back in, since they need to be variables:
echo preg_replace(’%$triggerword(?![^<]*)%i’,$replace_with, $string);

The $replace_with works fine, but after the blue $triggerword variable it stops replacing anything

WAIT, I got it

echo preg_replace("%$triggerword(?![^<]*)%i", $replace_with, $string);

That’s the reason I don’t like regular expressions :stuck_out_tongue:

yet you grasp them!

So I need to sort of get this, so I can work it back to the actual thing I am trying to do. This part

(?![^<]*)

breaks down to:

(?! negative lookahead
[^<] Matches a single character that is not contained within the brackets, so not a < huh? What?

  •  zero or more characters
    

the closing italic tag

So would you care to clarify the [^<] or is it magic?

The point was to not have matches inside tags, so if you set the trigger word to span it would not give you something like this:
[php]some text here regarding span or some other stuff[/php]–>
[php]<span>some text here regarding span or some other stuff[/php]

I’m honestly not sure if I got it though :stuck_out_tongue: Didn’t really have time

Well thanks for all the help. i have succesfully implemented the solution on a website.

One question comes to mind though: suppose there are more tags, say h1 headers, in which the replacement must also be excluded. Part of me thinks I should be able to add this tag to the same regex as well. My own attempt are failing, but I mean something along the lines of:

original

preg_replace("%$triggerword(?![^<]*)%i", $replace_with, $string);

becomes

preg_replace("%$triggerword(?![^<]*(|))%i", $replace_with, $string);

I think you got it, the code seems to work here
[php]$string = ‘The sun must be bold. But not this instance of sun because it is between italics. This sun though, should be bold again. The sun must be bold.

But not this instance of sun because it is between italics

. This sun though, should be bold again.’;
$triggerword = ‘sun’;
$replace_with = ‘sun’;
echo preg_replace("%$triggerword(?![^<]*(|))%i", $replace_with, $string);[/php]

Output:

[code]The sun must be bold.
But not this instance of sun because it is between italics.
This sun though, should be bold again.
The sun must be bold.

But not this instance of sun because it is between italics

. This sun though, should be bold again.[/code]

Hmm then i must have missed something else, I should be able to work it out.

Anyways, thanks a thousand times. I’d be happy to donate a cup-o-coffee, or do you only take karma?

I’m not really here for either :stuck_out_tongue: Happy you got it sorted! Hope you learned something in the process :slight_smile:

Sponsor our Newsletter | Privacy Policy | Terms of Service