Searching an array containing html source code for a particular string value

I’m using file() to read in the source of a web page and then search for a particular tag. My confusion is because of the following:


$page = file(www.example.com);

foreach ($page as $line_num => $line)
{
	if ($line == "<div class="papaya">")
	{
		echo"Found it!!!!!";
	}
}

Which doesn’t work.
As a test I tried just searching for a simpler tag in case the extra " were causing a problem due to not being escaped, but that doesn’t work either.
I’ve also run through $page to see if each $line does contain a string with $is_string() and that all seems ok.

I am stumped as to what I am doing wrong.

Turn errors on in your environment. It will show you have a syntax error in your if statement:

	if ($line == "<div class="papaya">")

If your string has double quotes (") in it, you need to either escape them with backslashes like this:

	if ($line == "<div class=\"papaya\">")

or use single quotes to wrap the string:

	if ($line == '<div class="papaya">')

Thanks for the answer, but that is not the issue, have tried both escaping the quotes and the single quote wrap. The code fails, as I said, even with if ($line == "<html>")

A bit more background:
Using the code:

	echo "Line No.-{$line_num}: " ."contains a ". gettype ($line ). "&nbsp;" . "\"".htmlspecialchars($line) . "\""."<br /><br />";

The output is:

Line No.-0: contains a string "<html>"

Line No.-1: contains a string "<head> "

Line No.-2: contains a string "</head> "

Line No.-3: contains a string "<body> "

Line No.-4: contains a string "<h1>heading 1</h1> "

Line No.-5: contains a string "<div class="papaya"> "

Line No.-6: contains a string "text in side the div. "

Line No.-7: contains a string "</div> "

Line No.-8: contains a string "</body> "

Line No.-9: contains a string "</html>"

So when I later look for the first html tag it should find it shouldn’t it?

I think I am missing something fundamental to the way php deals with strings in arrays or something.

Next problem will be that each line is being given with its carriage return, which you’re not checking for.

Try this:

    if (trim($line) == '<div class="papaya">')

This will trim whitespace from the beginning and end of the string, including the carriage return.

Thank you. That did it :slight_smile: I also found another way of doing it with:

foreach ($page as $line_num => $line)
{

	if (stristr($line,'<div class="papaya">') == true )
	{
		echo"Found it!!!!!";
	}
}

That will work, but other php devs might be confused by your use of the stristr function. If you just want to check whether the line contains the string, a standard way to do it is using stripos:

    if (stripos($line, '<div class="papaya">') !== false )

Thanks for the tip. I am confused though as to why stripos is better to use than stristr.
Also would I be better to change my to check for the non existence of the string rather than the existence of the string (assuming so because the function will return false)?

It’s entirely a convention thing; they’ll both work for what you want, it’s just that stripos is normally used for this. If you use stristr, other devs are going to stop and wonder what you’re doing. It won’t take them long to figure it out, but it’s still a sticking point.

Regarding existence vs non-existence - it doesn’t really matter which way round you do that, so choos the way that makes the code make most sense to you.

Thanks again for the help. Totally self taught with php and all this advice is great.

Sponsor our Newsletter | Privacy Policy | Terms of Service