PHP regexp pattern problem

Euklides · July 20, 2014, 8:33pm

Hello,

I am trying to do simple script which should summarize lunch menus from different restaurants.

For one of the websites the data is not formatted very well for me to use a DOM parser so I am trying to use regexp to read the relevant lines. However I am having problem matching multiple lines: For example, this is a part of the HTML page I want to parse:

“”"" HTML Website""""""
Måndag

28/7, Kycklingryta “Flygande Jakob” med bacon, banan & jordnötter

Veg:Spaghetti med quornfärssås & riven ost
Wok:Fläskfilé i sötsursås
Grill:Hamburgare med bröd & pommes
Grill: Kebabtallrik med pommes & tillbehör

"""""""""""""""""""""""""""""""""

So basically what I want to output with my script is:

28/7, Kycklingryta “Flygande Jakob” med bacon, banan & jordnötter
Veg:Spaghetti med quornfärssås & riven ost
Wok:Fläskfilé i sötsursås
Grill:Hamburgare med bröd & pommes
Grill: Kebabtallrik med pommes & tillbehör

However I am not sure how to handle this by a regexp since there are several line breakes etc. I have managed to write the code to output the first line (see code):

[php]

<?php $url="http://www.restauranghusman.se/veckans.html"; $content = file_get_contents($url); $regexp = "/[0-9]+\/[0-9]+[A-Za-z:,& öäå\"]+/"; if (preg_match($regexp, $content, $matches)) { echo $matches[0]; } else { echo "Did not find any match"; } ?>

[/php]

So to summarize the question. How should I write the regexp pattern to output the wanted text and how should the new lines be handled? Also, perhaps there is a better way to do this than with regexp. In that case also please explain which alternative would be better?

Looking forward to your answers.

Topcoder · July 21, 2014, 12:43am

I’m not even sure you need to use a regex…

I would just replace all the “
” with Carriage return line feeds

[php]$content = str_replace(’
’,’\r\n’’, $content);[/php]

Then strip out all the HTML… Which would make the above statement look like this

[php]$content = strip_tags(str_replace(’
’,’\r\n’’, $content));[/php]

That should leave you close to the result you’re looking for based on the HTML sample you provided.