Hello,
I am trying to do simple script which should summarize lunch menus from different restaurants.
For one of the websites the data is not formatted very well for me to use a DOM parser so I am trying to use regexp to read the relevant lines. However I am having problem matching multiple lines: For example, this is a part of the HTML page I want to parse:
“”"" HTML Website""""""
Måndag
28/7, Kycklingryta “Flygande Jakob” med bacon, banan & jordnötter
Veg:Spaghetti med quornfärssås & riven ost
Wok:Fläskfilé i sötsursås
Grill:Hamburgare med bröd & pommes
Grill: Kebabtallrik med pommes & tillbehör
"""""""""""""""""""""""""""""""""
So basically what I want to output with my script is:
28/7, Kycklingryta “Flygande Jakob” med bacon, banan & jordnötter
Veg:Spaghetti med quornfärssås & riven ost
Wok:Fläskfilé i sötsursås
Grill:Hamburgare med bröd & pommes
Grill: Kebabtallrik med pommes & tillbehör
However I am not sure how to handle this by a regexp since there are several line breakes etc. I have managed to write the code to output the first line (see code):
[php]
<?php $url="http://www.restauranghusman.se/veckans.html"; $content = file_get_contents($url); $regexp = "/[0-9]+\/[0-9]+[A-Za-z:,& öäå\"]+/"; if (preg_match($regexp, $content, $matches)) { echo $matches[0]; } else { echo "Did not find any match"; } ?>[/php]
So to summarize the question. How should I write the regexp pattern to output the wanted text and how should the new lines be handled? Also, perhaps there is a better way to do this than with regexp. In that case also please explain which alternative would be better?
Looking forward to your answers.