Hello:
I am trying to develop a PHP scraper that scrapes all headings h1 to h6 from a web page. The output should show the h tags and their contents. Each line should be something like this
Heading One
,Heading two
etc. I think I am close but it doses not work. If any one can help me I would appreciate it.Thanks,
Randy
Here is my code:
[php]
Headings h1 to h6Find all the headings on a Web Page
<?php if (isset($_POST['chkurl']) && !empty($_POST['chkurl'])) { // process page for headings $url = sprintf('http://www.%s', $_POST["chkurl"]); echo "Webpage being processed - ".$url."
"; //open the web page as a file $fp = @fopen($url,'r') or die('Cannot access web page
'); //read the data while(!feof($fp)) { $line = fgets($fp); // check for a heading...with string functions $pattern = '/\(.+)<\/h[0-6]>/'; $matches = preg_match_all($pattern, $line, $found); if ($matches >= 1 ) { $startpos = strpos($line,""); while ($matches > 0) { $heading = substr($line, $startpos, ($endpos - $startpos + 2)); echo "".$heading."
"; $matches--; $startpos = strpos($line,"",$startpos + 1); } } } // Close the "file" fclose ($fp); } else { ?>Complete URL below. |
www. |
[/php]