Get information from a web page

I have a page where you can put in a web address in a form that will display all the li tags and contents on the same page. I have looked all over the internet and other forms about this area and only seen the content returned. I need the form to show something like this:

  • Some text
  • . The code does not run. If anyone can help me I would appreciate it.

    Thanks,
    r

    Here is the code I have.

    <!DOCTYPE html>
    <html>
    <head>
    	<title></title>
    </head>
    <body>
    <h1>Find all li tags</h1>
    <?php
    
        if (isset($_POST['seturl']) && !empty($_POST['seturl'])) {
          // process page for li tags
          $url = sprintf('http://www.%s', $_POST["seturl"]);
          echo "<p>Webpage being processed - ".$url."</p>";
    
          //open the web page as a file
          $fp = @fopen($url,'r') or
                die('<h2 align="center">Cannot access web page</h2>');
    
          //read the data
          while(!feof($fp)) {
              $line = fgets($fp);
    
              $pattern = '/<li>(+?)<\/li>/i';
              preg_match_all($pattern, $line, $found);
    		    
    		 if ($found >= 1 ) {
                  $startpos = strpos($line,"<li");
                  $endpos = strpos($line,"li>");
                  while ($found > 0) {
                      $litag = substr($line, $startpos, ($endpos - $startpos + 2));
    				  echo "<p>".$litag."</p>";
                      $found--;
                      $startpos = strpos($line,"<li",$startpos + 1);
                      $endpos = strpos($line,"li>",$startpos + 1);
                  }
              } 
          }
          
          // Close the "file"
    	  fclose ($fp);
       
        }
        else {
        
    ?>
    <form action="find_li.php" method="post">
    <table border="0" cellspacing="2" cellpadding="2" >
    <tr><td>Enter URL below.</td></tr>
    <tr><td>www.<input type="text" name="seturl" size="40" /></td></tr>
    <tr><td><input type="submit" value="Get Web Page" /></td></tr>
    </table>
    </form>
    <?php  }  ?>
    </body>
    </html>
    

    r-man,

    This should get you started - but you’ll need to work your logic out correctly. I changed your pattern - The one you had was syntactically incorrect. I also change the way you tell if it matched. It returns an array and you need to count that array on the second element to determine if their was a match.

    The rest of your logic I didn’t check, but you should be able to work through that, if not ask. Also this code will not find things like

    [php]

  • red
  • [/php] - I’m not sure if that’s important to you, but it should match on the example you given.

    [php]

  • Some text
  • [/php]

    Here’s the modified code.

    [php]

    Find all li tags

    <?php
     if (isset($_POST['seturl']) && !empty($_POST['seturl'])) {
       // process page for li tags
       $url = sprintf('http://www.%s', $_POST["seturl"]);
       echo "<p>Webpage being processed - ".$url."</p>";
    
       //open the web page as a file
       $fp = @fopen($url,'r') or
             die('<h2 align="center">Cannot access web 
    

    page’);
    echo “test”;

       //read the data
       while(!feof($fp)) {
           $line = fgets($fp);
    
           $pattern = '/<li>.*<\/li>/i';
    
           preg_match_all($pattern, $line, $found);
    	 if (count($found[0]) >= 1 ) {
               $startpos = strpos($line,"<li");
               $endpos = strpos($line,"li>");
               while ($found > 0) {
                   $litag = substr($line, $startpos, ($endpos - 
    

    $startpos + 2));
    echo “

    ”.$litag."

    ";
    $found–;
    $startpos = strpos($line,"<li",$startpos + 1);
    $endpos = strpos($line,“li>”,$startpos + 1);
    }
    }
    }
       // Close the "file"
      fclose ($fp);
    
     }
     else {
    

    ?>

    Enter URL below.
    www.<input type="text" name="seturl" size="40"

    />

    <?php } ?> [/php]

    Topcoder:

    Think you for your reply. I tried it and the only thing that comes up is test. I tried a simple web page I created with simple li tags but it did not work. I must be doing something wrong.

    I have been trying the following code that I found on the internet. It kinda works - it does not show the tags completely. Here is the code I have been trying:
    [php]
    while(!feof($fp)) {
    $line = fgets($fp);

      $pattern = '/[<div>](.*?)<\/div>/';
    
          preg_match_all($pattern, $line, $found);
    	if (count($found[0]) >= 1 ) {
                echo $found[1][0] . "\n";
          }  
      }
    

    [/php]

    If you have any other suggestions please let me know.
    Thank you for all your help,
    r

    Any easier way to solve your problem to get the right regex will be to visit this site…

    www.myregextester.com

    Click the button for “Show code for PHP” - so you can just cut and paste the code when you got it right.

    Then type this in for the match pattern on top

    [php]

    .*</div>[/php]

    Then for the source text, just paste in the code you want to parse, I was using

    [php]hello

    hi
    aaaaa
    hi1
    aaaaa
    hi2
    [/php]

    Then click on submit

    TopCoder.

    Thank you for the information about myregextester.com. I went to the web site and tried it out. It does give the code on the site. I copied the code exactly as they had it and placed it in a new page. When I ran the code it only gave me the content between the tags like this:
    [php]
    Array
    (
    [0] => Array
    (
    [0] => hi
    [1] => hi1
    [2] => hi2
    )
    )
    [/php]

    It did not display the information like the web site gave. So I must be doing something wrong. I just haven’t figured it out yet.

    Thanks,
    r.

    You’re not doing anything wrong, it’s working, you just need to click “View Source” and you will see the div tags. The div tags are not rendering to the screen, because it’s not suppose too.

    Take a look below, see how I wrapped the output in with htmlentities? Now the div tags will show up on the screen.

    [php]<?php
    $sourcestring="hello

    hi
    aaaaa
    hi1
    aaaaa
    hi2
    "; preg_match_all('/
    .*<\/div>/',$sourcestring,$matches); echo "
    ".htmlentities(print_r($matches,true));
    ?>   [/php]

    Sorry I did not get right back to you, I took some time off from working on the code. It started to look all the same to me. I have made some changes to my code and with the added information from you the code works. I want to thank you for all your help. You were helpful to me, so you deserve a cup of coffee!
    Thanks again!
    r

    Sponsor our Newsletter | Privacy Policy | Terms of Service