Need help with perfomance issues

So to please my boss I wrote up a quick script to grab all the links for postings on CL and other like sites so he won’t have to check himself, just his email, so I did my best and he was happy it worked in the end, but my problem is I am still an amateur with php so when it is run it cost the server huge processing times longer then 60 seconds! I know there is a way to fix the code to make it run faster but I don’t want to change anything for fear of it breaking and my boss knocking on my door.

[php]<?php session_start();

if(!ISSET($_SESSION[‘counts’])){
$_SESSION[‘counts’] = 0;
}

//time how long it takes to excute Part A
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$starttime = $mtime;
//end

//your database details

$dbservertype=‘mysql’;
$servername=‘localhost’;
$dbusername=‘root’;
$dbpassword=’’;
$dbname=‘postings’;

//////////////////////////////////////

/**

  • Connect to the mysql database.
    */
    $conn = mysql_connect($servername, $dbusername, $dbpassword) or die(mysql_error());
    mysql_select_db($dbname, $conn) or die(mysql_error());

//now lets spider the webpage and look for new post

//put some details in
$url = “http://someotherpostsite.com/post/exmaples”;
$input = @file_get_contents($url) or die(“Could not access file: $url”);
$regexp = “<a\s[^>]href=("??)([^" >]?)\1[^>]>(.)</a>”;
$link_num = 0;
$done = 0;
//details done

A:

if(preg_match_all("/$regexp/siU", $input, $matches, PREG_SET_ORDER)) {
foreach($matches as $match) {

// $match[2] = link address
// $match[3] = link text

$check_url=mysql_query(“SELECT auto_id FROM urls WHERE url=’{$match[2]}’”);
$url_db=mysql_fetch_array($check_url);
if ($url_db[‘auto_id’] > 0 ){}
else {
$link_num++;
$time = date(‘l jS \of F Y h:i:s A’);
mysql_query(“INSERT INTO urls (url, time_stamp) VALUES (’{$match[2]}’, ‘{$time}’)”);

$mail_content = @file_get_contents($match[2]);

//now mail it
$to = "[email protected]";

if ($done == 1){
$from = “Second site:”;
}
else {
$from = “First site:”;
}

$subject = “{$from} {$match[3]}”;

$headers = ‘MIME-Version: 1.0’ . “\r\n”;
$headers .= ‘Content-type: text/html; charset=iso-8859-1’ . “\r\n”;
$headers .= “From: [email protected]\r\n” . “X-Mailer: php”;

mail($to, $subject, $mail_content, $headers);
}

}
}

if ($done == 1){
goto B;
}

//now do the 2ad listing
$url = “http://somesitewithpost.com/post/examples”;
$input = @file_get_contents($url) or die(“Could not access file: $url”);
$done = 1;
//done

goto A;

B:

$_SESSION[‘counts’] = $_SESSION[‘counts’] + $link_num;

//show time it took to excute PART B
$mtime = microtime();
$mtime = explode(" ",$mtime);
$mtime = $mtime[1] + $mtime[0];
$endtime = $mtime;
$totaltime = ($endtime - $starttime);
echo “All Links crawled, checked, mailed in {$totaltime} seconds with {$link_num} inserted into database

{$_SESSION[‘counts’]} added this session.”;
//end

?>[/php]

I do not see any code issues except goto’s. GOTO’s are not used by most programmers.
They are a hold over from the early years of B.A.S.I.C languages and are NOT used in OOP
languages. Removing them might help the speed issues a bit. Since PHP code is one big
text file, the goto’s may be reparsing the file several times when the goto’s are hit.

You have code that says something like:
if done==1 skip next part and goto b:
blah blah…

B:

You should have it something like:
if done!=1 {
do this code
}

No gotos… Goto’s slow all code down some unless it is a compiled code. A compiled code such as VB.NET will order the goto and convert it into faster code. PHP is ordered in the order you enter your commands and will have to locate the label-tag you are going to. So, to look for A: or B:, it has to start at the top of the code and find the label… Slows ya down… Try it and post back the results so we will know for sure if that is the problem. (Might help someone else…)

Before I changed the script from using goto’s to a single function, the time to run without new entrys was
0.285s
after implementing the single function idea removing goto’s i got
0.827s

with 1 new entry before function
7.937s

with 1 new entry after function
9.574

So I went back to the old way for now until I can work out something better. This is interesting, I made much more complex php projects before and ran faster then this. Could it be something else I am missing?

Well, thanks for the timing amounts. Can you post the newer code so I can compare how they both work. It seems like a lot of time to pull that little data…

Hold on, Just found out some jerkward was uploading a huge file to the server through the FTP, that was prob why it took so long to pull the data, just booted him off (thank you temp bans) going to try the timings again.

Finally got it fixed, the problem was that I was pulling posting data for each new email sent out, causing massive work time, by instead of pulling the data I created a IFRAME and just set it to display fully, now it can do 26 new entries in 4s instead of 7s for 1 entree and boss man wont even notice the difference!

you should be using loops
code like this, is redundant within your loop

//now do the 2ad listing $url = "http://somesitewithpost.com/post/examples"; $input = @file_get_contents($url) or die("Could not access file: $url"); $done = 1; //done
since the url is hardcoded, it has to pull this 3 times if i read your code correctly
So instead of redownloading the same file, copy it to another string, and operate on the string instead
reducing the time it needs to download the file from 3 times to once

In the new version I made it fixed this and now all the work is done in a function and the url is pulled only once, instead of for each new entry, made a HUGE difference.

Congrats! Always nice to hear when someone succeeds… Am sure the boss is happy!

thankful I fixed it to, cuz now he just gave a a huge list of sites to pull data from >.< well luckly all I have to do now is call the function and insert the url and I am done.

Sponsor our Newsletter | Privacy Policy | Terms of Service