Keyword Scraping Using PHP Help!


#1

Hello all.

Im a newbie here, im wondering if there are any keyword scraping tools that i can use or build using PHP. My knowledge of PHP is very limited so it wont be easy for me. Ive been looking for some keyword scraping software for sometime now and found most of them are much to pricey. Free keyword scrapers tend to be rubbish aswell and always have a catch with them, they are either limited to the amount of time you use them (Trial Versions) or limited to the amount of keywords they scrape. The genuine free online types of scrapers are just way to slow and will only scrape about 4000 keywords in 24 hours which is literaly a snails pace! Im looking for something that will scrape 10,000+ keywords in a matter of a few hours and would scrape online shops like amazon and ebay can anyone help. :slight_smile:


#2

Well, you are talking about a lot of code depending on what you need for the resulting data. I have several keyword scrapers for various sites that acquire data for me nightly. But, I do find that each website needs different special handling. Scraping sites is actually easy. The hard part is pulling out the correct data you need.

Please explain what and why you need to scrape 10k keywords for. You mentioned Amazon, so are you attempting to do price-checks? This is quite easy with Amazon’s site. Here is sample code for getting this info from Amazon. You will need to know their ISIN number which they use for indexing products. Gives you an idea what you can do with little code. This code was found with the help of Google.
[php]<?php

/* Enter the Amazon Product ISIN */
$amazonISIN = "B00OTWNSMM";

/* Grab the content of the HTML web page */
$html = file_get_contents("http://www.amazon.com/gp/aw/d/$amazonISIN");

/* Clean-up */
$html = str_replace("&amp;nbsp;", "", $html);

/* The magical regex for extracting the price */
$regex = '/\<b\>(Prezzo|Precio|Price|Prix Amazon|Preis):?\<\/b\>([^\<]+)/i';

/* Return the price */

if (preg_match($regex, $html, $price)) {
    $price = number_format((float)($price[2]/100), 2, '.', '');
    echo "The price for amazon.com/dp/$amazonISIN is $price";
} else {
    echo "Sorry, the item is out-of-stock on Amazon";
}

?>[/php]
Obviously, this code is simple and for only one product at a time. You would need to keep a database list of their ISIN numbers and parse thru them in a loop. Also, most websites only allow a certain number of calls to their pages in a set time. Google for instance will lock your IP out if you do too many requests in one minute. Occasionally, you need to have your code sleep for part of a second between scrapes to make it work correctly. In one of my scrapers, I add a " sleep(.5); " command after each scrape and the site lets me keep going. With no delay, it locks out my IP after about 3 minutes and the code fails.
In other words, there is a lot of things to think about for a project like this. Not a complicated process, but, you have to design it in the correct manner. Perhaps you should give us a little more info on what you are scraping.