I have the following code which scrapes the text from multiple pages and displays them:
[php]
<?php include_once 'simple_html_dom.php'; $urls = array( 'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=29/04/1150001118/1&judet=29', 'http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=29/04/1150001118/2&judet=29', ); function scraping($url) { // create HTML DOM $html = file_get_html($url); // get article block if ($html && is_object($html) && isset($html->nodes)) { foreach ($html->find('/html/body/table') as $article) { // get title $item['titlu'] = trim($article->find('/tbody/tr[1]/td/div', 0)->plaintext); // get body $item['tr2'] = trim($article->find('/tbody/tr[2]/td[2]', 0)->plaintext); $item['tr3'] = trim($article->find('/tbody/tr[3]/td[2]', 0)->plaintext); $item['tr4'] = trim($article->find('/tbody/tr[4]/td[2]', 0)->plaintext); $item['tr5'] = trim($article->find('/tbody/tr[5]/td[2]', 0)->plaintext); $item['tr6'] = trim($article->find('/tbody/tr[6]/td[2]', 0)->plaintext); $item['tr7'] = trim($article->find('/tbody/tr[7]/td[2]', 0)->plaintext); $item['tr8'] = trim($article->find('/tbody/tr[8]/td[2]', 0)->plaintext); $item['tr9'] = trim($article->find('/tbody/tr[9]/td[2]', 0)->plaintext); $item['tr10'] = trim($article->find('/tbody/tr[10]/td[2]', 0)->plaintext); $item['tr11'] = trim($article->find('/tbody/tr[11]/td[2]', 0)->plaintext); $item['tr12'] = trim($article->find('/tbody/tr[12]/td/div/]', 0)->plaintext); $ret[] = $item; } // clean up memory $html->clear(); unset($html); return $ret;} } echo ''; foreach ($urls as $url) { $ret = scraping($url); foreach ($ret as $v) { echo $v['titlu'] . '
'; echo $v['tr2'] . '
'; echo $v['tr3'] . '
'; echo $v['tr4'] . '
'; echo $v['tr5'] . '
'; echo $v['tr6'] . '
'; echo $v['tr7'] . '
'; echo $v['tr8'] . '
'; echo $v['tr9'] . '
'; echo $v['tr10'] . '
'; echo $v['tr11'] . '
'; echo $v['tr12'] . '
'; echo '
';echo '
'; } } ?>[/php]
I need to export all those variables into a csv or excel worksheet, one row for each url like : http://i.stack.imgur.com/TiPCl.pngI tried working on this:
[php]<?php
include_once ‘simple_html_dom.php’;
header(‘Content-Type: application/excel’);
header(‘Content-Disposition: attachment; filename=“sample.csv”’);
$urls = array(
‘http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/01/1150001435/1&judet=50’,
‘http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/1&judet=50’,
‘http://lmvz.anofm.ro:8080/lmv/detalii.jsp?UNIQUEJVID=50/05/1140001657/2&judet=50’,
);
function scraping($url) {
// create HTML DOM$html = file_get_html($url); // get article block if ($html && is_object($html) && isset($html->nodes)) { foreach ($html->find('/html/body/table') as $article) { // get title $item['titlu'] = trim($article->find('/tbody/tr[1]/td/div', 0)->plaintext); // get body $item['tr2'] = trim($article->find('/tbody/tr[2]/td[2]', 0)->plaintext); $item['tr3'] = trim($article->find('/tbody/tr[3]/td[2]', 0)->plaintext); $item['tr4'] = trim($article->find('/tbody/tr[4]/td[2]', 0)->plaintext); $item['tr5'] = trim($article->find('/tbody/tr[5]/td[2]', 0)->plaintext); $item['tr6'] = trim($article->find('/tbody/tr[6]/td[2]', 0)->plaintext); $item['tr7'] = trim($article->find('/tbody/tr[7]/td[2]', 0)->plaintext); $item['tr8'] = trim($article->find('/tbody/tr[8]/td[2]', 0)->plaintext); $item['tr9'] = trim($article->find('/tbody/tr[9]/td[2]', 0)->plaintext); $item['tr10'] = trim($article->find('/tbody/tr[10]/td[2]', 0)->plaintext); $item['tr11'] = trim($article->find('/tbody/tr[11]/td[2]', 0)->plaintext); $item['tr12'] = trim($article->find('/tbody/tr[12]/td/div/]', 0)->plaintext); $ret[] = $item; } // clean up memory $html->clear(); unset($html); return $ret;}
}
$output = fopen(“php://output”, “w”);
foreach ($urls as $url) {
$ret = scraping($url);fputcsv($output, $ret);
}
fclose($output);
exit();
?>[/php]
But the output is all wrong.
Any ideas on how i can make it right?
Link to include file: http://sourceforge.net/projects/simplehtmldom/files/simple_html_dom.php/download