Parse wikipedia output result

Hi guys. I have been using the wikipedia API to retrieve information about a topic. Ive managed to get a response and retrieve the first section of the topic (in this case football)

Using this method -’.$search.’&redirects=1&format=json&prop=text&section=0’);

However the first section that is retrieved includes the pictures and i just want to main text from the introduction.

The code that is sent back from wiki is this -

[parse] => Array
[text] => Array
[*] =>

This article is about sports known as football. For the ball used in these sports, see Football (ball).
Some of the many different games known as football. From top left to bottom right: Association football or soccer, Australian rules football, International rules football, Rugby Union, Rugby League, and American Football.

The game of football is any of several similar team sports, of similar origins which involve advancing a ball into a goal area in an attempt to score. Many of these involve kicking a ball with the foot to score a goal, though not all codes of football using kicking as a primary means of advancing the ball or scoring. The most popular of these sports worldwide is association football, more commonly known as just "football" or "soccer". Unqualified, the word football applies to whichever form of football is the most popular in the regional context in which the word appears, including American football, Australian rules football, Canadian football, Gaelic football, rugby league, rugby union and other related games. These variations are known as "codes".


I want the code that resides in the

tags. How would i go about parsing this and removing the rest. ive tried to get to work simple html dom parser but with no luck.

Any help would be greatly appreciated



You can use regular expression to parse this html code:


~i", $arr[‘parse’][‘text’][’’], $matches)){
print_r($matches[1]); // <— this will contain all matches found within


That looks about right. I have a slight issue, that code im getting back from wiki is being printed out by the use of print_r (response from wiki) How would i parse the print_r response. Is there any way for me to convert the response from wiki into html?

Thanks for your help

In the code I posted there is a reference to the array mentioned in your first post. So I guess you should do something like this:

$arr = response from wiki;

Managed to get that working, thankyou phpHelp. A slight problem is that when it print_r’s the matches - it still displays in an Array -

( [0] => <span class="url"><a href="" class="external text" rel="nofollow"></a></span> [1] =>

How would i make it so it didn’t display that and just displayed it in HTML?



You can do this:
[php]if(is_array($matches[1])) foreach($matches[1] as $paragraph){
echo $paragraph;

Where would this go? wouldnt go in the original if statement’s outcome would it?


This is in replacement of: print_r($matches[1]); Isn’t it what you requested?

Yeah thats what i wanted. I placed it in the original statement, but no response back. Here’s my entire code

<?php $search = urlencode($_POST['select']); //$search = "Football"; $url = sprintf(''.$search.'&redirects=1&format=json&prop=text&section=0'); $ch=curl_init(); curl_setopt($ch, CURLOPT_URL, $url); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_USERAGENT, ''); $res = curl_exec($ch); curl_close($ch); require_once 'Zend/Json.php'; $val = Zend_Json::decode($res); $arr = $val; if(preg_match_all("~


~i", $arr['parse']['text']['*'], $matches)){ if(is_array($matches[1])) foreach($matches[1] as $paragraph){ echo $paragraph; } } ?>

Well, try to echo $matches[0] instead. But normally (by default) matches what you need are returned by preg_match_all() function in the [1] index of array. You can view usage examples here.

Managed to get it to work with your code. thanks phpHelp. Im glad i signed up at this forum now!

Sponsor our Newsletter | Privacy Policy | Terms of Service