Help with a search engine for a site I'm working on

Hello.

I’m hoping that I can get some help setting up a search engine for a website I’m working on. I’ve already got a basic search feature functioning. You type words into the search field, press enter, and are taken to a page with results. The results are gathered by querying the database and looking in certain fields for instances of the entered keywords. However, my boss would like me to implement some factors that make it significantly more complicated as far as I can tell.

Let me show you some of the current PHP…

[php]
// Connect to server and select database
require_once($_SERVER[“DOCUMENT_ROOT”]."/content/_main_database_config.php");
ConnectToMainSiteDatabase();
$query = “SELECT * FROM content_subpages”;
$result = mysql_query($query);
while ($row = mysql_fetch_assoc($result)) {
$text_content = str_replace(">","> “,$row[“page_content”]);
$text_content = str_replace(”<"," <",$text_content);
$text_content = strip_tags($text_content);
$text_content = str_replace("&"," “,$text_content);
$text_content = str_replace(” “,” “,$text_content);
$text_content = preg_replace(”/[^A-Za-z0-9 ]/", ‘’, $text_content);
$text_content = preg_replace( “/\s+/”, " “, $text_content);
$text_content = mysql_real_escape_string($text_content);
mysql_query(“UPDATE content_subpages SET text_content = ‘$text_content’ WHERE url_referer = '”.$row[“url_referer”].”’");
}
function highlight($needle, $haystack){
$ind = stripos($haystack, $needle);
$len = strlen($needle);
if($ind !== false){
return substr($haystack, 0, $ind) . ‘’ . substr($haystack, $ind, $len) . “” .
highlight($needle, substr($haystack, $ind + $len));
} else return $haystack;
}
function summarize($haystack,$needle,$wordLimit = 15) {

	// first get summary of text around key word (needle)
	$preg_safe = str_replace(" ", "\s", preg_quote($needle));
	$pattern = "/(\w*\S\s+){0,$wordLimit}\S*($preg_safe)\S*(\s\S+){0,$wordLimit}/ix";
	if (preg_match_all($pattern, $haystack, $matches)) {
		//$summary = str_ireplace($needle, "<strong>$needle</strong>", $matches[0][0]) . '...';
		$summary = '...' . highlight($needle, $matches[0][0]) . '...';
		
	} else {
		$summary = $matches[0][0];
	}

	return $summary;
}	

$keywords = mysql_real_escape_string(trim($_GET[“q”]));
$search_results = ‘’;
$search_results_spacer = ‘



’;
$total_results = 0;

/– SEARCH THE SUBPAGES –/
$query = “SELECT * FROM content_subpages WHERE text_content LIKE ‘%$keywords%’”;
$result = mysql_query($query);
$total_results += mysql_num_rows($result);
while ($row = mysql_fetch_assoc($result)) {
$text_content = summarize($row[“text_content”],$keywords, 25);
$search_results .= ’
<a href=“http://’.$_SERVER[“SERVER_NAME”].’/’.$row[“url_referer”].’”>

’.$row[“title”].’


<a href=“http://’.$_SERVER[“SERVER_NAME”].’/’.$row[“url_referer”].’” class=“search_result_link”>http://’.$_SERVER[“SERVER_NAME”].’/’.$row[“url_referer”].’

‘.$text_content.’

.$search_results_spacer;
}

/– SEARCH THE MEDIA –/
$query = “SELECT * FROM ccm_media WHERE title LIKE ‘%$keywords%’ OR keywords LIKE ‘%$keywords%’”;
$result = mysql_query($query);
$total_results += mysql_num_rows($result);
while ($row = mysql_fetch_assoc($result)) {
//$text_content = summarize($row[“text_content”],$keywords, 25);
$media_link = ‘http://’.$_SERVER[“SERVER_NAME”].’/media&shared=true&keyword=’.$row[“message_number”].’&tab=mostrecent’;
$search_results .= ’

Media: ‘.$row[“title”].’


‘.$media_link.’


.$search_results_spacer;

}

/– SEARCH THE PHOTO GALLERY –/
$query = “SELECT * FROM ccm_photogallery WHERE category LIKE ‘%$keywords%’ OR album_title LIKE ‘%$keywords%’ OR album_desc LIKE ‘%$keywords%’”;
$result = mysql_query($query);
$total_results += mysql_num_rows($result);
while ($row = mysql_fetch_assoc($result)) {
//$text_content = summarize($row[“text_content”],$keywords, 25);
$album_link = ‘http://’.$_SERVER[“SERVER_NAME”].’/photogallery&album=’.$row[“data_folder”];
$search_results .= ’

Photo Album: ‘.$row[“album_title”].’


‘.$album_link.’

‘.$row[“album_desc”].’




';

}

echo ‘

’.$total_results.’ results found.

’;
echo $search_results;[/php]

I know that’s quite a bit to go over so let me summarize what happens. There are different tables in the database that hold different types of content. In this case, we’re checking the subpages, the media, and the photogallery. I want to be able to differentiate the types results on output so I split the process into separate queries. All the output HTML is saved to a variable along with a running count of the results found. I then output that to the page.

This works, but my boss wants it to be a little more intuitive. This was his request:

Think about a way to combine all the results after you have run all the queries. In other words have a way to score the results to be able to sort them by relevance. So a few things that come to mind:
  1. If all words are found that is higher than a single word found within
    a multi word search phrase
  2. Perfectly matched words like if you searched for “multi” and there
    was “multi” on one page and “multiply” on another then the multi would
    be higher.

I really have no idea how I would accomplish this. I’m pretty certain some regular expressions would be involved in the searches as well as perhaps some more looping (to iterate through multi word phrases), but I need help.

I can provide more info if you want to help me and need to know more. Just let me know.

Thank you so much.

Take a look at this tutorial about FULL TEXT searching in a db.
http://www.youtube.com/watch?v=d–v0NhjIfc

Thanks for the link. That short tutorial was quite helpful and informative, however FULLTEXT indexing is for MyISAM table types and we’re using InnoDB. I switched to MyISAM to test this but upon further research I found an answer on Stack Overflow that strongly recommended against using MyISAM for various reasons. This answer was also highly upvoted by the community and I was already a bit wary about changing the table type. Because of this, I don’t think FULLTEXT is an option I can use at this time.

Would somebody help me find another solution or work with me on getting it put together?

Sponsor our Newsletter | Privacy Policy | Terms of Service