Transforming NOT LIKE sql statement into array

can you guys give me a little push on how to not write out multiple NOT LIKE statements in order to filter out robots? I can use an array instead can’t I? I know there are databases on 3rd party websites that keep known bot ip addresses, but this website does get the level of traffic to warrant API code to query those DBs. Here’s a sample of what I have currently to save the owner some sanity:

$sql = mysqli_query($conn, "SELECT ip
                                 , host
                                 , page
                                 , DATE_FORMAT(date, '%m/%d/%y') as date
                                 , TIME_FORMAT(time, '%T') as time
                                   FROM tblTraffic 
                                   WHERE ip NOT LIKE '%googleusercontent%' AND 
                                                 ip NOT LIKE '%spider%' AND 
                                                 ip NOT LIKE '%googlebot%' 
                                   ORDER BY date DESC, time DESC");

Well, you wouldn’t do that in a query. Unless you mean you are pulling from a 3rd party database and using their info, they should have an API set up for you to use. Normally to block robots, you would add a robots file in the server and it would not let robots to use the pages. I think this can also be done in the .htaccess system.

If you want to just use the robots.txt way, here is a good article on how to do that. It explains how to check your access logs on the server and see if any robots have been checking it. They how to find out their hosting info and add it to your robots block list. It is quite easy. If you block google bots, you site may not show up on it though. Anyway, just FYI… Bot Tutorial

Found this tutorial on how to do this inside the .htaccess file. Just FYI, too… htaccess-bot-tut

thanks Ernie! You have been very helpful, as usual! =)

my query is not inaccurate though, because what I am doing is filtering out the robot visits so the business owner doesn’t see that garbage in his report. I do want those robot visits stored as database records. However, the robots that are actually garbage I DO want to exclude from being stored in the database because I know they yield no marketing potential for this businessman. So those types of robots would be these kinds of strings:

%spider%
%yandex%

What’s why I asked about 3rd party API’s that can query databases across websites.

Okay, I did not understand your comments. What you want to do is block them being saved into the database. You would need to look at the place where you saved the data from the visit and alter that to skip the saving if it is your robot text versions. If you are attempting to clean up your already created data, you would need to use a script to look for the bad entries and then delete them. Changing the selection in the query to skip them would still leave the bad data in the database. It would make more sense to remove the bad ones and not store them in the future. Not sure if that actually helps or not.

I may remove the bad historical data and I may not. But if I do, it’s easier for me to just go into phpMyAdmin via GoDaddy and run the delete queries to take care of it instead of running a PHP script. but how about my original question? is there an easier, consolidated way to write a PHP SQL statement to filter out the bad records than what I originally posted? For instance, turning this:

WHERE ip NOT LIKE '%googleusercontent%' AND 
                                             ip NOT LIKE '%spider%' AND 
                                             ip NOT LIKE '%googlebot%'

into something like this:

WHERE ip NOT LIKE array('%googleusercontent%', '%spider%'. '%googlebot%')

???

I know that’s not correct, it’s just an example of what I thought was possible.

I use the first version a lot and it works well for me. But, I have found the LIKE’s can be tricky.
Sometimes it will flag incorrect results. It is not always as easy to log into the control panel to do work.
If you already have a n ADMIN panel on a website, you can create a form that lets you delete entries based on a query. I think it depends on your logic.

Keep letting these bots to scan the site and keep getting data you don’t want and have to run queries often to remove them.

Or, just not let them in to begin with and not have to do anything further.

Your choice. If you keep these unwanted entries, then yes, the first way works well. The array version takes a little more processing power, but, is still fast enough. I would just keep the first version if you must.

that’s what I’ll do then! I think the end result will be me creating a robots.txt file and listing all the crap robots that do not get much prospecting traffic from customers, and then me filtering a very small amount other ip records that I’m not sure of in terms of whether or not they are useful bots and indexing engines or not. thanks man! I’ll see you in my next thread! =)

SELECT ip
    , host
    , page
    , DATE_FORMAT(date, '%m/%d/%y') as date
    , TIME_FORMAT(time, '%T') as time
FROM 
tblTraffic 
    WHERE ip NOT IN (
        SELECT ip 
        FROM tbleTraffic 
        WHERE 
            user_agent NOT LIKE '%googleusercontent%' 
            AND 
            user_agent NOT LIKE '%spider%'
            AND 
            user_agent NOT LIKE '%googlebot%') 
ORDER BY date DESC, time DESC

By doing something like that.

cipher,

that’s just as long as mine would be if I wrote out all the NOT LIKE statements. actually urs is longer cuz I would have to write the subquery words out in the statement to boot!

The issue is using like AND having several comparisons to check. So that makes it very limiting. If the idea is just to limit it in code, you could either limit what goes into that table to begin with, by checking the values before they are inserted, or creating a view to query against, which is just a pre-done query that you can then use select * because the fields have already been limited in the view itself.

great idea! I will try it. thanks dude!

Sponsor our Newsletter | Privacy Policy | Terms of Service