Conditionally comparing two arrays for differences and insert

I am in the process of comparing two tables for differences and insert the original with what has changed into a new table. The only exception is that I don’t want to look at the last_update_date as I have found this may change without any changes to the record/row itself.

My issue is that this script is working across a variety of tables and as I loop through the results I am getting the keys and values, I do know that last_update_date and id are consistent though as columns.

I have tried using array_dif, but when it did find differences it would error out some way down due to “Insert value list does not match column list.”
I have tried using code like $column !== $column and then executing the queries. The problem with this approach is that it inserts the data for the first array, and then only inserts one column for the second when I need all of them (a to compare and b sometimes, the value may be NULL so I need to see the before and after).

What I have below is one of many of today’s iterations, but simply I just want to insert rows from the first if the second has the same id but one of the other columns is not the same (but not based on the last update date). Thanks in advance:

    //Set the arrays up
        $prep = array();
        $prep2 = array();
    //We have run the sql  
     foreach ($database1_sql_row as $result) {

            foreach ($result as $column => $column_value) {            
    /*
     * 
     * 
     * append the column and column value to the array
     * 
     * 
     */       
              $prep[':'.$column] = $column_value;  
    /*
    *  Run the SQL to get the results from the second table where id = id of the first
    *
    */

    $database2_sql_row=$database2_sql->fetchAll(PDO::FETCH_ASSOC);   
    /*
     * 
     * 
     * 
     * Setup the second array for inserting the values that don't match
     * 
     * 
     */


     foreach ($database2_sql_row as $result2) {

            foreach ($result2 as $column2 => $column_value2) {

             $prep2[':'.$column2] = $column_value2;
 if($column2 !== "last_update_date"){
//do something here so that if this is the only change we don't insert both result 1 and 2
         }
       }

     } 
    }


    $aDifferenceArray = array_diff($result, $result2);
    if(!empty($aDifferenceArray)){

             
             
        $sth = $DB_con3->prepare("INSERT INTO $insert_table ( " . implode(', ',array_keys($result)) . ") VALUES (" . implode(', ',array_keys($prep)) . ")");
    $res = $sth->execute($prep);

     


             $sth2 = $DB_con3->prepare("INSERT INTO $insert_table ( " . implode(', ',array_keys($result2)) . ") VALUES (" . implode(', ',array_keys($prep2)) . ")");
    $res2 = $sth2->execute($prep2); 
      }
    }

Well, why are you not just using PHP to do the job for you?

Read up on this PHP function and post some more up-to-date code: array-deff function

( Just be careful which array you send to it and then you can display the results. Much easier! )

Based on your description and looping over column values, it’s likely you have a bad database design, resulting in overly complicated code/query(ies) to accomplish any task.

Post some sets of sample data covering several possibilities and show what result you want for each possibility.

Well, in my opinion $prep really should be refactored $params.

and before you do

$res = $sth->execute($params);

You should do this -

foreach ($params as $key => $value)
        {
            if($key === 'id') { continue; } // Don't include the id, if it should happen to be there:
            $attribute_pairs[] = $value; // Assign it to an array:
        }
$sth->execute($attribute_pairs);

Though like already stated you most likely have a bad database design.

After looking at it again, why do this at all? Just run a query and handle it in the query itself.
Just use JOIN and compare the tables that way and then the results can just be displayed.
A JOIN is similar to a UNION and for this you can just UNION all of the data from each table.
Loosely, this query would select all values where col1 and col2 are not identical:

SELECT col1, col2
FROM (
   SELECT t1.col1, t1.col2
   FROM t1
   UNION ALL
   SELECT t2.col1, t2.col2
   FROM t2
)  t
GROUP BY col1, col2
HAVING COUNT(*) = 1
ORDER BY col1, col2

Not really sure on the ORDER BY, but, something like this should work…
Perhaps you should give us more info on your two tables structures so we can see if it is a DB
design error or if something like this would work for you…

Thanks all for your replies. Perhaps I should have stated that this script is to review ANY tables provided by the client. I am not able to alter the data or the table structures for Compliance reasons. I like the idea of doing it all in a query in this last reply though.

The one thing that I wanted to avoid was inserting where the last update date (which is a constant across tables) has changed but none of the other columns have (like a User hit save without editing any of the other data in the row).

G

We are confused. If the client’s data is in a query, just create the correct query as I showed one example of.

If the user sends you two tables by uploading them, just use PHP’s array-deff function to do the work for you.

Either way, you do not need to walk thru the data. Compliance is not a problem since you do not touch the data, just create a new results and display it. Using either of these ways, you can find all data that has be changed and show the older or newer versions as needed.

By the way, if a user saves a record that does not change anything, they you do not update the last-updated field. You normally only change that when data changes not just a user hitting the save button. If your system does that, then it is not designed correctly.

Thanks ErnieAlex,

To clarify, I am receiving tables not queries. I am writing the queries.

Again I like the idea of letting the db do the work but on that query. The query suggested works great, I am now looking into how to actually return both rows, not just highlight the row where a match wasn’t found in the other. Either in one row or rows in sequence, eg:

id, column1_before, column1_after, column2_before, column2_after

or
t1.id, t1.column1. t1.column2
t1.id t2.column1. t2.column2

Thanks again for your help, I need to spend more time leaning on the DB than script…

In what format do the tables come to you in? CSV files? SQL file? XML? Json?
It would be super simple to create a temp table, import the data in one simple line and run the query…

The query depends on the array layout. If the id’s match in each table, just use that as the main index.
In the example it will return a list of rows that do not match. Meaning changed. If the id’s are the main
index, you can just return both the previous and changed ones. The easiest way would be just return the
id’s themselves, nothing more. Then, do a simple second query to pull out both the before and after from
the two tables and display them. Make sense?

Thanks, that does make sense, and it is logical and relatively easy to do. Thanks!

To load a file you can do it loosely like this:

-- create test table for json data
CREATE TABLE IF NOT EXISTS jsons_temp(
       id INT AUTO_INCREMENT NOT NULL UNIQUE, 
       json_data json);
LOAD DATA LOCAL INFILE 'C:\\some\\path\\test.json'
INTO table json_temp(json_data);

This is just a simple example for Json data and a local file. You didn’t say what format the data was,
so just an example to help you get started. It is just a starter… Good luck!

Sponsor our Newsletter | Privacy Policy | Terms of Service