Skip Duplicates In XML

Hi All,

I am trying to figure out a way to skip duplicate entries in an XML file. To be honest I haven’t worked with XML much at all and have no idea where to start.

I’ve went through the SimpleXML manual to make it as far as I have which is simply echoing out the elements in the XML object that are needed.

I am building this so I can take a flat db file from a program and put it into a MySQL so I can build a quick and simple way of manually adding songs to the search. What I will be placing into the MySQL database will be only a couple of a the attributes in the display node.

I really need to use both the Author and Title attributes to compare each song as I don’t want to remove every all but one song by the an author and some songs have been done and done again by different authors.

What I need to do is compare two

Here is an example of the XML content. There are currently 110,609 songs in the XML database.

<Song FilePath="C:\English Rap R&amp;B\iTunes\iTunes Media\Music\JUELZ SANTANA\WHAT THE GAME BEEN MISSING\15 J.m4a" FileSize="7829897">
  <Display Author="JUELZ SANTANA" Title="J" Album="WHAT THE GAME BEEN MISSING" Color="2324603" Cover="2" Tag="1" />

 </Song>         
                                                                                                                                                                                                                  
 <Song FilePath="C:\ALL MP3&apos;s\Music in Folders\DaveMathews Band\#41.mp3" FileSize="4802953">
  <Display Author="DAVE MATTHEWS" Title="#41" Genre="Rock" Album="Crash" Year="2000" Color="3768206" Cover="2" Tag="1" />
 </Song>       

Here is my current PHP which displays the Author, Title, and Genre in my web browser. Including a count to count the number of songs which I will use later to say X amount of songs has been inserted into the database if a file is used rather than manually entering the information.

[php]
if (file_exists(‘music_db.xml’)) {
$xml = simplexml_load_file(‘music_db.xml’);

$x=1;
foreach($xml->children() as $song)
{ 
	echo "<pre>";
foreach($song->children() as $child)
{
	if(!empty( $child->attributes()->Author) && !empty($child->attributes()->Title) && !empty($child->attributes()->Genre) && !is_numeric((string)$child->attributes()->Title))
	{
			echo  "Author:" . $child->attributes()->Author, "<br />";
			echo  "Title:" . $child->attributes()->Title, "<br />";
			echo  "Genre:" . $child->attributes()->Genre, "<br />";
			echo $x;
			$x++;
	}
	echo "</pre>";
}

}
foreach(libxml_get_errors() as $error)
{
echo “\t”, $error->message;
}
}
else {
exit(‘Failed to open music_db.xml.’);
}
[/php]

Consider something like this. Also cleaner it up a bit, but it could still need some polishing ^^

I did some changes to the loop and variables involved with parsing the xml. Mainly for readability but you also saved an unnecessary loop :slight_smile:

I save all the songs under an array key containing “artist - song title”, that way the artist/song combo will simply be overwritten if you have several.

I also split the logic and the view.

[php]if (file_exists(‘music_db.xml’)) {
$songs = simplexml_load_file(‘music_db.xml’);

$parsed = array();
foreach($songs as $song)
{
$attr = $song->Display->attributes();
if(!empty($attr->Author) && !empty($attr->Title) && !empty($attr->Genre) && !is_numeric((string)$attr->Title))
{
$parsed[$attr->Author . ’ - ’ . $attr->Title] = array(
‘author’ => (string) $attr->Author,
‘title’ => (string) $attr->Title,
‘genre’ => (string) $attr->Genre
);
}
}
foreach(libxml_get_errors() as $error)
{
echo “\t”, $error->message;
}
}
else
{
exit(‘Failed to open music_db.xml.’);
}

asort($parsed); // remove this if you don’t want a sorted array (will sort on artist, then song title)

$x = 1;
foreach ($parsed as $song) {
echo "Author: " . $song[‘author’], “
”;
echo "Title: " . $song[‘title’], “
”;
echo "Genre: " . $song[‘genre’], “
”;
echo $x;
$x++;
}[/php]
[hr]

Are you sure you wish to not include songs that do not have a genre? One of your two example songs was excluded because of this rule. You could do it like this and include songs without genre:

[php]if (file_exists(‘music_db.xml’)) {
$songs = simplexml_load_file(‘music_db.xml’);

$parsed = array();
foreach($songs as $song)
{
$attr = $song->Display->attributes();
if(!empty($attr->Author) && !empty($attr->Title))
{
$parsed[$attr->Author . ’ - ’ . $attr->Title] = array(
‘author’ => (string) $attr->Author,
‘title’ => (string) $attr->Title,
‘genre’ => !empty($attr->Genre) ? (string) $attr->Genre : null
);
}
}
foreach(libxml_get_errors() as $error)
{
echo “\t”, $error->message;
}
}
else
{
exit(‘Failed to open music_db.xml.’);
}

asort($parsed); // remove this if you don’t want a sorted array (will sort on artist, then song title)

$x = 1;
foreach ($parsed as $song) {
echo "Author: " . $song[‘author’], “
”;
echo "Title: " . $song[‘title’], “
”;
echo "Genre: " . $song[‘genre’], “
”;
echo $x;
$x++;
}[/php]

Hi JimL,

Thanks for the response. I haven’t been able to sit down and run this yet but I’m sure with just a little tweaking I’ll be able to make it into what is needed.

I wasn’t worried about the display being part of the logic as it was there just for debugging purposes. This will ultimately be part of a class that will handle the XML database file from this particular program. It should in the end remove all invalid XML from the file, save the information in the database and removing all duplicates.

I plan on in the end posting this script to allow others to handle these files and to improve on it as they need, just have to get over the milestone of learning about how to properly handle the XML.

Thanks Again,
Anthony
A.K.A. Valandor

erteilt und nichts passiert ?


OMAIR

Sponsor our Newsletter | Privacy Policy | Terms of Service