Since I am already using PHP for my web pages, I reused it for a little offline file parsing / processing (I am running php from the Win10 command prompt). I am now wondering if it is just the wrong language for the job or if it might just be slow due to some dumb mistake I may have made.
What my script does is this: It reads a flat file listing (each line containing the full path to one specific file) and incrementally/recursively constructs a nested array tree (one array for each folder) representing the complete folder/file structure (which in the end is dumped as a 65MB JSON file), i.e. for each read line the existing tree is traversed using the “exploded” path info and missing nodes are added as needed.
The input file is about 38MB and contains about 450’000 lines ("(sub-)folders" in the resulting data structure may contain anywhere from 1 to 8000 child nodes). To my surprise the script takes more than 30 minutes to built the tree and I it seems to get slower the bigger the existing tree already is (maybe some array reallocation/growing issue?).
PS: I am currently using PHP 5.2.17 - since that is what is used on my web server.