Well, after a lot of playing around with the test data, I came up with the following code.
It basically just loads all of the data into a simple array. Next, we would need to create a parsing routine.
That routine would be a simple “for” clause that would go down thru the individual lines and pull the data you
need for each question. Since most of the questions start with a page number, we can break down the total
page into separate questions using “PAGE:” as a hint of where to start. Then, you would have to look for
questions and then the answers. And, of course the ID: to capture the id number.
Here is the starting point. It assumes that the file is a text file. I took your PDF sample and copied the 3 pages of data and placed this into a text file. I called it test.txt. Next, I used this code in the same folder as the text file. It just lists the live data from the test data. If you look at the output of this, you will understand where I am heading with this code. You would have to loop thru the $lines array and pull your data as needed. So, if the line starts with A. or C. , then they are answers. You can test these lines using PHP string functions.
So, here is a start for you:
[php]
<?PHP
// Load entire file into an array (called $lines)
$fd = fopen ("test.txt", "r");
while (!feof ($fd))
{
$buffer = fgets($fd, 4096);
$lines[] = $buffer;
}
fclose ($fd);
// Loop through our array...
foreach($lines as $line)
{
echo($line)."
"; // just display the lines for now... (replace this with parsing code)
}
?>
[/php]
I know this is not a solution, but, it starts you off. It gets the data into a more useable form.
If you need help with the parsing code, let us know…