php text processing with comments and various formats.

sawndiddle · March 8, 2013, 10:47pm

I have the text file below. It is just a small snippet of a larger file, but you can see that the data changes in different sections. As such, I need to write my php file to read it depending on different sections.

Each section starts with a ~WORD which is cool. And I can use IF statements if one of those sections start, but how do I ignore the comments when I am processing my code and how do I have it start a new if statement when ~WORD is found in the current line that is being read?

[php]$txt_file = file_get_contents(‘welllog.txt’); //read well log file
$rows = explode("\n", $txt_file); // for each line break make a new row.
array_shift($rows);

foreach($rows as $row => $data) // run for each rown found.
{
//get row data
$row_data = preg_split(’/ +/’, $data); // If multiple spaces found, input next value into array.

$info[$row]['junk']       = $row_data[0]; // because the file starts with spaces at the front
$info[$row]['depth']      = $row_data[1];
$info[$row]['gr_edtc']    = $row_data[2];
$info[$row]['cmff']       = $row_data[3];
$info[$row]['cmrp_3ms']   = $row_data[4];
$info[$row]['tcmr']       = $row_data[5];
$info[$row]['ksdr']       = $row_data[6];
$info[$row]['ktim']       = $row_data[7];    [/php]

#--------------------------------------------------
~WELL INFORMATION
#MNEM.UNIT      DATA             DESCRIPTION
#---- ------ --------------   -----------------------------
STRT .F      6900.0          :START DEPTH     
STOP .F      7400.0          :STOP DEPTH     
COMP .        Cirque Resources, LP                     :COMPANY
WELL .        Trippell 32-16H                          :WELL
RANG .        90 W                                     :Range
TOWN .        160 N                                    :Township
#-----------------------------------------------------------------------------
# 
#     DEPT      GR_EDTC        CMFF       CMRP_3MS       TCMR         KSDR         KTIM
#
~A  
    6900.0      43.0127       0.0052       0.0119       0.0446       0.0001       0.0007
    6900.5      37.4412       0.0060       0.0118       0.0432       0.0001       0.0009
    6901.0      32.6030       0.0041       0.0123       0.0391       0.0001       0.0003
    6901.5      26.2366       0.0032       0.0136       0.0324       0.0001       0.0001
    6902.0      23.1347       0.0031       0.0141       0.0263       0.0000       0.0001
    6902.5      22.3031       0.0026       0.0111       0.0228       0.0000       0.0000
    6903.0      20.0020       0.0053       0.0061       0.0155       0.0000       0.0000
    6903.5      21.5065       0.0027       0.0075       0.0204       0.0000       0.0000
    6904.0      24.3387       0.0000       0.0032       0.0175       0.0000       0.0000
    6904.5      29.2992       0.0039       0.0072       0.0237       0.0000       0.0001
    6905.0      31.4188       0.0020       0.0091       0.0289       0.0000       0.0000

m11 · March 9, 2013, 12:33am

What is the problem you are having?

Do you only need the lines with 7 numbers?

sawndiddle · March 9, 2013, 12:39am

No, I can get the 7 columns of data if that is all I have. However I have that data and also the well information data. They are in different formats on the file. So I need to find a way to

ignore any lines that start with #
be able to determine which section of the file I am currently reading so that I may parse that data differently.

m11 · March 9, 2013, 12:46am

I would have to see more of the file to see how much different the ‘sections’ are and which data you need

sawndiddle · March 9, 2013, 12:48am

http://hf.shawndibble.com/welllog2.txt

I need:
~WELL INFORMATION
~PARAMETER INFORMATION
~A

m11 · March 9, 2013, 12:52am

Hm… I take it you do not know regular expressions?

sawndiddle · March 9, 2013, 1:06am

No. I am pretty new. Just looked it up. I am thinking I maybe able to do something like:

[php]if (preg_match("/~WELL/", “$data”))[/php]

And also repeat that for
~PARAMETER
~A
#–

If that is correct, how do I go about ignoring the rest of the line and proceeding to the next line?

m11 · March 9, 2013, 1:12am

You could do something like this, basically building an associative array from the file.

[php]
// fetch all lines to array
// note that using file() returns an array rather than using file_get_contents() which returns a string
$lines = file(‘welllog.txt’, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

$info = array();
$sect = 0;

// loop lines and build info array
foreach($lines as $line) {
if (preg_match(’/^~/’, $line)) {
$sect = preg_replace(’/[^A-Za-z]/’, ‘’, $line);
}
$info[$sect][] = $line;
}
[/php]

This will give you an array that looks like this:

Then you would need to go through and parse the data from each section. Each section would need a different regex to parse the proper lines (ignoring comments and such)

m11 · March 9, 2013, 1:24am

Here’s an example of how you could parse the “A” data from that array:

[php]
if (isset($data[‘A’])) {
foreach($data[‘A’] as $key => $line) {
// Explaination of : (\d*.\d+)\s+
// () parentheses are used to enclose each value we want in $parts
// \d* matches 0 or more digits
// . matches a decimal
// \d+ matches 1 or more digits
// \s+ matches 1 or more spaces
// the same pattern is duplicated 7 times for 7 values expected

	if (preg_match('/(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)/', $line, $parts)) {
		// regex matched lets replace the line with our parse data
		$data['A'][$key] = array(
			'depth'		=> $parts[1],
			'gr_edtc'	=> $parts[2],
			'cmff'		=> $parts[3],
			'cmrp_3ms'	=> $parts[4],
			'tcmr'		=> $parts[5],
			'ksdr'		=> $parts[6],
			'ktim'		=> $parts[7],
		);
	}
	else {
		unset($data['A'][$key]); // if regex fails the line should be removed
	}
}

}
[/php]

Now, the $data[‘A’] array will look like this:

    [A] => Array
        (
            [1] => Array
                (
                    [depth] => 6900.0
                    [gr_edtc] => 43.0127
                    [cmff] => 0.0052
                    [cmrp_3ms] => 0.0119
                    [tcmr] => 0.0446
                    [ksdr] => 0.0001
                    [ktim] => 0.0007
                )

            [2] => Array
                (
                    [depth] => 6900.5
                    [gr_edtc] => 37.4412
                    [cmff] => 0.0060
                    [cmrp_3ms] => 0.0118
                    [tcmr] => 0.0432
                    [ksdr] => 0.0001
                    [ktim] => 0.0009
                )

            [3] => Array
                (
                    [depth] => 6901.0
                    [gr_edtc] => 32.6030
                    [cmff] => 0.0041
                    [cmrp_3ms] => 0.0123
                    [tcmr] => 0.0391
                    [ksdr] => 0.0001
                    [ktim] => 0.0003
                )

Here is a regular expression cheat sheet. See if you can make a regex for the other two sections.

sawndiddle · March 9, 2013, 1:35am

Thank You. This will take some time for me to understand as it is quite a bit different from what I was using and I am still very new to PHP, but from what you are showing in terms of output, is exactly what I was searching.

m11 · March 9, 2013, 3:48am

Feel free to ask any questions if there’s something in the code you don’t understand.

Oh, I also messed up in my second example. $data should be $info (to match the first code)