php text processing with comments and various formats.

I have the text file below. It is just a small snippet of a larger file, but you can see that the data changes in different sections. As such, I need to write my php file to read it depending on different sections.

Each section starts with a ~WORD which is cool. And I can use IF statements if one of those sections start, but how do I ignore the comments when I am processing my code and how do I have it start a new if statement when ~WORD is found in the current line that is being read?

[php]$txt_file = file_get_contents(‘welllog.txt’); //read well log file
$rows = explode("\n", $txt_file); // for each line break make a new row.
array_shift($rows);

foreach($rows as $row => $data) // run for each rown found.
{
//get row data
$row_data = preg_split(’/ +/’, $data); // If multiple spaces found, input next value into array.

$info[$row]['junk']       = $row_data[0]; // because the file starts with spaces at the front
$info[$row]['depth']      = $row_data[1];
$info[$row]['gr_edtc']    = $row_data[2];
$info[$row]['cmff']       = $row_data[3];
$info[$row]['cmrp_3ms']   = $row_data[4];
$info[$row]['tcmr']       = $row_data[5];
$info[$row]['ksdr']       = $row_data[6];
$info[$row]['ktim']       = $row_data[7];    [/php]
#--------------------------------------------------
~WELL INFORMATION
#MNEM.UNIT      DATA             DESCRIPTION
#---- ------ --------------   -----------------------------
STRT .F      6900.0          :START DEPTH     
STOP .F      7400.0          :STOP DEPTH     
COMP .        Cirque Resources, LP                     :COMPANY
WELL .        Trippell 32-16H                          :WELL
RANG .        90 W                                     :Range
TOWN .        160 N                                    :Township
#-----------------------------------------------------------------------------
# 
#     DEPT      GR_EDTC        CMFF       CMRP_3MS       TCMR         KSDR         KTIM
#
~A  
    6900.0      43.0127       0.0052       0.0119       0.0446       0.0001       0.0007
    6900.5      37.4412       0.0060       0.0118       0.0432       0.0001       0.0009
    6901.0      32.6030       0.0041       0.0123       0.0391       0.0001       0.0003
    6901.5      26.2366       0.0032       0.0136       0.0324       0.0001       0.0001
    6902.0      23.1347       0.0031       0.0141       0.0263       0.0000       0.0001
    6902.5      22.3031       0.0026       0.0111       0.0228       0.0000       0.0000
    6903.0      20.0020       0.0053       0.0061       0.0155       0.0000       0.0000
    6903.5      21.5065       0.0027       0.0075       0.0204       0.0000       0.0000
    6904.0      24.3387       0.0000       0.0032       0.0175       0.0000       0.0000
    6904.5      29.2992       0.0039       0.0072       0.0237       0.0000       0.0001
    6905.0      31.4188       0.0020       0.0091       0.0289       0.0000       0.0000

What is the problem you are having?

Do you only need the lines with 7 numbers?

No, I can get the 7 columns of data if that is all I have. However I have that data and also the well information data. They are in different formats on the file. So I need to find a way to

  1. ignore any lines that start with #
  2. be able to determine which section of the file I am currently reading so that I may parse that data differently.

I would have to see more of the file to see how much different the ‘sections’ are and which data you need

http://hf.shawndibble.com/welllog2.txt

I need:
~WELL INFORMATION
~PARAMETER INFORMATION
~A

Hm… I take it you do not know regular expressions?

No. I am pretty new. Just looked it up. I am thinking I maybe able to do something like:

[php]if (preg_match("/~WELL/", “$data”))[/php]

And also repeat that for
~PARAMETER
~A
#–

If that is correct, how do I go about ignoring the rest of the line and proceeding to the next line?

You could do something like this, basically building an associative array from the file.

[php]
// fetch all lines to array
// note that using file() returns an array rather than using file_get_contents() which returns a string
$lines = file(‘welllog.txt’, FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);

$info = array();
$sect = 0;

// loop lines and build info array
foreach($lines as $line) {
if (preg_match(’/^~/’, $line)) {
$sect = preg_replace(’/[^A-Za-z]/’, ‘’, $line);
}
$info[$sect][] = $line;
}
[/php]

This will give you an array that looks like this:

Array
(
    [VERSIONINFORMATION] => Array
        (
            [0] => ~VERSION INFORMATION
            [1] => VERS.           2.0   :CWLS Log ASCII Standard - VERSION 2.0
            [2] => WRAP.           NO    :One Line per depth step
            [3] => PROD.  Schlumberger   :LAS Producer
            [4] => PROG.  DLIS to ASCII 17C0-154                          :LAS Program name and version
            [5] => CREA.        2009/06/21 02:17                          :LAS Creation date {YYYY/MM/DD hh:mm}
            [6] => DLIS_CREA.  2009-Jun-21 01:00                          :DLIS Creation date and time {YYYY-MMM-DD hh:mm}
            [7] => SOURCE.     CMR_ECS_049PUP.DLIS                        :DLIS File Name
            [8] => FILE-ID.     CMR_ECS_049PUP                            :File Identification Number
            [9] => #--------------------------------------------------
        )

    [WELLINFORMATION] => Array
        (
            [0] => ~WELL INFORMATION
            [1] => #MNEM.UNIT      DATA             DESCRIPTION
            [2] => #---- ------ --------------   -----------------------------
            [3] => STRT .F      6900.0          :START DEPTH     
            [4] => STOP .F      7400.0          :STOP DEPTH     
            [5] => STEP .F       0.5            :STEP     
            [6] => NULL .          -999.25      :NULL VALUE
            [7] => COMP .        Cirque Resources, LP                     :COMPANY
            [8] => WELL .        Trippell 32-16H                          :WELL
            [9] => FLD  .        Wildcat                                  :FIELD
            [10] => LOC  .        SE SE                                    :LOCATION
            [11] => CNTY .        Burke                                    :COUNTY
            [12] => STAT .        North Dakota                             :STATE
            [13] => CTRY .                                                 :COUNTRY
            [14] => API  .        33-013-01444                             :API NUMBER
            [15] => UWI  .                                                 :UNIQUE WELL ID
            [16] => DATE .        20-Jun-2009                              :LOG DATE {DD-MMM-YYYY}
            [17] => SRVC .        Schlumberger                             :SERVICE COMPANY
            [18] => LATI .DEG                                              :LATITUDE
            [19] => LONG .DEG                                              :LONGITUDE
            [20] => GDAT .                                                 :GeoDetic Datum
            [21] => SECT .        32                                       :Section
            [22] => RANG .        90 W                                     :Range
            [23] => TOWN .        160 N                                    :Township
            [24] => #-----------------------------------------------------------------------------
        )

    [PARAMETERINFORMATION] => Array
        (
            [0] => ~PARAMETER INFORMATION
            [1] => #MNEM.UNIT    VALUE                      DESCRIPTION
            [2] => #---- -----   --------------------       ------------------------
            [3] => RUN  .          TWO                      :RUN NUMBER
            [4] => PDAT .        GROUND LEVEL               :Permanent Datum
            [5] => EPD  .F         2299.000000              :Elevation of Permanent Datum above Mean Sea Level
            [6] => LMF  .          KELLY BUSHING            :Logging Measured From (Name of Logging Elevation Reference)
            [7] => APD  .F          25.000000               :Elevation of Depth Reference (LMF) above Permanent Datum
            [8] => #-----------------------------------------------------------------------------
        )

    [CURVEINFORMATION] => Array
        (
            [0] => ~CURVE INFORMATION
            [1] => #MNEM.UNIT   API CODE                                  DESCRIPTION
            [2] => #---- -----  --------                                  -----------------------
            [3] => DEPT .F                                                :DEPTH (BOREHOLE) {F10.1}
            [4] => GR_EDTC.GAPI                                           :Gamma Ray {F13.4}
            [5] => CMFF .CFCF                                             :CMR Free Fluid {F13.4}
            [6] => CMRP_3MS.CFCF                                          :CMR 3ms Porosity {F13.4}
            [7] => TCMR .CFCF                                             :Total CMR Porosity {F13.4}
            [8] => KSDR .MD                                               :Permeability from CMR - SDR Model {F13.4}
            [9] => KTIM .MD                                               :Permeability from CMR - Timur Model {F13.4}
            [10] => #-----------------------------------------------------------------------------
            [11] => # 
            [12] => #     DEPT      GR_EDTC        CMFF       CMRP_3MS       TCMR         KSDR         KTIM
            [13] => #
        )

    [A] => Array
        (
            [0] => ~A  
            [1] =>     6900.0      43.0127       0.0052       0.0119       0.0446       0.0001       0.0007
            [2] =>     6900.5      37.4412       0.0060       0.0118       0.0432       0.0001       0.0009
            [3] =>     6901.0      32.6030       0.0041       0.0123       0.0391       0.0001       0.0003
            [4] =>     6901.5      26.2366       0.0032       0.0136       0.0324       0.0001       0.0001
            [5] =>     6902.0      23.1347       0.0031       0.0141       0.0263       0.0000       0.0001
        )
)

Then you would need to go through and parse the data from each section. Each section would need a different regex to parse the proper lines (ignoring comments and such)

Here’s an example of how you could parse the “A” data from that array:

[php]
if (isset($data[‘A’])) {
foreach($data[‘A’] as $key => $line) {
// Explaination of : (\d*.\d+)\s+
// () parentheses are used to enclose each value we want in $parts
// \d* matches 0 or more digits
// . matches a decimal
// \d+ matches 1 or more digits
// \s+ matches 1 or more spaces
// the same pattern is duplicated 7 times for 7 values expected

	if (preg_match('/(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)\s+(\d*\.\d+)/', $line, $parts)) {
		// regex matched lets replace the line with our parse data
		$data['A'][$key] = array(
			'depth'		=> $parts[1],
			'gr_edtc'	=> $parts[2],
			'cmff'		=> $parts[3],
			'cmrp_3ms'	=> $parts[4],
			'tcmr'		=> $parts[5],
			'ksdr'		=> $parts[6],
			'ktim'		=> $parts[7],
		);
	}
	else {
		unset($data['A'][$key]); // if regex fails the line should be removed
	}
}

}
[/php]

Now, the $data[‘A’] array will look like this:

    [A] => Array
        (
            [1] => Array
                (
                    [depth] => 6900.0
                    [gr_edtc] => 43.0127
                    [cmff] => 0.0052
                    [cmrp_3ms] => 0.0119
                    [tcmr] => 0.0446
                    [ksdr] => 0.0001
                    [ktim] => 0.0007
                )

            [2] => Array
                (
                    [depth] => 6900.5
                    [gr_edtc] => 37.4412
                    [cmff] => 0.0060
                    [cmrp_3ms] => 0.0118
                    [tcmr] => 0.0432
                    [ksdr] => 0.0001
                    [ktim] => 0.0009
                )

            [3] => Array
                (
                    [depth] => 6901.0
                    [gr_edtc] => 32.6030
                    [cmff] => 0.0041
                    [cmrp_3ms] => 0.0123
                    [tcmr] => 0.0391
                    [ksdr] => 0.0001
                    [ktim] => 0.0003
                )

Here is a regular expression cheat sheet. See if you can make a regex for the other two sections.

Thank You. This will take some time for me to understand as it is quite a bit different from what I was using and I am still very new to PHP, but from what you are showing in terms of output, is exactly what I was searching.

Feel free to ask any questions if there’s something in the code you don’t understand.

Oh, I also messed up in my second example. $data should be $info (to match the first code)

Sponsor our Newsletter | Privacy Policy | Terms of Service