Check if a file exists if so don't overwrite it

Pedroski55 · September 25, 2019, 3:50am

Students send their homework via a little html form. PHP writes their answers to a file. Sometimes they send the right answers, but then send again and the form is empty, or the answers are wrong.

The file names look like this: 1825010112_18BEwW4data (1825010112 is a student number, 18BE a class)

What I would like to do in php is, before I write the file:

check if the file exists
if it exists, append ‘_copy’ to the name and then write it.

The end of the php file looks like this:

$newname = $studentnr . "_18BEwW5data";
$path = "/home/myusername/public_html/18BE/php/uploads/";
$fp = fopen($path . $newname, 'w');
fwrite($fp, $body);
fclose($fp);
echo ' 成功！ Your data has been saved 1 time in a text file! <br><br> ';
?>

Any tips please?

chorn · September 25, 2019, 6:47am

Iou can check this with file_exists. But what if the _copy file already exists?

Pedroski55 · September 25, 2019, 7:47am

It happens a lot that students send their answers, then send again, because they had mistake. With emails, this is not a problem, because each email has a unique UID which I use in the savefile name, so all emails get saved and checked.
Now I am using php to write the files directly to my webpage and I get them with rsync, bypassing emails.

How about a counter which increments on every filesave and is added to the file name?

`$newname = $studentnr . "_18BEwW5data + str(counter)";`

But I can’t see how to do this.

I run the Python that marks these 2 times. The second time, it only overwrites if the score is bigger. I put the scores in a dictionary studentsScores{}

if a student sends the right answers, then another email with wrong answers, the score will be bad (this has happened)

so I run the score function again.

the second time, it will overwrite the score in studentsScores[studentnr] IF AND ONLY IF the score is BIGGER!

for answer in answerFiles:
        data = open(path + answer, 'r')
        dataList = data.readlines()
        data.close()
        studentnr = dataList[0].replace('Studentnr = ', '').replace('\n', '')
        print('Studinr is ' + studentnr)
        score = 0        
        for i in range(1, len(dataList)):
                thisAnswer = newAnswerData[i].split('|')
                if dataList[i] in thisAnswer:
                        score += points
        if score > studentsScores[studentnr]:
                studentsScores[studentnr] = score # only overwrite if score is bigger!

Pedroski55 · September 25, 2019, 8:45am

If I make a text file “counter” which only contains a number, open that each time before php writes the file, something like

$counter = $path .  "counter";
$cf = fopen($counter, 'r');
echo 'Counter is ' . $cf;
fclose($cf);
$newcounter = $cf++; // this should add 1
$newNum = fopen($counter, 'w');
fwrite($counter, $newcounter);
fclose($newNum);
$newname = $studentnr . "_18BEwW5data" . $newcounter;
$path = "/home/myusername/public_html/18BE/php/uploads/";
$fp = fopen($path . $newname, 'w');
fwrite($fp, $body);
fclose($fp);
echo ' 成功！ Your data has been saved 1 time in a text file! <br><br> ';

Something along these lines? Would that work?

chorn · September 25, 2019, 9:30am

Depends on what your goal is. If you want that counter per student you have to store it accordingly. But it looks like you overcomplicate a task that a database could easily handle by just making a new revision every time a student uploads something.

Pedroski55 · September 25, 2019, 9:35am

Ah, but I don’t know any sql! Is that what I would need?

Basically, I want to assign a uid to every file. That way, they will all be stored and all be checked. If the first file a student sends is 100% correct, and then he sends another only 90% correct, no problem! Python will only overwrite the first one if the second one is bigger, or vice versa.

chorn · September 25, 2019, 10:14am

At worst, on every upload you can run a loop and check on each iteration if the filename, suffixed with the number from the loop, already exists, until you find a filename with a large enough number that doesn’t exist, to store the file in.

If you want to continuously store all files, and a unique filename each time is possible, you may want to build up a journal or something, that you can evaluate later. So you just store the filename and the students name as a new line of the journal file every time. But without a database you would miss ACId compatibility.

Pedroski55 · September 25, 2019, 10:14am

This does what I want. If you have any tips to make it better, please let me know!

$path = "/home/myusername/public_html/18BE/php/uploads/";
$counter = $path .  "counter";
$cf = fopen($counter, 'r');
$num = fread($cf, filesize($counter));
echo 'Counter is ' . $num . '<br>';
fclose($cf);
$newcounter = intval($num);
$newcounter++;//should add 1
echo 'new number is ' . $newcounter . '<br>';
$newNum = fopen($counter, 'w');
fwrite($newNum, $newcounter);
fclose($newNum);
$newname = $studentnr . "_18BEwW5data" . $newcounter;
$fp = fopen($path . $newname, 'w');
fwrite($fp, $body);
fclose($fp);
echo ' 成功！ Your data has been saved 1 time in a text file! <br><br> ';

I get this kind of output:

Counter is 113
new number is 114
成功！ Your data has been saved 1 time in a text file!

phdr · September 25, 2019, 10:29am

Unless you use file locking, you will find that your file based counter, without any error checking in it, will keep resetting when there are concurrent requests to your web page, because the attempt to read the file when it is being written to by a different instance of your script will produce an error and the code will keep starting back a one.

You need to eliminate all this file handling and just store the submitted data in a database. Once the data is in a database, you can query it or produce reports from it simply by writing the appropriate sql queries. It also appears that you have hard-coded course information in the code, which suggests you have repeated this form handling logic for each possible course. You instead need to use variable(s) for things that are different/vary and have a single instance of the code.

Pedroski55 · September 25, 2019, 10:45am

So, now I’ve got to learn sql?? Can you recommend a good book for beginners?

I thought maybe I could get problems if 2 students send at exactly the same time. But I only have around 200 students in 5 different classes. Probably won’t go into overload!

Each php file is generated by a Python file. I just need to tell it how many variables I have each week, it generates the file in 2 seconds, I upload it with the current week’s webpage.

astonecipher · September 25, 2019, 2:00pm

SQL would be better, however, it can be accomplished depending on a few things.

You can do a count of the files matching the current name, with a wildcard to account for multiple uploads, and use that counter to add a postfix; like windows does when you download the same file multiple times.

Depending on the structure of the document, you can parse it to see if the score is better, but that is far more difficult base on where it seems you present ability is.

anaror · September 25, 2019, 10:46pm

Just use the time() function and append it to the file name. Problem solved.

phdr · September 25, 2019, 11:58pm

Using a time() value won’t solve this because multiple requests can occur in the same second. You have to understand ‘race conditions’ in program execution in order to solve this.

An example - the Vbulliten forum software had a bug in the display of new posts because they used the time of the last post that was viewed to determine if there were new posts to display. However, if multiple posts where made in the same second and at the same time that a visitor viewed posts, any posts made in the part second after when the request was made were never displayed as being a “new” post. You could see the posts via other methods, but they never showed up as being a new post. The correct way to have done this was to use the id of the last post that was viewed to determine if there were new posts to display.

Pedroski55 · September 26, 2019, 12:06am

First of all, I am very grateful for any and all tips and suggestions.

Given the very small number of users, the time they have to complete this (about 36 hours usually) and the tiny size of the data involved, plus the speed of the host server, I think the php will do its work in a millisecond. How likely is it that 2 or more students send at exactly the same millisecond? What might happen then?

That said, I am very open to suggestions from you experts as to how to do this better. For now it works. We can always improve it!

You suggested file-locking. How is that done?

phdr · September 26, 2019, 1:02am

I added a known example of a race condition to my reply above, while you were posting your latest reply. Doing tricks with time() and making assumptions about the execution of your code won’t work. Probability will eventually get you. With real-time, multi-tasking, time-sliced, interrupt/event-driven operation systems, you cannot guarantee the order in which multiple instances of your script performs actions (don’t forget about one student submitting the same form multiple times due to poor connectivity.) The result will be each instance getting the same starting value and using it, resulting in either one set of data overwriting another or resulting in errors that you must detect and handle in order to try and save the data to a file again.

Since it’s doubtful you will switch to just storing the submitted student’s data in a database, you can get your feet wet with MySQL/MariaDB by using a database table to produce concurrent-safe counter(s). Because this uses a single update query to both increment the counter and get the count value, the row of data is locked for the duration of the query and it operates correctly for concurrent requests.

The query looks like this -

$sql = "UPDATE some_table SET counter = LAST_INSERT_ID(counter + 1) WHERE some_where_condition";

The WHERE term is present so that you can maintain multiple counters in one table. The use of the LAST_INSERT_ID() MySQL function allows access to the incremented value that the specific instance of the query produced, regardless of other instances running the same query. Once you have executed this query, you can get the incremented value by referencing the “last insert id” value. Using the php PDO extension, which is the simplest api to use to interface between php and a database server, you would use -

$counter = $pdo->lastInsertId();

Pedroski55 · September 26, 2019, 2:21am

Thank you for the advice. I see I am going to have to tackle some sql! I’ll get a book and work through it!

astonecipher · September 26, 2019, 1:18pm

I don’t think you will have an issue using time or, even better, microtime, and appending it to the uploads. While race conditions are possible, it also depends on how many people are accessing. It’s a matter of probability as well. When your user base is 500, the most users at any given time for a project is going to be a few hours before the project is due.

If the file upload follows the format of, {assignment name}-{student name}.{extension}
and you append the time to that so it is now,
{assignment name}-{student name}-{time}.{extension}

You wont have an issue EVEN IF two users happen to upload a file the same time.

Now, learning how SQL works is never a bad thing. So you can expand greatly on what you can do after you’ve learned about it. But, at this point I think, you would be overengineering for a problem that will likely never occur in the present usage.

Parker · September 26, 2019, 3:41pm

I have always just issued the INSERT and let the ON ERROR routine handle existing occurrences of the key…

What’s your expected duplicate error rate for each 1,000 occurrences of an INSERT – 3% or 4% or less?

If that’s the case you are issuing 960 to 970 unnecessary seeks into the indices and prime data area.

Just sayin’. Workability still trumps performance.

phdr · September 26, 2019, 5:06pm

That’s what the OP should be doing, but isn’t. He/She’s saving submitted form data to files, downloading the files, then processing the data they contain using a second programming language.

astonecipher · September 26, 2019, 5:30pm

With everything going on, I missed a portion of the use case on this.

What you are basically doing, I have done many times, except it was for a ticketing system.

I would need more info on the what and why, but, I think you could possibly expand on this greatly. But essentially, the student logs in; they select the course they are submitting work to; the backend then adds a record that the assignment was uploaded (and when), moves the assignment to file storage.

On your side, you can see the assignment was uploaded at a specific time (if that matters), a link to every file that was uploaded, either by student, assignment, or course.

The student can also view all files they uploaded, when, and for what course.

It sounds daunting, but it really isn’t. You now have a fully functional student portal. When and if you get the inclination, you could add to it, using messaging or push notifications that a new assignment is available, is due, grades have posted, office hours to chat about issues, whatever.