I have a tool that I wrote, probably 5+ years ago. Runs once a week, collects data from a public API, translates it into files usable by the asterisk phone server.
I totally forgot about it. Checked. Yep, up to date files created, all seem in the right format.
Meanwhile, had to debug a script that zipped a zip recursively, with the new data appended. The server had barely enough storage left, as the zip took almost 200GB (the data is only 3GB). I looked at the logs, last successful run: 2019
Well it’s not that simple… Because whoever wrote that made it way too complicated (and the production version has been tweaked without updating the dev too)
A clean rewrite with some guard clauses helped remove the haduken ifs and actually zipping the file outside of the zipped directory helped a lot
I mean, I have to say I’ve hastened my own demise (in program terms) by over-engineering something that should be simple. Sometimes adding protective guardrails actually causes errors when something changes.
I oversimplified it but the actual process was to zip files to send to an FTP server
The cron zipped the files to send in the same directory as the zipped files, then sent the zip, then deleted the zip
Looks fine, right? But what if the FTP server is slow and uploading take more time than the hourly cron dispatch? You now have a second script that zip all the folder, with the previous zip file, which will slow down the upload, etc…
I believe may have been started by an FTP upload erroring out and forcing an early return without having a cleanup, and progressively got worse
… I suppose this happened. The logs were actually broken and didn’t actually add the message part of the error object, and only logging the memory address to it
Oh no need. The client didn’t noticed anything in 6 years, and the reason why we had to check is because they wanted us to see if we could add this feature… That already existed.
My favorite part is, if you do some extensive analytics from time to time (e.g. to prepare an upgrade to a new major version) and as a side effect stumble upon some workflows/pipelines/scripts constantly failing (and alerting the process owner) every five minutes for… at least a few months already.
Then you go and ask the process owner and they’re just like “yeah, we were annoyed by the constant error notification mails, so we mad a filter that auto deletes them”…
I feel like half my job is trying to stop false positives and other noise from hitting important places. Because false positives kill any chance true positives will be noticed/reacted to/processed.
I have a tool that I wrote, probably 5+ years ago. Runs once a week, collects data from a public API, translates it into files usable by the asterisk phone server.
I totally forgot about it. Checked. Yep, up to date files created, all seem in the right format.
Sometimes things just keep working.
Meanwhile, had to debug a script that zipped a zip recursively, with the new data appended. The server had barely enough storage left, as the zip took almost 200GB (the data is only 3GB). I looked at the logs, last successful run: 2019
Yes, had the same happen. Something that should be simple failing for stupid reasons.
Well it’s not that simple… Because whoever wrote that made it way too complicated (and the production version has been tweaked without updating the dev too)
A clean rewrite with some guard clauses helped remove the haduken ifs and actually zipping the file outside of the zipped directory helped a lot
I mean, I have to say I’ve hastened my own demise (in program terms) by over-engineering something that should be simple. Sometimes adding protective guardrails actually causes errors when something changes.
Which are what guardrails are for. When something change, you don’t know the impact the change will have.
By having guardrails, you make sure to limit/eliminate potential critical issues.
Am I understanding that last part correctly?
Did they just automatically create a backup zip-bomb in their script‽
I oversimplified it but the actual process was to zip files to send to an FTP server
The cron zipped the files to send in the same directory as the zipped files, then sent the zip, then deleted the zip
Looks fine, right? But what if the FTP server is slow and uploading take more time than the hourly cron dispatch? You now have a second script that zip all the folder, with the previous zip file, which will slow down the upload, etc…
I believe may have been started by an FTP upload erroring out and forcing an early return without having a cleanup, and progressively got worse
… I suppose this happened. The logs were actually broken and didn’t actually add the
messagepart of the error object, and only logging the memory address to itNeed some monitoring!
Oh no need. The client didn’t noticed anything in 6 years, and the reason why we had to check is because they wanted us to see if we could add this feature… That already existed.
My favorite part is, if you do some extensive analytics from time to time (e.g. to prepare an upgrade to a new major version) and as a side effect stumble upon some workflows/pipelines/scripts constantly failing (and alerting the process owner) every five minutes for… at least a few months already.
Then you go and ask the process owner and they’re just like “yeah, we were annoyed by the constant error notification mails, so we mad a filter that auto deletes them”…
I feel like half my job is trying to stop false positives and other noise from hitting important places. Because false positives kill any chance true positives will be noticed/reacted to/processed.
Yeah, all these simple data processing scripts will always work as long as both sides stay the same/compatible
Yep. It seems they haven’t changed a thing about the format. Probably a script much older than mine on their end is generating it too.
Isn’t that true for all of data processing?
Maybe. But webdevs have made it a mission not to seem like so