Software development is hard
In my latest blog post, I compared software engineers to construction engineers. I said, that if we, software engineers, want to have the same level of responsibility as other engineers, we need to have the same level of respect other engineers get. But something was haunting me. I was thinking a lot about software engineering for the past month, and I couldn’t understand why software engineers are different. Could it be “just because”? Software is young, compared to other engineering professions. Could it be that we, as humanity, haven’t really understood software development to it’s fullest in order to hold software engineer with the same level of responsibility as we hold other engineers? But then, it all came to me.
I was writing code that integrates with another third-party API. As usual, I was consulting the API documentation. There were no official libraries, and I had to construct the API client using the documentation and trial and error. All went well, I did some testing, and eventually pushed the code to production. Sometime afterward, the code broke. My server would return 500 error code. Why? Because the API provider changes their API.
Who is at fault? Me, for not monitoring the API change logs? The API provider for not communicating these changes to all its consumers? Is it even possible? An average website communicates with a dozen of third party providers. We don’t think about it, and take a lot of these dependencies for granted.
Let’s look at a database for example.
Nearly every web service, except for statically rendered web pages, depend on a database.
Most of us take it for granted.
Have you ever checked that your UPDATE table_name SET field = ? WHERE id = ? actually affect any rows?
What if it doesn’t?
What do you do?
And what happens when pool.acquite() fails to acquire a connection from the connection pool?
Sure, it almost never happens, but a sporadic network error could cause your code to fail.
But none of us think of such cases.
Let’s look at another example: writing a file to the disk. During our day-to-day life, we rarely encounter errors when trying to write a file to the disk. But it can happen. A bug in the OS. Lack of disk space. Absence of writing permission for the process.
We take all these things for granted, as if they can’t fail, but they can, and they do. And then the question raises: how can we demand accountability from software engineers when they depend on systems they did not develop nor understand fully? An electrical engineer checks the wiring in your home, he is not responsible for the grid providing you 500 volts instead of 220 volts. When we develop code, we make assumptions. Assumptions that the network will function, because if it’s not, the IT engineer needs to fix it. Assumptions that the disk is not full, because we, hopefully, have monitoring alerts on disk size. And worse of all, we integrate with a dozen of providers, often times using lousy documentation in combination with trial-and-error or some test data, to determine the happy path flows.
It might sound like I’m whining. Like I’m trying to say “Hey, we are poor software engineers, don’t demand accountability from us!”. But I’m not. I truly think that software development is hard. Sure, it might sound like we put stuff in a database or move pixels on a screen, but it requires so much effort, and belief in others, that I don’t think it’s even possible to develop a fault-tolerant software. We can reduce the amount of bugs by, say, doing more tests, but PMs don’t have time for tests. Great software exist, but it comes with a cost. A deadly cost of time. When you don’t have the luxury of time, you end up with “Ops, something went wrong” type of errors.
Software development depends on a lot of verbal agreements, that rarely are respected. It depends on a lot of assumptions. Assumptions that the network will be there, that the database will be able to update the record, and that the user won’t take the hard drive out of the computer while running our software. Most of these assumptions are legit. That’s why most of the software works most of the time. But what do we do when we can’t acquire a database connection because the machine has reached its memory limit? Do we retry again? For how long?
Software development is hard.