Ever write code that is dead simple to follow and makes you believe that not a thing can go wrong with it, only for it to simply fail, and you cannot figure out for the love of god what is the cause?
Anything that can go wrong will go wrong
- Murphy's Law
On a busy afternoon, I wrapped up writing a service that was expected to call an API. The API in question had already been developed and deployed, but was giving trouble when called. I could not come around to find a cause for the issue at the time of deployment so I deferred to deal with the problem for a later date - and that date had arrived. Having written the service I now had to focus my attention on the API and determine why it was not working.
It was one of those Hello World APIs - well, not a Hello World per se but it was definitely an API that had no complexity, was written in expressjs and had very few endpoints. The endpoint that was giving trouble took the request, checked for existence of a file in the request and handled it (through multer). The endpoint required authorization. Sounds dead simple right?
Apparently not. The API was hosted on Heroku. And each time we made a request to that endpoint, we got back Heroku's
Application Error in the response. Immediately we checked the logs and we could see an entry that read:
H18 Server request interrupted
The documentation for the H18 error did not make much sense to me. Socket connected... but got destroyed... response incomplete... What? Come again?
I then checked for other google results. Lots of them on Stackoverflow. This one caught my attention. But it asked me to ignore the error. Alright. Ignored. But I cannot look the other way - it was clear to me that this error had something to do with my API returning with the
Application Error message. Un-ignored!
Let's skip that SO answer. Back to Google. More scrolling. Found another SO answer. Hmm... error occurs when uploading a large file... But I am not uploading a large file. And in any case, based on my configurations for
multer, it will throw an error if I exceed the file size anyway. That can't be it...
But looks like a reasonable thought process - file is large, connection times out, H18 error received. This also goes in hand with Heroku's own explanation of the error - about the socket getting destroyed.
Alright - time to get dirty. Log all the things!! You get a log! You also get a log! You all get a log! I added as many logs as I could to get a sense of till which step the request reached. I even set the
DEBUG environment variable to
* to see the debug messages from all imported modules.
And then, I observed something. When I called the endpoint again, and traced the logs, it indicated that the request failed authorization. Ok. So the request was not authorized. Big deal. The API could have responded, as developed, with the 401 http status. Why didn't that happen?
And then, doing two plus two equals to four, it then it me! We had our code set up as follows:
// Step 1 - Check if the request has authorization. If it does not, quit. // Step 2 - If authorization exists, use multer to parse the file from the request
See where we went wrong?
multer, which processes the request to extract the file, was AFTER the authorization check. When authorization failed, the API would immediately respond back - however, if the file was still getting uploaded to the API, before
multer could lay its hands on it, the request would terminate - leaving the file upload hanging!
And that is why, we received the H18 errors. We fixed it by rearranging the code as follows:
// Step 1 - Use multer to parse the file from the request // Step 2 - Check if the request has authorization. If it does not, quit. If it does, proceed to use the file extracted by multer
And that fixed it! It took me the whole of the day and lots and lots of anxiety filled moments where I was not able to figure out what we were doing wrong, for me to finally find the cause of the issue and to fix it. It's a simple issue but the error messages were too generic and unhelpful. Thank god Stackoverflow and Google without which I would not have figured this out. We no longer got the H18 errors and the endpoint also successfully returned with the expected response.