NPM gets killed

November 12, 2018 3.3k views
Node.js Ubuntu 16.04

No matter what I do with NPM it gets killed.
I know, there are many reports about this and everybody says increasing RAM or SWAP should help.
But for some reason it is not in my case.
First: I increased RAM up to 3G and SWAP up to 2G and it didn't help.

deployer@staging:~/apps/naprok/releases/20181112091712$ /usr/bin/env npm audit fix
npm WARN deprecated browserslist@1.5.2: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
npm WARN deprecated text-encoding@0.6.4: no longer maintained
npm WARN deprecated browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
npm WARN deprecated nomnom@1.6.2: Package no longer supported. Contact support@npmjs.com for more info.
npm WARN deprecated circular-json@0.3.3: CircularJSON is in maintenance only, flatted is its successor.
Killed.............] \ fetchMetadata: sill resolveWithNewModule util-deprecate@1.0.2 checking installable status

deployer@staging:~/apps/naprok/releases/20181112091712$ /usr/bin/env npm install 
npm WARN deprecated browserslist@1.5.2: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
npm WARN deprecated text-encoding@0.6.4: no longer maintained
npm WARN deprecated browserslist@1.7.7: Browserslist 2 could fail on reading Browserslist >3.0 config used in other tools.
npm WARN deprecated nomnom@1.6.2: Package no longer supported. Contact support@npmjs.com for more info.
npm WARN deprecated circular-json@0.3.3: CircularJSON is in maintenance only, flatted is its successor.
Killed.............] / loadDep:yargs: sill resolveWithNewModule astral-regex@1.0.0 checking installable status

Second. I'm facing this on my staging server. But I also have production which has even less memory but it works without being killed all the time.

Thanks for any help!

1 comment
1 Answer

Hey friend!

These logs don't seem to indicate why the process died. Do you have any logs that do? Something like /var/log/syslog or /var/log/messages should be where you find OOM errors if it's memory. You'd also see a clear memory kill on the web console if you went there, if that was the cause.

Note that if it is memory, comparing to another server is a difficult thing to do. I know this is a point of confusion for a lot of people when they face this, but the simple reality of the internet is that no two public facing systems are ever truly the same. I'll give you a theoretical (but plausible) scenario to explain that:

IP 1.1.1.1 has no reputation of successfully receiving attacks in last 5 years, therefore is distributed on less attack lists than 1.1.1.2, which was successfully attacked in 2013 and therefore receives more automated attacks which causes elevated memory usage in public facing applications. That, then, causes IP 1.1.1.2 to see more memory usage than 1.1.1.1 despite having no differences in software stack. This theory is only one of many that you can't really know or prove/disprove, but is plausible based on things we know about how attackers work (and the fact that not all IPs are equal in attack frequency). Everything on the internet is under constant attack by someone, the only things that somewhat vary are the frequency of attacks, the attackers, and the reason for the attacks. This is why you can never truly compare one server to another unless you've compared all inbound traffic.

Moving beyond that, I wouldn't assume it's memory until you know for sure. If you do know for sure, then you have two options:

  1. Add more memory
  2. Build the application stack to better handle the load

I rarely recommend #1. If you're setting fireworks off in the living room and the ceiling catches fire, raising the ceiling seems like a bad solution to me. The fireworks just go higher, new ceiling catches fire too. To me, it makes more sense to better contain the fireworks and not let them reach the ceiling. That's a horrible illustration for actually reaching the conclusion, but I think you get where my brain is going.

If we know it's memory and this is a public facing application, use a caching layer that better handles memory usage. This reduces requests to the backend application, and the caching software probably handles itself better in releasing memory back to the system. Nginx is pretty much the top choice for this, just use it as a reverse proxy with caching features enabled.

Hope that helps :)

Jarland

  • Thanks Jarland for taking a look at the issue!

    Do you have any other options in mind how I can find out the reason of killing the process?
    /var/log/syslog didn't tell me anything, just some crontab related logs. No /var/log/messages at all. In a web console nothing new https://i.imgur.com/sTJ5NYb.png .

    Yesterday I played with droplets and I used 8Gigs droplet but it still didn't help.
    https://gist.github.com/eugenekorpan/08ab20bf2ee240a0c975c2d18793a121
    Before it was only 1Gig. I can continue playing with resizing of droplets but it doesn't make a lot of sense.

    Thanks!

  • As far as I understand this is the kernel logs
    https://gist.github.com/eugenekorpan/8678869524c0b9162cf61458909755b7

    But I can't see any killings.
    I couldn't find any info about why the process was killed.

    Any ideas?

    • Sorry for the delay. So good news and bad news. I'll get them out of the way. Good news: I don't think it's memory related. Bad news: I don't think it's memory related.

      The great part about memory issues is that they're easy to understand. So what we have here is a process dying with no clear cause, and no clear path to resolution. What I dislike the most is that this breaks from my area of expertise, so I'm hoping someone else might see this topic and weigh in on it.

      My feeling is that the application is going to be what logs the issue, but perhaps it needs deeper debugging built in to be able to catch and log whatever is causing it. That alone is a vague statement, as the nodejs interpreter isn't my area of expertise.

      I have a nodejs app that dies every few days as well, no idea why and I really just don't care that much because I restart it and continue on with my day (it isn't incredibly important). I say that just to say that I'm no stranger to them dying without clear reasoning that gives a path to resolution. If the interpreter can't output more data around why the app is dying, perhaps something like strace is the only remaining option:

      http://hokstad.com/5-simple-ways-to-troubleshoot-using-strace

Have another answer? Share your knowledge.