The Hug of Death

2:30PMAugust 27 2020Daniel Tompkins

Archive

If a server crashes in the woods, does it make a sound?

This blog re­cently re­ceived some high traffic on the Signs of Life post after I shared the pro­ject on Hacker News . After every­thing was said and done, my an­a­lytics re­ported some­thing like 8000+ unique vis­i­tors over a couple of days.

Urban Dic­tio­nary de­fines the hug of death as:

An ac­ci­dental DDoS-like ef­fect caused when a web­site sud­denly gains pop­u­larity (usu­ally via Reddit), causing a huge amount of traffic.

There are a few other names for this net­work phe­nom­enon. Gener­i­cally, or in net­work sci­ence, it's known as a "flash crowd". In the 90's and early 2000s, it was often re­ferred to as "slash­dot­ting", or the Slashdot ef­fect — after slashdot.org, a once-pop­ular tech news forum.

In the same et­y­mo­log­ical vein, other early so­cial sites— fo­rums and ag­gre­ga­tors— took on the "hug of death" verbage. New and small blog­gers were "Farked", "Drudged", and "Bo­ing­Bo­ing'd"— their sites crushed to death by the thou­sands of con­cur­rent re­quests pouring in from these high-traffic nodes.

Meme of anime guy seeing butterfly, except it's asking if spending time on a personal project is worth it.

Most people trying to ac­cess my pro­ject were likely get­ting 503 Ser­vice Un­avail­able as an error re­sponse. Ap­par­ently the number of re­quests was be­yond what my Dig­i­talO­cean server was ca­pable of han­dling.

I'm on the second cheapest Dig­i­talO­cean "Droplet" plan, which pro­vides a server with 2GB RAM and 1 vir­tual CPU. That's al­ready $10/​month for some­thing that doesn't nec­es­sarily give me any ROI, so I wouldn't be thrilled to up­grade to a higher plan.

What other op­tions do I have using FOSS to combat the hug of death?

Con­tent Op­ti­miza­tion

The de­ci­sion to use LAMP (Linux, Apache, MySQL, PHP) for loosed was nothing less than ar­bi­trary. So, I think it's high-time to eval­uate this de­ci­sion and find out more about how dif­ferent servers handle traffic.

There are a few key methods to cut­ting down re­sponse time (that I know of):

  • Com­pressing, bundling and caching static as­sets,
  • mini­fying css and javascript (or negating to use client-side JS al­to­gether),
  • dis­trib­uting static con­tent and caches through a con­tent-de­livery net­work (CDN),
  • up­grading a server's RAM or CPU,
  • using sec­ondary servers, or load-bal­ancers, to dis­tribute re­quests

These op­tions are all well and good. I minify JS. I com­press, lazy­load and cache im­ages. I do my best to use fewer third-party scripts (I could do a lot better). For someone who's hosting a site without ads or rev­enue, though, I'm not re­ally pre­pared to pay for a CDN provider (maybe a free one?), ad­di­tional server re­sources, or ad­di­tional servers...

My web­site isn't a com­plete mess, but there must be a course of ac­tion to pre­vent fur­ther server over­load without paying an ad­di­tional $20-50/​month.

Basic Se­cu­rity

First, I'll share a little list of "Anti-over­load tech­niques" from Wikipedia ...

To par­tially over­come above av­erage load limits and to pre­vent over­load, most pop­ular web sites use common tech­niques like:

Man­aging net­work traffic, by using:

  • Fire­walls to block un­wanted traffic coming from bad IP sources or having bad pat­terns
  • HTTP traffic man­agers to drop, redi­rect or rewrite re­quests having bad HTTP pat­terns
  • Band­width man­age­ment and traffic shaping, in order to smooth down peaks in net­work usage

In the past, I've had some is­sues with Tor-gateway bots sub­mit­ting sub­scrip­tion re­quests. When you sub­scribe to my blog, you re­ceive a con­fir­ma­tion email— which I pay for!

Luckily, the Mailgun plan I'm on (oops— not any­more, switched to Plunk ) has a min­imum number of sent emails per month be­fore I'm charged; so I was able to black­list the of­fending IPs and fix the issue be­fore it re­ally hurt.

Not all traffic coming through a Tor gateway is harmful, and I en­courage people to use tools to pro­tect your anonymity on­line. How­ever, if you run a web­site that's re­ceiving spam traffic from IPs leading back to a Tor gateway, I rec­om­mend having a look at dan.me.uk .

This is a web­site run by "Dan" (not me) and has many useful tools ded­i­cated to pre­venting ma­li­cious traffic from in­fa­mously of­fending IPs.

An­other (more dif­fi­cult) op­tion is to an­a­lyze re­quests in re­al­time and sever con­nec­tions that ap­pear to de­viate from the mean. This, of course, can con­sume ad­di­tional server re­sources and re­quires some thought to pre­vent cut­ting off ac­tual, honest pa­trons.

If you don't have the prac­tical knowl­edge to im­ple­ment some­thing like that, but you're run­ning an Apache server (like me), there's mod­_qos . It's a "Quality of Ser­vice" Apache mod which es­sen­tially does what I just de­scribed. Not sure how it might af­fect benev­o­lent load testing...

Caching

More ad­vice from our Wikipedia lesson:

  • De­ploying web cache tech­niques
  • Using dif­ferent do­main names or IP ad­dresses to serve dif­ferent (static and dy­namic) con­tent by sep­a­rate web servers, e.g.:
    • im­ages.ex­ample.com
    • ex­ample.com

Right now I have the most basic caching pos­sible— just a header spec for im­ages, css, js, and fonts in my .htac­cess file:

.htaccess
<filesMatch ".(css|webm|jpg|jpeg|png|gif|js|ico|svg|woff|woff2|ttf|otf|js)$"> Header set Cache-Control "max-age=31536000, public" </filesMatch>
xml

Using a CDN for a most of these as­sets might help me a lot. In­stead of re­questing a dozen .gif files from my server, it would free up band­width and po­ten­tially cut down round-trip time by using a server closer to the client.

On the other hand, if Dig­i­talO­cean is up— but Cloud­flare's ser­vices are down, it could be just as bad. How­ever, my cur­rent setup clearly has room for im­prove­ment in this de­part­ment.

Load-Bal­ancing and Hard­ware

  • Using dif­ferent do­main names or com­puters to sep­a­rate big files from small and medium-sized files; the idea is to be able to fully cache small and medium-sized files and to ef­fi­ciently serve big or huge (over 10 – 1000 MB) files by using dif­ferent set­tings
  • Using many in­ternet servers (pro­grams) per com­puter, each one bound to its own net­work card and IP ad­dress
  • Using many in­ternet servers (com­puters) that are grouped to­gether be­hind a load bal­ancer so that they act or are seen as one big web server
  • Adding more hard­ware re­sources (i.e. RAM, disks) to each com­puter

I have a pretty wimpy server. I'm cu­rious as to whether or not a con­tainer cluster of mir­rored servers / dis­trib­uted ser­vices would per­form better than a single server. If each con­tainer has a unique sub­ad­dress would that free up re­quests? Would the amount of RAM and pro­cessing it takes make up the dif­fer­ence?

My tech­nical knowl­edge with Docker swarms and Ku­ber­netes clus­ters is pretty in­fan­tile; but I think you're still bot­tle­necked by whichever con­tainer is man­aging the re­quests among the re­mainder. So if you're run­ning on a single VPS it's not worth it. I'll do some extra re­search on this to be sure I'm not missing out; or if you know better, please leave a com­ment.

As for ad­di­tional hard­ware re­sources, the goal of this deep-dive is to make im­prove­ments while keeping costs at a min­imum. The next sec­tion is re­ally where I'm hoping to find some an­swers.

Server Soft­ware

  • Tuning OS pa­ra­me­ters for hard­ware ca­pa­bil­i­ties and usage
  • Using more ef­fi­cient com­puter pro­grams for web servers, etc.
  • Using other workarounds, es­pe­cially if dy­namic con­tent is in­volved"

So, pre­tending we didn't see that last vague and un­helpful tip... let's take a look at some pop­ular Web servers and how they handle re­quests.

Apache2

In de­vel­op­ment since 1995, Apache2 is the old man in the family of Web servers. De­vel­op­ment of the orig­inal HTTP Web server, HTTPd , had begun only 5 years ear­lier by Tim Berners-Lee at CERN and was later taken up by the World­WideWeb Con­sor­tium (w3c).

Per­haps as a re­sult of its age and ma­tu­rity, Apache2 is a stable and se­cure Web server op­tion— and is the most widely used server soft­ware (fol­lowed closely by Nginx, as of today).

Statistics taken from W3Techs, August 24, 2020.
Statistics taken from W3Techs | Aug 24, 2020

It's in­ter­esting to note an­other sta­tistic from W3Techs (below) which shows that Node.js is used by more high traffic sites— fol­lowed by Nginx, then Apache.

Statistics taken from W3Techs, August 24, 2020.

Hmmm... So more web­sites are using Apache— but, more high traffic sites are using Node. What's up with that? Are de­vel­opers just too stub­born to switch to Node? Are high-traffic sites using Node be­cause it's ac­tu­ally better at han­dling high-traffic, or is this a chicken-and-egg sit­u­a­tion...?

To be frank, I don't have a soft­ware en­gi­neering de­gree. Most of my pro­gram­ming and sysadmin knowl­edge comes from Team Tree­house , learning from friends and from my own hobby pro­jects; but I'm going to go ahead and try to shed some light on this. Why not.

One thing I found is the Apache Bench­mark (ab) tool which is useful for load-testing. There's a good write-up by Pete Fre­itag on his blog .

The basic usage is as fol­lows:

ab -n 100 -c 10 http://www.yahoo.com/
bash

Where the "-n" ar­gu­ment is the number of re­quests and "-c" is the number of con­cur­rent re­quests. This can be tweaked with other flags to pro­duce the most ac­cu­rate re­sults, which will look some­thing like this (hope­fully, less abysmal):

Concurrency Level: 1000 Time taken for tests: 69.959 seconds Complete requests: 1000 Failed requests: 0 Total transferred: 30738000 bytes HTML transferred: 30557000 bytes Requests per second: 14.29 [#/sec] (mean) Time per request: 69959.429 [ms] (mean) Time per request: 69.959 [ms] (mean, across all concurrent requests) Transfer rate: 429.07 [Kbytes/sec] received
ansi

The other main tweak I've found for Apache is con­fig­uring the multi-pro­cessing module (MPM). Apache pro­vides three main MPMs: pre­fork, worker, and event.

I found that Dig­i­talO­cean de­faults to "pre­fork" with it's LAMP-stack base server image. After re­viewing a lot of in­for­ma­tion and sug­ges­tions on­line, this seems like a poor choice. I ended up using this Stack­Ex­change thread to switch to mp­m_event (with php-fpm).

Once I made the switch, I went in to fuss around with the config file— which, for me, was at:

/etc/apache2/mods-enabled/mpm_event.conf
shell

I was hard-pressed to find some dumbed-down doc­u­men­ta­tion for set­ting these config values; but I used the Apache Bench­mark tool as I was ad­justing them to get the best RPS score.

So, not great, but at least that's some im­prove­ment! I man­aged to speed it up to ~100 RPS at 5000 re­quests with 20 con­cur­rent re­quests.

Nginx and Node

An­other factor that could be sig­nif­i­cantly skewing the above-men­tioned Web server stats is the fact that Node uses HTTP by de­fault and is often paired with a re­verse proxy in order to se­curely serve HTTPS. I see a lot of Node devs using Nginx for this pur­pose. How­ever, there are pure JavaScript al­ter­na­tives like Red­bird .

CPUs0 KB1 KB10 KB100 KB
1145,55174,09154,68433,125
2249,293131,466102,06962,554
4543,061261,269207,84888,691
81,048,421524,745392,15191,640
162,001,846972,382663,92191,623
323,019,1821,316,362774,56791,640
363,298,5111,309,358764,74491,655
The table above shows some results from an official Nginx performance test. It shows the requests per second (RPS) on an NGINX server with up to 100KB per request.

Out of the box, Nginx pro­duces some in­cred­ible RPS speeds. An­other lesson this demon­strates is that the best so­lu­tion to pre­venting server over­load is al­ways keeping your re­quests to a min­imum. The falloff from 0KB-100KB is an as­tro­nom­ical 110,000+ re­quests.

I'm sure anyone reading this must be loving the hypocrisy since this page alone has some­thing like 40+ re­quests, equaling about 500KB of data. Do I re­ally need com­ments on every page? Ex­tra­neous .svg's and .gif's... ? No. But I must have some kind of com­pul­sion dis­order, be­cause I re­ally can't stop my­self.

If you're looking for a sim­ilar tool to Apache Bench­mark, I found a post on yld.io/​blog that walks through using wrk2 — an­other load-testing ap­pli­ca­tion written in C.

Con­clu­sion

This is hon­estly a whole new world to me as someone who's started off mainly doing front-end and other sur­face-level pro­gram­ming. I'm stoked to be learning about the dif­ferent tools for load-testing and ways to avoid having the server shit the bed. If you think I'm way out of my league and I have no idea what I'm talking about, you win the prize.

This was all I could come up with right now, but I'll def­i­nitely be doing more re­search and (slowly) trying to figure out how to keep my posts soaked in rich media. Let me know if you have any tips or ex­pe­ri­ence that's useful to people dip­ping their toes in sysadmin and de­vops!