Last week, I moved my blog to Amazon EC2 in order to have more control of it (blogged here). I chose a Micro instance running Ubuntu Linux because Micro is the least expensive (2 cents per hour) and from my previous experience hosting various websites and services on Micro instances, I felt it was more than adequate to support my blog.
To my surprise, the results were disappointing. I noticed that when accessing my blog, it would sometimes pause for several seconds before serving the requested page. I spent several hours trying to figure out what was going on. When I looked in the Apache logs, I realized that the CPU spiked when my blog was crawled by one of several search engine bots. When a full crawl wasn’t happening, my blog was lightning fast but when one of the search engine bots crawled my site, my blog was nearly unresponsive until the bot finished. Obviously this was not acceptable.
While running top, I quickly discovered that when my site was being crawled, the CPU would spike, but the spike was not the typical user or system CPU. It was showing up as 97%st “CPU steal time”. Here’s a screenshot:
During the first 5 to 10 seconds of the bot crawling my site, everything was super fast, but then things would come to a near halt.
Steal time is a metric that only has meaning in a virtualized computing environment. It represents the amount of CPU that is taken away from a virtual machine to serve other purposes.
Amazon’s website describes Micro instances as follows: “Instances of this family provide a small amount of consistent CPU resources and allow you to burst CPU capacity when additional cycles are available. They are well suited for lower throughput applications and web sites that consume significant compute cycles periodically“.
What I’ve discovered is that when you tax the CPU more than a few seconds, you get throttled back almost to a standstill. I put together a brief video where I demonstrate this happening. In this video, I run a simple script that makes repeated calls to sysbench to tax the CPU for 2 seconds. After it runs for a few times, you’ll see an immediate and severe reduction in performance and you’ll see the steal time go to over 97% leaving very little left to run my job.
Advance to the 1:20 mark if you want to skip to the benchmark
I still think Micro instances are awesome, especially considering the cost, but I’ve now learned that they are not right for all applications. A simple 10-page website with only a few thousand hits a day will do just fine since it will only need CPU in short bursts. These burst are executed very fast, so it is a good fit for this type of application. However, if your application needs sustained good CPU performance, you’ll need to upgrade to a Small instance or higher.
The nice side effect of this whole experience is that my blog is very optimized. Pre-cached pages, minified HTML/JS/CSS and a well tuned DB makes a speedy blog!
Editorial: I think the extreme CPU throttling really limits the usefulness of Micro instances. The next step up in the Amazon EC2 world is a small instance but the price jump to a small instance is substantial. Originally I saw micro instances as a “foot in the door” to Amazon EC2. It was a way for someone to try the model and then grow into larger instances as the demand grows. It seems to me that having a model that basically shuts down the CPU after a few seconds of heavy use gives a lot of new users a bad experience, especially considering that Amazon does not clearly explain the model. I think Amazon needs to either modify the micro instance to not throttle CPU as severely, or offer a “tiny” instance that provides a good entry into the EC2 world.
NOTE: Since writing this article, I’ve joined the Google Cloud Platform team as the head of developer advocacy. Google Compute Engine is our equivalent to Amazon EC2, and we have a similarly named “micro instance” at a similar price (actually, it’s lower). I ran the same test on the Google instance and did not experience any stolen CPU behavior. It provides fairly consistent horsepower that is, in my opinion, better suited for low-end tasks or even low-traffic websites.