Linux kernel to blame for 'leap second' outage
A number of high-profile outages that took place last weekend can be traced back to how the Linux OS kernel mishandled a leap second added to the official time, charges the CTO of DataStax, a company that manages the open source Cassandra database.
"Initial reporting often fingered Java or even Cassandra as the culprit ... but the actual problem was a kind of livelock in the Linux system calls responsible for timers," wrote DataStax CTO and Cassandra creator Jonathan Ellis, in a blog post.
On Saturday midnight Greenwich Mean Time (GMT), an extra second was added to the Universal Coordinated Time (UTC), the official time used to coordinate servers across the Internet. Although the Network Time Protocol (NTP), the most widely used mechanism to synchronize the time across the Internet, was designed to handle leap seconds, a number of popular Internet services briefly went offline after the second was inserted in their servers, including those running Reddit, LinkedIn and the Quantas airline reservation system.