National broadband outage caused by router bugBy Rick Burgess 11 comments
A vast number of Time Warner Cable customers arose from bed this Monday morning to discover that their Internet connections were down. Other ISPs have been affected as well, but TWC customers are clearly the loudest.
Time Warner Help tweeted about the problem this morning, saying, "We appear to be recovering from a large but brief internet outage affecting most of our service areas. Please attempt to connect again".
Reportedly, Time Warner cable subscribers as far West as Texas and as far East as New York may have experienced a network hiccup this morning which has been described as a "national outage". While Time Warner Cable clearly was affected, there have also been reports of other ISPs who suffered the same issues.
So, what was the reason? Level 3, a major backbone provider for Internet, just issued an official statement:
"Shortly after 9 am ET today, Level 3's network experienced several outages across North America and Europe relating to some of the routers on our network," the company said. "Our technicians worked quickly to bring systems back online. At this time, all connection issues have been resolved, and we are working hard with our equipment vendors to determine the exact cause of the outage and ensure all systems are stable."
Although sparse on details, some unofficial explanations have been shared through the likes of social media by affected companies. It appears there is a bug in Juniper Communications' JUNOS which can cause the kernel to crash. JUNOS is found in many enterprise-level network products, including core switches and routers.
The potential for this bug to rear its ugly head was first reported over a year ago. Juniper scrambled and issued a fix to the problem immediately. The fact that anything happened at all a year later may imply that some core network switches and routers simply were not updated. Interestingly, the nature of the bug allows a hacker to potentially exploit it, but no details were given on why it happened today.
Along with Level 3's admitted difficulties, there are reports of many such issues across a variety of backbones and ISPs. At the moment, Level 3 is fully operational, but there have been and may continue to be connectivity issues throughout the day.
Update: nLayer's CEO contacted me last night to correct my information. The company had five out of about 70 routers that were affected and connectivity between the U.S and Europe was mostly unaffected. He described the issue as having a "minor impact"