The Heartbleed internet security bug has exposed just how vulnerable the world is to a digital disaster.
Networked computers have created trillions of dollars in new economic power, but the growing alarm over the Heartbleed internet security bug suggests that humanity may have a new kind of disaster to worry about: a global IT crash.
Although Arpanet, the precursor to the internet, was designed specifically to make it possible for the US military to maintain communications even if central communication hubs were knocked out in a nuclear war, experts say that the increasing reliance of many systems on single applications and their level of integration suggests that the vulnerability of our IT and communications systems is a real threat.
“The internet was created, theoretically and practically, with a very distributed set of servers and mechanisms, so that it would be resilient to such problems. The problem is that although that is correct, as you may notice, the applications which were built on top of the internet don’t share those characteristics,” says Graciela Chichilnisky, a professor of economics at Columbia University in New York who studies the odds of rare “black swan” events. If everyone is using the same email program, for instance, she says, or keeps their data on the same cloud, they share a point of vulnerability.
“It is certainly something that we need to worry about,” agrees Martyn Thomas, a British engineering expert who studies the security of large-scale IT systems.
The Heartbleed security breach is a good illustration of their point. Included in the 2012 release of OpenSSL, an open-source cryptography protocol used by many major internet sites, Heartbeat is a security vulnerability revealed publicly on April 7. Netcraft, a British security company, estimated that 17.5% of all secure sites could be unlocked through the Heartbleed vulnerability.
The architecture of the internet is also more vulnerable than generally believed, according to Thomas, who points to a February attack in which hackers used a flaw in the Network Time Protocol, which coordinates the timing of internet-connected devices, to mount attacks on an undisclosed site. “Someone’s got a big new cannon,” tweeted Matthew Prince, CEO of CloudFlare, an American internet security company, following the attack. “The start of ugly things to come.”
Thomas first became aware of the vulnerability of technology in the lead-up to the millennium, when he and many other IT experts feared that computers would go haywire when the year 99 turned to 00. Although the Y2K bug didn’t end up being a disaster, he argues that it was a near miss that should have taught the world a lesson about the vulnerabilities of networked technologies. “We found huge numbers of systems that would have failed had they not been fixed in the run up to that date change and this alerted me to the fact that there are these single points of failure in technical systems,” he recalls.
“I think the world ought to have taken the year 2000 as a warning event, a signal event, that we need to be careful about looking out for things that could go wrong that could cause widespread failure–either through some cascade problem or domino effect or a single point of failure–for a lot of different systems,” Thomas says.
Instead, since 2000, people have become more and more dependent on increasingly networked technology that has little redundancy built into it. “A big part of the problem is that redundancy looks like inefficiency and it gets optimized out by engineers,” he says. The result is that many systems that appear to be independent are in fact much more interconnected than they appear.
A 2011 study by the Royal Academy of Engineering led by Thomas found that the Global Positioning System (GPS) on which many data networks, financial systems, shipping and air transportation, agriculture, railways and emergency services now depend, is subject to 20 different potential kinds of vulnerability, including deliberate jamming and atmospheric interference due to a solar storm.
In the United Kingdom, the report concluded, many systems rely entirely on GPS with no real backup, “so a failure of the GPS signal could cause the simultaneous failure of many services that are probably expected to be independent of each other”.
Thomas, who led the study, said that in the United Kingdom the situation is now somewhat better, thanks to the addition of an earth-based navigational system for backup and more atomic clocks, but notes that most US navigation systems still rely on GPS, with no backup system.
The vulnerability of GPS isn’t an isolated case. Instead, Thomas says, it’s “just one example of the danger of an increasingly integrated society that is increasingly dependent on technology.”
In addition, under pressure to keep efficiency up and development costs down, software is often written with relatively little concern for quality or stability. Software development, Thomas says, “is not a profession in any real sense and it’s not an engineering discipline in any real sense yet.” A lot of software in the world is badly written, he says, or written in what he calls “toy languages” (in which category he includes the C programming language) and vulnerable to attack. Nor would these poorly written programs be easy to correct without starting from scratch: in the end, Thomas notes, programmers have known for 50 years that testing can only demonstrate the presence of bugs, not confirm their absence.
And those factors are just what the former US Defense Secretary Donald Rumsfeld might call “the known unknowns”: disasters can happen even when systems are entirely secure. Analysts who have studied the risks of complex systems note that even the most closely observed and managed system can never be entirely risk-free.
Just as the flap of a butterfly’s wing in the Pacific can supposedly lead to a storm in Chicago, theoretically at least, a spilled Coke in Stuttgart might stop trains in Beijing. Risk management experts have long argued that complex, tightly coupled systems almost inevitably break down.
In his 1984 book Normal Accidents: Living with High-Risk Technologies, Charles Perrow, a professor emeritus of sociology at Yale University and now a visiting professor at Stanford University who specializes in the inherent risks of complex systems, argues that disasters in complex, tightly coupled systems are inevitable for three reasons: people make mistakes, big accidents almost always escalate from small incidents, and many disasters stem not from the technology but from an organizational failure. Nor can engineering redundancy eliminate the risk, he wrote, because the redundancies add more complexity to the system, lead to a shirking of responsibility among workers, or to pressures to increase production speed.
While the basic protocol that runs the internet is arguably not tightly coupled–TCP/IP, the foundational program of what became the internet was kept deliberately simple–the systems that have grown up since then are quite interconnected, and very occasionally, do suffer from glitches. In the 2010 “flash crash”, for instance, a large US stock sale triggered a series of responses from high-speed computerized trading systems that sent the US stock markets down about 9% in five minutes, and drove the prices of some well known stocks down to as little as a penny and others as high as $100,000–and almost all these wild swings were driven by pre-programmed responses.
The market recovered a few minutes later, but problems with other high-tech systems–power grids, global positioning navigation–might not permit such a painless reset.
However, responding to these kinds of risks is not easy. For one thing, security is expensive. In a recent study that looked specifically at the risks of cyber theft and cyber attacks, a study by the World Economic Form and McKinsey & Co. (Risk and Responsibility in a Hyperconnected World) estimated that major technology trends such as massive analytics, cloud computing and Big Data could add between $9.6 and $21.6 trillion in value to the world economy, but if the sophistication of cyber-attackers outrun defenders’ capabilities, new regulations and more conservative corporate policies could shave off around $3 trillion in positive new economic value.
Even the concept of security is increasingly difficult to maintain in a networked world. The old-fashioned idea of security demanded a kind of isolation, the WEF/McKinsey analysts conclude, but it’s an approach that no longer makes economic sense. “This notion of security seems quaint in a world where it is impossible to draw a clean ring around the network of one country or one company, and where large organizations can be the target of 10,000 cyber attacks per day,” they write.
In a survey of 250 companies’ chief information security officers, McKinsey found that on average, few believe their companies are prepared: the typical security executive gives his company a C or C- grade on six of seven key measures institutions are using to reduce the potential for cyber attacks. Only in incident response and testing did they give themselves a C+. And most CIOs told McKinsey they don’t put their company’s most sensitive data on an IT cloud.
Nor are governments necessarily ready to cope. One expert on the sociology of crises, Patrick Lagadec, a Paris-based crisis management consultant and Senior Research Scientist at the Ecole Polytechnique in Paris, argues that governments are also unprepared for disasters that they have not seen before.
Generals are often ridiculed for preparing for the last war, but Lagadec believes other government officials are no better at coping with the unknown. “Government are sometimes better for inside-the-box incidents; but they are a war behind when it comes to surprises, the unknown, the unconventional,” says Lagadec, author of Navigating the Unknown, A practical lifeline for decision-makers in the dark.
However, Thomas and Chichilnisky both say governments could play a positive role by changing laws to catch up with the new realities of the networked world.
Thomas says that introducing more legal liability for damage caused by bad code could make software companies more careful.
When it comes to internet applications, Chichilnisky argues that there are only two ways to reduce the risk of a catastrophic internet failure: either to put limits on the usage of individual applications or to strengthen anti-trust laws.
Why anti-trust? Successful internet application businesses tend to have marginal costs that are near zero, she explains. That means winning companies win big, which not only concentrates network power into a few hands, but concentrates the risks of that network: if Google has a problem with Gmail, the half a billion or so people who use Gmail would also have a problem.
“Along with the economic issues and inefficiencies that monopolies create, which are standard and well known, there is also now the inefficiencies and damages caused by catastrophic risks, which market concentration leads to in this case. You need more than ever legislation that protects competitive markets,” she says.