Thursday, January 29, 2015

Thoughts from Senate Testimony

Yesterday I testified to the Senate Homeland Security and Government Affairs committee at a hearing on Protecting America from Cyber Attacks: The Importance of Information Sharing. I'd like to share a few thoughts about the experience. You may find these comments helpful if you are asked to testify, or want to help someone testify, or want to influence the legislative process.

This was my fifth appearance at a government hearing. In 2012 I apepared before the U.S.-China Economic and Security Review Commission, and in 2013 I appeared before the Senate Armed Services Committee, the House Committee on Homeland Security, and the House Committee on Foreign Affairs.

The process starts with a request from committee staff. They asked if I would be available and willing to testify. If I decide to decline, they would generally not force me to appear. The exception would be some sort of adversarial hearing. On the contrary, this sort of hearing is intended to educate the legislators and the public about a certain topic.

Two days prior to the hearing I had to submit written testimony, available here as a PDF. Writing this document wasn't easy. The committee staff asked me to address specific questions about adversaries and threat intelligence. I had to strike a tone and write in a way that would be accessible to the Senators and staffers, while conveying the right information.

I spoke in one of the conference rooms in the Dirksen Senate office building. The location is open to the public, but you have to pass through a metal detector. There was room for about 100 people in the chamber. The attendees are a mix of press, staffers, and interested citizens, along with the witnesses and our colleagues.

The hearing starts when the chairman decides to begin. Senators and staffs enter and leave as they wish. Votes were happening during the hearing, so Senators leave to vote. A camera, shown in the lower left of the picture above, records the event and broadcasts it to the Senator's offices. They can watch remotely, in other words. A court stenographer seated in the well creates a transcript in real time.

As you can see in the picture at left, I had to raise my right hand and swear to tell the truth before the committee. This was the first time I had to do that. Chairman Johnson said it was a committee tradition.

This was the first hearing of the new Congress, and some of the members were new to the Committee. The Chairman instructed them on the order for asking questions. Each got 5 minutes.

Witnesses had 6 minutes each for opening statements. In front of each witness is a microphone and an old-school digital timer. When you have a minute left, the light changes from green to yellow. When your time ends, the clock starts counting up from zero, and the light changes to red.

I had my statement ready to go, but the first witness ended about 2 minutes early. This set a possible expectation that we would all have to finish early. I started crossing out sections of my statement in order to limit the time I needed to finish.

When I spoke, I kept to my script, but I added color for certain points based on what I heard earlier. I also emphasized a few points based on my sense of the Senators' interest level.

After all the witnesses spoke, we answered questions from the Senators. I thought they asked good questions. They tended to stick with the content of the hearing, namely information sharing. At other appearances I have fielded questions on many aspects of "cyber security." I think the legislators are making progress trying to understand the issues.

One issue I didn't mention in my statement involved the Computer Fraud and Abuse Act (CFAA). I thought of the CFAA based on reactions from the security community, mainly in blog posts and Tweets. Chairman Johnson asked what obstacles he should expect when trying to pass threat intelligence sharing legislation. I responded that there is a trust deficit in the security community. I thought that reform of the CFAA to address some of the security community's concerns would help build goodwill and reduce opposition to other security-themed legislation. I reinforced this point after the hearing when Senators Johnson and Carper spoke privately with the witnesses.

It is important to know that legislators aren't just interested in complaints about their proposals. They are much more likely to want suggested language to change the proposal. That is the best case for both parties.

Sometimes it's not possible to identify a legislative solution to a problem. Sometimes legislation is not appropriate. I made this point when I said that we didn't need greater penalties for "hacking." I think we need reformed hacking laws that are enforced. I also said that it's better for the government to focus on inherently government functions, like law enforcement, that are denied to the private sector.

If you have any questions, please post them here or ask via Twitter to @taosecurity.

Tuesday, January 27, 2015

How to Answer the CEO and Board Attribution Question

Elements of the Q Model of Attribution, by Thomas Rid and Ben Buchanan
Earlier today I Tweeted the following:

If you think CEOs & boards don't care about #attribution, you aren't talking to them or working w/them. The 1st question they ask is "who?"

I wrote this to convey the reality of incident response at the highest level of an organization. Those who run breached organizations want to know who is responsible for an intrusion.

As I wrote in Five Reasons Attribution Matters, your perspective on attribution changes depending on your role in the organization.

The question in the title of this blog post is, however, how does one answer the board? It's likely that the board and CEO will be asking the CIO or CISO "who." What should be the response?

My recommendation is to respond "how badly do you want to know?" Generally speaking, answering the attribution question is a function of the resources applied to the problem.

For example, I once performed an incident response for a Fortune 50 technology and retail company. They were so determined to identify the intruder that they hired former law enforcement officials, working as private investigators (PIs), to answer the question from the "physical world" perspective. In collaboration with local, federal, and foreign law enforcement officials, the PIs followed leads all the way to Romania. They performed surveillance on the suspect, interviewed his circle of associates, and eventually confirmed his involvement. Unfortunately for both the victim company and the perpetrator, the suspect disappeared. The suspect's family and friends believed that his "employer," an organized crime syndicate, decided the situation had gained too much publicity and that the suspect had become a liability.

The breached organization in my example decided to call in PIs and outside IR consultants once their annual loss rate exceeded $10 million. That was a CEO and board decision. The answer would affect how they conducted business, in a myriad of ways well outside that of IT or information security.

Clearly not every intrusion is going to merit PIs, IR consultants, international legal cooperation, and so on. However, some cases do merit that attention, and attribution can be done.

To more fully answer the question, I strongly recommend reading Attributing Cyber Attacks by Dr Thomas Rid and Ben Buchanan. They discuss the merits of attribution and the importance of communication, as depicted in their Q model.

I know some CEOs and board members read this blog. Other readers work in different capacities. Both points of view are relevant, as mentioned in my previous blog post. I hope this post helps those in the technical world to understand the thought process of those in the nontechnical world.

Saturday, January 24, 2015

The Next Version of testmyids.com

Longtime TaoSecurity Blog readers are likely to remember me mentioning www.testmyids.com. This is a Web site that returns nothing more than

uid=0(root) gid=0(root) groups=0(root)

This content triggers a Snort intrusion detection system alert, due to the signature

alert ip any any -> any any (msg:"GPL ATTACK_RESPONSE id check returned root"; content:"uid=0|28|root|29|"; fast_pattern:only; classtype:bad-unknown; sid:2100498; rev:8;)

You can see the Web page in Firefox, and the alert in Sguil, below.


A visit to this Web site is a quick way to determine if your NSM sensor sees what you expect it to see, assuming you're running a tool that will identify the activity as suspicious. You might just want to ensure your other NSM data records the visit, as well.

Site owner Chas Tomlin emailed me today to let me know he's adding some new features to www.testmyids.com. You can read about them in this blog post. For example, you could download a malicious .exe, or other files.

Chas asked me what other sorts of tests I might like to see on his site. I'm still thinking about it. Do you have any ideas?

Friday, January 23, 2015

Is an Alert Review Time of Less than Five Hours Enough?

This week, FireEye released a report titled The Numbers Game: How Many Alerts are too Many to Handle? FireEye hired IDC to survey "over 500 large enterprises in North America, Latin America, Europe, and Asia" and asked director-level and higher IT security practitioners a variety of questions about how they manage alerts from security tools. In my opinion, the following graphic was the most interesting:


As you can see in the far right column, 75% of respondents report reviewing critical alerts in "less than 5 hours." I'm not sure if that is really "less than 6 hours," because the next value is "6-12 hours." In any case, is it sufficient for organizations to have this level of performance for critical alerts?

In my last large enterprise job, as director of incident response for General Electric, our CIO demanded 1 hour or less for critical alerts, from time of discovery to time of threat mitigation. This means we had to do more than review the alert; we had to review it and pass it to a business unit in time for them to do something to contain the affected asset.

The strategy behind this requirement was one of fast detection and response to limit the damage posed by an intrusion. (Sound familiar?)

Also, is it sufficient to have fast response for only critical alerts? My assessment is no. Alert-centric response, which I call "matching" in The Practice of Network Security Monitoring, is only part of the operational campaign model for a high-performing CIRT. The other part is hunting.

Furthermore, it is dangerous to rely on accurate judgement concerning alert rating. It's possible a low or moderate level alert is more important than a critical alert. Who classified the alert? Who wrote it? There are a lot of questions to be answered.

I'm in the process of doing research for my PhD in the war studies department at King's College London. I'm not sure if my data or research will be able to answer questions like this, but I plan to investigate it.

What do you think?


Try the Critical Stack Intel Client

You may have seen in my LinkedIn profile that I'm advising a security startup called Critical Stack. If you use Security Onion or run the Bro network security monitoring platform (NSM), you're ready to try the Critical Stack Intel Client.

Bro is not strictly an intrusion detection system that generates alerts, like Snort. Rather, Bro generates a range of NSM data, including session data, transaction data, extracted content data, statistical data, and even alerts -- if you want them.

Bro includes an intelligence framework that facilitates integrating various sources into Bro. These sources can include more than just IP addresses. This Bro blog post explains some of the options, which include:

Intel::ADDR
Intel::URL
Intel::SOFTWARE
Intel::EMAIL
Intel::DOMAIN
Intel::USER_NAME
Intel::FILE_HASH
Intel::FILE_NAME
Intel::CERT_HASH

This Critical Stack Intel Client makes it easy to subscribe to over 30 threat feeds for the Bro intelligence framework. The screen capture below shows some of the feeds:



Visit intel.criticalstack.com and follow the wizard to get started. Basically, you begin by creating a Collection. A Collection is a container for the threat intelligence you want. Next you select the threat intelligence Feeds you want to populate your collection. Finally you create a Sensor, which is the system where you will deploy the threat intelligence Collection. When done you have an API key that your client will use to access the service.

I wrote a document explaining how to move beyond the wizard and test the client on a sensor running Bro -- either Bro by itself, or as part of the Security Onion NSM distro.

The output of the Critical Stack Intel Client will be new entries in an intel.log file, stored with other Bro logs.

If Bro is completely new to you, I discuss how to get started with it in my latest book The Practice of Network Security Monitoring.

Please take a look at this new free software and let me know what you think.

Thursday, January 22, 2015

Notes on Stewart Baker Podcast with David Sanger

Yesterday Steptoe and Johnson LLP released the 50th edition of their podcast series, titled Steptoe Cyberlaw Podcast - Interview with David Sanger. Stewart Baker's discussion with New York Times reporter David Sanger (pictured at left) begins at the 20:15 mark. The interview was prompted by the NYT story NSA Breached North Korean Networks Before Sony Attack, Officials Say. I took the following notes for those of you who would like some highlights.

Sanger has reported on the national security scene for decades. When he saw President Obama's definitive statement on December 19, 2014 -- "We can confirm that North Korea engaged in this attack [on Sony Pictures Entertainment]." -- Sanger knew the President must have had solid attribution. He wanted to determine what evidence had convinced the President that the DPRK was responsible for the Sony intrusion.

Sanger knew from his reporting on the Obama presidency, including his book Confront and Conceal: Obama's Secret Wars and Surprising Use of American Power, that the President takes a cautious approach to intelligence. Upon assuming his office, the President had little experience with intelligence or cyber issues (except for worries about privacy).

Obama had two primary concerns about intelligence, involving "leaps" and "leaks." First, he feared making "leaps" from intelligence to support policy actions, such as the invasion of Iraq. Second, he worried that leaks of intelligence could "create a groundswell for action that the President doesn't want to take." An example of this second concern is the (mis)handling of the "red line" on Syrian use of chemical weapons.

In early 2009, however, the President became deeply involved with Olympic Games, reported by Sanger as the overall program for the Stuxnet operation. Obama also increased the use of drones for targeted killing. These experiences helped the President overcome some of his concerns with intelligence, but he was still likely to demand proof before taking actions.

Sanger stated in the podcast that, in his opinion, "the only way" to have solid attribution is to be inside adversary systems before an attack, such that the intelligence community can see attacks in progress. In this case, evidence from inside DPRK systems and related infrastructure (outside North Korea) convinced the President.

(I disagree that this is "the only way," but I believe it is an excellent option for performing attribution. See my 2009 post Counterintelligence Options for Digital Security for more details.)

Sanger would not be surprised if we see more leaks about what the intelligence community observed. "There's too many reporters inside the system" to ignore what's happening, he said. The NYT talks with government officials "several times per month" to discuss reporting on sensitive issues. The NYT has a "presumption to publish" stance, although Sanger held back some details in his latest story that would have enabled the DPRK or others to identify "implants in specific systems."

Regarding the purpose of announcing attribution against the DPRK, Sanger stated that deterrence against the DPRK and other actors is one motivation. Sanger reported meeting with NSA director Admiral Mike Rogers, who said the United States needs a deterrence capability in cyberspace. More importantly, the President wanted to signal to the North Koreans that they had crossed a red line. This was a destructive attack, coupled with a threat of physical harm against movie goers. The DPRK has become comfortable using "cyber weapons" because they are more flexible than missiles or nuclear bombs. The President wanted the DPRK to learn that destructive cyber attacks would not be tolerated.

Sanger and Baker then debated the nature of deterrence, arms control, and norms. Sanger stated that it took 17 years after Hiroshima and Nagasaki before President Kennedy made a policy announcement about seeking nuclear arms control with the Soviet Union. Leading powers don't want arms control, until their advantage deteriorates. Once the Soviet Union's nuclear capability exceeded the comfort level of the United States, Kennedy pitched arms control as an option. Sanger believes the nuclear experience offers the right set of questions to ask about deterrence and arms control, although all the answers will be different. He also hopes the US moves faster on deterrence, arms control, and norms than shown by the nuclear case, because other actors (China, Russia, Iran, North Korea, etc.) are "catching up fast."

(Incidentally, Baker isn't a fan of deterrence in cyberspace. He stated that he sees deterrence through the experience of bombers in the 1920s and 1930s.)

According to Sanger, the US can't really discuss deterrence, arms control, and norms until it is willing to explain its offensive capabilities. The experience with drone strikes is illustrative, to a certain degree. However, to this day, no government official has confirmed Olympic Games.

I'd like to thank Stewart Baker for interviewing David Sanger, and I thank David Sanger for agreeing to be interviewed. I look forward to podcast 51, featuring my PhD advisor Dr Thomas Rid.

Thursday, January 15, 2015

FBI Is Part of US Intelligence Community

Are you surprised to learn that the FBI is officially part of the United States Intelligence Community? Did you know there's actually a list?

If you visit the Intelligence Community Web site at www.intelligence.gov, you can learn more about the IC. The member agencies page lists all 17 organizations.

The FBI didn't always emphasize an intelligence role. The Directorate of Intelligence appeared in 2005 and was part of the National Security Branch, as described here.

Now, as shown on the latest organizational chart, Intelligence is a peer with the National Security Branch. Each has its own Executive Assistant Director. NSB currently houses a division for Counterterrorism, a division for Counterintelligence, and directorate for Weapons of Mass Destruction.

You may notice that there is a Cyber Divison within a separate branch for "Criminal, Cyber, Response, and Services." If the Bureau continues to stay exceptionally engaged in investigating and countering cyber crime, espionage, and sabotage, we might see a separate Cyber branch at some point.

The elevation of the Bureau's intelligence function was a consequence of 9-11 and the Intelligence Reform and Terrorism Prevention Act of 2004.

If you want to read a book on the IC, Jeffrey Richelson publishes every few years on the topic. His sixth edition dates to 2011. I read an earlier edition, and noticed his writing is fairly dry.

Mark Lowenthal's book is also in its sixth edition. I was able to find my review of the fourth edition, if you want my detailed opinion.

In general these books are suitable for students and participants in the IC. Casual readers will probably not find them exciting enough. Reading them and related .gov sites will help keep you up to date on the nature and work of the IC, however.

With this information in mind, it might make more sense to some why the FBI acted both as investigator for recent intrusions and as a spokesperson for the IC.

Cass Sunstein on Red Teaming

On January 7, 2015, FBI Director James Comey spoke to the International Conference on Cyber Security at Fordham University. Part of his remarks addressed controversy over the US government's attribution of North Korea as being responsible for the digital attack on Sony Pictures Entertainment.

Near the end of his talk he noted the following:

We brought in a red team from all across the intelligence community and said, “Let’s hack at this. What else could be explaining this? What other explanations might there be? What might we be missing? What competing hypothesis might there be? Evaluate possible alternatives. What might we be missing?” And we end up in the same place.

I noticed some people in the technical security community expressing confusion about this statement. Isn't a red team a bunch of hackers who exploit vulnerabilities to demonstrate defensive flaws?

In this case, "red team" refers to a group performing the actions Director Comey outlined above. Harvard Professor and former government official Cass Sunstein explains the sort of red team mentioned by Comey in his new book Wiser: Getting Beyond Groupthink to Make Groups Smarter. In this article published by Fortune, Sunstein and co-author Reid Hastie advise the following as one of the ways to avoid group think to improve decision making:

Appoint an adversary: Red-teaming

Many groups buy into the concept of devil’s advocates, or designating one member to play a “dissenting” role. Unfortunately, evidence for the efficacy of devil’s advocates is mixed. When people know that the advocate is not sincere, the method is weak. A much better strategy involves “red-teaming.”

This is the same concept as devil’s advocacy, but amplified: In military training, red teams play an adversary role and genuinely try to defeat the primary team in a simulated mission. In another version, the red team is asked to build the strongest case against a proposal or plan. Versions of both methods are used in the military and in many government offices, including NASA’s reviews of mission plans, where the practice is sometimes called a “murder board.”

Law firms have a long-running tradition of pre-trying cases or testing arguments with the equivalent of red teams. In important cases, some law firms pay attorneys from a separate firm to develop and present a case against them. The method is especially effective in the legal world, as litigators are naturally combative and accustomed to arguing a position assigned to them by circumstance. A huge benefit of legal red teaming is that it can helpt clients understand the weaknesses of their side of a case, often leading to settlements that avoid the devastating costs of losing at trial.

One size does not fit all, and cost and feasibility issues matter. But in many cases, red teams are worth the investment. In the private and public sectors, a lot of expensive mistakes can be avoided with the use of red teams.

Some critics of the government's attribution statements have ignored the fact that the FBI took this important step. An article in Reuters, titled In cyberattacks such as Sony strike, Obama turns to 'name and shame', add some color to this action:

The new [name and shame] policy has meant wresting some control of the issue from U.S. intelligence agencies, which are traditionally wary of revealing much about what they know or how they know it.

Intelligence officers initially wanted more proof of North Korea's involvement before going public, according to one person briefed on the matter. A step that helped build consensus was the creation of a team dedicated to pursuing rival theories - none of which panned out.

If you don't trust the government, you're unlikely to care that the intelligence community (which includes the FBI) red-teamed the attribution case. Nevertheless, it's important to understand the process involved. The government and IC are unlikely to release additional details, unless and until they pursue an indictment similar to the one against the PLA and five individuals from Unit 61398 last year.

Thanks to Augusto Barros for pointing me to the new "Wiser" book.

Tuesday, January 13, 2015

Does This Sound Familiar?

I read the following in the 2009 book Streetlights and Shadows:
Searching for the Keys to Adaptive Decision Making by Gary Klein. It reminded me of the myriad ways operational information technology and security processes fail.

This is a long excerpt, but it is compelling.

== Begin ==

A commercial airliner isn't supposed to run out of fuel at 41,000 feet. There are too many safeguards, too many redundant systems, too many regulations and checklists. So when that happened to Captain Bob Pearson on July 23, 1983, flying a twin-engine Boeing 767 from Ottawa to Edmonton with 61 passengers, he didn't have any standard flight procedures to fall back on.

First the fuel pumps for the left engine quit. Pearson could work around that problem by turning off the pumps, figuring that gravity would feed the engine. The computer showed that he had plenty of fuel for the flight.

Then the left engine itself quit. Down to one engine, Pearson made the obvious decision to divert from Edmonton to Winnipeg, only 128 miles away. Next, the fuel pumps on the right engine went.

Shortly after that, the cockpit warning system emitted a warning sound that neither Pearson nor the first officer had ever heard before. It meant that both the engines had failed.

And then the cockpit went dark. When the engines stopped, Pearson lost all electrical power, and his advanced cockpit instruments went blank, leaving him only with a few battery-powered emergency instruments that were barely enough to land; he could read the instruments because it was still early evening.

Even if Pearson did manage to come in for a landing, he didn't have any way to slow the airplane down. The engines powered the hydraulic system that controlled the flaps used in taking off and in landing. Fortunately, the designers had provided a backup generator that used wind power from the forward momentum of the airplane.

With effort, Pearson could use this generator to manipulate some of his controls to change the direction and pitch of the airplane, but he couldn't lower the flaps and slats, activate the speed brakes, or use normal braking to slow down when landing. He couldn't use reverse thrust to slow the airplane, because the engines weren't providing any thrust. None of the procedures or flight checklists covered the situation Pearson was facing.

  Pearson, a highly experienced pilot, had been flying B-767s for only three months-almost as long as the airplane had been in the Air Canada fleet. Somehow, he had to fly the plane to Winnipeg. However, "fly" is the wrong term. The airplane wasn't flying. It was gliding, and poorly. Airliners aren't designed to glide very well-they are too heavy, their wings are too short, they can't take advantage of thermal currents. Pearson's airplane was dropping more than 20 feet per second.

Pearson guessed that the best glide ratio speed would be 220 knots, and maintained that speed in order to keep the airplane going for the longest amount of time. Maurice Quintal, the first officer, calculated that they wouldn't make it to Winnipeg. He suggested instead a former Royal Canadian Air Force base that he had used years earlier. It was only 12 miles away, in Gimli, a tiny community originally settled by Icelanders in 1875.1 So Pearson changed course once again.

Pearson had never been to Gimli but he accepted Quintal's advice and headed for the Gimli runway. He steered by the texture of the clouds underneath him. He would ask Winnipeg Central for corrections in his heading, turn by about the amount requested, then ask the air traffic controllers whether he had made the correct turn. Near the end of the flight he thought he spotted the Gimli runway, but Quintal corrected him.

As Pearson got closer to the runway, he knew that the airplane was coming in too high and too fast. Normally he would try to slow to 130 knots when the wheels touched down, but that was not possible now and he was likely to crash.

Luckily, Pearson was also a skilled glider pilot. (So was Chesley Sullenberger, the pilot who landed a US Airways jetliner in the Hudson River in January of 2009. We will examine the Hudson River landing in chapter 6.) Pearson drew on some techniques that aren't taught to commercial pilots. In desperation, he tried a maneuver called a slideslip, skidding the airplane forward in the way ice skaters twist their skates to skid to a stop.

He pushed the yoke to the left, as if he was going to turn, but pressed hard on the right rudder pedal to counter the turn. That kept the airplane on course toward the runway. Pearson used the ailerons and the rudder to create more drag. Pilots use this maneuver with gliders and light aircraft to produce a rapid drop in altitude and airspeed, but it had never been tried with a commercial jet. The slide-slip maneuver was Pearson's only hope, and it worked.

  When the plane was only 40 feet off the ground, Pearson eased up on the controls, straightened out the airplane, and brought it in at 175 knots, almost precisely on the normal runway landing point. All the passengers and the crewmembers were safe, although a few had been injured in the scramble to exit the plane after it rolled to a stop.

The plane was repaired at Gimli and was flown out two days later. It returned to the Air Canada fleet and stayed in service another 25 years, until 2008.2 It was affectionately called "the Gimli Glider."

The story had a reasonably happy ending, but a mysterious beginning. How had the plane run out of fuel? Four breakdowns, four strokes of bad luck, contributed to the crisis.

Ironically, safety features built into the instruments had caused the first breakdown. The Boeing 767, like all sophisticated airplanes, monitors fuel flow very carefully. It has two parallel systems measuring fuel, just to be safe. If either channel 1 or channel 2 fails, the other serves as a backup.

However, when you have independent systems, you also have to reconcile any differences between them. Therefore, the 767 has a separate computer system to figure out which of the two systems is more trustworthy. Investigators later found that a small drop of solder in Pearson's airplane had created a partial connection in channel 2. The partial connection allowed just a small amount of current to flow-not enough for channel 2 to operate correctly, but just enough to keep the default mode from kicking in and shifting to channel 1.

The partial connection confused the computer, which gave up. This problem had been detected when the airplane had landed in Edmonton the night before. The Edmonton mechanic, Conrad Yaremko, wasn't able to diagnose what caused the fault, nor did he have a spare fuel-quantity processor. But he had figured out a workaround. If he turned channel 2 off, that circumvented the problem; channel 1 worked fine as long as the computer let it.

The airplane could fly acceptably using just one fuel-quantity processor channel. Yaremko therefore pulled the circuit breaker to channel 2 and put tape over it, marking it as inoperative. The next morning, July 23, a crew flew the plane from Edmonton to Montreal without any trouble.

The second breakdown was a Montreal mechanic's misguided attempt to fix the problem. The Montreal mechanic, jean Ouellet, took note of the problem and, out of curiosity, decided to investigate further. Ouellet had just completed a two-month training course for the 767 but had never worked on one before. He tinkered a bit with the faulty Fuel Quantity Indicator System without success. He re-enabled channel 2; as before, the fuel gauges in the cockpit went blank. Then he got distracted by another task and failed to pull the circuit breaker for channel 2, even though he left the tape in place showing the channel as inoperative. As a result, the automatic fuel-monitoring system stopped working and the fuel gauges stayed blank.

  A third breakdown was confusion about the nature of the fuel gauge problem. When Pearson saw the blank fuel gauges and consulted a list of minimum requirements, he knew that the airplane couldn't be flown in that condition. He also knew that the 767 was still very new-it had first entered into airline service in 1982. The minimum requirements list had already been changed 55 times in the four months that Air Canada had been flying 767s. Therefore, pilots depended more on the maintenance crew to guide their judgment than on the lists and manuals.

Pearson saw that the maintenance crews had approved this airplane to keep flying despite the problem with the fuel gauges. Pearson didn't understand that the crew had approved the airplane to fly using only channel 1. In talking with the pilot who had flown the previous legs, Pearson had gotten the mistaken impression that the airplane had just flown from Edmonton to Ottawa to Montreal with blank fuel gauges. That pilot had mentioned a "fuel gauge problem." When Pearson climbed into the cockpit and saw that the fuel gauges were blank, he assumed that was the problem the previous pilot had encountered, which implied that it was somehow acceptable to continue to operate that way.

The mechanics had another way to provide the pilots with fuel information. They could use a drip-stick mechanism to measure the amount of fuel currently stored in each of the tanks, and they could manually enter that information into the computer. The computer system could then calculate, fairly accurately, how much fuel was remaining all through the flight.

In this case, the mechanics carefully determined the amount of fuel in the tanks. But they made an error when they converted that to weight. This error was the fourth breakdown.

Canada had converted to the metric system only a few years earlier, in 1979. The government had pressed Air Canada to direct Boeing to build the new 767s using metric measurements of liters and kilograms instead of gallons and pounds-the first, and at that time the only, airplane in the Air Canada fleet to use the metric system. The mechanics in Montreal weren't sure about how to make the conversion (on other airplanes the flight engineer did that job, but the 767 didn't use a flight engineer), and they got it wrong.

In using the drip-stick measurements, the mechanics plugged in the weight in pounds instead of kilograms. No one caught the error. Because of the error, everyone believed they had 22,300 kg of fuel on board, the amount needed to get them to Edmonton, but in fact they had only a little more than 10,000 kg-less than half the amount they needed.

  Pearson was understandably distressed by the thought of not being able to monitor the fuel flow directly. Still, the figures had been checked repeatedly, showing that the airplane had more fuel than was necessary. The drip test had been repeated several times, just to be sure.

That morning, the airplane had gotten approval to fly from Edmonton to Montreal despite having fuel gauges that were blank. (In this Pearson was mistaken; the airplane used channel 1 and did have working fuel gauges.) Pearson had been told that maintenance control had cleared the airplane.

The burden of proof had shifted, and Pearson would have to justify a decision to cancel this flight. On the basis of what he knew, or believed he knew, he couldn't justify that decision. Thus, he took off, and everything went well until he ran out of fuel and both his engines stopped.

== End ==

This story is an example that one cannot build "unhackable systems." I also believe this story demonstrates that operational and decision-based failures will continue to plague technology. It is no use building systems that theoretically "have no vulnerabilities" so long as people operate and make decisions based on use of those systems.

If you liked this post, I've written about engineering disasters in the past.

You can but the book which published this story at Amazon.com.