Security engineering: broken promises

The following draft excerpt comes from my upcoming book. Republished with permission of No Starch Press.


On the face of it, the field of information security appears to be a mature, well-defined, and an accomplished branch of computer science. Resident experts eagerly assert the importance of their area of expertise by pointing to large sets of neatly cataloged security flaws, invariably attributed to security-illiterate developers; while their fellow theoreticians note how all these problems would have been prevented by adhering to this year's hottest security methodology. A commercial industry thrives in the vicinity, offering various non-binding security assurances to everyone, from casual computer users to giant international corporations.


Yet, for several decades, we have in essence completely failed to come up with even the most rudimentary, usable frameworks for understanding and assessing the security of modern software; and spare for several brilliant treatises and limited-scale experiments, we do not even have any real-world success stories to share. The focus is almost exclusively on reactive, secondary security measures: vulnerability management, malware and attack detection, sandboxing, and so forth; and perhaps on selectively pointing out flaws in somebody else's code. The frustrating, jealously guarded secret is that when it comes to actually enabling others to develop secure systems, we deliver far less value than could be expected.


So, let's have a look at some of the most alluring approaches to assuring information security - and try to figure out why they fail to make a difference to regular users and businesses alike.

Flirting with formal solutions



Perhaps the most obvious and clever tool for building secure programs would be simply to algorithmically prove they behave just the right way. This is a simple premise that intuitively, should be within the realm of possibility - so why hasn't this approach netted us much?


Well, let's start with the adjective “secure” itself: what is it supposed to convey, precisely? Security seems like a simple and intuitive concept, but in the world of computing, it escapes all attempts to usefully specify it. Sure, we can restate the problem in catchy, yet largely unhelpful ways – but you know we have a problem when one of the definitions most frequently cited by practitioners is:


“A system is secure if it behaves precisely in the manner intended – and does nothing more.”


This definition (originally attributed to Ivan Arce) is neat, and vaguely outlines an abstract goal – but then tells very little on how to achieve it. It could be computer science - but in terms of specificity, it just as easily could be a passage in Victor Hugo's poem:


“Love is a portion of the soul itself, and it is of the same nature as the celestial breathing of the atmosphere of paradise.”


Now, one could argue that practitioners are not the ones to be asked for nuanced definitions - but, ask the same question to a group of academics, and they will deliver roughly the same. The following common academic definition traces back to Bell-La Padula security model, published back in the sixties (one of about a dozen attempts to formalize the requirements for secure systems - in this particular case in terms of a finite state machine – and one of the most notable ones):


“A system is secure if and only if it starts in a secure state and cannot enter an insecure state.”


Definitions along these lines are fundamentally true, of course, and may serve as a basis for dissertations, perhaps a couple of government grants; but in practice, any models built on top of them are bound to be nearly useless for generalized, real-world software engineering. There are at least three reasons for this:


  • There is no way to define desirable behavior of a sufficiently complex computer system: no single authority can spell out what the “intended manner” or “secure states” are supposed to be for an operating system or a web browser. The interests of users, system owners, data providers, business process owners, and software and hardware vendors, tend to differ quite significantly and shift rapidly – if all the stakeholders are capable and willing to clearly and honestly disclose them out to begin with. To add insult to injury, sociology and game theory suggest that computing a simple sum of these particular interests may not actually result in a satisfactory outcome; the dilemma, known as “the tragedy of the commons”, is central to many disputes over the future of the Internet.



  • Wishful thinking does not automatically map to formal constraints: even if a perfect high-level agreement of how the system should behave can be reached in a subset of cases, it is nearly impossible to formalize many expectations as a set of permissible inputs, program states, and state transitions – a prerequisite for almost every type of formal analysis. Quite simply, intuitive concepts such as “I do not want my mail to be read by others” do not translate to mathematical models particularly well - and vice versa. Several exotic approaches that let such vague requirements to be at least partly formalized exist, but they put heavy constraints on software engineering processes, and often result in rulesets and models far more complicated than the validated algorithms themselves – in turn, likely needing their own correctness to be proven... yup, recursively.



  • Software behavior is very hard to conclusively analyze: static analysis of computer programs to prove they would always behave in accordance to a detailed specification is a task that nobody managed to believably demonstrate in complex real-world scenarios (although as usual, limited success in highly constrained settings or with very narrow goals is possible). Many cases are likely to be impossible to solve in practice (due to computational complexity) – or even may turn out to be completely undecidable due to the halting problem.



Perhaps more frustrating than the vagueness and uselessness of these early definitions is that as decades fly by, little or no progress is made on coming up with something better; in fact, a fairly recent academic paper released in 2001 by the Naval Research Laboratory backtracks some of the earlier work, and arrives at a much more casual, enumerative definition of software security: one that explicitly disclaims it is imperfect and incomplete:


“A system is secure if it adequately protects information that it processes against unauthorized disclosure, unauthorized modification, and unauthorized withholding (also called denial of service). We say 'adequately' because no practical system can achieve these goals without qualification; security is inherently relative.”


The paper also provides a retrospective assessment of earlier efforts, and the unacceptable sacrifices made to preserve the theoretical purity of said models:


“Experience has shown that, on one hand, the axioms of the Bell-La Padula model are overly restrictive: they disallow operations that users require in practical applications. On the other hand, trusted subjects, which are the mechanism provided to overcome some of these restrictions, are not restricted enough. [...] Consequently, developers have had to develop ad hoc specifications for the desired behavior of trusted processes in each individual system.”


In the end, regardless of the number of elegant, competing models introduced, all attempts to understand and evaluate the security of real-world software using algorithmic foundations seem to be bound to fail. This leaves developers and security experts with no method to make authoritative statements about the quality of produced code. So, what are we left with?

Risk management



In absence of formal assurances and provable metrics, and given the frightening prevalence of security flaws in key software relied upon by modern societies, businesses flock to another catchy concept: risk management. The idea, applied successfully to the insurance business (as of this writing, with perhaps a bit less to show for in the financial world), simply states that system owners should learn to live with vulnerabilities that would be not cost-effective to address, and divert resources to cases where the odds are less acceptable, as indicated by the following formula:


risk = probability of an event * maximum loss


The doctrine says that if having some unimportant workstation compromised every year is not going to cost the company more than $1,000 in lost productivity, maybe they should just budget this much and move on – rather than spending $10,000 on additional security measures or contingency and monitoring plans. The money would be better allocated to isolating, securing, and monitoring that mission-critical mainframe that churns billing records for all customers instead.


Prioritization of security efforts is a prudent step, naturally. The problem is that when risk management is done strictly by the numbers, it does deceptively little to actually understand, contain, and manage real-world problems. Instead, it introduces a dangerous fallacy: that structured inadequacy is almost as good as adequacy, and that underfunded security efforts plus risk management are about as good as properly funded security work.


Guess what? No dice:



  • In interconnected systems, losses are not capped, and not tied to an asset: strict risk management depends on the ability to estimate typical and maximum cost associated with a compromise of a resource. Unfortunately, the only way to do it is to overlook the fact that many of the most spectacular security breaches in history started in relatively unimportant and neglected entry points, followed by complex access escalation paths, eventually resulting in near-complete compromise of critical infrastructure (regardless of any superficial compartmentalization in place). In by-the-numbers risk management, the initial entry point would realistically be assigned a lower weight as having low value compared to other nodes; and the internal escalation path to more sensitive resources would be likewise downplayed as having low probability of ever being abused.



  • Statistical forecasting does not tell you much about your individual risks: just because on average, people in the city are more likely to be hit by lightning than mauled by a bear, does not really mean you should bolt a lightning rod to your hat, but then bathe in honey. The likelihood of a compromise associated with a particular component is, on an individual scale, largely irrelevant: security incidents are nearly certain, but out of thousands exposed non-trivial resources, any resource could be used as an attack vector, and none of them is likely to see a volume of events that would make statistical analysis meaningful within the scope of the enterprise.



  • Security is simply not a sound insurance scheme: an insurance company can use statistical data to offset capped claims that might need to be paid across a large, well-studied populace, using the premiums collected from every participant; and to estimate reserves needed to deal with random events, such as sudden, localized surges in the number of claims, up to a chosen level of event probability. In such a setting, formal risk management works pretty well. In contrast, in information security, there is no meaningful way to measure how dangerous your current practices may be; no way to detect and estimate the impact of breaches when they occur in order to build a baseline; and no way to cleanly offset the costs of a breach with the value contributed by healthy assets.


Enlightenment through taxonomy



The two schools of thought discussed previously have something in common – both assume that it is possible to define security as a set of computable goals, and that the resulting unified theory of a secure system or a model of acceptable risk would then elegantly trickle down, resulting in an optimal set of low-level actions needed to achieve perfection in application design.


There is also the opposite approach preached by some practitioners – owing less to philosophy, and more to natural sciences: that much like Charles Darwin back in the day, by gathering sufficient amounts of low-level, experimental data, we would be able to observe, reconstruct, and document increasingly more sophisticated laws, until some sort of a unified model of a secure computing is organically arrived at.


This latter world view brings us projects like the Department of Homeland Security-funded Common Weakness Enumeration (CWE). In the organization's own words, the goal of CWE is to develop a unified “Vulnerability Theory”; to “improve the research, modeling, and classification of software flaws”; and “provide a common language of discourse for discussing, finding and dealing with the causes of software security vulnerabilities". A typical, delightfully baroque example of the resulting taxonomy may be:


Improper Enforcement of Message or Data Structure → Failure to Sanitize Data into a Different Plane → Improper Control of Resource Identifiers → Insufficient Filtering of File and Other Resource Names for Executable Content.


Today, there are about 800 names in this dictionary; most of them as discourse-enabling as the one quoted here.


A slightly different school of naturalist thought is manifested in projects such as the Common Vulnerability Scoring System (CVSS), a business-backed collaboration aiming to strictly quantify known security problems in terms of a set of basic, machine-readable parameters. A real-world example of the resulting vulnerability descriptor may be:


AV:LN / AC:L / Au:M / C:C / I:N / A:P / E:F / RL:T / RC:UR / CDP:MH / TD:H / CR:M / IR:L / AR:M


Given this 14-dimensional vector, organizations and researchers are expected to transform it in a carefully chosen, use-specific manner – and arrive at some sort of an objective, verifiable, objective conclusion about the significance of the underlying bug (say, “42”), precluding the need to more subjectively judge the nature of security flaws.


I may be poking gentle fun at their expense - but rest assured, I do not mean to belittle these CWE or CVSS: both projects serve noble goals, most notably giving a more formal dimension to risk management strategies implemented by large organizations (any general criticisms of certain approaches to risk management aside). Having said that, none of them yielded a grand theory of secure software yet - and I doubt such a framework is within sight.


[...end of excerpt...]

Feil igjen fra Slyngstad

Leder for oljefondets forvaltning, Yngve Slyngstad, fortsetter i dagens DN med feilinformasjon om såkalt passiv forvaltning. I følge Slyngstad ville et passivt fond måtte selge seg ut av greske statsobligasjoner i det sekund de nedgraderes og tas ut av indeksen.

Som jeg har nevnt før blir det ikke forbudt å tenke dersom en innfører passiv forvaltning. En passiv forvalter må naturligvis forsøke å være mest mulig effektiv. Å forsøke å selge seg ut av obligasjoner når det ikke finnes et marked er normalt ikke spesielt hensiktsmessig. Derfor burde heller ikke et passivt fond gjøre det.

Om NBIM mener dette egentlig er en aktiv strategi, så er det helt fint. Aktiv i den forstand at en passer på investeringene er bra. Aktiv i den forstand at en forsøker å plukke vinnerobligasjoner eller vinneraksjer er ikke spesielt lurt, noe de enorme tapene på obligasjonsporteføljen under finanskrisen viste. Fondet mente før 2008 at å geare obligasjonsporteføljen nesten to ganger var en kjempeidé, fordi man da kunne gamble på rentedifferanser. Lite visste vel Stortinget om at det som skulle være den sikre delen av porteføljen, ble brukt til ett gigantisk veddemål på én risikofaktor. Flaks har i ettertid hindret alt for store tap. Neste gang er det ikke sikkert man er like heldig. Aktiv forvaltning åpner for operasjonell risiko.

For at det ikke skal være noen som helst tvil om hva som er hensiktsmessig, så la meg være veldig konkret: Å selge greske statsobligasjoner når spreaden (differansen mellom kjøps og salgskurs) er så stor at markedet i praksis ikke fungerer er ikke hensiktsmessig. Dersom spreaden er akseptabel er det imidlertid ingen grunn til å holde på obligasjonene bare fordi de har falt i verdi. Årsaken til at verdipapirer faller i verdi er at markedet anser dem å være mindre verdifulle. Årsaken til at gjeldspapirer var så ”billige” under finanskrisen var en reell frykt for økonomisk kollaps. Det kan hende kreditorene til Hellas får 100% igjen på sine obligasjoner, men det er også meget sannsynlig at långiverne må akseptere gjeldssanering. Oljefondet bør altså kvitte seg med greske statsobligasjoner om de kan, men ikke til enhver spread.

Risikopremien for Oslo Børs 1915-2009

Jeg har fjernet dette innlegget, da det ville kreve endel arbeid å kvalitetssikre beregningene tilstrekkelig. Det er uansett andre som har gjord dette bedre, så dersom du er ute etter anslag på risikopremien i Norge, anbefaler jeg:

Elroy Dimson, Paul Marsh and Mike Staunton: "Credit Suisse Global Investment Returns Yearbook 2014"

Vulnerability databases and pie charts don't mix

There are quite a few extensive vulnerability databases in existence today. While their value in the field of vulnerability management is clear and uncontroversial, a relatively new usage pattern can also be seen: the data is being incorporated into high-level analyses addressed predominantly to executive audiences and the media to provide insight into the state of the security industry; threat reports from IBM and Symantec are good examples of this. Which vendor is the most responsive? Who has the highest number of high-risk vulnerabilities? These and many other questions are just begging to be objectively answered with a clean-looking pie chart.


Vulnerability researchers - the people behind the data points used - are usually fairly skeptical of such efforts; but their criticisms revolve primarily around the need to factor in bug severity, or the potential for cherry-picking the data to support a particular claim. These flaws are avoidable in a well-designed study. Are we good, then?


Well, not necessarily so. The most important problem is that today, for quite a few software projects, the majority of vulnerabilities is discovered through in-house testing - and the attitudes of vendors when it comes to discussing these findings publicly tend to vary. This has a completely devastating impact on the value of the analyzed data: vulnerability counting severely penalizes forthcoming players, benefits the more secretive ones, and places the ones who do not do any proactive work somewhere in between.


Consider this example from the browser world: in recent years, the folks over at Microsoft started doing a lot of in-house fuzzing, and have undoubtedly uncovered hundreds of security flaws in Internet Explorer and elsewhere. It appears to be their preference not to routinely discuss these problems, however - often silently targeting fixes for service packs or other cummulative updates instead. In fact, here's an anecdote: I reported a bunch of exploitable crashes to them in September 2009, only to see them fixed them without attribution in December that year. The underlying flaws were apparently discovered independently during internal cleanups. So be it: as long as bugs get fixed, we all benefit, and Microsoft is definitely working hard in this area.


Contrast this approach with Mozilla, another vendor doing a lot of successful in-house security testing (in part thanks to the amazing work of Jesse Ruderman). They are pretty forthcoming about their results, and announce internal, fuzzing-related fixes almost every month. Probably to avoid shooting themselves in the foot in vulnerability count tallies, they tend to report them cummulatively as crashes with evidence of memory corruption, however - and usually assign them a single CVE number to this every month. Again, sounds good.


Lastly, have a look at Chromium; several folks are fuzzing the hell out of this browser, too - but the project opts to track these issues individually, partly because the need to coordinate with WebKit developers - and each one of them ends up with a separate CVE entry. The result? Release notes often look like
this.


All these approaches have their merits - but how do you reconcile them for the purpose of vulnerability counting? And, is it fair to compare any of the above players with vendors who do not seem to be doing any proactive security work at all?


Well, perhaps the browser world is special; one could argue that at least some products with matching security practices must exist - and these cases should be directly comparable. Maybe, but the other problem is the quality of the databases themselves: recent changes to the vulnerability handling process, including the emergence of partial- or non-disclosure, the popularity of vulnerability trading, and the demise of centralized vulnerability discussion channels, all make it prohibitively difficult for database maintainers to reliably track issues through their lifetime. Common problems include:


  • The inability to fully understand what the problem actually is, and what severity it needs to be given. Database maintainers cannot be expected to be intimately familiar with every product, and need to process thousands of entries every year - but this often leads to vulnerability notes that may at first sight appear inaccurate, hard to verify, or very likely not worth classifying as security flaws at all.


  • The difficulty discovering how the disclosure process looked like, and how long the vendor needed to develop a patch. This is perhaps the most important metric to examine when trying to understand the performance of a vendor - yet one that is not captured, or captured very selectively and inconsistently, in most of the databases I am aware of.


  • The difficulty detecting the moment when a particular flaw is addressed - all the databases contain a considerable number of entries that
    were
    not
    updated
    to
    reflect
    patch
    status (apologies for Chrome-specific examples). There seems to be correlation between the prevalence of this problem and the mode in which vendor responses are made available to the general public. Furthermore, when a problem is not fixed in a timely manner, the maintainers of the database generally do not reach out to the vendor to investigate why: is the researcher's claim contested, or is the vendor simply sloppy? This very important distinction is lost.


Comparable problems apply to most other security-themed studies that draw far-fetching conclusions from simple numerical analysis of proprietary data. Pie charts don't immediately invalidate a whitepaper, but blind reliance on these figures warrants a closer investigation of the claims.