Cookies were devised by Lou Montulli, a Netscape engineer, somewhere in 1994. Lou outlined his original design in a minimalistic, four-page proposal posted on netscape.com; based on that specification, the implementation shipped in their browser several months later - and other vendors were quick to follow.
It wasn't until 1997 that the first reasonably detailed specification of the mechanism has been attempted: RFC 2109. The document captured some of the status quo - but confusingly, also tried to tweak the design, an effort that proved to be completely unsuccessful; for example, contrary to what is implied by this RFC, most browsers do not support multiple comma-delimited NAME=VALUE
pairs in a single Set-Cookie
header; do not recognize quoted-string
cookie values; and do not use max-age
to determine cookie lifetime.
Three years later, another, somewhat better structured effort to redesign cookies - RFC 2965 - proved to be equally futile. Meanwhile, browser vendors tweaked or extended the scheme in their own ways: for example, around 2002, Microsoft unilaterally proposed httponly cookies as a security mechanism to slightly mitigate the impact of cross-site scripting flaws - a concept quickly, if prematurely, embraced by the security community.
All these moves led to a very interesting situation: there is simply no accurate, official account of cookie behavior in modern browsers; the two relevant RFCs, often cited by people arguing on the Internet, are completely out of touch with reality. This forces developers to discover compatible behaviors by trial and error - and makes it an exciting gamble to build security systems around cookies in the first place.
In any case - well-documented or not, cookies emerged as the canonical solution to an increasingly pressing problem of session management; and as web applications have grown more complex and more sensitive, the humble cookie caught the world by storm. With it, came a flurry of fascinating security flaws.
They have Internet over there, too?
Perhaps the most striking issue - and an early sign of trouble - is the problem of domain scoping.
Unlike the more pragmatic approach employed for JavaScript DOM access, cookies can be set for any domain of which the setter is a member - say, foo.example.com
is meant to be able to set a cookie for *.example.com
. On the other hand, allowing example1.com
to set cookies for example2.com
is clearly undesirable, as it allows a variety of sneaky attacks: denial of service at best, and altering site preferences, modifying carts, or stealing personal data at worst.
To that effect, the RFC provided this elegant but blissfully naive advice:
"Only hosts within the specified domain can set a cookie for a domain and domains must have at least two (2) or three (3) periods in them to prevent domains of the form: ".com", ".edu", and "va.us". Any domain that fails within one of the seven special top level domains listed below only require two periods. Any other domain requires at least three. The seven special top level domains are: "COM", "EDU", "NET", "ORG", "GOV", "MIL", and "INT".
Regrettably, there are at least three glaring problems with this scheme - two of which should have been obvious right away:
- Some country-level registrars indeed mirror the top-level hierarchy (e.g.
example.co.uk
), in which case the three-period rule makes sense; but many others allow direct registrations (e.g.,example.fr
), or permit both approaches to coexist (say,example.jp
andexample.co.jp
). In the end, the three-period rule managed to break cookies in a significant number of ccTLDs - and consequently, most implementations (Netscape included) largely disregarded the advice. Yup, that's right - as a result, you could set cookies for*.com.pl
. - The RFC missed the fact that websites are reachable by means other than their canonical DNS names; in particular, the rule permitted a website at
http://1.2.3.4/
to set cookies for*.3.4
, or a website athttp://example.com.pl./
to set a cookie for*.com.pl.
- To add insult to injury, Internet Assigned Numbers Authority eventually decided to roll out a wide range of new top-level domains, such as
.biz
,.info
, or.jobs
- and is now attempting to allow arbitrary gTLD registrations. This last step promises to be a yet another nail to the coffin of sane cookie management implementations.
Net effect? All mainstream browsers had a history of embarrassing bugs in this area - and now ship with a giant, hairy, and frequently-updated lists of real-world "public suffix" domains for which cookies should not be set - as well as an array of checks to exclude non-FQDN, IPs, and pathological DNS notations of all sorts.
8K ought to be enough for for anybody
To make denial-of-service attacks a bit harder, it is well-understood that most web servers limit the size of a request they are willing to process; these limits are very modest - for example, Apache rejects request headers over 8 kB, while IIS draws the line at 16 kB. This is perfectly fine under normal operating conditions - but can be easily exceeded when the browser is attempting to construct a request with a lot of previously set cookies attached.
The specification neglected this possibility, offered no warning to implementators, and proposed no discovery and resolution algorithm. In fact, it mandated minimal jar size requirements well in excess of the limits enforced by HTTP servers:
"In general, user agents' cookie support should have no fixed limits. They should strive to store as many frequently-used cookies as possible. Furthermore, general-use user agents should provide each of the following minimum capabilities [...]:
* at least 300 cookies
* at least 4096 bytes per cookie (as measured by the size of the characters that comprise the cookie non-terminal in the syntax description of the Set-Cookie header)
* at least 20 cookies per unique host or domain name"
As should be apparent, the suggested minimum - 20 cookies of 4096 bytes each - allows HTTP request headers to balloon up to the 80 kB boundary.
Does this matter from the security perspective? At first sight, no - but this is only until you realize that there are quite a few popular sites that rely on user-name.example.com
content compartmentalization; and that any malicious user can set top-level cookies to prevent the visitor from ever being able to access any *.example.com
site again.
The only recourse domain owners have in this case is to request their site to be added to the aforementioned public suffix list; there are quite a few entries along these lines there already, including operaunite.com
or appspot.com
- but this approach obviously does not scale particularly well. The list is also not supported by all existing browsers, and not mandated in any way for new implementations.
"Oh, please. Nobody is actually going to depend on them."
In the RFC 2109 paragraph cited earlier, the specification pragmatically acknowledged that implementations will be forced to limit cookie jar sizes - and then, confusingly demanded that no fixed limits are put in place, yet specified minimum limits that should be obeyed by implementators.
What proved to be missing is any advice on a robust jar pruning algorithm, or even a brief discussion of the security considerations associated with this process; any implementation that enforces the recommended minimums - 300 cookies globally, 20 cookies per unique host name - is clearly vulnerable to a trivial denial-of-service attack: the attacker may use wildcard DNS entries (a.example.com
, b.example.com
, ...), or even just a couple of throw-away domains, to exhaust the global limit, and have all sensitive cookies purged - kicking the user out of any web applications he is currently logged into. Whoops.
It is worth noting that given proper warning, browser vendors would not find it significantly more complicated to structure the limits differently, enforce them on functional domain level, or implement pruning strategies other than FIFO (e.g., taking cookie use counters into account). Convincing them to make these changes now is more difficult.
While the ability to trash your cookie jar is perhaps not a big deal - or rather, the ability for sites to behave disruptively is also poorly mitigated on HTML or JavaScript level, making this a boring topic - the weakness has special consequences in certain contexts; see next section for more.
Be my very special cookie
Two special types of HTTP cookies are supported by all contemporary web browsers:
secure
, sent only on HTTPS navigation (protecting the cookie from being leaked to or interfered by rogue proxies); and httponly
, exposed only to HTTP servers, but not visible to JavaScript (protecting the cookie against cross-site scripting flaws).
Although these ideas appear to be straightforward, the way they were specified implicitly allowed a number of unintended possibilities - all of which, predictably, plagued web browsers through the years. Consider the following questions:
- Should JavaScript be able to set
httponly
cookies viadocument.cookie
? - Should non-encrypted pages be able to set
secure
cookies? - Should browsers hide jar-stored
httponly
cookies from APIs offered to plugins such as Flash or Java? - Should browsers hide
httponly
Set-Cookie
headers in server responses shared withXMLHttpRequest
, Flash, or Java? - Should it be possible to drop
httponly
orsecure
cookies by overflowing the "plain" cookie jar in the same domain, then replace them with vanilla lookalikes? - Should it be possible to drop
httponly
orsecure
cookies by setting tons ofhttponly
orsecure
in other domains?
All of this is formally permitted - and some of the aforementioned problems are prevalent to this day, and likely will not be fixed any time soon.
At first sight, the list may appear inconsequential - but these weaknesses have profound consequences for web application design in certain environments. One striking example is rolling out HTTPS-only services that are intended to withstand rogue, active attackers on open wireless networks: if secure
cookies can be injected on easy-to-intercept HTTP pages, it suddenly gets a whole lot harder.
If it tastes good, who cares where it comes from?
Cookies diverge from JavaScript same-origin model in two fairly important and inexplicable ways:
domain=
scoping is significantly more relaxed than SOP, paying no attention to protocol, port number, or exact host name. This undermines the SOP-derived security model in many compartmentalized applications that also use cookie authentication. The approach also makes it unclear how to handledocument.cookie
access from non-HTTP URLs - historically leading to quite a few fascinating browser bugs (setlocation.host
while on adata:
page and profit!).path=
scoping is considerably stricter than what's offered by SOP - and therefore, it is completely useless from the security standpoint. Web developers misled by this mechanism often mistakenly rely on it for security compartmentalization; heck, even reputable security consultants get it completely wrong.
On top of this somewhat odd scoping scheme, conflict resolution is essentially ignored in the specification; every cookie is identified by a name-domain-path tuple, allowing identically named but differently scoped cookies to coexist and apply to the same request - but the standard fails to provide servers with any metadata to assist in resolving such conflicts, and does not even mandate any particular ordering of such cookies.
This omission adds another interesting twist to the httponly
and secure
cookie cases; consider these two cookies:
Set on https://www.example.com/:
FOO=legitimate_value; secure; domain=www.example.com; path=/
Set on http://www.example.com/:
FOO=injected_over_http; domain=.example.com; path=/
The two cookies are considered distinct, so any browser-level mechanisms that limits attacker's ability to clobber secure
cookies will not kick in. Instead, the server will at best receive both FOO
values in a single Cookie
header, their ordering dependent on the browser and essentially unpredictable (and at worst, the cookies will get clobbered - a problem in Internet Explorer). What next?
Character set murder mystery
HTTP/1.0 RFC technically allowed high-bit characters in HTTP headers without further qualification; HTTP/1.1 RFC later disallowed them. Neither of these documents provided any guidance on how such characters should be handled when encountered, though: rejected, transcoded to 7-bit, treated as ISO-8859-1, as UTF-8, or perhaps treated in some other way.
The specification for cookies further aggravated this problem, cryptically stating:
"If there is a need to place such data in the name or value, some encoding method such as URL style %XX encoding is recommended, though no encoding is defined or required."
There is an obvious problem with saying that you can use certain characters, but that their meaning is undefined; the systemic neglect of this topic has profound consequences in two common cases where user-controlled values frequently appear in HTTP headers: Content-Disposition
is one (eventually "solved" with browser-specific escaping schemes); another is, of course, the Cookie
header.
As can be expected, based on such poor advice, implementators ended up with the least sensible approach; for example, I have a two-year-old bug with Mozilla (418394): the problem is that Firefox has a tendency to mangle high-bit values in HTTP cookies, permitting cookie separators (";") to suddenly materialize in place of UTF-8 in the middle of an otherwise sanitized cookie value; this led to more than one web application vulnerability to date.
A session is forever
The last problem I want to mention in this post is far less pressing - but is also an interesting testament to the shortcomings of the original design.
For some reason, presumably due to privacy concerns, the specification decided to distinguish between session cookies, meant to be non-persistent; and cookies with a specified expiration date, which may persist across browser sessions, are stored on the disk, and may be subject to additional client-enforced restrictions. On the topic of the longevity of the former class of cookies, the RFC conveniently says:
"Each session is relatively short-lived."
Today, however, this is obviously not true, and the distinction feels misguided: with the emergence on portable computers with suspend functionality, and the increased shift toward web-oriented computing, users tend to keep browsers open for weeks or months at a time; session cookies may also be stored and then recovered across auto-updates or software crashes, allowing them to live almost indefinitely.
When session cookies routinely persist longer than many definite-expiry ones, and yet are used as a more secure and less privacy-invasive alternative, we obviously have a problem. We probably need to rethink the concept - and either ditch them altogether, or impose reasonable no-use time limits at which such cookies are evicted from the cookie jar.
Closing words
I find it quite captivating to see the number of subtle problems caused by such a simple and a seemingly harmless scheme. It is also depressing how poorly documented and fragile the design remains some 15 years later; and that the introduction of well-intentioned security mechanisms, such as
httponly
, only contributed to the misery. An IETF effort to document and clarify some of the security-critical aspects of the mechanism is underway only now - but it won't be able to fix them all.
Some of the telltale design patterns - rapid deployment of poorly specified features, or leaving essential security considerations as "out of scope" - are still prevalent today in the browser world, and can be seen in the ongoing HTML5 work. Hopefully, that's where the similarities will end.
0 nhận xét:
Đăng nhận xét