Possibly the most fascinating HTML parser behavior ever

I learned about this tidbit from sirdarckcat. It is in no way new, but the trick is so cute that I just could not resist sharing.


When parsing HTML documents, browsers recognize two methods of specifying tag parameter values: a "bare" form (such as <img src=image.jpg>), which is terminated by angle brackets, whitespaces, and so on; and a quoted form (<img src="image.jpg">) which is terminated only by a matching quote.


Every browser makes the decision by looking at the first non-whitespace character after the name=value separator. If this happens to be a single or a double quotation mark, the second parsing strategy is used; otherwise, the first method is a go. Internet Explorer also recognizes backticks (`) as a faux quote, leading to security flaws in a fair number of HTML filters - but even with this quirk, the behavior is still pretty straightforward. In particular, in the following example, stray quotes will not have any effect on how the tag is interpreted:


<a href=http://www.example.com/?">This text is not a tag parameter anymore.">Click me</a>


But here's the thing: Internet Explorer seems to be doing a substring search for an equals sign followed by a quote anywhere in the parameter name=value pair. Therefore, the following syntax will be parsed in a very different way:


<a href=http://www.example.com/?=">This is still a part of markup indeed!">Click me</a>


It's one of the most unique and surreal HTML parser quirks I am aware of (and it survives to this day in Internet Explorer 9). In principle, it allows any server-side HTML filter to get out of sync with the browser, leading to parameter splitting and tag consumption. In reality, it has a limited practical significance: if your HTML filter is relaxed enough to allow this syntax to go through, it is probably already vulnerable to the abuse of other syntax tricks.

0 nhận xét:

Đăng nhận xét