When parsing HTML documents, browsers recognize two methods of specifying tag parameter values: a "bare" form (such as <img src=image.jpg>
), which is terminated by angle brackets, whitespaces, and so on; and a quoted form (<img src="image.jpg">
) which is terminated only by a matching quote.
Every browser makes the decision by looking at the first non-whitespace character after the name=value
separator. If this happens to be a single or a double quotation mark, the second parsing strategy is used; otherwise, the first method is a go. Internet Explorer also recognizes backticks (`) as a faux quote, leading to security flaws in a fair number of HTML filters - but even with this quirk, the behavior is still pretty straightforward. In particular, in the following example, stray quotes will not have any effect on how the tag is interpreted:
<a href=http://www.example.com/?">This text is not a tag parameter anymore.">Click me</a>
But here's the thing: Internet Explorer seems to be doing a substring search for an equals sign followed by a quote anywhere in the parameter name=value
pair. Therefore, the following syntax will be parsed in a very different way:
<a href=http://www.example.com/?=">This is still a part of markup indeed!">Click me</a>
It's one of the most unique and surreal HTML parser quirks I am aware of (and it survives to this day in Internet Explorer 9). In principle, it allows any server-side HTML filter to get out of sync with the browser, leading to parameter splitting and tag consumption. In reality, it has a limited practical significance: if your HTML filter is relaxed enough to allow this syntax to go through, it is probably already vulnerable to the abuse of other syntax tricks.