On designing UIs for non-robots

In a typical, attentive human subject, the usual latency between a visual stimulus and a voluntary motor response is between 100 and 300 milliseconds. As should be evident, we do not pause for that long to assess the situation after each and every muscle movement; instead, we routinely schedule series of motor actions well in advance - and process sensory feedback only after the fact, in an asynchronous manner. Within that sub-second window of opportunity, we are simply unable to abort a premeditated action - even if things go wrong.


And here lies an interesting problem: on today's blazing fast personal computers, a lot can happen in as little as one-tenth of that timeframe. Within a browser, windows can be opened, moved around, and then closed; system prompts triggered or destroyed; programs launched and terminated. In such an environment, designing security UIs that take human cognitive limitations into account is a tricky game: any security-relevant prompt that does not enforce a certain amount of uninterrupted, distraction-free, in-focus screen time before accepting user input, is likely completely broken.


Intuitively, this just feels wrong - surely, humans can't be that bad, so the issue can't be that serious - but this is exactly the sort of a fallacy we should be trying to avoid. There is nothing, absolutely nothing, that would make attacks impractical; increasingly faster JavaScript has the ability to programatically open, position, resize, focus, blur, and close windows, and measure mouse pointer velocity and click timings with extreme accuracy; with a bit of basic ingenuity, any opportunity for a voluntary user reaction can be taken out of the equation. That's it: we suck, and there is nothing you can do to change it.


To back this claim, let's have a look at the recently-introduced HTML5 geolocation API; the initial call to navigator.geolocation.getCurrentPosition() spawns a security prompt in Firefox, Opera, Chrome, Safari, and a couple of other browsers. This UI does not implement a meaningful delay before accepting user input - and so, this
crude and harmless Firefox proof-of-concept can be used to predict the timing of mouse clicks, and steal your location data with an annoyingly high success rate. This particular vector is tracked as Mozilla bug 583175, but similar problems are endemic to most of the new security UIs in place; the reason is not always simple oversight, but often, just explicit opposition to the idea of introducing usability roadblocks: after all, to a perfect human being, they are just a nuisance.


Fine-grained click timing is, of course, not where the story ends; it has been demonstrated time and time again that with minimal and seemingly innocuous conditioning, healthy and focused test subjects can be reliably duped into completely ignoring very prominent and unusual visual signals; the significance of this problem is unappreciated mostly because not many exploit writers are behavioral scientists - but that's not very reassuring thought.


There is some admirable work going on to make browser security messaging more accessible to non-technical users; but I'd wager that our problems run deeper than that. We are notoriously prone to overestimating the clarity of our perception, the rationality of our thought, and the accuracy of our actions; this is often a desirable trait when going through your life - but it tends to bite us hard when trying to design security-critical software to be used by other human beings.


We need to fight the habit the best we can, and start working on unified, secure human-to-machine interfacing in the browser. If we dismiss our inherent traits as an out-of-scope problem in security engineering, we will lose.


PS. On a somewhat related note, you may also enjoy Jesse Ruderman's recent 10-minute presentation about UI truncation attacks.

0 nhận xét:

Đăng nhận xét