Robin Berjon

Web Applications Security

Trusted Web Applications Considered Harmful

There is a lot of thinking going on around the possibility of using well-known Web technologies in order to create not-served-from-HTTP, not-running-in-the-browser, having-access-to-powerful-additional-APIs applications. I very much applaud the first two aspects, the third is more problematic. Not served from HTTP is great because there's a bunch of stuff that I just want to do locally. Some things are great for the cloud, but I like my local drive and there's a lot of information on it that I want to stay there. Not running in the browser is good too (even if it's using the same engine) — frankly there are only so many tabs you can handle. Heightened access is, however, something that we should be a lot more careful about.

It's not a problem of trust. Different parts of the Web universe believe in different approaches to security. In some cases you're getting something that's open source and trusted by the community, for instance a browser extension. In others, you're distributing applications for which you establish trust in the same way as you do with any other application that's not built on Web technology: you just sort of figure that if it wasn't safe you'd know. In some corners, people believe in a model in which these applications would be controlled by a policy framework that limits which APIs and methods they can access, and the applications themselves are reviewed by professionals before entering an application store to ensure that the APIs that they request access to are legitimate.

Any of the above can make sense with conventional apps. But apps built using Web technology are somewhat different: they're a lot more composable. Or, put in a less unicorny light, they are very conducive to code injection attacks.

Take a simple ebook application. It's all built with beautiful HTML, CSS, and SVG, it has smooth and classy interactions supported by Javascript. You're enforcing the most coercive of the systems listed above: a policy that only allows for very specific privileges, a review process that validates that such privileges are required, an electronic signature on the app that validates that it has indeed been signed by a certificate that you have verifiably tied to a real-world identity. The app needs to store books, so it has file system access. It also needs to access content from the social recommendation system so it has permission to access URLs in that origin. It's all very legit, and it all passes verification with flying colours. Users download lots of ebooks, and read lots of recommendations. Everyone's happy.

Except that all I need now to gain access to the file system of thousands of users is a single XSS bug in the recommendation system. Why? Because it's the same runtime and context that displays the HTML in the recommendation comments that has heightened access to these APIs. And how do I display HTML comments in an HTML system? Well, by inserting them in the DOM, where any scripts will happily be executed.

It's not exactly as if XSS bugs were rare, either. Thomas Roessler has shown that Apple's Dashboard widgets were frequently susceptible to such attacks. I'd be rather surprised if we couldn't find such holes in widgets built using OMTP BONDI or JIL APIs (WAC 1.0 widgets are somewhat safer in that some of the more powerful APIs are not exposed yet).

Could the verification step above have caught that? It's unlikely. That would require code analysis, which in turn requires highly skilled professionals, which would make the app validation process difficult. There certainly are heuristics that can help, but they will only get you so far.

And this isn't an obscure problem. Frankly, I suck at security. I'm the guy who spent years running everything on his Linux box — not just CLI but window manager, apps, browser, etc. — as root because I couldn't be arsed to sudo. If I can find a security issue with your product, worse, if I can craft an actual attack against your product, it's not a manageable security risk that you have. It's a death wish.

How is this different from native apps that may be getting information off the Web? Well they don't normally interpret that information directly, using the same runtime that they use for the rest of their code. When on occasion they do instantiate a Web view of some sort to show HTML content they may have acquired online, that context does not have special privileges compared to a regular browser.

So are we screwed? Do we have to return to using one of those horrible GUI toolkits manipulated with some legacy language?

Hell no. The first thing that we can do is to keep adding features inside of the distributed code security boundary. And when stepping outside of that boundary, make sure it is through clear, understandable, in-flow user action. That is well-known to some but it bears repeating because there are still quite a few otherwise smart people who don't understand the value that there is here. This is the model that has been followed by, amongst others, the DAP and WebApps WG. It certainly has limitations, but frankly, it already gives you a lot to play with.

Beyond that, there are essentially two options. One is blind trust. It sounds scary, but it's what most people who've installed a Firefox extension without checking its code first have done (Firefox extensions have full access to XPCOM, and therefore to the system — it's not the case with all browser extension systems). Frankly, that system sucks and will eventually bite. The other is to use a system like Web Introducer or Web Intents to connect apps to potentially unsafe operations in a safer way, with user mediation. Or whatever comes next — this is still an area ripe with potential. It's exciting!

A post-scriptum note

The observant reader may be wondering (observant readers always seem to wonder about this or that) how one goes about designing APIs with the knowledge that the security context in which they will be used may vary wildly depending on the tastes of the implementer, where it gets reused, and whatever gorgeous new security scheme we may come up with. I ought to return to this in a future post but in many cases (not all) some guidelines in API design can help orthogonalise the API from security (which does not mean that the API does not take security into account, simply that it allows it to take many guises). For instance, say that there is an API that allows one to access the unicorns stored on your device. Unicorns are a precious and oh-so-private resource that you don't just want to share randomly — in other words, it's an API that can't just be exposed willy-nilly, and that could also probably benefit from some minimisation features to enhance privacy.

Well as the API goes, it doesn't require much. It basically has to be asynchronous. The rest is in how it is specified to work appropriately in different contexts. The code would look like this:

  // find all the five-star unicorns
  navigator.unicorns.find({ stars: 5 }, successCB);
  function successCB (unicorns) {
      // do something with the unicorns
  }

You might require find() to be only triggered by a trusted event (and throw otherwise) or you might leave it open. In a normal context I would assume it to trigger a unicorn picker (similar somewhat to the file picker that you get out of <input type=file>.click(), but with more rainbows) showing the results of the search; and in order to support minimisation that picker would allow the user to remove certain parts of the result set before okaying it. But if you're very much certain that you have an amazing security model that protects the user like a superhero then you can call the callback back with the full list of results immediately. In other words, it's safe in a browser accessing arbitrary content using today's security model and it's open to innovation in the security space. A lot of those ideas are exercised in the Contacts API for instance.

Note that another approach would be to tie the above to HTML using <input type=unicorn>. That is indeed the best approach in some cases, but it carries its own downsides and some people seem to see it as more magical an extension point than it really is. It is indeed useful when appropriate though, it's what DAP did with HTML Media Capture for instance, and will likely do again elsewhere.

Like this? Hate this? A revelation? Totally daft? If so, head over to read this as well and keep that feedback coming!

This article is part of a series on the Device APIs Working Group (DAP).