Session Theft and DPoP

When a user logs in to your product, you create a session for them. How you do this varies from product to product, but it typically involves giving the user something they will include in future requests: an opaque token, a signed JWT, etc.

Providing this session information could be part of an Authorization header or a cookie:

curl -H "Authorization: Bearer {this_is_my_session_token}" ..

# OR

curl -b "session={this_is_my_session_token}" ..

However, this leads to an important question:

What happens if an attacker steals a user's session token?

Well... the attacker would be able to make requests as if they were the user.

Depending on how bad that sounds to you, there are different approaches you can take to mitigate it. Some of these mitigations are straightforward, like invalidating a session if you detect a significant change in a user-agent header.

Some of these are more complex, like computing a "fingerprint" for a device and re-authenticating the user if the fingerprint doesn't match.

In this post, we'll look at DPoP, which can be another layer of defense against this type of attack.

One small note: DPoP is an OAuth extension, so the RFC talks about it in OAuth terms. Most people don't think about OAuth regularly, so I'm going to take some liberties and discuss the core ideas in a simpler context: A web browser making authenticated requests to a backend using a session token.

The underlying ideas are the same, and at the end, I'll show the areas I simplified and what you should do for a real implementation.

Demonstrating Proof of Possession (DPoP)

From the RFC:

The primary aim of DPoP is to prevent unauthorized or illegitimate parties from using leaked or stolen access tokens, by binding a token to a public key upon issuance and requiring that the client proves possession of the corresponding private key when using the token.

The problem with just using session tokens is that they are accepted with no other information. What we'd like to do is to figure out a way where we can tie these session tokens to something that's harder to steal.

As an example, imagine if every time the user made a request to the backend, they needed to provide a valid 2FA code. This has some obvious advantages:

It solves our key problem: A stolen session token can no longer be used unless the attacker also steals our 2FA secret.
It adds friction to any attack: 2FA secrets are stored separately from our sessions, so it should be a challenge to steal both.

However, it's a terrible user experience. It's actually hard to design a more annoying user experience. Our legitimate users will get interrupted constantly to re-enter their 2FA code.

Our ideal solution here should solve our key problem (making a stolen session token useless), add friction to attacks, and not impact our user experience.

Using Public/Private Key Pairs

A private key allows you to create a signature from some data (signature = sign(data, private_key)) and a public key allows you to verify that signature (is_signature_correct = verify(signature, data, public_key)).

DPoP's core idea is to store a public/private key pair in the browser, and require the user to prove they have the private key whenever they make an authenticated request.

To make this concrete: When a user logs in, in addition to our normal credentials (email/password, 6 digit code, etc), they also provide a public key. This public key is then tied to their session.

fetch("/login", {
    method: "POST",
    headers: {
        DPoP: public_key, // <- new
    },
    body: JSON.stringify({
        email,
        password,
    }),
});

Later on, when our user is making an authenticated request, instead of just sending us their session token, they also send us something they signed with their private key (we'll talk about what data they sign later on).

fetch('/authenticated-request', {
    method: 'POST',
    headers: {
        'Authorization': `Bearer ${session_token}`,
        'DPoP': data_signed_with_private_key, // <- new
        // ...

Our server doesn't just verify the session token, it also uses the public key to verify the signature. Assuming the signature is valid, we get a stronger guarantee: the user has both the private key and the session.

But wait, why wouldn't an attacker just steal the private key too?

This is a very reasonable question, if an attacker is stealing someone's session token, why wouldn't they also steal the private key?

The answer is that there are better methods to securely store a private key than there are to securely store a session token. In the browser, the WebCrypto API has a method for generating private/public key pairs that includes an option to make it not extractable, meaning that you don't have access to the private key itself, only a reference to it. There isn't really an equivalent concept for session tokens which are commonly stored in memory, as a cookie, or in localStorage.

A malicious chrome extension can steal your cookies, it can inject code onto your pages to grab tokens you store in memory or localStorage, but it can't get your private key.

That isn't to say you are fully protected, a malicious chrome extension can still use the private key to sign things. However, since an attacker can't get the value of the private key, we can limit the time they would have access to our user's account after the extension is removed.

If we go back to our three criteria from before, we see that this approach does well on all three:

It solves our key problem: A stolen session token can no longer be used unless the attacker also steals our private key.
It adds friction to any attack: The private key is stored in a non-extractable way, meaning it's harder to steal both the private key and the session token.
The user experience is mostly unaffected: Providing a public key and signing some data can all happen in the background without the user knowing. There are some cases where you'd add latency to requests or need to retry a failed request, but it's pretty minimal.

Ok, so, what do we sign?

We mentioned before that the client will send both the session token and a signature, so what does the user have to sign? Here's an example of a signed payload followed by what each field means:

{
    "iat": 1762504611,
    "jti": "9fb5871c-d39a-4ed5-8405-a56435cd48af",
    "htm": "POST",
    "htu": "https://api.example.com/some-endpoint",

    // optionally:
    "ath": base64url(sha256(accessToken)),
    "nonce": "9a001171-d9f6-4cb7-bffa-4522fa9ba5ad"
}

iat (short for issued at) is the timestamp of when the data was signed.

jti is a unique ID, randomly generated on the client. Every request should include a new, randomly generated value. The server is responsible for storing all recently used JTIs to protect against replay attacks where someone reuses a previously signed payload.

htm and htu are the request's method and URL. This binds the signature to one specific API call.

ath is the hash of an access token. Since this RFC is an OAuth extension, this is referring to OAuth's concept of an access token, which would loosely translate to our session token.

This is probably the biggest deviation that I'll make from the spec - DPoP considers this field optional only if you don't have an accessToken yet, but once you do have an accessToken, ath is required. In the browser context, there are some cases where it's not feasible to get the session token (an HttpOnly cookie) so you may want to leave out ath in those cases.

nonce is a randomly generated value provided by the server, to be included in the request.

While the nonce is technically optional, it's one of the more powerful and useful concepts here, so you likely always want to include it.

Remember that case we talked about earlier where a malicious chrome extension can sign data, but it can't get the value of the private key? If we didn't have a nonce, the attacker can generate signed payloads for themselves to use in the future.

// my malicious chrome extension, creating signatures it can use in the future
const signatureForTomorrow = sign({ iat: tomorrow, ... })
const signatureForFriday = sign({ iat: friday, ... })
// ... you get the idea

These would be valid as long as the ath is.

The nonce, however, means that the server can effectively say:

"Actually, can you include 965b5daf-5bdc-4923-89c5-b34a2fd8ccf7 as the nonce in your requests going forward?"

And any signatures the attacker generated for the future are now useless.

The combination of these 6 fields provides pretty strong guarantees around session theft. Each request is only valid for a short period of time, cannot be replayed, cannot be intercepted and used on a different endpoint, is tied directly to an unextractable private key stored securely in your browser, and the server at any time can render any pre-generated requests invalid.

The challenges of IP address changes for sessions

If you read the OWASP guide on session management, it has a lot of great, explicit, practical advice for managing sessions in your application.

That being said, one area I've always found vague is the section on Binding the Session ID to Other User Properties. This isn't only true for OWASP, it's just a hard problem.

Let's say you bind the session to an IP address, which is one of the recommendations, what happens if the IP address changes? It could be a sign that one of your user's sessions was stolen and needs to be invalidated. Or it could be a sign that they started working from a coffee shop. Or they are on a mobile device and moving around.

You could require the user to re-authenticate, but some users will get frustrated by that, especially if it happens frequently. You could detect things like "this IP was in New York but now it's in India" (sometimes called impossible travel) but good attackers know about residential IPs, and you might just be annoying the portion of your audience that uses a VPN.

To me, this is where the nonce really shines.

The power of the DPoP nonce

The DPoP spec itself doesn't actually cover when to change the nonce:

The server determines when to issue a new DPoP nonce challenge and if it is needed, thereby requiring the use of the nonce value in subsequent DPoP proofs. The logic through which the server makes that determination is out of scope of this document.

So, we get to decide when to generate new nonces. An easy win is to generate new nonces every few minutes, but we can do better than that.

What if, instead of just binding a session to other user properties, we also bind a nonce to other user properties.

The flow would look something like this:

User sends an authenticated request, but it's coming from a new IP that we haven't seen before.
Your server rejects the request and sends back a new nonce for them to use. This nonce is tied to the new IP address.
The browser (with no user interaction) signs a new payload that includes the new nonce.
The browser retries the request, which succeeds as long as the IP is the same

We now have a silent, background challenge that we can issue whenever we want that can only be answered with access to both the private key and the session token.

Because it's relatively cheap, it allows us to trigger this challenge more frequently than we would for a challenge that involves the user. There's no "the IP address changed, but we need to decide if that's scary or not," you can just always generate a new nonce.

What are the downsides?

We can't protect against XSS

XSS (or cross site scripting) attacks would enable the attacker to run code in your user's browser. And while we can stop them from getting the value of the private key, we can't stop them from signing their own requests that they make to your server.

Even swapping out the nonce here doesn't help, as they can just sign their own requests with the new nonce.

That being said, once you address the XSS issue and generate new nonces, you have effectively stopped the XSS attack - as any exfiltrated sessions or DPoP signatures will no longer be usable.

OWASP has a cheat sheet with protections you can put in place for XSS attacks, as you'll need to protect against them separately.

It doesn't play nicely with server side rendering

In order to create a DPoP signature, you need JavaScript to run. When you have a site that uses server side rendering, you often will load data to pre-populate a page before JavaScript runs.

This means all of those calls to load data happen before JS runs and can't easily be protected with DPoP.

This might not be a huge deal, as you don't necessarily need to protect every request with DPoP. You might only care about protecting certain mutations which all originate from fetch requests in the browser, in which case everything would work just fine.

Some small clarifications

Throughout the post, I made a few simplifications to make it easier to understand. The RFC will have the full details, but just to point out a few important points:

When we send our "signed payload," we are actually sending a JWT. We include the public key in the header and the type is dpop+jwt so we know this JWT was created specifically for this purpose.
This is actually also true when we register the device on the /login route. We don't just send the public key, we send a signed JWT with the public key in the header.
The ath and nonce are technically optional, but if you use an accessToken it isn't optional anymore. Similarly, once the server returns a nonce, the client is not allowed to ignore it.

Wrapping up

Protecting against session theft can be challenging, as it often relies on heuristics or making your user experience a bit worse.

DPoP, while not completely fool proof, provides some stronger guarantees - both for protecting against session theft and for limiting the effect that various session theft attacks could have.