Refresh Token mechanism does not work well with many concurrent requests

Hi,

I’m pretty sure that current oauth token refresh mechanism doesn’t always work as it should. I went through many threads on that forum and read many comments. People have problems with token refreshing and almost always their application thread safety is blamed. Yes, in most cases client application thread safety is to be blamed. But I think it’s not always the case.

There is really nice comment by @bradb - Long-lived token support - #5 by bradb with refreshing explanation. He wrote:

If you don’t want to worry about threading you can build in some error handling. If two threads attempt to refresh the token one will succeed and one will fail. In the case of failure just refetch the token from storage (it should be the new one).

So when two simultaneous refresh requests are made then one of them will be successful and the other one will not. We can extend that example to 10 simultaneous refresh requests. In such case only one of them should be successful and 9 of them should fail. The thing is it’s not the case! For example we’re getting 3 successful refresh token responses out of 10. Here is example:

→ for i in `seq 1 10`; do sleep 1 && curl -s "https://api.infusionsoft.com/token" -i -d "grant_type=refresh_token&refresh_token=arv8upha5wayybv8y3jhkjh4&client_id=znvtck59tpfmn3yht25du5q2&client_secret=$CLIENT_SECRET" & done


HTTP/1.1 200 OK
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Responder: prod-j-worker-us-west-1b-24.mashery.com
transfer-encoding: chunked
Connection: keep-alive

{"access_token":"u7g7dm5a6avzmnwcp9nr7t6u","token_type":"bearer","expires_in":86400,"refresh_token":"t33raar4h6qwrng58ch2et26","scope":"full|hw308.infusionsoft.com"}


HTTP/1.1 200 OK
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Responder: prod-j-worker-us-west-1c-13.mashery.com
transfer-encoding: chunked
Connection: keep-alive

{"access_token":"savnvejutg6gbpyws36pqdn9","token_type":"bearer","expires_in":86400,"refresh_token":"ypyesws8cwcam6rq297hr98d","scope":"full|hw308.infusionsoft.com"}Cache-Control: no-store


HTTP/1.1 200 OK
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Responder: prod-j-worker-us-west-1c-15.mashery.com
transfer-encoding: chunked
Connection: keep-alive

{"access_token":"nmxjnpvfzuu9mxaue4j4qu65","token_type":"bearer","expires_in":86400,"refresh_token":"zwwnavn8pp2mj3e5wkg4jt7a","scope":"full|hw308.infusionsoft.com"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1c-11.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1c-18.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1b-01.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1b-03.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1b-06.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1c-09.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}


HTTP/1.1 400 Bad Request
Cache-Control: no-store
Content-Type: application/json;charset=UTF-8
Date: Tue, 18 Jun 2019 14:08:31 GMT
Pragma: no-cache
Server: Mashery Proxy
X-Mashery-Error-Code: ERR_400_BAD_REQUEST
X-Mashery-Responder: prod-j-worker-us-west-1b-22.mashery.com
Content-Length: 69
Connection: keep-alive

{"error":"invalid_grant","error_description":"Invalid refresh token"}

So we already can’t use bradb’s solution, because we have more than one successful responses. And only one of the newly returned refresh / access tokens actually works. Two others ones are already wrong (expired?). One of them (possibly wrong one) will overwrite the other ones.

What one could do is to try (make an API call) each of the access tokens before persisting it in the storage (and store only the one which actually works). But this seems to me as unnecessary work for a API client side. API should just return no more than one successful response for refresh token requests that are using the same refresh token.

Also one could think that we should store refresh token that we got as the last one. No, we can’t. We don’t have control over CPU and it’s up to it which threads will be processed first and which last. Even in my curl example, when none additional application / library is involved, the second one (not third one) returned token is the working one.

What do you think? Am I right? Thanks.

If you just run a single process on a schedule then you never have to worry about threads. Doing it that way just complicates things. Run a background process that manages tokens and then only worry about reading the active access token from the DB…no threads involved.

Using this, I’ve never once, across many projects, ever had a single failure.

1 Like

Not to pile on, but there’s no way to build a production quality token refresh subsystem w/o making it single-threaded. Sucks, but that’s the way of the world. It’s not going to get any better.

A few years ago I ran similar type of tests, and noticed the same pattern as well.
By doing those simultaneous requests, it will create race conditions. You will end up with Tokens that have been overridden by another set of recent ones.

I would do the same thing John mentioned. If you are concerned about the Tokens, you could always do a secondary check like 5 / 10 minutes later to see if they are still valid.

Thanks for your answers.

Not to pile on, but there’s no way to build a production quality token refresh subsystem w/o making it single-threaded. Sucks, but that’s the way of the world. It’s not going to get any better.

Not that I disagree - I’m just not sure about that. For sure everything is easier in single-threaded scenario. But we have more than 100 oAuth integrations and we have never met such problem like this one.

Let’s not talk about single threaded scenario any more - it works as expected. We all know that. I still want to talk about multi threaded scenario. Why? Because documentation (OAuth2 Authentication - Keap Developer Portal) doesn’t say it shouldn’t be used. The concurrency is the nature of web. Consider banking API that allows transfer founds from one account to another. The client can request to move all founds to another account using two concurrent requests and it’s the server responsibility to handle concurrent requests correctly (accept one request and discard other). This is not client responsibility to make synchronous requests. It can’t be. Otherwise it would be possible to move more founds that actually is present on the account. That’s why I think this should be fixed by Infusionsoft. It’s their responsibility to handle it correctly.

For me it’s definitely a bug on Infusionsoft side, but you can consider it just a limitation. However in that case I would expect to see description of this limitation in the documentation (OAuth2 Authentication - Keap Developer Portal). There should be something like “WARNING: Do not use concurrent requests to refresh tokens. We don’t process concurrent refresh requests correctly and you may receive tokens that is already expired”.
Additionally, as I already mentioned, Bradley Booth recommends using multiple threads (as one of the options) and doesn’t say anything about such limitations. On contrary he suggest that one one for two requests will be successful, which is not always the case.

Documentation doesn’t talk about pouring milk on your code either…doesn’t mean that says it’s what you should do.

multi threading has not benefit and it’s poor coding.