I don't understand the need for this level of engineering. It appears we are going for an opaque bearer token here. The checksum is pointless because an entire 512 bit token still fits in an x86 cache line. Comparing the whole sequence won't show up in any profiler session you will ever care about.
If you want aspects of the token to be inspectable by intermediaries, then you want json web tokens or a similar technology. You do not want to conflate these ideas. JWTs would solve the stated database concern. All you need to store in a JWT scheme are the private/public keys. Explicit tracking of the session is not required.
> The checksum is pointless because an entire 512 bit token still fits in an x86 cache line
I suppose it’s there to avoid round-trip to the DB. Most of us just need to host the DB on the same machine instead, but given sharding is involved, I assume the product is big enough this is undesirable.
The point of the checksum is to just drop obviously wrong keys. No need to handle revocation or do any DB access if checksum is incorrect, the key can just be rejected.
Hey OP, sorry for the negativity, I think most of these commenters right now are pretty off-base. My company is building a lot of API infrastructure and I thought this was a great write up!
While it's true that API keys are basically prefix + base32Encode(ID + secret), you will want a few more things to make secure API keys: at least versioning and hashing metadata to avoid confused deputy attacks.
Presumably because API keys are n bytes of random data vs. a shitty user-generated password we don’t have to bother using a salt + can use something cheap to compute like SHA256 vs. a multi-round bcrypt-like?
GitHub introduced checksums to their tokens to aid offline secret scanning. AFAIK it’s mostly an optimization for that use case. But the checksums also mean you can reveal a token’s prefix and suffix to show a partially redacted token, which has its benefits.
Side note: the slug prefix is not primarily intended for the end-user / developer to figure out which kind of key it is, but for security scanners to detect when they are committed to code / leaked and invalidate them.
Reading “hex” pointing to a clearly base62-ish string was a bit interesting :-)
Also, could we shard based on a short hash of account_id, and store the same hash in the token? This way we can lose the whole api_key → account_id lookup table in the metashard altogether.
I know sometimes people just like to try things out, but for the love of god do not implement encryption related functionality yourself. Use JWT tokens and OpenSSL or another established library to sign them. This problem is solved. Not essentially solved, solved.
Creating your own API key system has a high likelihood of fucking things up for good!
You don't need any encryption or signing for API keys. Using JWTs is probably more dangerous here, and more annoying for people using the API since you now have to handle refreshing tokens.
Plain old API keys are straightforward to implement. Create a long random string and save it in the DB. When someone connects to the API, check if the API key is in your DB and use that to authenticate them. That's it.
If you want aspects of the token to be inspectable by intermediaries, then you want json web tokens or a similar technology. You do not want to conflate these ideas. JWTs would solve the stated database concern. All you need to store in a JWT scheme are the private/public keys. Explicit tracking of the session is not required.
I suppose it’s there to avoid round-trip to the DB. Most of us just need to host the DB on the same machine instead, but given sharding is involved, I assume the product is big enough this is undesirable.
Experience tells otherwise
Here is a detailed write-up on how to implement production API keys: https://kerkour.com/api-keys
Even the random hex with checksum component seems overkill to me, either the API key is correct or it isn't.
Reading “hex” pointing to a clearly base62-ish string was a bit interesting :-)
Also, could we shard based on a short hash of account_id, and store the same hash in the token? This way we can lose the whole api_key → account_id lookup table in the metashard altogether.
PS : I too am working on a APIs.Take a look here : https://voiden.md/
Plain old API keys are straightforward to implement. Create a long random string and save it in the DB. When someone connects to the API, check if the API key is in your DB and use that to authenticate them. That's it.
This is pretty much just plain-old-api-keys, at least as far as the auth mechanism is concerned.
The prefix slug and the checksum are just there so your vulnerability scanner can find and revoke all the keys folks accidentally commit to github.
But otherwise, yes, for love of everything holy - keep it simple.