tangled
alpha
login
or
join now
evan.jarrett.net
/
at-container-registry
66
fork
atom
A container registry that uses the AT Protocol for manifest storage and S3 for blob storage.
atcr.io
docker
container
atproto
go
66
fork
atom
overview
issues
1
pulls
pipelines
try and create a cache for layer pushing again
evan.jarrett.net
3 months ago
fb7ddd0d
ecf84ed8
verified
This commit was signed with the committer's
known signature
.
evan.jarrett.net
SSH Key Fingerprint:
SHA256:bznk0uVPp7XFOl67P0uTM1pCjf2A4ojeP/lsUE7uauQ=
+1427
-118
14 changed files
expand all
collapse all
unified
split
.tangled
workflows
tests.yml
cmd
appview
serve.go
docs
BLUESKY_PDS_CLOCK_TOLERANCE_BUG.md
TROUBLESHOOTING.md
pkg
appview
handlers
api.go
images.go
logout.go
logout_test.go
oauth_errors.go
middleware
registry.go
routes
routes.go
auth
oauth
client.go
server.go
token
servicetoken.go
+1
-1
.tangled/workflows/tests.yml
···
1
1
when:
2
2
- event: ["push"]
3
3
-
branch: ["main", "test"]
3
3
+
branch: ["*"]
4
4
- event: ["pull_request"]
5
5
branch: ["main"]
6
6
+1
-1
cmd/appview/serve.go
···
388
388
mainRouter.Get("/auth/oauth/callback", oauthServer.ServeCallback)
389
389
390
390
// OAuth client metadata endpoint
391
391
-
mainRouter.Get("/client-metadata.json", func(w http.ResponseWriter, r *http.Request) {
391
391
+
mainRouter.Get("/oauth-client-metadata.json", func(w http.ResponseWriter, r *http.Request) {
392
392
config := oauthClientApp.Config
393
393
metadata := config.ClientMetadata()
394
394
+532
docs/BLUESKY_PDS_CLOCK_TOLERANCE_BUG.md
···
1
1
+
# Bluesky PDS OAuth Provider Clock Tolerance Bug Report
2
2
+
3
3
+
**Status:** Confirmed Bug
4
4
+
**Severity:** High (blocks OAuth authentication)
5
5
+
**Affects:** `@atproto/oauth-provider@0.13.4` (and likely earlier versions)
6
6
+
**Date Identified:** 2025-11-18
7
7
+
**Reported By:** ATCR Project
8
8
+
9
9
+
---
10
10
+
11
11
+
## Executive Summary
12
12
+
13
13
+
The Bluesky PDS OAuth provider (`@atproto/oauth-provider`) incorrectly rejects valid client assertion JWTs when the client's system clock is even milliseconds ahead of the PDS server clock. This occurs because the `jose` library's `jwtVerify` function is called with `maxTokenAge` (which triggers `iat` validation) but without setting `clockTolerance`, causing it to default to 0 seconds.
14
14
+
15
15
+
**Impact:** OAuth authentication fails for any client with normal clock drift ahead of the PDS, violating the ATProto OAuth specification and industry standards (FAPI 2.0, RFC 9068).
16
16
+
17
17
+
**Fix:** Add `clockTolerance: 30` (or 60) parameter to the `jwtVerify` call in `client.ts`.
18
18
+
19
19
+
---
20
20
+
21
21
+
## Problem Description
22
22
+
23
23
+
### Observed Behavior
24
24
+
25
25
+
OAuth client assertion validation fails with the error:
26
26
+
27
27
+
```
28
28
+
InvalidClientError: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past)
29
29
+
```
30
30
+
31
31
+
This occurs even when:
32
32
+
- Both systems have proper NTP synchronization
33
33
+
- Clock drift is minimal (observed: 115 milliseconds)
34
34
+
- The drift is well within industry-standard tolerances (30-60 seconds)
35
35
+
36
36
+
### Root Cause
37
37
+
38
38
+
**File:** `packages/oauth/oauth-provider/src/client/client.ts`
39
39
+
**Line:** ~240 (in `authenticate` method)
40
40
+
41
41
+
```typescript
42
42
+
const result = await this.jwtVerify<{
43
43
+
jti: string
44
44
+
exp?: number
45
45
+
}>(input.client_assertion, {
46
46
+
subject: this.id,
47
47
+
audience: checks.authorizationServerIdentifier,
48
48
+
requiredClaims: ['jti'],
49
49
+
maxTokenAge: CLIENT_ASSERTION_MAX_AGE / 1000,
50
50
+
// Missing: clockTolerance parameter
51
51
+
})
52
52
+
```
53
53
+
54
54
+
**The Issue:**
55
55
+
56
56
+
1. `maxTokenAge` is set, which triggers `iat` (Issued At) claim validation in the `jose` library
57
57
+
2. `clockTolerance` is **not set**, so `jose` defaults to `0 seconds`
58
58
+
3. Any client clock drift ahead of the PDS (even 1ms) causes rejection
59
59
+
4. The validation logic in `jose` is: `if (iat > now + clockTolerance) reject()`
60
60
+
61
61
+
---
62
62
+
63
63
+
## Evidence
64
64
+
65
65
+
### Timeline from Production Logs
66
66
+
67
67
+
**Example 1: Failed Authentication (ATCR AppView)**
68
68
+
69
69
+
```json
70
70
+
{
71
71
+
"time": 1763433826885, // PDS received request: 2025-11-18 02:43:46.885 UTC
72
72
+
"error": "iat claim timestamp check failed (it should be in the past)"
73
73
+
}
74
74
+
```
75
75
+
76
76
+
**Client assertion JWT payload:**
77
77
+
```json
78
78
+
{
79
79
+
"iat": 1763433827, // Token issued at: 2025-11-18 02:43:47.000 UTC
80
80
+
"exp": 1763433857
81
81
+
}
82
82
+
```
83
83
+
84
84
+
**Analysis:**
85
85
+
- PDS received request at: `02:43:46.885`
86
86
+
- JWT `iat` claim: `02:43:47.000`
87
87
+
- Time difference: **+115 milliseconds** (client ahead)
88
88
+
- Result: **REJECTED** ❌
89
89
+
90
90
+
---
91
91
+
92
92
+
**Example 2: Successful Authentication (tangled.org server)**
93
93
+
94
94
+
```json
95
95
+
{
96
96
+
"time": 1763434370365, // PDS received request: 2025-11-18 02:52:50.365 UTC
97
97
+
}
98
98
+
```
99
99
+
100
100
+
**Client assertion JWT payload:**
101
101
+
```json
102
102
+
{
103
103
+
"iat": 1763434370, // Token issued at: 2025-11-18 02:52:50.000 UTC
104
104
+
"exp": 1763434400
105
105
+
}
106
106
+
```
107
107
+
108
108
+
**Analysis:**
109
109
+
- PDS received request at: `02:52:50.365`
110
110
+
- JWT `iat` claim: `02:52:50.000`
111
111
+
- Time difference: **-365 milliseconds** (client behind)
112
112
+
- Result: **ACCEPTED** ✅
113
113
+
114
114
+
**Conclusion:** The PDS accepts tokens with `iat` in the past but rejects any token with `iat` in the future, regardless of how small the difference.
115
115
+
116
116
+
---
117
117
+
118
118
+
### Clock Synchronization Status
119
119
+
120
120
+
**ATCR AppView Server (Fedora, chronyd):**
121
121
+
- NTP Status: ✅ Synchronized
122
122
+
- Clock source: time.cloudflare.com
123
123
+
- Drift: Within normal NTP accuracy (5-100ms typical)
124
124
+
125
125
+
**Bluesky PDS (Kubernetes/Talos Linux):**
126
126
+
- NTP Status: ✅ Synchronized
127
127
+
- Clock source: time.cloudflare.com
128
128
+
- Talos node drift: +3.4ms ahead of NTP (observed)
129
129
+
130
130
+
**Both systems are properly synchronized.** The 115ms variance is normal for distributed systems with NTP.
131
131
+
132
132
+
---
133
133
+
134
134
+
## Specification Violations
135
135
+
136
136
+
### 1. ATProto OAuth Specification
137
137
+
138
138
+
**Quote from https://atproto.com/specs/oauth:**
139
139
+
140
140
+
> "Authorization Servers **should not reject client assertion JWTs generated less than a minute ago**"
141
141
+
142
142
+
**Interpretation:** The PDS should accept client assertions with `iat` timestamps within ~60 seconds (past or future) to account for clock skew.
143
143
+
144
144
+
**Current behavior:** Rejects any `iat` in the future, even by 1 millisecond.
145
145
+
146
146
+
**Verdict:** ❌ **VIOLATES ATProto spec**
147
147
+
148
148
+
---
149
149
+
150
150
+
### 2. FAPI 2.0 Security Profile (Financial-grade API)
151
151
+
152
152
+
**Quote from FAPI 2.0 spec:**
153
153
+
154
154
+
> "Authorization servers **MUST accept** JWTs with an `iat` or `nbf` timestamp between 0 and **10 seconds in the future**"
155
155
+
156
156
+
> "Authorization servers **SHALL reject** JWTs with an `iat` or `nbf` timestamp greater than **60 seconds in the future**"
157
157
+
158
158
+
**Rationale from spec:**
159
159
+
> "Even a few hundred milliseconds can cause rejection with clock skew... 10 seconds chosen to not affect security while increasing interoperability... Some ecosystems need 30 seconds to fully eliminate issues"
160
160
+
161
161
+
**Current behavior:** Rejects tokens 115ms in the future.
162
162
+
163
163
+
**Verdict:** ❌ **VIOLATES FAPI 2.0 minimum requirement (10s tolerance)**
164
164
+
165
165
+
---
166
166
+
167
167
+
### 3. RFC 9068 (JWT Profile for OAuth 2.0 Access Tokens)
168
168
+
169
169
+
**Quote:**
170
170
+
171
171
+
> "Implementers **MAY provide for some small leeway, usually no more than a few minutes**, to account for clock skew"
172
172
+
173
173
+
**Industry practice:** 30-60 seconds is the modern standard.
174
174
+
175
175
+
**Current behavior:** 0 seconds tolerance.
176
176
+
177
177
+
**Verdict:** ❌ **Below recommended practice**
178
178
+
179
179
+
---
180
180
+
181
181
+
### 4. RFC 9449 (DPoP - OAuth 2.0 Demonstrating Proof-of-Possession)
182
182
+
183
183
+
**Quote:**
184
184
+
185
185
+
> "To accommodate for clock offsets, the server **MAY accept DPoP proofs** that carry an `iat` time in the **reasonably near future (on the order of seconds or minutes)**"
186
186
+
187
187
+
**Current behavior:** Client assertions use similar JWT structure to DPoP proofs but have 0 tolerance.
188
188
+
189
189
+
**Verdict:** ❌ **Inconsistent with DPoP guidance**
190
190
+
191
191
+
---
192
192
+
193
193
+
## Industry Standards Analysis
194
194
+
195
195
+
### Library Defaults Comparison
196
196
+
197
197
+
| Library | Language | Default clockTolerance | Common Config |
198
198
+
|---------|----------|------------------------|---------------|
199
199
+
| **panva/jose** (PDS uses this) | JavaScript | **0s** ❌ | 30-60s |
200
200
+
| jsonwebtoken | Node.js | 0s | 30-60s |
201
201
+
| Spring Security | Java | 60s | 60s |
202
202
+
| nimbus-jose-jwt | Java | 60s | 60s |
203
203
+
| golang-jwt | Go | 0s | 60s |
204
204
+
| Okta JWT Verifier | Go | 120s | 120s |
205
205
+
206
206
+
**Key insight:** Modern libraries default to 0s (secure by default), but **application code must configure appropriate tolerance**. Enterprise libraries default to 60-120s for usability.
207
207
+
208
208
+
---
209
209
+
210
210
+
### OAuth Provider Recommendations
211
211
+
212
212
+
| Provider | Recommended clockTolerance |
213
213
+
|----------|---------------------------|
214
214
+
| Google | 30 seconds |
215
215
+
| Microsoft Azure AD | 300 seconds (5 minutes) |
216
216
+
| Okta | 120 seconds (2 minutes) |
217
217
+
| Auth0 | 5-30 seconds |
218
218
+
| **FAPI 2.0 (Banking)** | **10-60 seconds (10s minimum)** |
219
219
+
220
220
+
**Consensus:** 30-60 seconds is the modern standard for production OAuth systems.
221
221
+
222
222
+
---
223
223
+
224
224
+
### Real-World Clock Drift Expectations
225
225
+
226
226
+
**NTP Synchronization Accuracy:**
227
227
+
- Internet: 5-100ms typical (90% < 10ms)
228
228
+
- Same cloud provider, different regions: 10-50ms
229
229
+
- Multi-cloud/hybrid: Up to 200ms
230
230
+
- Mobile/edge devices: Up to 5 seconds
231
231
+
232
232
+
**Natural Clock Drift:**
233
233
+
- Typical RTC accuracy: 1-5 ppm (parts per million)
234
234
+
- Daily drift without NTP: ~0.4 seconds/day
235
235
+
- Network latency: Adds milliseconds to seconds
236
236
+
237
237
+
**Conclusion:** 115ms of drift with proper NTP is **completely normal** and expected in distributed systems.
238
238
+
239
239
+
---
240
240
+
241
241
+
## Proposed Fix
242
242
+
243
243
+
### One-Line Code Change
244
244
+
245
245
+
**File:** `packages/oauth/oauth-provider/src/client/client.ts`
246
246
+
**Location:** `authenticate` method (around line 240)
247
247
+
248
248
+
**Current code:**
249
249
+
```typescript
250
250
+
const result = await this.jwtVerify<{
251
251
+
jti: string
252
252
+
exp?: number
253
253
+
}>(input.client_assertion, {
254
254
+
subject: this.id,
255
255
+
audience: checks.authorizationServerIdentifier,
256
256
+
requiredClaims: ['jti'],
257
257
+
maxTokenAge: CLIENT_ASSERTION_MAX_AGE / 1000,
258
258
+
})
259
259
+
```
260
260
+
261
261
+
**Proposed fix:**
262
262
+
```typescript
263
263
+
const result = await this.jwtVerify<{
264
264
+
jti: string
265
265
+
exp?: number
266
266
+
}>(input.client_assertion, {
267
267
+
subject: this.id,
268
268
+
audience: checks.authorizationServerIdentifier,
269
269
+
requiredClaims: ['jti'],
270
270
+
maxTokenAge: CLIENT_ASSERTION_MAX_AGE / 1000,
271
271
+
clockTolerance: 30, // Accept tokens up to 30s in the future (FAPI-compliant)
272
272
+
})
273
273
+
```
274
274
+
275
275
+
**Alternative values:**
276
276
+
- **`clockTolerance: 10`** - FAPI 2.0 minimum requirement
277
277
+
- **`clockTolerance: 30`** - Recommended default (Google's practice)
278
278
+
- **`clockTolerance: 60`** - Maximum per FAPI 2.0, ATProto spec guidance
279
279
+
280
280
+
---
281
281
+
282
282
+
### Justification for 30 Seconds
283
283
+
284
284
+
**Security considerations:**
285
285
+
- 30 seconds is negligible for token expiration windows (typically 5-15 minutes)
286
286
+
- Does not meaningfully increase replay attack window
287
287
+
- Well within FAPI 2.0 maximum (60 seconds)
288
288
+
289
289
+
**Operational benefits:**
290
290
+
- Eliminates 99%+ of clock skew issues
291
291
+
- Accommodates normal NTP accuracy (5-100ms) with huge margin
292
292
+
- Handles network latency (typically <100ms)
293
293
+
- Prevents user-facing authentication failures
294
294
+
295
295
+
**Standards compliance:**
296
296
+
- ✅ Meets FAPI 2.0 minimum (10s) and maximum (60s)
297
297
+
- ✅ Aligns with ATProto spec ("less than a minute ago")
298
298
+
- ✅ Matches industry best practice (30-60s range)
299
299
+
- ✅ Consistent with Google's documented practice
300
300
+
301
301
+
---
302
302
+
303
303
+
## Testing Methodology
304
304
+
305
305
+
### Reproduction Steps
306
306
+
307
307
+
1. Set up two servers with independent NTP synchronization
308
308
+
2. Ensure Server A's clock is 100-500ms ahead of Server B
309
309
+
3. Configure OAuth client on Server A to authenticate against PDS on Server B
310
310
+
4. Attempt client assertion-based OAuth flow
311
311
+
5. Observe validation failure with `iat` error
312
312
+
313
313
+
### Verification After Fix
314
314
+
315
315
+
1. Apply the proposed code change (add `clockTolerance: 30`)
316
316
+
2. Rebuild and deploy PDS
317
317
+
3. Retry OAuth flow from Step 3 above
318
318
+
4. Confirm successful authentication
319
319
+
320
320
+
### Test Cases
321
321
+
322
322
+
**Should ACCEPT (with 30s tolerance):**
323
323
+
- ✅ `iat` 115ms in the future (observed case)
324
324
+
- ✅ `iat` 5 seconds in the future
325
325
+
- ✅ `iat` 29 seconds in the future
326
326
+
- ✅ `iat` exactly 30 seconds in the future
327
327
+
- ✅ `iat` 1 second in the past
328
328
+
- ✅ `iat` 5 minutes in the past (within `maxTokenAge`)
329
329
+
330
330
+
**Should REJECT:**
331
331
+
- ❌ `iat` 31 seconds in the future
332
332
+
- ❌ `iat` more than `maxTokenAge` seconds in the past
333
333
+
- ❌ Invalid JWT signature
334
334
+
- ❌ Missing required claims
335
335
+
336
336
+
---
337
337
+
338
338
+
## Impact Assessment
339
339
+
340
340
+
### Severity: HIGH
341
341
+
342
342
+
**User impact:**
343
343
+
- OAuth authentication fails intermittently based on clock variance
344
344
+
- Affects any OAuth client whose clock is ahead of PDS
345
345
+
- Unpredictable failures (works sometimes, fails other times)
346
346
+
- Poor developer experience (confusing error message)
347
347
+
348
348
+
**Affected scenarios:**
349
349
+
- Docker/Podman registries authenticating to ATCR
350
350
+
- Third-party OAuth clients (tangled.org works only because clock is behind)
351
351
+
- Distributed systems with independent time synchronization
352
352
+
- Cloud environments with clock drift (VMs, containers)
353
353
+
354
354
+
**Current workarounds:**
355
355
+
1. Ensure OAuth client clock is always behind PDS (impractical)
356
356
+
2. Fork indigo library to send older `iat` timestamps (client-side hack)
357
357
+
3. Patch PDS with custom Docker image (deployment complexity)
358
358
+
359
359
+
**None of these are acceptable long-term solutions.**
360
360
+
361
361
+
---
362
362
+
363
363
+
## Recommended Actions
364
364
+
365
365
+
### Immediate (Bluesky Team)
366
366
+
367
367
+
1. **Apply the one-line fix** to `oauth-provider/src/client/client.ts`
368
368
+
2. **Add `clockTolerance: 30`** to the `jwtVerify` call
369
369
+
3. **Publish new version** of `@atproto/oauth-provider` package
370
370
+
4. **Update PDS** to use fixed version
371
371
+
372
372
+
### Short-term (Bluesky Team)
373
373
+
374
374
+
1. **Add configuration option** for `clockTolerance` (allow deployments to adjust)
375
375
+
2. **Document the setting** in PDS configuration docs
376
376
+
3. **Add logging** to track clock skew patterns (for monitoring)
377
377
+
378
378
+
### Long-term (Bluesky Team)
379
379
+
380
380
+
1. **Add comprehensive time validation tests** covering clock skew scenarios
381
381
+
2. **Document OAuth timing requirements** in ATProto spec
382
382
+
3. **Consider** implementing server-provided nonces (DPoP pattern) for stricter validation without clock dependency
383
383
+
384
384
+
### For ATCR Project
385
385
+
386
386
+
**Until upstream fix:**
387
387
+
1. Document this issue in ATCR troubleshooting guide
388
388
+
2. Implement client-side workaround (fork indigo with `-1s` offset in `iat`)
389
389
+
3. Monitor for PDS updates with the fix
390
390
+
391
391
+
**After upstream fix:**
392
392
+
1. Update to fixed PDS version
393
393
+
2. Remove client-side workaround
394
394
+
3. Document resolution in changelog
395
395
+
396
396
+
---
397
397
+
398
398
+
## References
399
399
+
400
400
+
### Official Specifications
401
401
+
402
402
+
1. **ATProto OAuth Specification**
403
403
+
https://atproto.com/specs/oauth
404
404
+
Section: Client Assertion Validation
405
405
+
406
406
+
2. **FAPI 2.0 Security Profile**
407
407
+
https://openid.net/specs/fapi-security-profile-2_0-final.html
408
408
+
Section 5.2.2.1: Authorization Server - Time Validation
409
409
+
410
410
+
3. **RFC 7519 - JSON Web Token (JWT)**
411
411
+
https://datatracker.ietf.org/doc/html/rfc7519
412
412
+
Section 4.1.6: "iat" (Issued At) Claim
413
413
+
414
414
+
4. **RFC 9068 - JWT Profile for OAuth 2.0 Access Tokens**
415
415
+
https://datatracker.ietf.org/doc/rfc9068/
416
416
+
Section 2.2.2: Clock Skew
417
417
+
418
418
+
5. **RFC 9449 - OAuth 2.0 Demonstrating Proof-of-Possession (DPoP)**
419
419
+
https://datatracker.ietf.org/doc/html/rfc9449
420
420
+
Section 4.3: Checking DPoP Proofs
421
421
+
422
422
+
### Library Documentation
423
423
+
424
424
+
6. **panva/jose - JWT Verify Options**
425
425
+
https://github.com/panva/jose/blob/main/docs/jwt/verify/interfaces/JWTVerifyOptions.md
426
426
+
Documentation for `clockTolerance` parameter
427
427
+
428
428
+
7. **jose Source Code - JWT Claims Validation**
429
429
+
https://github.com/panva/jose/blob/main/src/lib/jwt_claims_set.ts
430
430
+
Shows default `clockTolerance = 0` when undefined
431
431
+
432
432
+
### Related Issues
433
433
+
434
434
+
8. **Bluesky atproto Repository**
435
435
+
https://github.com/bluesky-social/atproto
436
436
+
(Issue to be filed with this report)
437
437
+
438
438
+
9. **ATCR Project Documentation**
439
439
+
https://github.com/your-org/atcr
440
440
+
OAuth troubleshooting guide
441
441
+
442
442
+
---
443
443
+
444
444
+
## Appendix: Alternative Solutions Considered
445
445
+
446
446
+
### Option 1: Client-side Workaround (Fork indigo)
447
447
+
448
448
+
**Implementation:** Modify indigo's `NewClientAssertion` to subtract 1 second from `iat`
449
449
+
450
450
+
**Pros:**
451
451
+
- Quick fix for ATCR
452
452
+
- No PDS changes needed
453
453
+
- Full control over timing offset
454
454
+
455
455
+
**Cons:**
456
456
+
- Doesn't fix root cause
457
457
+
- Must maintain fork
458
458
+
- Other OAuth clients still affected
459
459
+
- Not a proper solution
460
460
+
461
461
+
**Verdict:** ⚠️ Temporary workaround only
462
462
+
463
463
+
---
464
464
+
465
465
+
### Option 2: Use Server-Provided Nonces
466
466
+
467
467
+
**Implementation:** PDS provides time-based nonce in error response, client includes in retry
468
468
+
469
469
+
**Pros:**
470
470
+
- Eliminates clock skew dependency entirely
471
471
+
- Stronger security model
472
472
+
- DPoP already uses this pattern
473
473
+
474
474
+
**Cons:**
475
475
+
- Requires significant changes to OAuth flow
476
476
+
- Adds latency (extra round trip)
477
477
+
- Not backward compatible
478
478
+
- Complex implementation
479
479
+
480
480
+
**Verdict:** 🔄 Consider for future enhancement, not immediate fix
481
481
+
482
482
+
---
483
483
+
484
484
+
### Option 3: Disable `maxTokenAge` Validation
485
485
+
486
486
+
**Implementation:** Remove `maxTokenAge` parameter from `jwtVerify` call
487
487
+
488
488
+
**Pros:**
489
489
+
- Eliminates `iat` validation
490
490
+
- Simple one-line change
491
491
+
492
492
+
**Cons:**
493
493
+
- ❌ Removes important security check (token age validation)
494
494
+
- ❌ Allows arbitrarily old tokens to be used
495
495
+
- ❌ Not a proper fix
496
496
+
497
497
+
**Verdict:** ❌ Not recommended - security regression
498
498
+
499
499
+
---
500
500
+
501
501
+
### Option 4: Add `clockTolerance` Parameter (Recommended)
502
502
+
503
503
+
**Implementation:** Add `clockTolerance: 30` to existing `jwtVerify` call
504
504
+
505
505
+
**Pros:**
506
506
+
- ✅ Minimal code change (one line)
507
507
+
- ✅ Fixes root cause
508
508
+
- ✅ Spec-compliant (FAPI 2.0, ATProto, RFCs)
509
509
+
- ✅ Industry standard practice
510
510
+
- ✅ No security regression
511
511
+
- ✅ Benefits all OAuth clients
512
512
+
513
513
+
**Cons:**
514
514
+
- None significant
515
515
+
516
516
+
**Verdict:** ✅ **Recommended solution**
517
517
+
518
518
+
---
519
519
+
520
520
+
## Conclusion
521
521
+
522
522
+
The Bluesky PDS OAuth provider has a clear bug: it validates client assertion JWTs with zero clock tolerance, causing authentication failures for properly synchronized systems with normal clock drift. This violates the ATProto OAuth specification, FAPI 2.0 requirements, and industry best practices.
523
523
+
524
524
+
The fix is trivial (one line of code), has no security downsides, and will improve interoperability for all OAuth clients authenticating to Bluesky PDS instances.
525
525
+
526
526
+
**Recommended action:** Add `clockTolerance: 30` to the `jwtVerify` call in `oauth-provider/src/client/client.ts`.
527
527
+
528
528
+
---
529
529
+
530
530
+
**Report Version:** 1.0
531
531
+
**Last Updated:** 2025-11-18
532
532
+
**Contact:** ATCR Project Team
+433
docs/TROUBLESHOOTING.md
···
1
1
+
# ATCR Troubleshooting Guide
2
2
+
3
3
+
This document provides troubleshooting guidance for common ATCR deployment and operational issues.
4
4
+
5
5
+
## OAuth Authentication Failures
6
6
+
7
7
+
### JWT Timestamp Validation Errors
8
8
+
9
9
+
**Symptom:**
10
10
+
```
11
11
+
error: invalid_client
12
12
+
error_description: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past)
13
13
+
```
14
14
+
15
15
+
**Root Cause:**
16
16
+
The AppView server's system clock is ahead of the PDS server's clock. When the AppView generates a JWT for OAuth client authentication (confidential client mode), the "iat" (issued at) claim appears to be in the future from the PDS's perspective.
17
17
+
18
18
+
**Diagnosis:**
19
19
+
20
20
+
1. Check AppView system time:
21
21
+
```bash
22
22
+
date -u
23
23
+
timedatectl status
24
24
+
```
25
25
+
26
26
+
2. Check if NTP is active and synchronized:
27
27
+
```bash
28
28
+
timedatectl show-timesync --all
29
29
+
```
30
30
+
31
31
+
3. Compare AppView time with PDS time (if accessible):
32
32
+
```bash
33
33
+
# On AppView
34
34
+
date +%s
35
35
+
36
36
+
# On PDS (or via HTTP headers)
37
37
+
curl -I https://your-pds.example.com | grep -i date
38
38
+
```
39
39
+
40
40
+
4. Check AppView logs for clock information (logged at startup):
41
41
+
```bash
42
42
+
docker logs atcr-appview 2>&1 | grep "Configured confidential OAuth client"
43
43
+
```
44
44
+
45
45
+
Example log output:
46
46
+
```
47
47
+
level=INFO msg="Configured confidential OAuth client"
48
48
+
key_id=did:key:z...
49
49
+
system_time_unix=1731844215
50
50
+
system_time_rfc3339=2025-11-17T14:30:15Z
51
51
+
timezone=UTC
52
52
+
```
53
53
+
54
54
+
**Solution:**
55
55
+
56
56
+
1. **Enable NTP synchronization** (recommended):
57
57
+
58
58
+
On most Linux systems using systemd:
59
59
+
```bash
60
60
+
# Enable and start systemd-timesyncd
61
61
+
sudo timedatectl set-ntp true
62
62
+
63
63
+
# Verify NTP is active
64
64
+
timedatectl status
65
65
+
```
66
66
+
67
67
+
Expected output:
68
68
+
```
69
69
+
System clock synchronized: yes
70
70
+
NTP service: active
71
71
+
```
72
72
+
73
73
+
2. **Alternative: Use chrony** (if systemd-timesyncd is not available):
74
74
+
```bash
75
75
+
# Install chrony
76
76
+
sudo apt-get install chrony # Debian/Ubuntu
77
77
+
sudo yum install chrony # RHEL/CentOS
78
78
+
79
79
+
# Enable and start chronyd
80
80
+
sudo systemctl enable chronyd
81
81
+
sudo systemctl start chronyd
82
82
+
83
83
+
# Check sync status
84
84
+
chronyc tracking
85
85
+
```
86
86
+
87
87
+
3. **Force immediate sync**:
88
88
+
```bash
89
89
+
# systemd-timesyncd
90
90
+
sudo systemctl restart systemd-timesyncd
91
91
+
92
92
+
# Or with chrony
93
93
+
sudo chronyc makestep
94
94
+
```
95
95
+
96
96
+
4. **In Docker/Kubernetes environments:**
97
97
+
98
98
+
The container inherits the host's system clock, so fix NTP on the **host** machine:
99
99
+
```bash
100
100
+
# On Docker host
101
101
+
sudo timedatectl set-ntp true
102
102
+
103
103
+
# Restart AppView container to pick up correct time
104
104
+
docker restart atcr-appview
105
105
+
```
106
106
+
107
107
+
5. **Verify clock skew is resolved**:
108
108
+
```bash
109
109
+
# Should show clock offset < 1 second
110
110
+
timedatectl timesync-status
111
111
+
```
112
112
+
113
113
+
**Acceptable Clock Skew:**
114
114
+
- Most OAuth implementations tolerate ±30-60 seconds of clock skew
115
115
+
- DPoP proof validation is typically stricter (±10 seconds)
116
116
+
- Aim for < 1 second skew for reliable operation
117
117
+
118
118
+
**Prevention:**
119
119
+
- Configure NTP synchronization in your infrastructure-as-code (Terraform, Ansible, etc.)
120
120
+
- Monitor clock skew in production (e.g., Prometheus node_exporter includes clock metrics)
121
121
+
- Use managed container platforms (ECS, GKE, AKS) that handle NTP automatically
122
122
+
123
123
+
---
124
124
+
125
125
+
### DPoP Nonce Mismatch Errors
126
126
+
127
127
+
**Symptom:**
128
128
+
```
129
129
+
error: use_dpop_nonce
130
130
+
error_description: DPoP "nonce" mismatch
131
131
+
```
132
132
+
133
133
+
Repeated multiple times, potentially followed by:
134
134
+
```
135
135
+
error: server_error
136
136
+
error_description: Server error
137
137
+
```
138
138
+
139
139
+
**Root Cause:**
140
140
+
DPoP (Demonstrating Proof-of-Possession) requires a server-provided nonce for replay protection. These errors typically occur when:
141
141
+
1. Multiple concurrent requests create a DPoP nonce race condition
142
142
+
2. Clock skew causes DPoP proof timestamps to fail validation
143
143
+
3. PDS session state becomes corrupted after repeated failures
144
144
+
145
145
+
**Diagnosis:**
146
146
+
147
147
+
1. Check if errors occur during concurrent operations:
148
148
+
```bash
149
149
+
# During docker push with multiple layers
150
150
+
docker logs atcr-appview 2>&1 | grep "use_dpop_nonce" | wc -l
151
151
+
```
152
152
+
153
153
+
2. Check for clock skew (see section above):
154
154
+
```bash
155
155
+
timedatectl status
156
156
+
```
157
157
+
158
158
+
3. Look for session lock acquisition in logs:
159
159
+
```bash
160
160
+
docker logs atcr-appview 2>&1 | grep "Acquired session lock"
161
161
+
```
162
162
+
163
163
+
**Solution:**
164
164
+
165
165
+
1. **If caused by clock skew**: Fix NTP synchronization (see section above)
166
166
+
167
167
+
2. **If caused by session corruption**:
168
168
+
```bash
169
169
+
# The AppView will automatically delete corrupted sessions
170
170
+
# User just needs to re-authenticate
171
171
+
docker login atcr.io
172
172
+
```
173
173
+
174
174
+
3. **If persistent despite clock sync**:
175
175
+
- Check PDS health and logs (may be a PDS-side issue)
176
176
+
- Verify network connectivity between AppView and PDS
177
177
+
- Check if PDS supports latest OAuth/DPoP specifications
178
178
+
179
179
+
**What ATCR does automatically:**
180
180
+
- Per-DID locking prevents concurrent DPoP nonce races
181
181
+
- Indigo library automatically retries with fresh nonces
182
182
+
- Sessions are auto-deleted after repeated failures
183
183
+
- Service token cache prevents excessive PDS requests
184
184
+
185
185
+
**Prevention:**
186
186
+
- Ensure reliable NTP synchronization
187
187
+
- Use a stable, well-maintained PDS implementation
188
188
+
- Monitor AppView error rates for DPoP-related issues
189
189
+
190
190
+
---
191
191
+
192
192
+
### OAuth Session Not Found
193
193
+
194
194
+
**Symptom:**
195
195
+
```
196
196
+
error: failed to get OAuth session: no session found for DID
197
197
+
```
198
198
+
199
199
+
**Root Cause:**
200
200
+
- User has never authenticated via OAuth
201
201
+
- OAuth session was deleted due to corruption or expiry
202
202
+
- Database migration cleared sessions
203
203
+
204
204
+
**Solution:**
205
205
+
206
206
+
1. User re-authenticates via OAuth flow:
207
207
+
```bash
208
208
+
docker login atcr.io
209
209
+
# Or for web UI: visit https://atcr.io/login
210
210
+
```
211
211
+
212
212
+
2. If using app passwords (legacy), verify token is cached:
213
213
+
```bash
214
214
+
# Check if app-password token exists
215
215
+
docker logout atcr.io
216
216
+
docker login atcr.io -u your.handle -p your-app-password
217
217
+
```
218
218
+
219
219
+
---
220
220
+
221
221
+
## AppView Deployment Issues
222
222
+
223
223
+
### Client Metadata URL Not Accessible
224
224
+
225
225
+
**Symptom:**
226
226
+
```
227
227
+
error: unauthorized_client
228
228
+
error_description: Client metadata endpoint returned 404
229
229
+
```
230
230
+
231
231
+
**Root Cause:**
232
232
+
PDS cannot fetch OAuth client metadata from `{ATCR_BASE_URL}/client-metadata.json`
233
233
+
234
234
+
**Diagnosis:**
235
235
+
236
236
+
1. Verify client metadata endpoint is accessible:
237
237
+
```bash
238
238
+
curl https://your-atcr-instance.com/client-metadata.json
239
239
+
```
240
240
+
241
241
+
2. Check AppView logs for startup errors:
242
242
+
```bash
243
243
+
docker logs atcr-appview 2>&1 | grep "client-metadata"
244
244
+
```
245
245
+
246
246
+
3. Verify `ATCR_BASE_URL` is set correctly:
247
247
+
```bash
248
248
+
echo $ATCR_BASE_URL
249
249
+
```
250
250
+
251
251
+
**Solution:**
252
252
+
253
253
+
1. Ensure `ATCR_BASE_URL` matches your public URL:
254
254
+
```bash
255
255
+
export ATCR_BASE_URL=https://atcr.example.com
256
256
+
```
257
257
+
258
258
+
2. Verify reverse proxy (nginx, Caddy, etc.) routes `/.well-known/*` and `/client-metadata.json`:
259
259
+
```nginx
260
260
+
location / {
261
261
+
proxy_pass http://localhost:5000;
262
262
+
proxy_set_header Host $host;
263
263
+
proxy_set_header X-Forwarded-Proto $scheme;
264
264
+
}
265
265
+
```
266
266
+
267
267
+
3. Check firewall rules allow inbound HTTPS:
268
268
+
```bash
269
269
+
sudo ufw status
270
270
+
sudo iptables -L -n | grep 443
271
271
+
```
272
272
+
273
273
+
---
274
274
+
275
275
+
## Hold Service Issues
276
276
+
277
277
+
### Blob Storage Connectivity
278
278
+
279
279
+
**Symptom:**
280
280
+
```
281
281
+
error: failed to upload blob: connection refused
282
282
+
```
283
283
+
284
284
+
**Diagnosis:**
285
285
+
286
286
+
1. Check hold service logs:
287
287
+
```bash
288
288
+
docker logs atcr-hold 2>&1 | grep -i error
289
289
+
```
290
290
+
291
291
+
2. Verify S3 credentials are correct:
292
292
+
```bash
293
293
+
# Test S3 access
294
294
+
aws s3 ls s3://your-bucket --endpoint-url=$S3_ENDPOINT
295
295
+
```
296
296
+
297
297
+
3. Check hold configuration:
298
298
+
```bash
299
299
+
env | grep -E "(S3_|AWS_|STORAGE_)"
300
300
+
```
301
301
+
302
302
+
**Solution:**
303
303
+
304
304
+
1. Verify environment variables in hold service:
305
305
+
```bash
306
306
+
export AWS_ACCESS_KEY_ID=your-key
307
307
+
export AWS_SECRET_ACCESS_KEY=your-secret
308
308
+
export S3_BUCKET=your-bucket
309
309
+
export S3_ENDPOINT=https://s3.us-west-2.amazonaws.com
310
310
+
```
311
311
+
312
312
+
2. Test S3 connectivity from hold container:
313
313
+
```bash
314
314
+
docker exec atcr-hold curl -v $S3_ENDPOINT
315
315
+
```
316
316
+
317
317
+
3. Check S3 bucket permissions (requires PutObject, GetObject, DeleteObject)
318
318
+
319
319
+
---
320
320
+
321
321
+
## Performance Issues
322
322
+
323
323
+
### High Database Lock Contention
324
324
+
325
325
+
**Symptom:**
326
326
+
Slow Docker push/pull operations, high CPU usage on AppView
327
327
+
328
328
+
**Diagnosis:**
329
329
+
330
330
+
1. Check SQLite database size:
331
331
+
```bash
332
332
+
ls -lh /var/lib/atcr/ui.db
333
333
+
```
334
334
+
335
335
+
2. Look for long-running queries:
336
336
+
```bash
337
337
+
docker logs atcr-appview 2>&1 | grep "database is locked"
338
338
+
```
339
339
+
340
340
+
**Solution:**
341
341
+
342
342
+
1. For production, migrate to PostgreSQL (recommended):
343
343
+
```bash
344
344
+
export ATCR_UI_DATABASE_TYPE=postgres
345
345
+
export ATCR_UI_DATABASE_URL=postgresql://user:pass@localhost/atcr
346
346
+
```
347
347
+
348
348
+
2. Or increase SQLite busy timeout:
349
349
+
```go
350
350
+
// In code: db.SetMaxOpenConns(1) for SQLite
351
351
+
```
352
352
+
353
353
+
3. Vacuum the database to reclaim space:
354
354
+
```bash
355
355
+
sqlite3 /var/lib/atcr/ui.db "VACUUM;"
356
356
+
```
357
357
+
358
358
+
---
359
359
+
360
360
+
## Logging and Debugging
361
361
+
362
362
+
### Enable Debug Logging
363
363
+
364
364
+
Set log level to debug for detailed troubleshooting:
365
365
+
366
366
+
```bash
367
367
+
export ATCR_LOG_LEVEL=debug
368
368
+
docker restart atcr-appview
369
369
+
```
370
370
+
371
371
+
### Useful Log Queries
372
372
+
373
373
+
**OAuth token exchange errors:**
374
374
+
```bash
375
375
+
docker logs atcr-appview 2>&1 | grep "OAuth callback failed"
376
376
+
```
377
377
+
378
378
+
**Service token request failures:**
379
379
+
```bash
380
380
+
docker logs atcr-appview 2>&1 | grep "OAuth authentication failed during service token request"
381
381
+
```
382
382
+
383
383
+
**Clock diagnostics:**
384
384
+
```bash
385
385
+
docker logs atcr-appview 2>&1 | grep "system_time"
386
386
+
```
387
387
+
388
388
+
**DPoP nonce issues:**
389
389
+
```bash
390
390
+
docker logs atcr-appview 2>&1 | grep -E "(use_dpop_nonce|DPoP)"
391
391
+
```
392
392
+
393
393
+
### Health Checks
394
394
+
395
395
+
**AppView health:**
396
396
+
```bash
397
397
+
curl http://localhost:5000/v2/
398
398
+
# Should return: {"errors":[{"code":"UNAUTHORIZED",...}]}
399
399
+
```
400
400
+
401
401
+
**Hold service health:**
402
402
+
```bash
403
403
+
curl http://localhost:8080/.well-known/did.json
404
404
+
# Should return DID document
405
405
+
```
406
406
+
407
407
+
---
408
408
+
409
409
+
## Getting Help
410
410
+
411
411
+
If issues persist after following this guide:
412
412
+
413
413
+
1. **Check GitHub Issues**: https://github.com/ericvolp12/atcr/issues
414
414
+
2. **Collect logs**: Include output from `docker logs` for AppView and Hold services
415
415
+
3. **Include diagnostics**:
416
416
+
- `timedatectl status` output
417
417
+
- AppView version: `docker exec atcr-appview cat /VERSION` (if available)
418
418
+
- PDS version and implementation (Bluesky PDS, other)
419
419
+
4. **File an issue** with reproducible steps
420
420
+
421
421
+
---
422
422
+
423
423
+
## Common Error Reference
424
424
+
425
425
+
| Error Code | Component | Common Cause | Fix |
426
426
+
|------------|-----------|--------------|-----|
427
427
+
| `invalid_client` (iat timestamp) | OAuth | Clock skew | Enable NTP sync |
428
428
+
| `use_dpop_nonce` | OAuth/DPoP | Concurrent requests or clock skew | Fix NTP, wait for auto-retry |
429
429
+
| `server_error` (500) | PDS | PDS internal error | Check PDS logs |
430
430
+
| `invalid_grant` | OAuth | Expired auth code | Retry OAuth flow |
431
431
+
| `unauthorized_client` | OAuth | Client metadata unreachable | Check ATCR_BASE_URL and firewall |
432
432
+
| `RecordNotFound` | ATProto | Manifest doesn't exist | Verify repository name |
433
433
+
| Connection refused | Hold/S3 | Network/credentials | Check S3 config and connectivity |
+18
pkg/appview/handlers/api.go
···
63
63
// Write star record to user's PDS
64
64
_, err = pdsClient.PutRecord(r.Context(), atproto.StarCollection, rkey, starRecord)
65
65
if err != nil {
66
66
+
// Check if OAuth error - if so, invalidate sessions and return 401
67
67
+
if handleOAuthError(r.Context(), h.Refresher, user.DID, err) {
68
68
+
http.Error(w, "Authentication failed, please log in again", http.StatusUnauthorized)
69
69
+
return
70
70
+
}
66
71
slog.Error("Failed to create star record", "error", err)
67
72
http.Error(w, fmt.Sprintf("Failed to create star: %v", err), http.StatusInternalServerError)
68
73
return
···
121
126
if err != nil {
122
127
// If record doesn't exist, still return success (idempotent)
123
128
if !errors.Is(err, atproto.ErrRecordNotFound) {
129
129
+
// Check if OAuth error - if so, invalidate sessions and return 401
130
130
+
if handleOAuthError(r.Context(), h.Refresher, user.DID, err) {
131
131
+
http.Error(w, "Authentication failed, please log in again", http.StatusUnauthorized)
132
132
+
return
133
133
+
}
124
134
slog.Error("Failed to delete star record", "error", err)
125
135
http.Error(w, fmt.Sprintf("Failed to delete star: %v", err), http.StatusInternalServerError)
126
136
return
···
179
189
// Check if star record exists
180
190
rkey := atproto.StarRecordKey(ownerDID, repository)
181
191
_, err = pdsClient.GetRecord(r.Context(), atproto.StarCollection, rkey)
192
192
+
193
193
+
// Check if OAuth error - if so, invalidate sessions
194
194
+
if err != nil && handleOAuthError(r.Context(), h.Refresher, user.DID, err) {
195
195
+
// For a read operation, just return not starred instead of error
196
196
+
w.Header().Set("Content-Type", "application/json")
197
197
+
json.NewEncoder(w).Encode(map[string]bool{"starred": false})
198
198
+
return
199
199
+
}
182
200
183
201
starred := err == nil
184
202
+15
pkg/appview/handlers/images.go
···
47
47
48
48
// Delete from PDS first
49
49
if err := pdsClient.DeleteRecord(r.Context(), atproto.TagCollection, rkey); err != nil {
50
50
+
// Check if OAuth error - if so, invalidate sessions and return 401
51
51
+
if handleOAuthError(r.Context(), h.Refresher, user.DID, err) {
52
52
+
http.Error(w, "Authentication failed, please log in again", http.StatusUnauthorized)
53
53
+
return
54
54
+
}
50
55
http.Error(w, fmt.Sprintf("Failed to delete tag from PDS: %v", err), http.StatusInternalServerError)
51
56
return
52
57
}
···
127
132
// Delete from PDS
128
133
tagRKey := fmt.Sprintf("%s:%s", repo, tag)
129
134
if err := pdsClient.DeleteRecord(r.Context(), atproto.TagCollection, tagRKey); err != nil {
135
135
+
// Check if OAuth error - if so, invalidate sessions and return 401
136
136
+
if handleOAuthError(r.Context(), h.Refresher, user.DID, err) {
137
137
+
http.Error(w, "Authentication failed, please log in again", http.StatusUnauthorized)
138
138
+
return
139
139
+
}
130
140
http.Error(w, fmt.Sprintf("Failed to delete tag '%s' from PDS: %v", tag, err), http.StatusInternalServerError)
131
141
return
132
142
}
···
144
154
145
155
// Delete from PDS first
146
156
if err := pdsClient.DeleteRecord(r.Context(), atproto.ManifestCollection, rkey); err != nil {
157
157
+
// Check if OAuth error - if so, invalidate sessions and return 401
158
158
+
if handleOAuthError(r.Context(), h.Refresher, user.DID, err) {
159
159
+
http.Error(w, "Authentication failed, please log in again", http.StatusUnauthorized)
160
160
+
return
161
161
+
}
147
162
http.Error(w, fmt.Sprintf("Failed to delete manifest from PDS: %v", err), http.StatusInternalServerError)
148
163
return
149
164
}
+6
-38
pkg/appview/handlers/logout.go
···
1
1
package handlers
2
2
3
3
import (
4
4
-
"log/slog"
5
4
"net/http"
6
5
7
6
"atcr.io/pkg/appview/db"
8
8
-
"atcr.io/pkg/auth/oauth"
9
9
-
indigooauth "github.com/bluesky-social/indigo/atproto/auth/oauth"
10
10
-
"github.com/bluesky-social/indigo/atproto/syntax"
11
7
)
12
8
13
13
-
// LogoutHandler handles user logout with proper OAuth token revocation
9
9
+
// LogoutHandler handles user logout from the web UI
10
10
+
// This only clears the current UI session cookie - it does NOT revoke OAuth tokens
11
11
+
// OAuth sessions remain intact so other browser tabs/devices stay logged in
14
12
type LogoutHandler struct {
15
15
-
OAuthClientApp *indigooauth.ClientApp
16
16
-
Refresher *oauth.Refresher
17
17
-
SessionStore *db.SessionStore
18
18
-
OAuthStore *db.OAuthStore
13
13
+
SessionStore *db.SessionStore
19
14
}
20
15
21
16
func (h *LogoutHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) {
···
27
22
return
28
23
}
29
24
30
30
-
// Get UI session to extract OAuth session ID and user info
31
31
-
uiSession, ok := h.SessionStore.Get(uiSessionID)
32
32
-
if ok && uiSession != nil && uiSession.DID != "" {
33
33
-
// Parse DID for OAuth logout
34
34
-
did, err := syntax.ParseDID(uiSession.DID)
35
35
-
if err != nil {
36
36
-
slog.Warn("Failed to parse DID for logout", "component", "logout", "did", uiSession.DID, "error", err)
37
37
-
} else {
38
38
-
// Attempt to revoke OAuth tokens on PDS side
39
39
-
if uiSession.OAuthSessionID != "" {
40
40
-
// Call indigo's Logout to revoke tokens on PDS
41
41
-
if err := h.OAuthClientApp.Logout(r.Context(), did, uiSession.OAuthSessionID); err != nil {
42
42
-
// Log error but don't block logout - best effort revocation
43
43
-
slog.Warn("Failed to revoke OAuth tokens on PDS", "component", "logout", "did", uiSession.DID, "error", err)
44
44
-
} else {
45
45
-
slog.Info("Successfully revoked OAuth tokens on PDS", "component", "logout", "did", uiSession.DID)
46
46
-
}
47
47
-
48
48
-
// Delete OAuth session from database (cleanup, might already be done by Logout)
49
49
-
if err := h.OAuthStore.DeleteSession(r.Context(), did, uiSession.OAuthSessionID); err != nil {
50
50
-
slog.Warn("Failed to delete OAuth session from database", "component", "logout", "error", err)
51
51
-
}
52
52
-
} else {
53
53
-
slog.Warn("No OAuth session ID found for user", "component", "logout", "did", uiSession.DID)
54
54
-
}
55
55
-
}
56
56
-
}
57
57
-
58
58
-
// Always delete UI session and clear cookie, even if OAuth revocation failed
25
25
+
// Delete only this UI session and clear cookie
26
26
+
// OAuth session remains intact for other browser tabs/devices
59
27
h.SessionStore.Delete(uiSessionID)
60
28
db.ClearCookie(w)
61
29
-1
pkg/appview/handlers/logout_test.go
···
57
57
58
58
handler := &LogoutHandler{
59
59
SessionStore: sessionStore,
60
60
-
OAuthStore: db.NewOAuthStore(database),
61
60
}
62
61
63
62
req := httptest.NewRequest("GET", "/auth/logout", nil)
+49
pkg/appview/handlers/oauth_errors.go
···
1
1
+
package handlers
2
2
+
3
3
+
import (
4
4
+
"context"
5
5
+
"log/slog"
6
6
+
"strings"
7
7
+
8
8
+
"atcr.io/pkg/auth/oauth"
9
9
+
)
10
10
+
11
11
+
// isOAuthError checks if an error indicates OAuth authentication failure
12
12
+
// These errors indicate the OAuth session is invalid and should be cleaned up
13
13
+
func isOAuthError(err error) bool {
14
14
+
if err == nil {
15
15
+
return false
16
16
+
}
17
17
+
errStr := strings.ToLower(err.Error())
18
18
+
return strings.Contains(errStr, "401") ||
19
19
+
strings.Contains(errStr, "403") ||
20
20
+
strings.Contains(errStr, "invalid_token") ||
21
21
+
strings.Contains(errStr, "invalid_grant") ||
22
22
+
strings.Contains(errStr, "use_dpop_nonce") ||
23
23
+
strings.Contains(errStr, "unauthorized") ||
24
24
+
strings.Contains(errStr, "token") && strings.Contains(errStr, "expired") ||
25
25
+
strings.Contains(errStr, "authentication failed")
26
26
+
}
27
27
+
28
28
+
// handleOAuthError checks if an error is OAuth-related and invalidates UI sessions if so
29
29
+
// Returns true if the error was an OAuth error (caller should return early)
30
30
+
func handleOAuthError(ctx context.Context, refresher *oauth.Refresher, did string, err error) bool {
31
31
+
if !isOAuthError(err) {
32
32
+
return false
33
33
+
}
34
34
+
35
35
+
slog.Warn("OAuth error detected, invalidating sessions",
36
36
+
"component", "handlers",
37
37
+
"did", did,
38
38
+
"error", err)
39
39
+
40
40
+
// Invalidate all UI sessions for this DID
41
41
+
if delErr := refresher.DeleteSession(ctx, did); delErr != nil {
42
42
+
slog.Warn("Failed to delete OAuth session after error",
43
43
+
"component", "handlers",
44
44
+
"did", did,
45
45
+
"error", delErr)
46
46
+
}
47
47
+
48
48
+
return true
49
49
+
}
+221
-54
pkg/appview/middleware/registry.go
···
7
7
"log/slog"
8
8
"net/http"
9
9
"strings"
10
10
+
"sync"
11
11
+
"time"
10
12
11
13
"github.com/distribution/distribution/v3"
12
14
"github.com/distribution/distribution/v3/registry/api/errcode"
···
27
29
// authMethodKey is the context key for storing auth method from JWT
28
30
const authMethodKey contextKey = "auth.method"
29
31
32
32
+
// validationCacheEntry stores a validated service token with expiration
33
33
+
type validationCacheEntry struct {
34
34
+
serviceToken string
35
35
+
validUntil time.Time
36
36
+
err error // Cached error for fast-fail
37
37
+
mu sync.Mutex // Per-entry lock to serialize cache population
38
38
+
inFlight bool // True if another goroutine is fetching the token
39
39
+
done chan struct{} // Closed when fetch completes
40
40
+
}
41
41
+
42
42
+
// validationCache provides request-level caching for service tokens
43
43
+
// This prevents concurrent layer uploads from racing on OAuth/DPoP requests
44
44
+
type validationCache struct {
45
45
+
mu sync.RWMutex
46
46
+
entries map[string]*validationCacheEntry // key: "did:holdDID"
47
47
+
}
48
48
+
49
49
+
// newValidationCache creates a new validation cache
50
50
+
func newValidationCache() *validationCache {
51
51
+
return &validationCache{
52
52
+
entries: make(map[string]*validationCacheEntry),
53
53
+
}
54
54
+
}
55
55
+
56
56
+
// getOrFetch retrieves a service token from cache or fetches it
57
57
+
// Multiple concurrent requests for the same DID:holdDID will share the fetch operation
58
58
+
func (vc *validationCache) getOrFetch(ctx context.Context, cacheKey string, fetchFunc func() (string, error)) (string, error) {
59
59
+
// Fast path: check cache with read lock
60
60
+
vc.mu.RLock()
61
61
+
entry, exists := vc.entries[cacheKey]
62
62
+
vc.mu.RUnlock()
63
63
+
64
64
+
if exists {
65
65
+
// Entry exists, check if it's still valid
66
66
+
entry.mu.Lock()
67
67
+
68
68
+
// If another goroutine is fetching, wait for it
69
69
+
if entry.inFlight {
70
70
+
done := entry.done
71
71
+
entry.mu.Unlock()
72
72
+
73
73
+
select {
74
74
+
case <-done:
75
75
+
// Fetch completed, check result
76
76
+
entry.mu.Lock()
77
77
+
defer entry.mu.Unlock()
78
78
+
79
79
+
if entry.err != nil {
80
80
+
return "", entry.err
81
81
+
}
82
82
+
if time.Now().Before(entry.validUntil) {
83
83
+
return entry.serviceToken, nil
84
84
+
}
85
85
+
// Fall through to refetch
86
86
+
case <-ctx.Done():
87
87
+
return "", ctx.Err()
88
88
+
}
89
89
+
} else {
90
90
+
// Check if cached token is still valid
91
91
+
if entry.err != nil && time.Now().Before(entry.validUntil) {
92
92
+
// Return cached error (fast-fail)
93
93
+
entry.mu.Unlock()
94
94
+
return "", entry.err
95
95
+
}
96
96
+
if entry.err == nil && time.Now().Before(entry.validUntil) {
97
97
+
// Return cached token
98
98
+
token := entry.serviceToken
99
99
+
entry.mu.Unlock()
100
100
+
return token, nil
101
101
+
}
102
102
+
entry.mu.Unlock()
103
103
+
}
104
104
+
}
105
105
+
106
106
+
// Slow path: need to fetch token
107
107
+
vc.mu.Lock()
108
108
+
entry, exists = vc.entries[cacheKey]
109
109
+
if !exists {
110
110
+
// Create new entry
111
111
+
entry = &validationCacheEntry{
112
112
+
inFlight: true,
113
113
+
done: make(chan struct{}),
114
114
+
}
115
115
+
vc.entries[cacheKey] = entry
116
116
+
}
117
117
+
vc.mu.Unlock()
118
118
+
119
119
+
// Lock the entry to perform fetch
120
120
+
entry.mu.Lock()
121
121
+
122
122
+
// Double-check: another goroutine may have fetched while we waited
123
123
+
if !entry.inFlight {
124
124
+
if entry.err != nil && time.Now().Before(entry.validUntil) {
125
125
+
err := entry.err
126
126
+
entry.mu.Unlock()
127
127
+
return "", err
128
128
+
}
129
129
+
if entry.err == nil && time.Now().Before(entry.validUntil) {
130
130
+
token := entry.serviceToken
131
131
+
entry.mu.Unlock()
132
132
+
return token, nil
133
133
+
}
134
134
+
}
135
135
+
136
136
+
// Mark as in-flight and create done channel if needed
137
137
+
if entry.done == nil {
138
138
+
entry.done = make(chan struct{})
139
139
+
}
140
140
+
entry.inFlight = true
141
141
+
done := entry.done
142
142
+
entry.mu.Unlock()
143
143
+
144
144
+
// Perform the fetch (outside the lock to allow other operations)
145
145
+
serviceToken, err := fetchFunc()
146
146
+
147
147
+
// Update the entry with result
148
148
+
entry.mu.Lock()
149
149
+
entry.inFlight = false
150
150
+
151
151
+
if err != nil {
152
152
+
// Cache errors for 5 seconds (fast-fail for subsequent requests)
153
153
+
entry.err = err
154
154
+
entry.validUntil = time.Now().Add(5 * time.Second)
155
155
+
entry.serviceToken = ""
156
156
+
} else {
157
157
+
// Cache token for 45 seconds (covers typical Docker push operation)
158
158
+
entry.err = nil
159
159
+
entry.serviceToken = serviceToken
160
160
+
entry.validUntil = time.Now().Add(45 * time.Second)
161
161
+
}
162
162
+
163
163
+
// Signal completion to waiting goroutines
164
164
+
close(done)
165
165
+
entry.mu.Unlock()
166
166
+
167
167
+
return serviceToken, err
168
168
+
}
169
169
+
30
170
// Global variables for initialization only
31
171
// These are set by main.go during startup and copied into NamespaceResolver instances.
32
172
// After initialization, request handling uses the NamespaceResolver's instance fields.
···
69
209
// NamespaceResolver wraps a namespace and resolves names
70
210
type NamespaceResolver struct {
71
211
distribution.Namespace
72
72
-
defaultHoldDID string // Default hold DID (e.g., "did:web:hold01.atcr.io")
73
73
-
baseURL string // Base URL for error messages (e.g., "https://atcr.io")
74
74
-
testMode bool // If true, fallback to default hold when user's hold is unreachable
75
75
-
refresher *oauth.Refresher // OAuth session manager (copied from global on init)
76
76
-
database storage.DatabaseMetrics // Metrics database (copied from global on init)
77
77
-
authorizer auth.HoldAuthorizer // Hold authorization (copied from global on init)
78
78
-
readmeCache storage.ReadmeCache // README cache (copied from global on init)
212
212
+
defaultHoldDID string // Default hold DID (e.g., "did:web:hold01.atcr.io")
213
213
+
baseURL string // Base URL for error messages (e.g., "https://atcr.io")
214
214
+
testMode bool // If true, fallback to default hold when user's hold is unreachable
215
215
+
refresher *oauth.Refresher // OAuth session manager (copied from global on init)
216
216
+
database storage.DatabaseMetrics // Metrics database (copied from global on init)
217
217
+
authorizer auth.HoldAuthorizer // Hold authorization (copied from global on init)
218
218
+
readmeCache storage.ReadmeCache // README cache (copied from global on init)
219
219
+
validationCache *validationCache // Request-level service token cache
79
220
}
80
221
81
222
// initATProtoResolver initializes the name resolution middleware
···
102
243
// Copy shared services from globals into the instance
103
244
// This avoids accessing globals during request handling
104
245
return &NamespaceResolver{
105
105
-
Namespace: ns,
106
106
-
defaultHoldDID: defaultHoldDID,
107
107
-
baseURL: baseURL,
108
108
-
testMode: testMode,
109
109
-
refresher: globalRefresher,
110
110
-
database: globalDatabase,
111
111
-
authorizer: globalAuthorizer,
112
112
-
readmeCache: globalReadmeCache,
246
246
+
Namespace: ns,
247
247
+
defaultHoldDID: defaultHoldDID,
248
248
+
baseURL: baseURL,
249
249
+
testMode: testMode,
250
250
+
refresher: globalRefresher,
251
251
+
database: globalDatabase,
252
252
+
authorizer: globalAuthorizer,
253
253
+
readmeCache: globalReadmeCache,
254
254
+
validationCache: newValidationCache(),
113
255
}, nil
114
256
}
115
257
···
165
307
}(ctx, client, nr.refresher, holdDID)
166
308
}
167
309
168
168
-
// Get service token for hold authentication
310
310
+
// Get service token for hold authentication (only if authenticated)
311
311
+
// Use validation cache to prevent concurrent requests from racing on OAuth/DPoP
169
312
// Route based on auth method from JWT token
170
313
var serviceToken string
171
314
authMethod, _ := ctx.Value(authMethodKey).(string)
172
315
173
173
-
if authMethod == token.AuthMethodAppPassword {
174
174
-
// App-password flow: use Bearer token authentication
175
175
-
slog.Debug("Using app-password flow for service token",
176
176
-
"component", "registry/middleware",
177
177
-
"did", did)
316
316
+
// Only fetch service token if user is authenticated
317
317
+
// Unauthenticated requests (like /v2/ ping) should not trigger token fetching
318
318
+
if authMethod != "" {
319
319
+
// Create cache key: "did:holdDID"
320
320
+
cacheKey := fmt.Sprintf("%s:%s", did, holdDID)
178
321
179
179
-
var err error
180
180
-
serviceToken, err = token.GetOrFetchServiceTokenWithAppPassword(ctx, did, holdDID, pdsEndpoint)
181
181
-
if err != nil {
182
182
-
slog.Error("Failed to get service token with app-password",
183
183
-
"component", "registry/middleware",
184
184
-
"did", did,
185
185
-
"holdDID", holdDID,
186
186
-
"pdsEndpoint", pdsEndpoint,
187
187
-
"error", err)
322
322
+
// Fetch service token through validation cache
323
323
+
// This ensures only ONE request per DID:holdDID pair fetches the token
324
324
+
// Concurrent requests will wait for the first request to complete
325
325
+
var fetchErr error
326
326
+
serviceToken, fetchErr = nr.validationCache.getOrFetch(ctx, cacheKey, func() (string, error) {
327
327
+
if authMethod == token.AuthMethodAppPassword {
328
328
+
// App-password flow: use Bearer token authentication
329
329
+
slog.Debug("Using app-password flow for service token",
330
330
+
"component", "registry/middleware",
331
331
+
"did", did,
332
332
+
"cacheKey", cacheKey)
333
333
+
334
334
+
token, err := token.GetOrFetchServiceTokenWithAppPassword(ctx, did, holdDID, pdsEndpoint)
335
335
+
if err != nil {
336
336
+
slog.Error("Failed to get service token with app-password",
337
337
+
"component", "registry/middleware",
338
338
+
"did", did,
339
339
+
"holdDID", holdDID,
340
340
+
"pdsEndpoint", pdsEndpoint,
341
341
+
"error", err)
342
342
+
return "", err
343
343
+
}
344
344
+
return token, nil
345
345
+
} else if nr.refresher != nil {
346
346
+
// OAuth flow: use DPoP authentication
347
347
+
slog.Debug("Using OAuth flow for service token",
348
348
+
"component", "registry/middleware",
349
349
+
"did", did,
350
350
+
"cacheKey", cacheKey)
188
351
189
189
-
// Check if app-password is expired/invalid
190
190
-
errMsg := err.Error()
191
191
-
if strings.Contains(errMsg, "expired or invalid") || strings.Contains(errMsg, "no app-password") {
192
192
-
return nil, nr.authErrorMessage("App-password authentication failed. Please re-authenticate with: docker login")
352
352
+
token, err := token.GetOrFetchServiceToken(ctx, nr.refresher, did, holdDID, pdsEndpoint)
353
353
+
if err != nil {
354
354
+
slog.Error("Failed to get service token with OAuth",
355
355
+
"component", "registry/middleware",
356
356
+
"did", did,
357
357
+
"holdDID", holdDID,
358
358
+
"pdsEndpoint", pdsEndpoint,
359
359
+
"error", err)
360
360
+
return "", err
361
361
+
}
362
362
+
return token, nil
193
363
}
364
364
+
return "", fmt.Errorf("no authentication method available")
365
365
+
})
194
366
195
195
-
// Generic service token error
196
196
-
return nil, nr.authErrorMessage(fmt.Sprintf("Failed to obtain storage credentials: %v", err))
197
197
-
}
198
198
-
} else if nr.refresher != nil {
199
199
-
// OAuth flow: use DPoP authentication
200
200
-
slog.Debug("Using OAuth flow for service token",
201
201
-
"component", "registry/middleware",
202
202
-
"did", did)
367
367
+
// Handle errors from cached fetch
368
368
+
if fetchErr != nil {
369
369
+
errMsg := fetchErr.Error()
203
370
204
204
-
var err error
205
205
-
serviceToken, err = token.GetOrFetchServiceToken(ctx, nr.refresher, did, holdDID, pdsEndpoint)
206
206
-
if err != nil {
207
207
-
slog.Error("Failed to get service token with OAuth",
208
208
-
"component", "registry/middleware",
209
209
-
"did", did,
210
210
-
"holdDID", holdDID,
211
211
-
"pdsEndpoint", pdsEndpoint,
212
212
-
"error", err)
371
371
+
// Check for app-password specific errors
372
372
+
if authMethod == token.AuthMethodAppPassword {
373
373
+
if strings.Contains(errMsg, "expired or invalid") || strings.Contains(errMsg, "no app-password") {
374
374
+
return nil, nr.authErrorMessage("App-password authentication failed. Please re-authenticate with: docker login")
375
375
+
}
376
376
+
}
213
377
214
214
-
// Check if this is likely an OAuth session expiration
215
215
-
errMsg := err.Error()
378
378
+
// Check for OAuth specific errors
216
379
if strings.Contains(errMsg, "OAuth session") || strings.Contains(errMsg, "OAuth validation") {
217
380
return nil, nr.authErrorMessage("OAuth session expired or invalidated by PDS. Your session has been cleared")
218
381
}
219
382
220
383
// Generic service token error
221
221
-
return nil, nr.authErrorMessage(fmt.Sprintf("Failed to obtain storage credentials: %v", err))
384
384
+
return nil, nr.authErrorMessage(fmt.Sprintf("Failed to obtain storage credentials: %v", fetchErr))
222
385
}
386
386
+
} else {
387
387
+
slog.Debug("Skipping service token fetch for unauthenticated request",
388
388
+
"component", "registry/middleware",
389
389
+
"did", did)
223
390
}
224
391
225
392
// Create a new reference with identity/image format
+3
-5
pkg/appview/routes/routes.go
···
201
201
})
202
202
203
203
// Logout endpoint (supports both GET and POST)
204
204
-
// Properly revokes OAuth tokens on PDS side before clearing local session
204
204
+
// Only clears the current UI session cookie - does NOT revoke OAuth tokens
205
205
+
// OAuth sessions remain intact so other browser tabs/devices stay logged in
205
206
logoutHandler := &uihandlers.LogoutHandler{
206
206
-
OAuthClientApp: deps.OAuthClientApp,
207
207
-
Refresher: deps.Refresher,
208
208
-
SessionStore: deps.SessionStore,
209
209
-
OAuthStore: deps.OAuthStore,
207
207
+
SessionStore: deps.SessionStore,
210
208
}
211
209
router.Get("/auth/logout", logoutHandler.ServeHTTP)
212
210
router.Post("/auth/logout", logoutHandler.ServeHTTP)
+17
-2
pkg/auth/oauth/client.go
···
27
27
28
28
// If production (not localhost), automatically set up confidential client
29
29
if !isLocalhost(baseURL) {
30
30
-
clientID := baseURL + "/client-metadata.json"
30
30
+
clientID := baseURL + "/oauth-client-metadata.json"
31
31
config = oauth.NewPublicConfig(clientID, redirectURI, scopes)
32
32
33
33
// Generate or load P-256 key
···
47
47
return nil, fmt.Errorf("failed to configure confidential client: %w", err)
48
48
}
49
49
50
50
-
slog.Info("Configured confidential OAuth client", "key_id", keyID, "key_path", keyPath)
50
50
+
// Log clock information for debugging timestamp issues
51
51
+
now := time.Now()
52
52
+
slog.Info("Configured confidential OAuth client",
53
53
+
"key_id", keyID,
54
54
+
"key_path", keyPath,
55
55
+
"system_time_unix", now.Unix(),
56
56
+
"system_time_rfc3339", now.Format(time.RFC3339),
57
57
+
"timezone", now.Location().String())
51
58
} else {
52
59
config = oauth.NewLocalhostConfig(redirectURI, scopes)
53
60
···
239
246
// Delete the session from database since scopes have changed
240
247
if err := r.clientApp.Store.DeleteSession(ctx, accountDID, sessionID); err != nil {
241
248
slog.Warn("Failed to delete session with mismatched scopes", "error", err, "did", did)
249
249
+
}
250
250
+
251
251
+
// Also invalidate UI sessions since OAuth is now invalid
252
252
+
if r.uiSessionStore != nil {
253
253
+
r.uiSessionStore.DeleteByDID(did)
254
254
+
slog.Info("Invalidated UI sessions due to scope mismatch",
255
255
+
"component", "oauth/refresher",
256
256
+
"did", did)
242
257
}
243
258
244
259
return nil, fmt.Errorf("OAuth scopes changed, re-authentication required")
+50
pkg/auth/oauth/server.go
···
2
2
3
3
import (
4
4
"context"
5
5
+
"errors"
5
6
"fmt"
6
7
"html/template"
7
8
"log/slog"
···
10
11
"time"
11
12
12
13
"atcr.io/pkg/atproto"
14
14
+
"github.com/bluesky-social/indigo/atproto/atclient"
13
15
"github.com/bluesky-social/indigo/atproto/auth/oauth"
14
16
)
15
17
16
18
// UISessionStore is the interface for UI session management
17
19
// UISessionStore is defined in client.go (session management section)
20
20
+
21
21
+
// getOAuthErrorHint provides troubleshooting hints for OAuth errors during token exchange
22
22
+
func getOAuthErrorHint(apiErr *atclient.APIError) string {
23
23
+
switch apiErr.Name {
24
24
+
case "invalid_client":
25
25
+
if strings.Contains(apiErr.Message, "iat") && strings.Contains(apiErr.Message, "timestamp") {
26
26
+
return "JWT timestamp validation failed - AppView system clock may be ahead of PDS clock. Check NTP sync: timedatectl status. Typical tolerance is ±30 seconds."
27
27
+
}
28
28
+
return "OAuth client authentication failed during token exchange - check client key and PDS OAuth configuration"
29
29
+
case "invalid_grant":
30
30
+
return "Authorization code is invalid, expired, or already used - user should retry OAuth flow from beginning"
31
31
+
case "use_dpop_nonce":
32
32
+
return "DPoP nonce challenge during token exchange - indigo should retry automatically, persistent failures indicate PDS issue"
33
33
+
case "invalid_dpop_proof":
34
34
+
return "DPoP proof validation failed - check system clock sync between AppView and PDS"
35
35
+
case "unauthorized_client":
36
36
+
return "PDS rejected the client - check client metadata URL is accessible and scopes are supported"
37
37
+
case "invalid_request":
38
38
+
return "Malformed token request - check OAuth flow parameters (code, redirect_uri, state)"
39
39
+
case "server_error":
40
40
+
return "PDS internal error during token exchange - check PDS logs for root cause"
41
41
+
default:
42
42
+
if apiErr.StatusCode == 400 {
43
43
+
return "Bad request during OAuth token exchange - check error details and PDS logs"
44
44
+
}
45
45
+
return "OAuth token exchange failed - see errorName and errorMessage for PDS response"
46
46
+
}
47
47
+
}
18
48
19
49
// UserStore is the interface for user management
20
50
type UserStore interface {
···
112
142
}
113
143
114
144
// Process OAuth callback via indigo (handles state validation internally)
145
145
+
// This performs token exchange with the PDS using authorization code
115
146
sessionData, err := s.clientApp.ProcessCallback(r.Context(), r.URL.Query())
116
147
if err != nil {
148
148
+
// Detailed error logging for token exchange failures
149
149
+
var apiErr *atclient.APIError
150
150
+
if errors.As(err, &apiErr) {
151
151
+
slog.Error("OAuth callback failed - token exchange error",
152
152
+
"component", "oauth/server",
153
153
+
"error", err,
154
154
+
"httpStatus", apiErr.StatusCode,
155
155
+
"errorName", apiErr.Name,
156
156
+
"errorMessage", apiErr.Message,
157
157
+
"hint", getOAuthErrorHint(apiErr),
158
158
+
"queryParams", r.URL.Query().Encode())
159
159
+
} else {
160
160
+
slog.Error("OAuth callback failed - unknown error",
161
161
+
"component", "oauth/server",
162
162
+
"error", err,
163
163
+
"errorType", fmt.Sprintf("%T", err),
164
164
+
"queryParams", r.URL.Query().Encode())
165
165
+
}
166
166
+
117
167
s.renderError(w, fmt.Sprintf("Failed to process OAuth callback: %v", err))
118
168
return
119
169
}
+81
-16
pkg/auth/token/servicetoken.go
···
3
3
import (
4
4
"context"
5
5
"encoding/json"
6
6
+
"errors"
6
7
"fmt"
7
8
"io"
8
9
"log/slog"
···
13
14
"atcr.io/pkg/atproto"
14
15
"atcr.io/pkg/auth"
15
16
"atcr.io/pkg/auth/oauth"
17
17
+
"github.com/bluesky-social/indigo/atproto/atclient"
16
18
)
17
19
20
20
+
// getErrorHint provides context-specific troubleshooting hints based on API error type
21
21
+
func getErrorHint(apiErr *atclient.APIError) string {
22
22
+
switch apiErr.Name {
23
23
+
case "use_dpop_nonce":
24
24
+
return "DPoP nonce mismatch - indigo library should automatically retry with new nonce. If this persists, check for concurrent request issues or PDS session corruption."
25
25
+
case "invalid_client":
26
26
+
if apiErr.Message != "" && apiErr.Message == "Validation of \"client_assertion\" failed: \"iat\" claim timestamp check failed (it should be in the past)" {
27
27
+
return "JWT timestamp validation failed - system clock on AppView may be ahead of PDS clock. Check NTP sync with: timedatectl status"
28
28
+
}
29
29
+
return "OAuth client authentication failed - check client key configuration and PDS OAuth server status"
30
30
+
case "invalid_token", "invalid_grant":
31
31
+
return "OAuth tokens expired or invalidated - user will need to re-authenticate via OAuth flow"
32
32
+
case "server_error":
33
33
+
if apiErr.StatusCode == 500 {
34
34
+
return "PDS returned internal server error - this may occur after repeated DPoP nonce failures or other PDS-side issues. Check PDS logs for root cause."
35
35
+
}
36
36
+
return "PDS server error - check PDS health and logs"
37
37
+
case "invalid_dpop_proof":
38
38
+
return "DPoP proof validation failed - check system clock sync and DPoP key configuration"
39
39
+
default:
40
40
+
if apiErr.StatusCode == 401 || apiErr.StatusCode == 403 {
41
41
+
return "Authentication/authorization failed - OAuth session may be expired or revoked"
42
42
+
}
43
43
+
return "PDS rejected the request - see errorName and errorMessage for details"
44
44
+
}
45
45
+
}
46
46
+
18
47
// GetOrFetchServiceToken gets a service token for hold authentication.
19
48
// Checks cache first, then fetches from PDS with OAuth/DPoP if needed.
20
49
// This is the canonical implementation used by both middleware and crew registration.
···
49
78
if err != nil {
50
79
// OAuth session unavailable - fail
51
80
InvalidateServiceToken(did, holdDID)
52
52
-
slog.Error("Failed to get OAuth session for service token",
53
53
-
"component", "token/servicetoken",
54
54
-
"did", did,
55
55
-
"holdDID", holdDID,
56
56
-
"pdsEndpoint", pdsEndpoint,
57
57
-
"error", err,
58
58
-
"errorType", fmt.Sprintf("%T", err))
81
81
+
82
82
+
// Try to extract detailed error information
83
83
+
var apiErr *atclient.APIError
84
84
+
if errors.As(err, &apiErr) {
85
85
+
slog.Error("Failed to get OAuth session for service token",
86
86
+
"component", "token/servicetoken",
87
87
+
"did", did,
88
88
+
"holdDID", holdDID,
89
89
+
"pdsEndpoint", pdsEndpoint,
90
90
+
"error", err,
91
91
+
"httpStatus", apiErr.StatusCode,
92
92
+
"errorName", apiErr.Name,
93
93
+
"errorMessage", apiErr.Message,
94
94
+
"hint", getErrorHint(apiErr))
95
95
+
} else {
96
96
+
slog.Error("Failed to get OAuth session for service token",
97
97
+
"component", "token/servicetoken",
98
98
+
"did", did,
99
99
+
"holdDID", holdDID,
100
100
+
"pdsEndpoint", pdsEndpoint,
101
101
+
"error", err,
102
102
+
"errorType", fmt.Sprintf("%T", err),
103
103
+
"hint", "OAuth session not found in database or token refresh failed")
104
104
+
}
59
105
60
106
// Delete the stale OAuth session to force re-authentication
61
107
// This also invalidates the UI session automatically
···
92
138
if err != nil {
93
139
// Auth error - may indicate expired tokens or corrupted session
94
140
InvalidateServiceToken(did, holdDID)
95
95
-
slog.Error("OAuth authentication failed during service token request",
96
96
-
"component", "token/servicetoken",
97
97
-
"did", did,
98
98
-
"holdDID", holdDID,
99
99
-
"pdsEndpoint", pdsEndpoint,
100
100
-
"url", serviceAuthURL,
101
101
-
"error", err,
102
102
-
"errorType", fmt.Sprintf("%T", err),
103
103
-
"hint", "This likely means the PDS rejected the OAuth session - refresh token may be expired or invalidated")
141
141
+
142
142
+
// Inspect the error to extract detailed information from indigo's APIError
143
143
+
var apiErr *atclient.APIError
144
144
+
if errors.As(err, &apiErr) {
145
145
+
// Log detailed API error information
146
146
+
slog.Error("OAuth authentication failed during service token request",
147
147
+
"component", "token/servicetoken",
148
148
+
"did", did,
149
149
+
"holdDID", holdDID,
150
150
+
"pdsEndpoint", pdsEndpoint,
151
151
+
"url", serviceAuthURL,
152
152
+
"error", err,
153
153
+
"httpStatus", apiErr.StatusCode,
154
154
+
"errorName", apiErr.Name,
155
155
+
"errorMessage", apiErr.Message,
156
156
+
"hint", getErrorHint(apiErr))
157
157
+
} else {
158
158
+
// Fallback for non-API errors (network errors, etc.)
159
159
+
slog.Error("OAuth authentication failed during service token request",
160
160
+
"component", "token/servicetoken",
161
161
+
"did", did,
162
162
+
"holdDID", holdDID,
163
163
+
"pdsEndpoint", pdsEndpoint,
164
164
+
"url", serviceAuthURL,
165
165
+
"error", err,
166
166
+
"errorType", fmt.Sprintf("%T", err),
167
167
+
"hint", "Network error or unexpected failure during OAuth request")
168
168
+
}
104
169
105
170
// Delete the stale OAuth session to force re-authentication
106
171
// This also invalidates the UI session automatically