ATCR Troubleshooting Guide#
This document provides troubleshooting guidance for common ATCR deployment and operational issues.
OAuth Authentication Failures#
JWT Timestamp Validation Errors#
Symptom:
error: invalid_client
error_description: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past)
Root Cause: The AppView server's system clock is ahead of the PDS server's clock. When the AppView generates a JWT for OAuth client authentication (confidential client mode), the "iat" (issued at) claim appears to be in the future from the PDS's perspective.
Diagnosis:
- Check AppView system time:
date -u
timedatectl status
- Check if NTP is active and synchronized:
timedatectl show-timesync --all
- Compare AppView time with PDS time (if accessible):
# On AppView
date +%s
# On PDS (or via HTTP headers)
curl -I https://your-pds.example.com | grep -i date
- Check AppView logs for clock information (logged at startup):
docker logs atcr-appview 2>&1 | grep "Configured confidential OAuth client"
Example log output:
level=INFO msg="Configured confidential OAuth client"
key_id=did:key:z...
system_time_unix=1731844215
system_time_rfc3339=2025-11-17T14:30:15Z
timezone=UTC
Solution:
-
Enable NTP synchronization (recommended):
On most Linux systems using systemd:
# Enable and start systemd-timesyncd sudo timedatectl set-ntp true # Verify NTP is active timedatectl statusExpected output:
System clock synchronized: yes NTP service: active -
Alternative: Use chrony (if systemd-timesyncd is not available):
# Install chrony sudo apt-get install chrony # Debian/Ubuntu sudo yum install chrony # RHEL/CentOS # Enable and start chronyd sudo systemctl enable chronyd sudo systemctl start chronyd # Check sync status chronyc tracking -
Force immediate sync:
# systemd-timesyncd sudo systemctl restart systemd-timesyncd # Or with chrony sudo chronyc makestep -
In Docker/Kubernetes environments:
The container inherits the host's system clock, so fix NTP on the host machine:
# On Docker host sudo timedatectl set-ntp true # Restart AppView container to pick up correct time docker restart atcr-appview -
Verify clock skew is resolved:
# Should show clock offset < 1 second timedatectl timesync-status
Acceptable Clock Skew:
- Most OAuth implementations tolerate ±30-60 seconds of clock skew
- DPoP proof validation is typically stricter (±10 seconds)
- Aim for < 1 second skew for reliable operation
Prevention:
- Configure NTP synchronization in your infrastructure-as-code (Terraform, Ansible, etc.)
- Monitor clock skew in production (e.g., Prometheus node_exporter includes clock metrics)
- Use managed container platforms (ECS, GKE, AKS) that handle NTP automatically
DPoP Nonce Mismatch Errors#
Symptom:
error: use_dpop_nonce
error_description: DPoP "nonce" mismatch
Repeated multiple times, potentially followed by:
error: server_error
error_description: Server error
Root Cause: DPoP (Demonstrating Proof-of-Possession) requires a server-provided nonce for replay protection. These errors typically occur when:
- Multiple concurrent requests create a DPoP nonce race condition
- Clock skew causes DPoP proof timestamps to fail validation
- PDS session state becomes corrupted after repeated failures
Diagnosis:
- Check if errors occur during concurrent operations:
# During docker push with multiple layers
docker logs atcr-appview 2>&1 | grep "use_dpop_nonce" | wc -l
- Check for clock skew (see section above):
timedatectl status
- Look for session lock acquisition in logs:
docker logs atcr-appview 2>&1 | grep "Acquired session lock"
Solution:
-
If caused by clock skew: Fix NTP synchronization (see section above)
-
If caused by session corruption:
# The AppView will automatically delete corrupted sessions # User just needs to re-authenticate docker login atcr.io -
If persistent despite clock sync:
- Check PDS health and logs (may be a PDS-side issue)
- Verify network connectivity between AppView and PDS
- Check if PDS supports latest OAuth/DPoP specifications
What ATCR does automatically:
- Per-DID locking prevents concurrent DPoP nonce races
- Indigo library automatically retries with fresh nonces
- Sessions are auto-deleted after repeated failures
- Service token cache prevents excessive PDS requests
Prevention:
- Ensure reliable NTP synchronization
- Use a stable, well-maintained PDS implementation
- Monitor AppView error rates for DPoP-related issues
OAuth Session Not Found#
Symptom:
error: failed to get OAuth session: no session found for DID
Root Cause:
- User has never authenticated via OAuth
- OAuth session was deleted due to corruption or expiry
- Database migration cleared sessions
Solution:
-
User re-authenticates via OAuth flow:
docker login atcr.io # Or for web UI: visit https://atcr.io/login -
If using app passwords (legacy), verify token is cached:
# Check if app-password token exists docker logout atcr.io docker login atcr.io -u your.handle -p your-app-password
AppView Deployment Issues#
Client Metadata URL Not Accessible#
Symptom:
error: unauthorized_client
error_description: Client metadata endpoint returned 404
Root Cause:
PDS cannot fetch OAuth client metadata from {ATCR_BASE_URL}/client-metadata.json
Diagnosis:
-
Verify client metadata endpoint is accessible:
curl https://your-atcr-instance.com/client-metadata.json -
Check AppView logs for startup errors:
docker logs atcr-appview 2>&1 | grep "client-metadata" -
Verify
ATCR_BASE_URLis set correctly:echo $ATCR_BASE_URL
Solution:
-
Ensure
ATCR_BASE_URLmatches your public URL:export ATCR_BASE_URL=https://atcr.example.com -
Verify reverse proxy (nginx, Caddy, etc.) routes
/.well-known/*and/client-metadata.json:location / { proxy_pass http://localhost:5000; proxy_set_header Host $host; proxy_set_header X-Forwarded-Proto $scheme; } -
Check firewall rules allow inbound HTTPS:
sudo ufw status sudo iptables -L -n | grep 443
Hold Service Issues#
Blob Storage Connectivity#
Symptom:
error: failed to upload blob: connection refused
Diagnosis:
-
Check hold service logs:
docker logs atcr-hold 2>&1 | grep -i error -
Verify S3 credentials are correct:
# Test S3 access aws s3 ls s3://your-bucket --endpoint-url=$S3_ENDPOINT -
Check hold configuration:
env | grep -E "(S3_|AWS_|STORAGE_)"
Solution:
-
Verify environment variables in hold service:
export AWS_ACCESS_KEY_ID=your-key export AWS_SECRET_ACCESS_KEY=your-secret export S3_BUCKET=your-bucket export S3_ENDPOINT=https://s3.us-west-2.amazonaws.com -
Test S3 connectivity from hold container:
docker exec atcr-hold curl -v $S3_ENDPOINT -
Check S3 bucket permissions (requires PutObject, GetObject, DeleteObject)
Performance Issues#
High Database Lock Contention#
Symptom: Slow Docker push/pull operations, high CPU usage on AppView
Diagnosis:
-
Check SQLite database size:
ls -lh /var/lib/atcr/ui.db -
Look for long-running queries:
docker logs atcr-appview 2>&1 | grep "database is locked"
Solution:
-
For production, migrate to PostgreSQL (recommended):
export ATCR_UI_DATABASE_TYPE=postgres export ATCR_UI_DATABASE_URL=postgresql://user:pass@localhost/atcr -
Or increase SQLite busy timeout:
// In code: db.SetMaxOpenConns(1) for SQLite -
Vacuum the database to reclaim space:
sqlite3 /var/lib/atcr/ui.db "VACUUM;"
Logging and Debugging#
Enable Debug Logging#
Set log level to debug for detailed troubleshooting:
export ATCR_LOG_LEVEL=debug
docker restart atcr-appview
Useful Log Queries#
OAuth token exchange errors:
docker logs atcr-appview 2>&1 | grep "OAuth callback failed"
Service token request failures:
docker logs atcr-appview 2>&1 | grep "OAuth authentication failed during service token request"
Clock diagnostics:
docker logs atcr-appview 2>&1 | grep "system_time"
DPoP nonce issues:
docker logs atcr-appview 2>&1 | grep -E "(use_dpop_nonce|DPoP)"
Health Checks#
AppView health:
curl http://localhost:5000/v2/
# Should return: {"errors":[{"code":"UNAUTHORIZED",...}]}
Hold service health:
curl http://localhost:8080/.well-known/did.json
# Should return DID document
Getting Help#
If issues persist after following this guide:
- Check GitHub Issues: https://github.com/ericvolp12/atcr/issues
- Collect logs: Include output from
docker logsfor AppView and Hold services - Include diagnostics:
timedatectl statusoutput- AppView version:
docker exec atcr-appview cat /VERSION(if available) - PDS version and implementation (Bluesky PDS, other)
- File an issue with reproducible steps
Common Error Reference#
| Error Code | Component | Common Cause | Fix |
|---|---|---|---|
invalid_client (iat timestamp) |
OAuth | Clock skew | Enable NTP sync |
use_dpop_nonce |
OAuth/DPoP | Concurrent requests or clock skew | Fix NTP, wait for auto-retry |
server_error (500) |
PDS | PDS internal error | Check PDS logs |
invalid_grant |
OAuth | Expired auth code | Retry OAuth flow |
unauthorized_client |
OAuth | Client metadata unreachable | Check ATCR_BASE_URL and firewall |
RecordNotFound |
ATProto | Manifest doesn't exist | Verify repository name |
| Connection refused | Hold/S3 | Network/credentials | Check S3 config and connectivity |