ATCR Troubleshooting Guide#

This document provides troubleshooting guidance for common ATCR deployment and operational issues.

OAuth Authentication Failures#

JWT Timestamp Validation Errors#

Symptom:

error: invalid_client
error_description: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past)

Root Cause: The AppView server's system clock is ahead of the PDS server's clock. When the AppView generates a JWT for OAuth client authentication (confidential client mode), the "iat" (issued at) claim appears to be in the future from the PDS's perspective.

Diagnosis:

Check AppView system time:

date -u
timedatectl status

Check if NTP is active and synchronized:

timedatectl show-timesync --all

Compare AppView time with PDS time (if accessible):

# On AppView
date +%s

# On PDS (or via HTTP headers)
curl -I https://your-pds.example.com | grep -i date

Check AppView logs for clock information (logged at startup):

docker logs atcr-appview 2>&1 | grep "Configured confidential OAuth client"

Example log output:

level=INFO msg="Configured confidential OAuth client"
  key_id=did:key:z...
  system_time_unix=1731844215
  system_time_rfc3339=2025-11-17T14:30:15Z
  timezone=UTC

Solution:

Enable NTP synchronization (recommended):

On most Linux systems using systemd:

# Enable and start systemd-timesyncd
sudo timedatectl set-ntp true

# Verify NTP is active
timedatectl status

Expected output:

System clock synchronized: yes
NTP service: active

Alternative: Use chrony (if systemd-timesyncd is not available):

# Install chrony
sudo apt-get install chrony  # Debian/Ubuntu
sudo yum install chrony       # RHEL/CentOS

# Enable and start chronyd
sudo systemctl enable chronyd
sudo systemctl start chronyd

# Check sync status
chronyc tracking

Force immediate sync:

# systemd-timesyncd
sudo systemctl restart systemd-timesyncd

# Or with chrony
sudo chronyc makestep

In Docker/Kubernetes environments:

The container inherits the host's system clock, so fix NTP on the host machine:

# On Docker host
sudo timedatectl set-ntp true

# Restart AppView container to pick up correct time
docker restart atcr-appview

Verify clock skew is resolved:

# Should show clock offset < 1 second
timedatectl timesync-status

Acceptable Clock Skew:

Most OAuth implementations tolerate ±30-60 seconds of clock skew
DPoP proof validation is typically stricter (±10 seconds)
Aim for < 1 second skew for reliable operation

Prevention:

Configure NTP synchronization in your infrastructure-as-code (Terraform, Ansible, etc.)
Monitor clock skew in production (e.g., Prometheus node_exporter includes clock metrics)
Use managed container platforms (ECS, GKE, AKS) that handle NTP automatically

DPoP Nonce Mismatch Errors#

Symptom:

error: use_dpop_nonce
error_description: DPoP "nonce" mismatch

Repeated multiple times, potentially followed by:

error: server_error
error_description: Server error

Root Cause: DPoP (Demonstrating Proof-of-Possession) requires a server-provided nonce for replay protection. These errors typically occur when:

Multiple concurrent requests create a DPoP nonce race condition
Clock skew causes DPoP proof timestamps to fail validation
PDS session state becomes corrupted after repeated failures

Diagnosis:

Check if errors occur during concurrent operations:

# During docker push with multiple layers
docker logs atcr-appview 2>&1 | grep "use_dpop_nonce" | wc -l

Check for clock skew (see section above):

timedatectl status

Look for session lock acquisition in logs:

docker logs atcr-appview 2>&1 | grep "Acquired session lock"

Solution:

If caused by clock skew: Fix NTP synchronization (see section above)

If caused by session corruption:

# The AppView will automatically delete corrupted sessions
# User just needs to re-authenticate
docker login atcr.io

If persistent despite clock sync:
- Check PDS health and logs (may be a PDS-side issue)
- Verify network connectivity between AppView and PDS
- Check if PDS supports latest OAuth/DPoP specifications

What ATCR does automatically:

Per-DID locking prevents concurrent DPoP nonce races
Indigo library automatically retries with fresh nonces
Sessions are auto-deleted after repeated failures
Service token cache prevents excessive PDS requests

Prevention:

Ensure reliable NTP synchronization
Use a stable, well-maintained PDS implementation
Monitor AppView error rates for DPoP-related issues

OAuth Session Not Found#

Symptom:

error: failed to get OAuth session: no session found for DID

Root Cause:

User has never authenticated via OAuth
OAuth session was deleted due to corruption or expiry
Database migration cleared sessions

Solution:

User re-authenticates via OAuth flow:

docker login atcr.io
# Or for web UI: visit https://atcr.io/login

If using app passwords (legacy), verify token is cached:

# Check if app-password token exists
docker logout atcr.io
docker login atcr.io -u your.handle -p your-app-password

AppView Deployment Issues#

Client Metadata URL Not Accessible#

Symptom:

error: unauthorized_client
error_description: Client metadata endpoint returned 404

Root Cause: PDS cannot fetch OAuth client metadata from {ATCR_BASE_URL}/client-metadata.json

Diagnosis:

Verify client metadata endpoint is accessible:

curl https://your-atcr-instance.com/client-metadata.json

Check AppView logs for startup errors:

docker logs atcr-appview 2>&1 | grep "client-metadata"

Verify ATCR_BASE_URL is set correctly:
```
echo $ATCR_BASE_URL
```

Solution:

Ensure ATCR_BASE_URL matches your public URL:

export ATCR_BASE_URL=https://atcr.example.com

Verify reverse proxy (nginx, Caddy, etc.) routes /.well-known/* and /client-metadata.json:

location / {
    proxy_pass http://localhost:5000;
    proxy_set_header Host $host;
    proxy_set_header X-Forwarded-Proto $scheme;
}

Check firewall rules allow inbound HTTPS:

sudo ufw status
sudo iptables -L -n | grep 443

Hold Service Issues#

Blob Storage Connectivity#

Symptom:

error: failed to upload blob: connection refused

Diagnosis:

Check hold service logs:

docker logs atcr-hold 2>&1 | grep -i error

Verify S3 credentials are correct:

# Test S3 access
aws s3 ls s3://your-bucket --endpoint-url=$S3_ENDPOINT

Check hold configuration:
```
env | grep -E "(S3_|AWS_|STORAGE_)"
```

Solution:

Verify environment variables in hold service:

export AWS_ACCESS_KEY_ID=your-key
export AWS_SECRET_ACCESS_KEY=your-secret
export S3_BUCKET=your-bucket
export S3_ENDPOINT=https://s3.us-west-2.amazonaws.com

Test S3 connectivity from hold container:

docker exec atcr-hold curl -v $S3_ENDPOINT

Check S3 bucket permissions (requires PutObject, GetObject, DeleteObject)

Performance Issues#

High Database Lock Contention#

Symptom: Slow Docker push/pull operations, high CPU usage on AppView

Diagnosis:

Check SQLite database size:
```
ls -lh /var/lib/atcr/ui.db
```

Look for long-running queries:

docker logs atcr-appview 2>&1 | grep "database is locked"

Solution:

For production, migrate to PostgreSQL (recommended):

export ATCR_UI_DATABASE_TYPE=postgres
export ATCR_UI_DATABASE_URL=postgresql://user:pass@localhost/atcr

Or increase SQLite busy timeout:

// In code: db.SetMaxOpenConns(1) for SQLite

Vacuum the database to reclaim space:
```
sqlite3 /var/lib/atcr/ui.db "VACUUM;"
```

Logging and Debugging#

Enable Debug Logging#

Set log level to debug for detailed troubleshooting:

export ATCR_LOG_LEVEL=debug
docker restart atcr-appview

Useful Log Queries#

OAuth token exchange errors:

docker logs atcr-appview 2>&1 | grep "OAuth callback failed"

Service token request failures:

docker logs atcr-appview 2>&1 | grep "OAuth authentication failed during service token request"

Clock diagnostics:

docker logs atcr-appview 2>&1 | grep "system_time"

DPoP nonce issues:

docker logs atcr-appview 2>&1 | grep -E "(use_dpop_nonce|DPoP)"

Health Checks#

AppView health:

curl http://localhost:5000/v2/
# Should return: {"errors":[{"code":"UNAUTHORIZED",...}]}

Hold service health:

curl http://localhost:8080/.well-known/did.json
# Should return DID document

Getting Help#

If issues persist after following this guide:

Check GitHub Issues: https://github.com/ericvolp12/atcr/issues
Collect logs: Include output from docker logs for AppView and Hold services
Include diagnostics:
- timedatectl status output
- AppView version: docker exec atcr-appview cat /VERSION (if available)
- PDS version and implementation (Bluesky PDS, other)
File an issue with reproducible steps

Common Error Reference#

Error Code	Component	Common Cause	Fix
`invalid_client` (iat timestamp)	OAuth	Clock skew	Enable NTP sync
`use_dpop_nonce`	OAuth/DPoP	Concurrent requests or clock skew	Fix NTP, wait for auto-retry
`server_error` (500)	PDS	PDS internal error	Check PDS logs
`invalid_grant`	OAuth	Expired auth code	Retry OAuth flow
`unauthorized_client`	OAuth	Client metadata unreachable	Check ATCR_BASE_URL and firewall
`RecordNotFound`	ATProto	Manifest doesn't exist	Verify repository name
Connection refused	Hold/S3	Network/credentials	Check S3 config and connectivity