A social knowledge tool for researchers built on ATProto
URL Metadata Strategy (DDD)#
This document outlines the domain-driven approach for fetching, aggregating, and caching URL metadata from multiple external sources.
Problem Statement#
We need to:
- Fetch metadata from multiple sources (Citoid, Iframely, etc.)
- Cache results to avoid redundant API calls
- Aggregate data from different sources intelligently
- Maintain clean domain boundaries and testability
Domain Model#
Value Objects#
UrlMetadata
- Immutable value object containing normalized metadata fields
- Derived from raw API responses through transformation
- Used by the domain layer for business logic
RawMetadataResponse
- Immutable value object containing the original API response
- Includes
sourcefield to track which service provided the data - Includes
retrievedAttimestamp for cache invalidation - Preserves the exact JSON structure returned by each API
MetadataSource
- Enumeration of available metadata sources (CITOID, IFRAMELY, etc.)
Domain Services#
MetadataAggregationService
- Coordinates fetching from multiple sources
- Implements intelligent merging strategies
- Handles fallback logic when sources fail
Infrastructure Services#
IMetadataProvider (Interface)
- Contract for individual metadata sources
- Returns raw API responses wrapped in RawMetadataResponse
- Implemented by CitoidMetadataService, IframelyMetadataService, etc.
IRawMetadataRepository (Interface)
- Stores and retrieves raw API responses by URL and source
- Enables querying for existing raw data to determine what's missing
- Preserves original API response structure for future reprocessing
IMetadataTransformer (Interface)
- Transforms raw API responses into domain UrlMetadata objects
- Source-specific implementations handle different API response formats
- Enables reprocessing of stored raw data as domain models evolve
Architecture Pattern: Strategy + Repository + Aggregation#
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ GetUrlMetadataUseCase │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────┐
│ Domain Layer │
│ MetadataAggregationService │
│ ├── IMetadataRepository (cache check/store) │
│ ├── IMetadataProvider[] (multiple sources) │
│ └── Aggregation Logic │
└─────────────────────┬───────────────────────────────────────┘
│
┌─────────────────────▼───────────────────────────────────────┐
│ Infrastructure Layer │
│ ├── DrizzleMetadataRepository │
│ ├── CitoidMetadataService │
│ ├── IframelyMetadataService │
│ └── OpenGraphMetadataService │
└─────────────────────────────────────────────────────────────┘
Implementation Strategy#
1. Provider Interface#
export interface IMetadataProvider {
readonly source: MetadataSource;
fetchRawMetadata(url: URL): Promise<Result<RawMetadataResponse>>;
isAvailable(): Promise<boolean>;
}
2. Repository Interface#
export interface IRawMetadataRepository {
findByUrl(url: URL): Promise<Result<RawMetadataResponse[]>>; // All cached raw responses for URL
findByUrlAndSource(
url: URL,
source: MetadataSource,
): Promise<Result<RawMetadataResponse | null>>;
save(rawResponse: RawMetadataResponse): Promise<Result<void>>;
isStale(rawResponse: RawMetadataResponse, maxAge: Duration): boolean;
}
3. Transformer Interface#
export interface IMetadataTransformer {
readonly source: MetadataSource;
transform(rawResponse: RawMetadataResponse): Result<UrlMetadata>;
}
4. Aggregation Service#
export class MetadataAggregationService {
constructor(
private readonly rawRepository: IRawMetadataRepository,
private readonly providers: IMetadataProvider[],
private readonly transformers: Map<MetadataSource, IMetadataTransformer>,
private readonly maxCacheAge: Duration = Duration.days(7),
) {}
async getMetadata(
url: URL,
sources?: MetadataSource[],
): Promise<Result<UrlMetadata>> {
// 1. Check cache for existing raw responses
const cachedRaw = await this.rawRepository.findByUrl(url);
// 2. Determine which sources need fresh data
const sourcesToFetch = this.determineSourcesToFetch(cachedRaw, sources);
// 3. Fetch raw responses from required sources in parallel
const freshRawResults = await this.fetchRawFromSources(url, sourcesToFetch);
// 4. Cache new raw results
await this.cacheRawResults(freshRawResults);
// 5. Transform all available raw responses to domain objects
const allRawResponses = [
...(cachedRaw.isOk() ? cachedRaw.value : []),
...freshRawResults,
];
const transformedMetadata = this.transformRawResponses(allRawResponses);
// 6. Aggregate transformed metadata
return this.aggregateMetadata(transformedMetadata);
}
private transformRawResponses(
rawResponses: RawMetadataResponse[],
): UrlMetadata[] {
return rawResponses
.map((raw) => {
const transformer = this.transformers.get(raw.source);
return transformer ? transformer.transform(raw) : null;
})
.filter((result) => result?.isOk())
.map((result) => result!.value);
}
private aggregateMetadata(metadataList: UrlMetadata[]): Result<UrlMetadata> {
// Intelligent merging logic:
// - Prefer academic sources (Citoid) for scholarly content
// - Prefer social media optimized sources (Iframely) for rich media
// - Combine fields from multiple sources (e.g., best title, description, image)
}
}
5. Use Case#
export class GetUrlMetadataUseCase {
constructor(
private readonly aggregationService: MetadataAggregationService,
) {}
async execute(
url: string,
preferredSources?: MetadataSource[],
): Promise<Result<UrlMetadata>> {
const urlResult = URL.create(url);
if (urlResult.isErr()) {
return err(urlResult.error);
}
return this.aggregationService.getMetadata(
urlResult.value,
preferredSources,
);
}
}
Caching Strategy#
Cache Key Structure#
- Raw responses:
raw_metadata:{url_hash}:{source} - Transformed metadata:
url_metadata:{url_hash}(optional, for performance)
Cache Invalidation#
- Time-based: 7 days default, configurable per source
- Manual: When user requests fresh metadata
- Source-specific: Different TTL for different providers
Partial Cache Hits#
- If we have Citoid data but need Iframely data, only fetch from Iframely
- Aggregate cached + fresh data intelligently
Data Aggregation Rules#
Field Priority (Configurable)#
- Title: Citoid > Iframely > OpenGraph
- Description: Iframely > Citoid > OpenGraph
- Author: Citoid > Iframely
- Image: Iframely > OpenGraph > Citoid
- Published Date: Citoid > Iframely
Conflict Resolution#
- Prefer more recent data when timestamps differ significantly
- Prefer more complete data (fewer null fields)
- Allow manual source preference overrides
Benefits of This Approach#
- Domain Purity: Business logic stays in domain layer
- Testability: Easy to mock providers and repository
- Flexibility: Easy to add new metadata sources
- Performance: Intelligent caching reduces API calls
- Reliability: Fallback between sources when one fails
- Configurability: Source preferences can be adjusted per use case
Future Enhancements#
- Source Health Monitoring: Track success rates and response times
- Dynamic Source Selection: Choose sources based on URL patterns
- Batch Processing: Fetch metadata for multiple URLs efficiently
- User Preferences: Allow users to prefer certain sources
- Metadata Enrichment: Combine multiple sources for richer data
Raw Data Storage Benefits#
- Data Preservation: Original API responses preserved exactly as returned
- Reprocessing Capability: Can re-transform data as domain models evolve
- Debugging: Easy to inspect what each API actually returned
- Audit Trail: Complete history of API interactions
- Schema Evolution: Domain objects can change without losing source data
- Multi-version Support: Can support multiple versions of transformers
Implementation Order#
- Define interfaces (
IMetadataProvider,IRawMetadataRepository,IMetadataTransformer) - Implement raw metadata repository with caching
- Create
RawMetadataResponsevalue object - Refactor existing
CitoidMetadataServiceto return raw responses - Implement
CitoidMetadataTransformerto convert raw to domain objects - Implement
MetadataAggregationService - Add additional providers and transformers (Iframely, OpenGraph)
- Implement intelligent aggregation logic
- Add monitoring and health checks