search for standard sites pub-search.waow.tech
search zig blog atproto

fix: use HTTP site URL as base_path for standard.site documents

standard.site documents store the origin URL (e.g., "https://attoshi.com")
in publication_uri, but the indexer only resolves base_path from AT-URIs
via the publications table. When publication_uri is an HTTP URL, the lookup
fails silently and base_path stays empty, breaking frontend links.

Add a fallback: if base_path is still empty and publication_uri starts with
http(s)://, strip the scheme and use the remainder as base_path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

+17
+17
backend/src/ingest/indexer.zig
··· 125 125 } 126 126 } 127 127 128 + // fallback: if publication_uri is an HTTP(S) URL, use its host as base_path 129 + // standard.site documents store the origin URL in the "site" field, which 130 + // our extractor reads into publication_uri. Strip the scheme to match 131 + // base_path convention (frontend prepends "https://"). 132 + if (base_path.len == 0 and pub_uri.len > 0) { 133 + const host = if (std.mem.startsWith(u8, pub_uri, "https://")) 134 + pub_uri["https://".len..] 135 + else if (std.mem.startsWith(u8, pub_uri, "http://")) 136 + pub_uri["http://".len..] 137 + else 138 + ""; 139 + if (host.len > 0 and host.len <= base_path_buf.len) { 140 + @memcpy(base_path_buf[0..host.len], host); 141 + base_path = base_path_buf[0..host.len]; 142 + } 143 + } 144 + 128 145 // skip .test domains (dev/staging data) 129 146 if (std.mem.endsWith(u8, base_path, ".test")) return; 130 147