···11+# Run Kagi News aggregator daily at 1 PM UTC (after Kagi updates around noon)
22+0 13 * * * cd /app && /usr/local/bin/python -m src.main >> /var/log/cron.log 2>&1
33+44+# Blank line required at end of crontab
55+
···11+<?xml version='1.0' encoding='UTF-8'?>
22+<!-- Sample RSS item from Kagi News - includes quote, highlights, perspectives, sources -->
33+<item>
44+ <title>Trump to meet Xi in South Korea on Oct 30</title>
55+ <link>https://kite.kagi.com/96cf948f-8a1b-4281-9ba4-8a9e1ad7b3c6/world/10</link>
66+ <description><p>The White House confirmed President Trump will hold a bilateral meeting with Chinese President Xi Jinping in South Korea on October 30, at the end of an Asia trip that includes Malaysia and Japan . The administration said the meeting will take place Thursday morning local time, and Mr Trump indicated his first question to Xi would concern fentanyl and other bilateral issues . The talks come amid heightened trade tensions after Beijing expanded export curbs on rare-earth minerals and following Mr Trump's recent threat of additional tariffs on Chinese goods, making the meeting a focal point for discussions on trade, technology supply chains and energy .</p><img src='https://kagiproxy.com/img/Q2SRXQtwTYBIiQeI0FG-X6taF_wHSJaXDiFUzju2kbCWGuOYIFUX--8L0BqE4VKxpbOJY3ylFPJkDpfSnyQYZ1qdOLXbphHTnsOK4jb7gqC4KCn5nf3ANbWCuaFD5ZUSijiK0k7wOLP2fyX6tynu2mPtXlCbotLo2lTrEswZl4-No2AI4mI4lkResfnRdp-YjpoEfCOHkNfbN1-0cNcHt9T2dmgBSXrQ2w' alt='News image associated with coverage of President Trump&#x27;s Asia trip and planned meeting with President Xi' /><br /><h3>Highlights:</h3><ul><li>Itinerary details: The Asia swing begins in Malaysia, continues to Japan and ends with the bilateral meeting in South Korea on Thursday morning local time, White House press secretary Karoline Leavitt said at a briefing .</li><li>APEC context: US officials indicated the leaders will meet on the sidelines of the Asia-Pacific Economic Cooperation gathering, shaping expectations for short, high-level talks rather than a lengthy summit .</li><li>Tariff escalation: President Trump recently threatened an additional 100% tariff on Chinese goods starting in November, a step he has described as unsustainable but that has heightened urgency for talks .</li><li>Rare-earth impact: Beijing's expanded curbs on rare-earth exports have exposed supply vulnerabilities because US high-tech firms rely heavily on those materials, raising strategic and economic stakes for the meeting .</li></ul><blockquote>Work out a lot of our doubts and questions - President Trump</blockquote><h3>Perspectives:</h3><ul><li>President Trump: He said his first question to President Xi would be about fentanyl and indicated he hoped to resolve bilateral doubts and questions in the talks. (<a href='https://www.straitstimes.com/world/united-states/trump-to-meet-xi-in-south-korea-on-oct-30-as-part-of-asia-swing'>The Straits Times</a>)</li><li>White House (press secretary): Karoline Leavitt confirmed the bilateral meeting will occur Thursday morning local time during a White House briefing. (<a href='https://www.scmp.com/news/us/diplomacy/article/3330131/donald-trump-meet-chinas-xi-jinping-next-thursday-south-korea-crunch-talks'>South China Morning Post</a>)</li><li>Beijing/Chinese authorities: Officials have defended tighter export controls on rare-earths, a move described in reporting as not explicitly targeting the US though it has raised tensions. (<a href='https://www.rt.com/news/626890-white-house-announces-trump-xi-meeting/'>RT</a>)</li></ul><h3>Sources:</h3><ul><li><a href='https://www.straitstimes.com/world/united-states/trump-to-meet-xi-in-south-korea-on-oct-30-as-part-of-asia-swing'>Trump to meet Xi in South Korea on Oct 30 as part of Asia swing</a> - straitstimes.com</li><li><a href='https://www.scmp.com/news/us/diplomacy/article/3330131/donald-trump-meet-chinas-xi-jinping-next-thursday-south-korea-crunch-talks'>Trump to meet Xi in South Korea next Thursday as part of key Asia trip</a> - scmp.com</li><li><a href='https://www.rt.com/news/626890-white-house-announces-trump-xi-meeting/'>White House announces Trump-Xi meeting</a> - rt.com</li><li><a href='https://www.thehindu.com/news/international/trump-to-meet-xi-in-south-korea-as-part-of-asia-swing/article70195667.ece'>Trump to meet Xi in South Korea as part of Asia swing</a> - thehindu.com</li><li><a href='https://www.aljazeera.com/news/2025/10/24/white-house-confirms-trump-to-meet-xi-in-south-korea-as-part-of-asia-tour'>White House confirms Trump to meet Xi in South Korea as part of Asia tour</a> - aljazeera.com</li></ul></description>
77+ <guid isPermaLink="true">https://kite.kagi.com/96cf948f-8a1b-4281-9ba4-8a9e1ad7b3c6/world/10</guid>
88+ <category>World</category>
99+ <category>World/Diplomacy</category>
1010+ <category>Diplomacy</category>
1111+ <pubDate>Thu, 23 Oct 2025 20:56:00 +0000</pubDate>
1212+</item>
···11+"""
22+End-to-End Integration Tests.
33+44+Tests the complete aggregator workflow against live infrastructure:
55+- Real HTTP mocking (Kagi RSS)
66+- Real PDS (Coves test PDS via Docker)
77+- Real community posting
88+- Real state management
99+1010+Requires:
1111+- Coves test PDS running on localhost:3001
1212+- Test database with community: e2e-95206.community.coves.social
1313+- Aggregator account: kagi-news.local.coves.dev
1414+"""
1515+import os
1616+import pytest
1717+import responses
1818+from pathlib import Path
1919+from datetime import datetime
2020+2121+from src.main import Aggregator
2222+from src.coves_client import CovesClient
2323+from src.config import ConfigLoader
2424+2525+2626+# Skip E2E tests by default (require live infrastructure)
2727+pytestmark = pytest.mark.skipif(
2828+ os.getenv('RUN_E2E_TESTS') != '1',
2929+ reason="E2E tests require RUN_E2E_TESTS=1 and live PDS"
3030+)
3131+3232+3333+@pytest.fixture
3434+def test_community(aggregator_credentials):
3535+ """Create a test community for E2E testing."""
3636+ import time
3737+ import requests
3838+3939+ handle, password = aggregator_credentials
4040+4141+ # Authenticate
4242+ auth_response = requests.post(
4343+ "http://localhost:3001/xrpc/com.atproto.server.createSession",
4444+ json={"identifier": handle, "password": password}
4545+ )
4646+ token = auth_response.json()["accessJwt"]
4747+4848+ # Create community (use short name to avoid handle length limits)
4949+ community_name = f"e2e-{int(time.time()) % 10000}" # Last 4 digits only
5050+ create_response = requests.post(
5151+ "http://localhost:8081/xrpc/social.coves.community.create",
5252+ headers={"Authorization": f"Bearer {token}"},
5353+ json={
5454+ "name": community_name,
5555+ "displayName": "E2E Test Community",
5656+ "description": "Temporary community for aggregator E2E testing",
5757+ "visibility": "public"
5858+ }
5959+ )
6060+6161+ if create_response.ok:
6262+ community = create_response.json()
6363+ community_handle = f"{community_name}.community.coves.social"
6464+ print(f"\n✅ Created test community: {community_handle}")
6565+ return community_handle
6666+ else:
6767+ raise Exception(f"Failed to create community: {create_response.text}")
6868+6969+7070+@pytest.fixture
7171+def test_config_file(tmp_path, test_community):
7272+ """Create test configuration file with dynamic community."""
7373+ config_content = f"""
7474+coves_api_url: http://localhost:8081
7575+7676+feeds:
7777+ - name: "Kagi World News"
7878+ url: "https://news.kagi.com/world.xml"
7979+ community_handle: "{test_community}"
8080+ enabled: true
8181+8282+log_level: debug
8383+"""
8484+ config_file = tmp_path / "config.yaml"
8585+ config_file.write_text(config_content)
8686+ return config_file
8787+8888+8989+@pytest.fixture
9090+def test_state_file(tmp_path):
9191+ """Create temporary state file."""
9292+ return tmp_path / "state.json"
9393+9494+9595+@pytest.fixture
9696+def mock_kagi_feed():
9797+ """Load real Kagi RSS feed fixture."""
9898+ # Load from data directory (where actual feed is stored)
9999+ fixture_path = Path(__file__).parent.parent / "data" / "world.xml"
100100+ if not fixture_path.exists():
101101+ # Fallback to tests/fixtures if moved
102102+ fixture_path = Path(__file__).parent / "fixtures" / "world.xml"
103103+ return fixture_path.read_text()
104104+105105+106106+@pytest.fixture
107107+def aggregator_credentials():
108108+ """Get aggregator credentials from environment."""
109109+ handle = os.getenv('AGGREGATOR_HANDLE', 'kagi-news.local.coves.dev')
110110+ password = os.getenv('AGGREGATOR_PASSWORD', 'kagi-aggregator-2024-secure-pass')
111111+ return handle, password
112112+113113+114114+class TestEndToEnd:
115115+ """Full end-to-end integration tests."""
116116+117117+ @responses.activate
118118+ def test_full_aggregator_workflow(
119119+ self,
120120+ test_config_file,
121121+ test_state_file,
122122+ mock_kagi_feed,
123123+ aggregator_credentials
124124+ ):
125125+ """
126126+ Test complete workflow: fetch → parse → format → post → verify.
127127+128128+ This test:
129129+ 1. Mocks Kagi RSS HTTP request
130130+ 2. Authenticates with real PDS
131131+ 3. Parses real Kagi HTML content
132132+ 4. Formats with rich text facets
133133+ 5. Posts to real community
134134+ 6. Verifies post was created
135135+ 7. Tests deduplication (no repost)
136136+ """
137137+ # Mock Kagi RSS feed
138138+ responses.add(
139139+ responses.GET,
140140+ "https://news.kagi.com/world.xml",
141141+ body=mock_kagi_feed,
142142+ status=200,
143143+ content_type="application/xml"
144144+ )
145145+146146+ # Allow passthrough for localhost (PDS)
147147+ responses.add_passthru("http://localhost")
148148+149149+ # Set up environment
150150+ handle, password = aggregator_credentials
151151+ os.environ['AGGREGATOR_HANDLE'] = handle
152152+ os.environ['AGGREGATOR_PASSWORD'] = password
153153+ os.environ['PDS_URL'] = 'http://localhost:3001' # Auth through PDS
154154+155155+ # Create aggregator
156156+ aggregator = Aggregator(
157157+ config_path=test_config_file,
158158+ state_file=test_state_file
159159+ )
160160+161161+ # Run first time: should post stories
162162+ print("\n" + "="*60)
163163+ print("🚀 Running first aggregator pass (should post stories)")
164164+ print("="*60)
165165+ aggregator.run()
166166+167167+ # Verify state was updated (stories marked as posted)
168168+ posted_count = aggregator.state_manager.get_posted_count(
169169+ "https://news.kagi.com/world.xml"
170170+ )
171171+ print(f"\n✅ First pass: {posted_count} stories posted and tracked")
172172+ assert posted_count > 0, "Should have posted at least one story"
173173+174174+ # Create new aggregator instance (simulates CRON re-run)
175175+ aggregator2 = Aggregator(
176176+ config_path=test_config_file,
177177+ state_file=test_state_file
178178+ )
179179+180180+ # Run second time: should skip duplicates
181181+ print("\n" + "="*60)
182182+ print("🔄 Running second aggregator pass (should skip duplicates)")
183183+ print("="*60)
184184+ aggregator2.run()
185185+186186+ # Verify count didn't change (deduplication worked)
187187+ posted_count2 = aggregator2.state_manager.get_posted_count(
188188+ "https://news.kagi.com/world.xml"
189189+ )
190190+ print(f"\n✅ Second pass: Still {posted_count2} stories (duplicates skipped)")
191191+ assert posted_count2 == posted_count, "Should not post duplicates"
192192+193193+ @responses.activate
194194+ def test_post_with_external_embed(
195195+ self,
196196+ test_config_file,
197197+ test_state_file,
198198+ mock_kagi_feed,
199199+ aggregator_credentials
200200+ ):
201201+ """
202202+ Test that posts include external embeds with images.
203203+204204+ Verifies:
205205+ - External embed is created
206206+ - Thumbnail URL is included
207207+ - Title and description are set
208208+ """
209209+ # Mock Kagi RSS feed
210210+ responses.add(
211211+ responses.GET,
212212+ "https://news.kagi.com/world.xml",
213213+ body=mock_kagi_feed,
214214+ status=200
215215+ )
216216+217217+ # Allow passthrough for localhost (PDS)
218218+ responses.add_passthru("http://localhost")
219219+220220+ # Set up environment
221221+ handle, password = aggregator_credentials
222222+ os.environ['AGGREGATOR_HANDLE'] = handle
223223+ os.environ['AGGREGATOR_PASSWORD'] = password
224224+ os.environ['PDS_URL'] = 'http://localhost:3001' # Auth through PDS
225225+226226+ # Run aggregator
227227+ aggregator = Aggregator(
228228+ config_path=test_config_file,
229229+ state_file=test_state_file
230230+ )
231231+232232+ print("\n" + "="*60)
233233+ print("🖼️ Testing external embed creation")
234234+ print("="*60)
235235+ aggregator.run()
236236+237237+ # Verify posts were created
238238+ posted_count = aggregator.state_manager.get_posted_count(
239239+ "https://news.kagi.com/world.xml"
240240+ )
241241+ print(f"\n✅ Posted {posted_count} stories with external embeds")
242242+ assert posted_count > 0
243243+244244+ def test_authentication_with_live_pds(self, aggregator_credentials):
245245+ """
246246+ Test authentication against live PDS.
247247+248248+ Verifies:
249249+ - Can authenticate with aggregator account
250250+ - Receives valid JWT tokens
251251+ - DID matches expected format
252252+ """
253253+ handle, password = aggregator_credentials
254254+255255+ print("\n" + "="*60)
256256+ print(f"🔐 Testing authentication: {handle}")
257257+ print("="*60)
258258+259259+ # Create client and authenticate
260260+ client = CovesClient(
261261+ api_url="http://localhost:8081", # AppView for posting
262262+ handle=handle,
263263+ password=password,
264264+ pds_url="http://localhost:3001" # PDS for auth
265265+ )
266266+267267+ client.authenticate()
268268+269269+ print(f"\n✅ Authentication successful")
270270+ print(f" Handle: {client.handle}")
271271+ print(f" Authenticated: {client._authenticated}")
272272+273273+ assert client._authenticated is True
274274+ assert hasattr(client, 'did')
275275+ assert client.did.startswith("did:plc:")
276276+277277+ def test_state_persistence_across_runs(
278278+ self,
279279+ test_config_file,
280280+ test_state_file,
281281+ aggregator_credentials
282282+ ):
283283+ """
284284+ Test that state persists correctly across multiple runs.
285285+286286+ Verifies:
287287+ - State file is created
288288+ - Posted GUIDs are tracked
289289+ - Last run timestamp is updated
290290+ - State survives aggregator restart
291291+ """
292292+ # Mock empty feed (to avoid posting)
293293+ import responses as resp
294294+ resp.start()
295295+ resp.add(
296296+ resp.GET,
297297+ "https://news.kagi.com/world.xml",
298298+ body='<?xml version="1.0"?><rss version="2.0"><channel></channel></rss>',
299299+ status=200
300300+ )
301301+302302+ handle, password = aggregator_credentials
303303+ os.environ['AGGREGATOR_HANDLE'] = handle
304304+ os.environ['AGGREGATOR_PASSWORD'] = password
305305+306306+ print("\n" + "="*60)
307307+ print("💾 Testing state persistence")
308308+ print("="*60)
309309+310310+ # First run
311311+ aggregator1 = Aggregator(
312312+ config_path=test_config_file,
313313+ state_file=test_state_file
314314+ )
315315+ aggregator1.run()
316316+317317+ # Verify state file was created
318318+ assert test_state_file.exists(), "State file should be created"
319319+ print(f"\n✅ State file created: {test_state_file}")
320320+321321+ # Verify last run was recorded
322322+ last_run1 = aggregator1.state_manager.get_last_run(
323323+ "https://news.kagi.com/world.xml"
324324+ )
325325+ assert last_run1 is not None, "Last run should be recorded"
326326+ print(f" Last run: {last_run1}")
327327+328328+ # Second run (new instance)
329329+ aggregator2 = Aggregator(
330330+ config_path=test_config_file,
331331+ state_file=test_state_file
332332+ )
333333+ aggregator2.run()
334334+335335+ # Verify state persisted
336336+ last_run2 = aggregator2.state_manager.get_last_run(
337337+ "https://news.kagi.com/world.xml"
338338+ )
339339+ assert last_run2 >= last_run1, "Last run should be updated"
340340+ print(f" Last run (after restart): {last_run2}")
341341+ print(f"\n✅ State persisted across aggregator restarts")
342342+343343+ resp.stop()
344344+ resp.reset()
345345+346346+ def test_error_recovery(
347347+ self,
348348+ test_config_file,
349349+ test_state_file,
350350+ aggregator_credentials
351351+ ):
352352+ """
353353+ Test that aggregator handles errors gracefully.
354354+355355+ Verifies:
356356+ - Continues processing on feed errors
357357+ - Doesn't crash on network failures
358358+ - Logs errors appropriately
359359+ """
360360+ # Mock feed failure
361361+ import responses as resp
362362+ resp.start()
363363+ resp.add(
364364+ resp.GET,
365365+ "https://news.kagi.com/world.xml",
366366+ body="Internal Server Error",
367367+ status=500
368368+ )
369369+370370+ handle, password = aggregator_credentials
371371+ os.environ['AGGREGATOR_HANDLE'] = handle
372372+ os.environ['AGGREGATOR_PASSWORD'] = password
373373+374374+ print("\n" + "="*60)
375375+ print("🛡️ Testing error recovery")
376376+ print("="*60)
377377+378378+ # Should not crash
379379+ aggregator = Aggregator(
380380+ config_path=test_config_file,
381381+ state_file=test_state_file
382382+ )
383383+384384+ try:
385385+ aggregator.run()
386386+ print(f"\n✅ Aggregator handled feed error gracefully")
387387+ except Exception as e:
388388+ pytest.fail(f"Aggregator should handle errors gracefully: {e}")
389389+390390+ resp.stop()
391391+ resp.reset()
392392+393393+394394+def test_coves_client_external_embed_format(aggregator_credentials):
395395+ """
396396+ Test external embed formatting.
397397+398398+ Verifies:
399399+ - Embed structure matches social.coves.embed.external
400400+ - All required fields are present
401401+ - Optional thumbnail is included when provided
402402+ """
403403+ handle, password = aggregator_credentials
404404+405405+ client = CovesClient(
406406+ api_url="http://localhost:8081",
407407+ handle=handle,
408408+ password=password
409409+ )
410410+411411+ # Test with thumbnail
412412+ embed = client.create_external_embed(
413413+ uri="https://example.com/story",
414414+ title="Test Story",
415415+ description="Test description",
416416+ thumb="https://example.com/image.jpg"
417417+ )
418418+419419+ assert embed["$type"] == "social.coves.embed.external"
420420+ assert embed["external"]["uri"] == "https://example.com/story"
421421+ assert embed["external"]["title"] == "Test Story"
422422+ assert embed["external"]["description"] == "Test description"
423423+ assert embed["external"]["thumb"] == "https://example.com/image.jpg"
424424+425425+ # Test without thumbnail
426426+ embed_no_thumb = client.create_external_embed(
427427+ uri="https://example.com/story2",
428428+ title="Test Story 2",
429429+ description="Test description 2"
430430+ )
431431+432432+ assert "thumb" not in embed_no_thumb["external"]
433433+ print("\n✅ External embed format correct")
+122
aggregators/kagi-news/tests/test_html_parser.py
···11+"""
22+Tests for Kagi HTML description parser.
33+"""
44+import pytest
55+from pathlib import Path
66+from datetime import datetime
77+import html
88+99+from src.html_parser import KagiHTMLParser
1010+from src.models import KagiStory, Perspective, Quote, Source
1111+1212+1313+@pytest.fixture
1414+def sample_html_description():
1515+ """Load sample HTML from RSS item fixture."""
1616+ # This is the escaped HTML from the RSS description field
1717+ html_content = """<p>The White House confirmed President Trump will hold a bilateral meeting with Chinese President Xi Jinping in South Korea on October 30, at the end of an Asia trip that includes Malaysia and Japan . The administration said the meeting will take place Thursday morning local time, and Mr Trump indicated his first question to Xi would concern fentanyl and other bilateral issues . The talks come amid heightened trade tensions after Beijing expanded export curbs on rare-earth minerals and following Mr Trump's recent threat of additional tariffs on Chinese goods, making the meeting a focal point for discussions on trade, technology supply chains and energy .</p><img src='https://kagiproxy.com/img/Q2SRXQtwTYBIiQeI0FG-X6taF_wHSJaXDiFUzju2kbCWGuOYIFUX--8L0BqE4VKxpbOJY3ylFPJkDpfSnyQYZ1qdOLXbphHTnsOK4jb7gqC4KCn5nf3ANbWCuaFD5ZUSijiK0k7wOLP2fyX6tynu2mPtXlCbotLo2lTrEswZl4-No2AI4mI4lkResfnRdp-YjpoEfCOHkNfbN1-0cNcHt9T2dmgBSXrQ2w' alt='News image associated with coverage of President Trump's Asia trip and planned meeting with President Xi' /><br /><h3>Highlights:</h3><ul><li>Itinerary details: The Asia swing begins in Malaysia, continues to Japan and ends with the bilateral meeting in South Korea on Thursday morning local time, White House press secretary Karoline Leavitt said at a briefing .</li><li>APEC context: US officials indicated the leaders will meet on the sidelines of the Asia-Pacific Economic Cooperation gathering, shaping expectations for short, high-level talks rather than a lengthy summit .</li></ul><blockquote>Work out a lot of our doubts and questions - President Trump</blockquote><h3>Perspectives:</h3><ul><li>President Trump: He said his first question to President Xi would be about fentanyl and indicated he hoped to resolve bilateral doubts and questions in the talks. (<a href='https://www.straitstimes.com/world/united-states/trump-to-meet-xi-in-south-korea-on-oct-30-as-part-of-asia-swing'>The Straits Times</a>)</li><li>White House (press secretary): Karoline Leavitt confirmed the bilateral meeting will occur Thursday morning local time during a White House briefing. (<a href='https://www.scmp.com/news/us/diplomacy/article/3330131/donald-trump-meet-chinas-xi-jinping-next-thursday-south-korea-crunch-talks'>South China Morning Post</a>)</li></ul><h3>Sources:</h3><ul><li><a href='https://www.straitstimes.com/world/united-states/trump-to-meet-xi-in-south-korea-on-oct-30-as-part-of-asia-swing'>Trump to meet Xi in South Korea on Oct 30 as part of Asia swing</a> - straitstimes.com</li><li><a href='https://www.scmp.com/news/us/diplomacy/article/3330131/donald-trump-meet-chinas-xi-jinping-next-thursday-south-korea-crunch-talks'>Trump to meet Xi in South Korea next Thursday as part of key Asia trip</a> - scmp.com</li></ul>"""
1818+ return html_content
1919+2020+2121+class TestKagiHTMLParser:
2222+ """Test suite for Kagi HTML parser."""
2323+2424+ def test_parse_summary(self, sample_html_description):
2525+ """Test extracting summary paragraph."""
2626+ parser = KagiHTMLParser()
2727+ result = parser.parse(sample_html_description)
2828+2929+ assert result['summary'].startswith("The White House confirmed President Trump")
3030+ assert "bilateral meeting with Chinese President Xi Jinping" in result['summary']
3131+3232+ def test_parse_image_url(self, sample_html_description):
3333+ """Test extracting image URL and alt text."""
3434+ parser = KagiHTMLParser()
3535+ result = parser.parse(sample_html_description)
3636+3737+ assert result['image_url'] is not None
3838+ assert result['image_url'].startswith("https://kagiproxy.com/img/")
3939+ assert result['image_alt'] is not None
4040+ assert "Trump" in result['image_alt']
4141+4242+ def test_parse_highlights(self, sample_html_description):
4343+ """Test extracting highlights list."""
4444+ parser = KagiHTMLParser()
4545+ result = parser.parse(sample_html_description)
4646+4747+ assert len(result['highlights']) == 2
4848+ assert "Itinerary details" in result['highlights'][0]
4949+ assert "APEC context" in result['highlights'][1]
5050+5151+ def test_parse_quote(self, sample_html_description):
5252+ """Test extracting blockquote."""
5353+ parser = KagiHTMLParser()
5454+ result = parser.parse(sample_html_description)
5555+5656+ assert result['quote'] is not None
5757+ assert result['quote']['text'] == "Work out a lot of our doubts and questions"
5858+ assert result['quote']['attribution'] == "President Trump"
5959+6060+ def test_parse_perspectives(self, sample_html_description):
6161+ """Test extracting perspectives list."""
6262+ parser = KagiHTMLParser()
6363+ result = parser.parse(sample_html_description)
6464+6565+ assert len(result['perspectives']) == 2
6666+6767+ # First perspective
6868+ assert result['perspectives'][0]['actor'] == "President Trump"
6969+ assert "fentanyl" in result['perspectives'][0]['description']
7070+ assert result['perspectives'][0]['source_url'] == "https://www.straitstimes.com/world/united-states/trump-to-meet-xi-in-south-korea-on-oct-30-as-part-of-asia-swing"
7171+7272+ # Second perspective
7373+ assert "White House" in result['perspectives'][1]['actor']
7474+7575+ def test_parse_sources(self, sample_html_description):
7676+ """Test extracting sources list."""
7777+ parser = KagiHTMLParser()
7878+ result = parser.parse(sample_html_description)
7979+8080+ assert len(result['sources']) >= 2
8181+8282+ # Check first source
8383+ assert result['sources'][0]['title'] == "Trump to meet Xi in South Korea on Oct 30 as part of Asia swing"
8484+ assert result['sources'][0]['url'].startswith("https://www.straitstimes.com")
8585+ assert result['sources'][0]['domain'] == "straitstimes.com"
8686+8787+ def test_parse_missing_sections(self):
8888+ """Test parsing HTML with missing sections."""
8989+ html_minimal = "<p>Just a summary, no other sections.</p>"
9090+9191+ parser = KagiHTMLParser()
9292+ result = parser.parse(html_minimal)
9393+9494+ assert result['summary'] == "Just a summary, no other sections."
9595+ assert result['highlights'] == []
9696+ assert result['perspectives'] == []
9797+ assert result['sources'] == []
9898+ assert result['quote'] is None
9999+ assert result['image_url'] is None
100100+101101+ def test_parse_to_kagi_story(self, sample_html_description):
102102+ """Test converting parsed HTML to KagiStory object."""
103103+ parser = KagiHTMLParser()
104104+105105+ # Simulate full RSS item data
106106+ story = parser.parse_to_story(
107107+ title="Trump to meet Xi in South Korea on Oct 30",
108108+ link="https://kite.kagi.com/test/world/10",
109109+ guid="https://kite.kagi.com/test/world/10",
110110+ pub_date=datetime(2025, 10, 23, 20, 56, 0),
111111+ categories=["World", "World/Diplomacy"],
112112+ html_description=sample_html_description
113113+ )
114114+115115+ assert isinstance(story, KagiStory)
116116+ assert story.title == "Trump to meet Xi in South Korea on Oct 30"
117117+ assert story.link == "https://kite.kagi.com/test/world/10"
118118+ assert len(story.highlights) == 2
119119+ assert len(story.perspectives) == 2
120120+ assert len(story.sources) >= 2
121121+ assert story.quote is not None
122122+ assert story.image_url is not None
···11+"""
22+Tests for Rich Text Formatter.
33+44+Tests conversion of KagiStory to Coves rich text format with facets.
55+"""
66+import pytest
77+from datetime import datetime
88+99+from src.richtext_formatter import RichTextFormatter
1010+from src.models import KagiStory, Perspective, Quote, Source
1111+1212+1313+@pytest.fixture
1414+def sample_story():
1515+ """Create a sample KagiStory for testing."""
1616+ return KagiStory(
1717+ title="Trump to meet Xi in South Korea",
1818+ link="https://kite.kagi.com/test/world/10",
1919+ guid="https://kite.kagi.com/test/world/10",
2020+ pub_date=datetime(2025, 10, 23, 20, 56, 0),
2121+ categories=["World", "World/Diplomacy"],
2222+ summary="The White House confirmed President Trump will hold a bilateral meeting with Chinese President Xi Jinping in South Korea on October 30.",
2323+ highlights=[
2424+ "Itinerary details: The Asia swing begins in Malaysia, continues to Japan.",
2525+ "APEC context: US officials indicated the leaders will meet on the sidelines."
2626+ ],
2727+ perspectives=[
2828+ Perspective(
2929+ actor="President Trump",
3030+ description="He said his first question to President Xi would be about fentanyl.",
3131+ source_url="https://www.straitstimes.com/world/test"
3232+ ),
3333+ Perspective(
3434+ actor="White House (press secretary)",
3535+ description="Karoline Leavitt confirmed the bilateral meeting.",
3636+ source_url="https://www.scmp.com/news/test"
3737+ )
3838+ ],
3939+ quote=Quote(
4040+ text="Work out a lot of our doubts and questions",
4141+ attribution="President Trump"
4242+ ),
4343+ sources=[
4444+ Source(
4545+ title="Trump to meet Xi in South Korea",
4646+ url="https://www.straitstimes.com/world/test",
4747+ domain="straitstimes.com"
4848+ ),
4949+ Source(
5050+ title="Trump meeting Xi next Thursday",
5151+ url="https://www.scmp.com/news/test",
5252+ domain="scmp.com"
5353+ )
5454+ ],
5555+ image_url="https://kagiproxy.com/img/test123",
5656+ image_alt="Test image"
5757+ )
5858+5959+6060+class TestRichTextFormatter:
6161+ """Test suite for RichTextFormatter."""
6262+6363+ def test_format_full_returns_content_and_facets(self, sample_story):
6464+ """Test that format_full returns content and facets."""
6565+ formatter = RichTextFormatter()
6666+ result = formatter.format_full(sample_story)
6767+6868+ assert 'content' in result
6969+ assert 'facets' in result
7070+ assert isinstance(result['content'], str)
7171+ assert isinstance(result['facets'], list)
7272+7373+ def test_content_structure(self, sample_story):
7474+ """Test that content has correct structure."""
7575+ formatter = RichTextFormatter()
7676+ result = formatter.format_full(sample_story)
7777+ content = result['content']
7878+7979+ # Check all sections are present
8080+ assert sample_story.summary in content
8181+ assert "Highlights:" in content
8282+ assert "Perspectives:" in content
8383+ assert "Sources:" in content
8484+ assert sample_story.quote.text in content
8585+ assert "📰 Story aggregated by Kagi News" in content
8686+8787+ def test_facets_for_bold_headers(self, sample_story):
8888+ """Test that section headers have bold facets."""
8989+ formatter = RichTextFormatter()
9090+ result = formatter.format_full(sample_story)
9191+9292+ # Find bold facets
9393+ bold_facets = [
9494+ f for f in result['facets']
9595+ if any(feat.get('$type') == 'social.coves.richtext.facet#bold'
9696+ for feat in f['features'])
9797+ ]
9898+9999+ assert len(bold_facets) > 0
100100+101101+ # Check that "Highlights:" is bolded
102102+ content = result['content']
103103+ highlights_pos = content.find("Highlights:")
104104+105105+ # Should have a bold facet covering "Highlights:"
106106+ has_highlights_bold = any(
107107+ f['index']['byteStart'] <= highlights_pos and
108108+ f['index']['byteEnd'] >= highlights_pos + len("Highlights:")
109109+ for f in bold_facets
110110+ )
111111+ assert has_highlights_bold
112112+113113+ def test_facets_for_italic_quote(self, sample_story):
114114+ """Test that quotes have italic facets."""
115115+ formatter = RichTextFormatter()
116116+ result = formatter.format_full(sample_story)
117117+118118+ # Find italic facets
119119+ italic_facets = [
120120+ f for f in result['facets']
121121+ if any(feat.get('$type') == 'social.coves.richtext.facet#italic'
122122+ for feat in f['features'])
123123+ ]
124124+125125+ assert len(italic_facets) > 0
126126+127127+ # The quote text is wrapped with quotes, so search for that
128128+ content = result['content']
129129+ quote_with_quotes = f'"{sample_story.quote.text}"'
130130+ quote_char_pos = content.find(quote_with_quotes)
131131+132132+ # Convert character position to byte position
133133+ quote_byte_start = len(content[:quote_char_pos].encode('utf-8'))
134134+ quote_byte_end = len(content[:quote_char_pos + len(quote_with_quotes)].encode('utf-8'))
135135+136136+ has_quote_italic = any(
137137+ f['index']['byteStart'] <= quote_byte_start and
138138+ f['index']['byteEnd'] >= quote_byte_end
139139+ for f in italic_facets
140140+ )
141141+ assert has_quote_italic
142142+143143+ def test_facets_for_links(self, sample_story):
144144+ """Test that URLs have link facets."""
145145+ formatter = RichTextFormatter()
146146+ result = formatter.format_full(sample_story)
147147+148148+ # Find link facets
149149+ link_facets = [
150150+ f for f in result['facets']
151151+ if any(feat.get('$type') == 'social.coves.richtext.facet#link'
152152+ for feat in f['features'])
153153+ ]
154154+155155+ # Should have links for: 2 sources + 2 perspectives + 1 Kagi News link = 5 minimum
156156+ assert len(link_facets) >= 5
157157+158158+ # Check that first source URL has a link facet
159159+ source_urls = [s.url for s in sample_story.sources]
160160+ for url in source_urls:
161161+ has_link = any(
162162+ any(feat.get('uri') == url for feat in f['features'])
163163+ for f in link_facets
164164+ )
165165+ assert has_link, f"Missing link facet for {url}"
166166+167167+ def test_utf8_byte_positions(self):
168168+ """Test UTF-8 byte position calculation with multi-byte characters."""
169169+ # Create story with emoji and non-ASCII characters
170170+ story = KagiStory(
171171+ title="Test 👋 Story",
172172+ link="https://test.com",
173173+ guid="https://test.com",
174174+ pub_date=datetime.now(),
175175+ categories=["Test"],
176176+ summary="Hello 世界 this is a test with emoji 🎉",
177177+ highlights=["Test highlight"],
178178+ perspectives=[],
179179+ quote=None,
180180+ sources=[],
181181+ )
182182+183183+ formatter = RichTextFormatter()
184184+ result = formatter.format_full(story)
185185+186186+ # Verify content contains the emoji
187187+ assert "👋" in result['content'] or "🎉" in result['content']
188188+189189+ # Verify all facet byte positions are valid
190190+ content_bytes = result['content'].encode('utf-8')
191191+ for facet in result['facets']:
192192+ start = facet['index']['byteStart']
193193+ end = facet['index']['byteEnd']
194194+195195+ # Positions should be within bounds
196196+ assert 0 <= start < len(content_bytes)
197197+ assert start < end <= len(content_bytes)
198198+199199+ def test_format_story_without_optional_fields(self):
200200+ """Test formatting story with missing optional fields."""
201201+ minimal_story = KagiStory(
202202+ title="Minimal Story",
203203+ link="https://test.com",
204204+ guid="https://test.com",
205205+ pub_date=datetime.now(),
206206+ categories=["Test"],
207207+ summary="Just a summary.",
208208+ highlights=[], # Empty
209209+ perspectives=[], # Empty
210210+ quote=None, # Missing
211211+ sources=[], # Empty
212212+ )
213213+214214+ formatter = RichTextFormatter()
215215+ result = formatter.format_full(minimal_story)
216216+217217+ # Should still have content and facets
218218+ assert result['content']
219219+ assert result['facets']
220220+221221+ # Should have summary
222222+ assert "Just a summary." in result['content']
223223+224224+ # Should NOT have empty sections
225225+ assert "Highlights:" not in result['content']
226226+ assert "Perspectives:" not in result['content']
227227+228228+ def test_perspective_actor_is_bolded(self, sample_story):
229229+ """Test that perspective actor names are bolded."""
230230+ formatter = RichTextFormatter()
231231+ result = formatter.format_full(sample_story)
232232+233233+ content = result['content']
234234+ bold_facets = [
235235+ f for f in result['facets']
236236+ if any(feat.get('$type') == 'social.coves.richtext.facet#bold'
237237+ for feat in f['features'])
238238+ ]
239239+240240+ # Find "President Trump:" in perspectives section
241241+ actor = "President Trump:"
242242+ perspectives_start = content.find("Perspectives:")
243243+ actor_char_pos = content.find(actor, perspectives_start)
244244+245245+ if actor_char_pos != -1: # If found in perspectives
246246+ # Convert character position to byte position
247247+ actor_byte_start = len(content[:actor_char_pos].encode('utf-8'))
248248+ actor_byte_end = len(content[:actor_char_pos + len(actor)].encode('utf-8'))
249249+250250+ has_actor_bold = any(
251251+ f['index']['byteStart'] <= actor_byte_start and
252252+ f['index']['byteEnd'] >= actor_byte_end
253253+ for f in bold_facets
254254+ )
255255+ assert has_actor_bold
256256+257257+ def test_kagi_attribution_link(self, sample_story):
258258+ """Test that Kagi News attribution has a link to the story."""
259259+ formatter = RichTextFormatter()
260260+ result = formatter.format_full(sample_story)
261261+262262+ # Should have link to Kagi story
263263+ link_facets = [
264264+ f for f in result['facets']
265265+ if any(feat.get('$type') == 'social.coves.richtext.facet#link'
266266+ for feat in f['features'])
267267+ ]
268268+269269+ # Find link to the Kagi story URL
270270+ kagi_link = any(
271271+ any(feat.get('uri') == sample_story.link for feat in f['features'])
272272+ for f in link_facets
273273+ )
274274+ assert kagi_link, "Missing link to Kagi story in attribution"
275275+276276+ def test_facets_do_not_overlap(self, sample_story):
277277+ """Test that facets with same feature type don't overlap."""
278278+ formatter = RichTextFormatter()
279279+ result = formatter.format_full(sample_story)
280280+281281+ # Group facets by type
282282+ facets_by_type = {}
283283+ for facet in result['facets']:
284284+ for feature in facet['features']:
285285+ ftype = feature['$type']
286286+ if ftype not in facets_by_type:
287287+ facets_by_type[ftype] = []
288288+ facets_by_type[ftype].append(facet)
289289+290290+ # Check for overlaps within each type
291291+ for ftype, facets in facets_by_type.items():
292292+ for i, f1 in enumerate(facets):
293293+ for f2 in facets[i+1:]:
294294+ start1, end1 = f1['index']['byteStart'], f1['index']['byteEnd']
295295+ start2, end2 = f2['index']['byteStart'], f2['index']['byteEnd']
296296+297297+ # Check if they overlap
298298+ overlaps = (start1 < end2 and start2 < end1)
299299+ assert not overlaps, f"Overlapping facets of type {ftype}: {f1} and {f2}"
+91
aggregators/kagi-news/tests/test_rss_fetcher.py
···11+"""
22+Tests for RSS feed fetching functionality.
33+"""
44+import pytest
55+import responses
66+from pathlib import Path
77+88+from src.rss_fetcher import RSSFetcher
99+1010+1111+@pytest.fixture
1212+def sample_rss_feed():
1313+ """Load sample RSS feed from fixtures."""
1414+ fixture_path = Path(__file__).parent / "fixtures" / "world.xml"
1515+ # For now, use a minimal test feed
1616+ return """<?xml version='1.0' encoding='UTF-8'?>
1717+<rss version="2.0">
1818+ <channel>
1919+ <title>Kagi News - World</title>
2020+ <item>
2121+ <title>Test Story</title>
2222+ <link>https://kite.kagi.com/test/world/1</link>
2323+ <guid>https://kite.kagi.com/test/world/1</guid>
2424+ <pubDate>Fri, 24 Oct 2025 12:00:00 +0000</pubDate>
2525+ <category>World</category>
2626+ </item>
2727+ </channel>
2828+</rss>"""
2929+3030+3131+class TestRSSFetcher:
3232+ """Test suite for RSSFetcher."""
3333+3434+ @responses.activate
3535+ def test_fetch_feed_success(self, sample_rss_feed):
3636+ """Test successful RSS feed fetch."""
3737+ url = "https://news.kagi.com/world.xml"
3838+ responses.add(responses.GET, url, body=sample_rss_feed, status=200)
3939+4040+ fetcher = RSSFetcher()
4141+ feed = fetcher.fetch_feed(url)
4242+4343+ assert feed is not None
4444+ assert feed.feed.title == "Kagi News - World"
4545+ assert len(feed.entries) == 1
4646+ assert feed.entries[0].title == "Test Story"
4747+4848+ @responses.activate
4949+ def test_fetch_feed_timeout(self):
5050+ """Test fetch with timeout."""
5151+ url = "https://news.kagi.com/world.xml"
5252+ responses.add(responses.GET, url, body="timeout", status=408)
5353+5454+ fetcher = RSSFetcher(timeout=5)
5555+5656+ with pytest.raises(Exception): # Should raise on timeout
5757+ fetcher.fetch_feed(url)
5858+5959+ @responses.activate
6060+ def test_fetch_feed_with_retry(self, sample_rss_feed):
6161+ """Test fetch with retry on failure then success."""
6262+ url = "https://news.kagi.com/world.xml"
6363+6464+ # First call fails, second succeeds
6565+ responses.add(responses.GET, url, body="error", status=500)
6666+ responses.add(responses.GET, url, body=sample_rss_feed, status=200)
6767+6868+ fetcher = RSSFetcher(max_retries=2)
6969+ feed = fetcher.fetch_feed(url)
7070+7171+ assert feed is not None
7272+ assert len(feed.entries) == 1
7373+7474+ @responses.activate
7575+ def test_fetch_feed_invalid_xml(self):
7676+ """Test handling of invalid XML."""
7777+ url = "https://news.kagi.com/world.xml"
7878+ responses.add(responses.GET, url, body="Not valid XML!", status=200)
7979+8080+ fetcher = RSSFetcher()
8181+ feed = fetcher.fetch_feed(url)
8282+8383+ # feedparser is lenient, but should have bozo flag set
8484+ assert feed.bozo == 1 # feedparser uses 1 for True
8585+8686+ def test_fetch_feed_requires_url(self):
8787+ """Test that fetch_feed requires a URL."""
8888+ fetcher = RSSFetcher()
8989+9090+ with pytest.raises((ValueError, TypeError)):
9191+ fetcher.fetch_feed("")
+227
aggregators/kagi-news/tests/test_state_manager.py
···11+"""
22+Tests for State Manager.
33+44+Tests deduplication state tracking and persistence.
55+"""
66+import pytest
77+import json
88+import tempfile
99+from pathlib import Path
1010+from datetime import datetime, timedelta
1111+1212+from src.state_manager import StateManager
1313+1414+1515+@pytest.fixture
1616+def temp_state_file():
1717+ """Create a temporary state file for testing."""
1818+ with tempfile.NamedTemporaryFile(mode='w', delete=False, suffix='.json') as f:
1919+ temp_path = Path(f.name)
2020+ yield temp_path
2121+ # Cleanup
2222+ if temp_path.exists():
2323+ temp_path.unlink()
2424+2525+2626+class TestStateManager:
2727+ """Test suite for StateManager."""
2828+2929+ def test_initialize_new_state_file(self, temp_state_file):
3030+ """Test initializing a new state file."""
3131+ manager = StateManager(temp_state_file)
3232+3333+ # Should create an empty state
3434+ assert temp_state_file.exists()
3535+ state = json.loads(temp_state_file.read_text())
3636+ assert 'feeds' in state
3737+ assert state['feeds'] == {}
3838+3939+ def test_is_posted_returns_false_for_new_guid(self, temp_state_file):
4040+ """Test that is_posted returns False for new GUIDs."""
4141+ manager = StateManager(temp_state_file)
4242+ feed_url = "https://news.kagi.com/world.xml"
4343+ guid = "https://kite.kagi.com/test/world/1"
4444+4545+ assert not manager.is_posted(feed_url, guid)
4646+4747+ def test_mark_posted_stores_guid(self, temp_state_file):
4848+ """Test that mark_posted stores GUIDs."""
4949+ manager = StateManager(temp_state_file)
5050+ feed_url = "https://news.kagi.com/world.xml"
5151+ guid = "https://kite.kagi.com/test/world/1"
5252+ post_uri = "at://did:plc:test/social.coves.post/abc123"
5353+5454+ manager.mark_posted(feed_url, guid, post_uri)
5555+5656+ # Should now return True
5757+ assert manager.is_posted(feed_url, guid)
5858+5959+ def test_state_persists_across_instances(self, temp_state_file):
6060+ """Test that state persists when creating new instances."""
6161+ feed_url = "https://news.kagi.com/world.xml"
6262+ guid = "https://kite.kagi.com/test/world/1"
6363+ post_uri = "at://did:plc:test/social.coves.post/abc123"
6464+6565+ # First instance marks as posted
6666+ manager1 = StateManager(temp_state_file)
6767+ manager1.mark_posted(feed_url, guid, post_uri)
6868+6969+ # Second instance should see the same state
7070+ manager2 = StateManager(temp_state_file)
7171+ assert manager2.is_posted(feed_url, guid)
7272+7373+ def test_track_last_run_timestamp(self, temp_state_file):
7474+ """Test tracking last successful run timestamp."""
7575+ manager = StateManager(temp_state_file)
7676+ feed_url = "https://news.kagi.com/world.xml"
7777+ timestamp = datetime.now()
7878+7979+ manager.update_last_run(feed_url, timestamp)
8080+8181+ retrieved = manager.get_last_run(feed_url)
8282+ assert retrieved is not None
8383+ # Compare timestamps (allow small difference due to serialization)
8484+ assert abs((retrieved - timestamp).total_seconds()) < 1
8585+8686+ def test_get_last_run_returns_none_for_new_feed(self, temp_state_file):
8787+ """Test that get_last_run returns None for new feeds."""
8888+ manager = StateManager(temp_state_file)
8989+ feed_url = "https://news.kagi.com/world.xml"
9090+9191+ assert manager.get_last_run(feed_url) is None
9292+9393+ def test_cleanup_old_guids(self, temp_state_file):
9494+ """Test cleanup of old GUIDs (> 30 days)."""
9595+ manager = StateManager(temp_state_file)
9696+ feed_url = "https://news.kagi.com/world.xml"
9797+9898+ # Add recent GUID
9999+ recent_guid = "https://kite.kagi.com/test/world/1"
100100+ manager.mark_posted(feed_url, recent_guid, "at://test/1")
101101+102102+ # Manually add old GUID (> 30 days)
103103+ old_timestamp = (datetime.now() - timedelta(days=31)).isoformat()
104104+ state_data = json.loads(temp_state_file.read_text())
105105+ state_data['feeds'][feed_url]['posted_guids'].append({
106106+ 'guid': 'https://kite.kagi.com/test/world/old',
107107+ 'post_uri': 'at://test/old',
108108+ 'posted_at': old_timestamp
109109+ })
110110+ temp_state_file.write_text(json.dumps(state_data, indent=2))
111111+112112+ # Reload and cleanup
113113+ manager = StateManager(temp_state_file)
114114+ manager.cleanup_old_entries(feed_url)
115115+116116+ # Recent GUID should still be there
117117+ assert manager.is_posted(feed_url, recent_guid)
118118+119119+ # Old GUID should be removed
120120+ assert not manager.is_posted(feed_url, 'https://kite.kagi.com/test/world/old')
121121+122122+ def test_limit_guids_to_100_per_feed(self, temp_state_file):
123123+ """Test that only last 100 GUIDs are kept per feed."""
124124+ manager = StateManager(temp_state_file)
125125+ feed_url = "https://news.kagi.com/world.xml"
126126+127127+ # Add 150 GUIDs
128128+ for i in range(150):
129129+ guid = f"https://kite.kagi.com/test/world/{i}"
130130+ manager.mark_posted(feed_url, guid, f"at://test/{i}")
131131+132132+ # Cleanup (should limit to 100)
133133+ manager.cleanup_old_entries(feed_url)
134134+135135+ # Reload state
136136+ manager = StateManager(temp_state_file)
137137+138138+ # Should have exactly 100 entries (most recent)
139139+ state_data = json.loads(temp_state_file.read_text())
140140+ assert len(state_data['feeds'][feed_url]['posted_guids']) == 100
141141+142142+ # Oldest entries should be removed
143143+ assert not manager.is_posted(feed_url, "https://kite.kagi.com/test/world/0")
144144+ assert not manager.is_posted(feed_url, "https://kite.kagi.com/test/world/49")
145145+146146+ # Recent entries should still be there
147147+ assert manager.is_posted(feed_url, "https://kite.kagi.com/test/world/149")
148148+ assert manager.is_posted(feed_url, "https://kite.kagi.com/test/world/100")
149149+150150+ def test_multiple_feeds_tracked_separately(self, temp_state_file):
151151+ """Test that multiple feeds are tracked independently."""
152152+ manager = StateManager(temp_state_file)
153153+154154+ feed1 = "https://news.kagi.com/world.xml"
155155+ feed2 = "https://news.kagi.com/tech.xml"
156156+ guid1 = "https://kite.kagi.com/test/world/1"
157157+ guid2 = "https://kite.kagi.com/test/tech/1"
158158+159159+ manager.mark_posted(feed1, guid1, "at://test/1")
160160+ manager.mark_posted(feed2, guid2, "at://test/2")
161161+162162+ # Each feed should only know about its own GUIDs
163163+ assert manager.is_posted(feed1, guid1)
164164+ assert not manager.is_posted(feed1, guid2)
165165+166166+ assert manager.is_posted(feed2, guid2)
167167+ assert not manager.is_posted(feed2, guid1)
168168+169169+ def test_get_posted_count(self, temp_state_file):
170170+ """Test getting count of posted items per feed."""
171171+ manager = StateManager(temp_state_file)
172172+ feed_url = "https://news.kagi.com/world.xml"
173173+174174+ # Initially 0
175175+ assert manager.get_posted_count(feed_url) == 0
176176+177177+ # Add 5 items
178178+ for i in range(5):
179179+ manager.mark_posted(feed_url, f"guid-{i}", f"post-{i}")
180180+181181+ assert manager.get_posted_count(feed_url) == 5
182182+183183+ def test_state_file_format_is_valid_json(self, temp_state_file):
184184+ """Test that state file is always valid JSON."""
185185+ manager = StateManager(temp_state_file)
186186+ feed_url = "https://news.kagi.com/world.xml"
187187+188188+ manager.mark_posted(feed_url, "test-guid", "test-post-uri")
189189+ manager.update_last_run(feed_url, datetime.now())
190190+191191+ # Should be valid JSON
192192+ with open(temp_state_file) as f:
193193+ state = json.load(f)
194194+195195+ assert 'feeds' in state
196196+ assert feed_url in state['feeds']
197197+ assert 'posted_guids' in state['feeds'][feed_url]
198198+ assert 'last_successful_run' in state['feeds'][feed_url]
199199+200200+ def test_automatic_cleanup_on_mark_posted(self, temp_state_file):
201201+ """Test that cleanup happens automatically when marking posted."""
202202+ manager = StateManager(temp_state_file)
203203+ feed_url = "https://news.kagi.com/world.xml"
204204+205205+ # Add old entry manually
206206+ old_timestamp = (datetime.now() - timedelta(days=31)).isoformat()
207207+ state_data = {
208208+ 'feeds': {
209209+ feed_url: {
210210+ 'posted_guids': [{
211211+ 'guid': 'old-guid',
212212+ 'post_uri': 'old-uri',
213213+ 'posted_at': old_timestamp
214214+ }],
215215+ 'last_successful_run': None
216216+ }
217217+ }
218218+ }
219219+ temp_state_file.write_text(json.dumps(state_data, indent=2))
220220+221221+ # Reload and add new entry (should trigger cleanup)
222222+ manager = StateManager(temp_state_file)
223223+ manager.mark_posted(feed_url, "new-guid", "new-uri")
224224+225225+ # Old entry should be gone
226226+ assert not manager.is_posted(feed_url, "old-guid")
227227+ assert manager.is_posted(feed_url, "new-guid")
+40
docs/PRD_COMMUNITIES.md
···201201202202---
203203204204+### Blob Upload Proxy System
205205+**Status:** Design documented, implementation TODO
206206+**Priority:** CRITICAL for Beta - Required for image/video posts in communities
207207+208208+**Problem:** Users on external PDSs cannot directly upload blobs to community-owned PDS repositories because they lack authentication credentials for the community's PDS.
209209+210210+**Solution:** Coves AppView acts as an authenticated proxy for blob uploads:
211211+212212+**Flow:**
213213+1. User uploads blob to Coves AppView via `social.coves.blob.uploadForCommunity`
214214+2. AppView validates user can post to community (not banned, community accessible)
215215+3. AppView uses community's PDS credentials to upload blob via `com.atproto.repo.uploadBlob`
216216+4. AppView returns CID to user
217217+5. User creates post record referencing the CID
218218+6. Post and blob both live in community's PDS
219219+220220+**Implementation Checklist:**
221221+- [ ] Handler: `social.coves.blob.uploadForCommunity` endpoint
222222+- [ ] Validation: Check user authorization to post in community
223223+- [ ] Credential Management: Reuse community token refresh logic
224224+- [ ] Upload Proxy: Forward blob to community's PDS with community credentials
225225+- [ ] Security: Size limits, content-type validation, rate limiting
226226+- [ ] Testing: E2E test with federated user uploading to community
227227+228228+**Why This Approach:**
229229+- ✅ Works with federated users (any PDS)
230230+- ✅ Reuses existing community credential infrastructure
231231+- ✅ Matches V2 architecture (AppView orchestrates, communities own data)
232232+- ✅ Blobs stored on correct PDS (community's repository)
233233+- ❌ AppView becomes upload intermediary (bandwidth cost)
234234+235235+**Alternative Considered:** Direct user uploads to community PDS
236236+- Rejected: Would require creating temporary user accounts on every community PDS (complex, insecure)
237237+238238+**See:** Design discussion in context of ATProto blob architecture
239239+240240+---
241241+204242### Posts in Communities
205243**Status:** Lexicon designed, implementation TODO
206244**Priority:** HIGHEST for Beta 1
···214252- [ ] Decide membership requirements for posting
215253216254**Without posts, communities exist but can't be used!**
255255+256256+**Depends on:** Blob Upload Proxy System (for image/video posts)
217257218258---
219259
+704-884
docs/aggregators/PRD_KAGI_NEWS_RSS.md
···11# Kagi News RSS Aggregator PRD
2233-**Status:** Planning Phase
33+**Status:** ✅ Phase 1 Complete - Ready for Deployment
44**Owner:** Platform Team
55-**Last Updated:** 2025-10-20
55+**Last Updated:** 2025-10-24
66**Parent PRD:** [PRD_AGGREGATORS.md](PRD_AGGREGATORS.md)
77+**Implementation:** Python + Docker Compose
88+99+## 🎉 Implementation Complete
1010+1111+All core components have been implemented and tested:
1212+1313+- ✅ **RSS Fetcher** - Fetches feeds with retry logic and error handling
1414+- ✅ **HTML Parser** - Extracts all structured data (summary, highlights, perspectives, quote, sources)
1515+- ✅ **Rich Text Formatter** - Formats content with proper facets for Coves
1616+- ✅ **State Manager** - Tracks posted stories to prevent duplicates
1717+- ✅ **Config Manager** - Loads and validates YAML configuration
1818+- ✅ **Coves Client** - Handles authentication and post creation
1919+- ✅ **Main Orchestrator** - Coordinates all components
2020+- ✅ **Comprehensive Tests** - 57 tests with 83% code coverage
2121+- ✅ **Documentation** - README with setup and deployment instructions
2222+- ✅ **Example Configs** - config.example.yaml and .env.example
2323+2424+**Test Results:**
2525+```
2626+57 passed, 6 skipped, 1 warning in 8.76s
2727+Coverage: 83%
2828+```
2929+3030+**Ready for:**
3131+- Integration testing with live Coves API
3232+- Aggregator DID creation and authorization
3333+- Production deployment
734835## Overview
936···1542- **Rich metadata**: Categories, highlights, source links included
1643- **Legal & free**: CC BY-NC licensed for non-commercial use
1744- **Low complexity**: No LLM deduplication needed (Kagi does it)
4545+- **Simple deployment**: Python + Docker Compose, runs alongside Coves on same instance
18461947## Data Source: Kagi News RSS Feeds
2048···46744775**Known Categories:**
4876- `world.xml` - World news
4949-- `tech.xml` - Technology (likely)
5050-- `business.xml` - Business (likely)
7777+- `tech.xml` - Technology
7878+- `business.xml` - Business
5179- `sports.xml` - Sports (likely)
5280- Additional categories TBD (need to scrape homepage)
5381···55835684**Update Frequency:** One daily update (~noon UTC)
57858686+**Important Note on Domain Migration (October 2025):**
8787+Kagi migrated their RSS feeds from `kite.kagi.com` to `news.kagi.com`. The old domain now redirects (302) to the new domain, but for reliability, always use `news.kagi.com` directly in your feed URLs. Story links within the RSS feed still reference `kite.kagi.com` as permalinks.
8888+5889---
59906091### RSS Item Schema
···99130</ul>
100131```
101132133133+**✅ Verified Feed Structure:**
134134+Analysis of live Kagi News feeds confirms the following structure:
135135+- **Only 3 H3 sections:** Highlights, Perspectives, Sources (no other sections like Timeline or Historical Background)
136136+- **Historical context** is woven into the summary paragraph and highlights (not a separate section)
137137+- **Not all stories have all sections** - Quote (blockquote) and image are optional
138138+- **Feed contains everything shown on website** except for Timeline (which is a frontend-only feature)
139139+102140**Key Features:**
103141- Multiple source citations inline
104142- Balanced perspectives from different actors
105105-- Highlights extract key points
106106-- Direct quotes preserved
143143+- Highlights extract key points with historical context
144144+- Direct quotes preserved (when available)
107145- All sources linked with attribution
146146+- Images from Kagi's proxy CDN
108147109148---
110149···123162 │ HTTP GET one job after update
124163 ▼
125164┌─────────────────────────────────────────────────────────────┐
126126-│ Kagi News Aggregator Service │
127127-│ DID: did:web:kagi-news.coves.social │
165165+│ Kagi News Aggregator Service (Python + Docker Compose) │
166166+│ DID: did:plc:[generated-on-creation] │
167167+│ Location: aggregators/kagi-news/ │
128168│ │
129169│ Components: │
130130-│ 1. Feed Poller: Fetches RSS feeds on schedule │
131131-│ 2. Item Parser: Extracts structured data from HTML │
132132-│ 3. Deduplication: Tracks posted GUIDs (no LLM needed) │
133133-│ 4. Category Mapper: Maps Kagi categories to communities │
170170+│ 1. RSS Fetcher: Fetches RSS feeds on schedule (feedparser) │
171171+│ 2. Item Parser: Extracts structured data from HTML (bs4) │
172172+│ 3. Deduplication: Tracks posted items via JSON state file │
173173+│ 4. Feed Mapper: Maps feed URLs to community handles │
134174│ 5. Post Formatter: Converts to Coves post format │
135135-│ 6. Post Publisher: Calls social.coves.post.create │
175175+│ 6. Post Publisher: Calls social.coves.post.create via XRPC │
176176+│ 7. Blob Uploader: Handles image upload to ATProto │
136177└─────────────────────────────────────────────────────────────┘
137178 │
138179 │ Authenticated XRPC calls
···140181┌─────────────────────────────────────────────────────────────┐
141182│ Coves AppView (social.coves.post.create) │
142183│ - Validates aggregator authorization │
143143-│ - Creates post with author = did:web:kagi-news.coves.social│
184184+│ - Creates post with author = did:plc:[aggregator-did] │
144185│ - Indexes to community feeds │
145186└─────────────────────────────────────────────────────────────┘
146187```
···152193```json
153194{
154195 "$type": "social.coves.aggregator.service",
155155- "did": "did:web:kagi-news.coves.social",
196196+ "did": "did:plc:[generated-on-creation]",
156197 "displayName": "Kagi News Aggregator",
157198 "description": "Automatically posts breaking news from Kagi News RSS feeds. Kagi News aggregates multiple sources per story with balanced perspectives and comprehensive source citations.",
158199 "aggregatorType": "social.coves.aggregator.types#rss",
···160201 "configSchema": {
161202 "type": "object",
162203 "properties": {
163163- "categories": {
164164- "type": "array",
165165- "items": {
166166- "type": "string",
167167- "enum": ["world", "tech", "business", "sports", "science"]
168168- },
169169- "description": "Kagi News categories to monitor",
170170- "minItems": 1
171171- },
172172- "subcategoryFilter": {
173173- "type": "array",
174174- "items": { "type": "string" },
175175- "description": "Optional: only post stories with these subcategories (e.g., 'World/Middle East', 'Tech/AI')"
176176- },
177177- "minSources": {
178178- "type": "integer",
179179- "minimum": 1,
180180- "default": 2,
181181- "description": "Minimum number of sources required for a story to be posted"
182182- },
183183- "includeImages": {
184184- "type": "boolean",
185185- "default": true,
186186- "description": "Include images from Kagi proxy in posts"
187187- },
188188- "postFormat": {
204204+ "feedUrl": {
189205 "type": "string",
190190- "enum": ["full", "summary", "minimal"],
191191- "default": "full",
192192- "description": "How much content to include: full (all sections), summary (main paragraph + sources), minimal (title + link only)"
206206+ "format": "uri",
207207+ "description": "Kagi News RSS feed URL (e.g., https://news.kagi.com/world.xml)"
193208 }
194209 },
195195- "required": ["categories"]
210210+ "required": ["feedUrl"]
196211 },
197212 "sourceUrl": "https://github.com/coves-social/kagi-news-aggregator",
198213 "maintainer": "did:plc:coves-platform",
199199- "createdAt": "2025-10-20T12:00:00Z"
214214+ "createdAt": "2025-10-23T00:00:00Z"
200215}
201216```
202217218218+**Note:** The MVP implementation uses a simpler configuration model. Feed-to-community mappings are defined in the aggregator's own config file rather than per-community configuration. This allows one aggregator instance to post to multiple communities.
219219+203220---
204221205205-## Community Configuration Examples
222222+## Aggregator Configuration (MVP)
206223207207-### Example 1: World News Community
224224+The MVP uses a simplified configuration model where the aggregator service defines feed-to-community mappings in its own config file.
208225209209-```json
210210-{
211211- "aggregatorDid": "did:web:kagi-news.coves.social",
212212- "enabled": true,
213213- "config": {
214214- "categories": ["world"],
215215- "minSources": 3,
216216- "includeImages": true,
217217- "postFormat": "full"
218218- }
219219-}
220220-```
226226+### Configuration File: `config.yaml`
221227222222-**Result:** Posts all world news stories with 3+ sources, full content including images/highlights/perspectives.
228228+```yaml
229229+# Aggregator credentials (from environment variables)
230230+# AGGREGATOR_DID=did:plc:xyz...
231231+# AGGREGATOR_PRIVATE_KEY=base64-encoded-key...
223232224224----
233233+# Coves API endpoint
234234+coves_api_url: "https://api.coves.social"
225235226226-### Example 2: AI/Tech Community (Filtered)
236236+# Feed-to-community mappings
237237+feeds:
238238+ - name: "World News"
239239+ url: "https://news.kagi.com/world.xml"
240240+ community_handle: "world-news.coves.social"
241241+ enabled: true
227242228228-```json
229229-{
230230- "aggregatorDid": "did:web:kagi-news.coves.social",
231231- "enabled": true,
232232- "config": {
233233- "categories": ["tech", "business"],
234234- "subcategoryFilter": ["Tech/AI", "Tech/Machine Learning", "Business/Tech Industry"],
235235- "minSources": 2,
236236- "includeImages": true,
237237- "postFormat": "full"
238238- }
239239-}
240240-```
243243+ - name: "Tech News"
244244+ url: "https://news.kagi.com/tech.xml"
245245+ community_handle: "tech.coves.social"
246246+ enabled: true
241247242242-**Result:** Only posts tech stories about AI/ML or tech industry business news with 2+ sources.
248248+ - name: "Science News"
249249+ url: "https://news.kagi.com/science.xml"
250250+ community_handle: "science.coves.social"
251251+ enabled: false # Can be disabled without removing
243252244244----
245245-246246-### Example 3: Breaking News (Minimal)
253253+# Scheduling
254254+check_interval: "24h" # Run once daily
247255248248-```json
249249-{
250250- "aggregatorDid": "did:web:kagi-news.coves.social",
251251- "enabled": true,
252252- "config": {
253253- "categories": ["world", "business", "tech"],
254254- "minSources": 5,
255255- "includeImages": false,
256256- "postFormat": "minimal"
257257- }
258258-}
256256+# Logging
257257+log_level: "info"
259258```
260259261261-**Result:** Only major stories (5+ sources), minimal format (headline + link), no images.
260260+**Key Decisions:**
261261+- Uses **community handles** (not DIDs) for easier configuration - resolved at runtime
262262+- One aggregator can post to multiple communities
263263+- Feed mappings managed in aggregator config (not per-community config)
264264+- No complex filtering logic in MVP - one feed = one community
262265263266---
264267···269272```json
270273{
271274 "$type": "social.coves.post.record",
272272- "author": "did:web:kagi-news.coves.social",
273273- "community": "did:plc:worldnews123",
275275+ "author": "did:plc:[aggregator-did]",
276276+ "community": "world-news.coves.social",
274277 "title": "{Kagi story title}",
275275- "content": "{formatted content based on postFormat config}",
278278+ "content": "{formatted content - full format for MVP}",
276279 "embed": {
277277- "$type": "app.bsky.embed.external",
280280+ "$type": "social.coves.embed.external",
278281 "external": {
279279- "uri": "https://kite.kagi.com/{uuid}/{category}/{id}",
282282+ "uri": "{Kagi story URL}",
280283 "title": "{story title}",
281281- "description": "{summary excerpt}",
282282- "thumb": "{image blob if includeImages=true}"
284284+ "description": "{summary excerpt - first 200 chars}",
285285+ "thumb": "{Kagi proxy image URL from HTML}"
283286 }
284287 },
285288 "federatedFrom": {
···296299}
297300```
298301302302+**MVP Notes:**
303303+- Uses `social.coves.embed.external` for hot-linked images (no blob upload)
304304+- Community specified as handle (resolved to DID by post creation endpoint)
305305+- Images referenced via original Kagi proxy URLs
306306+- "Full" format only for MVP (no format variations)
307307+- Content uses Coves rich text with facets (not markdown)
308308+299309---
300310301301-### Content Formatting by `postFormat`
311311+### Content Formatting (MVP: "Full" Format Only)
302312303303-#### Format: `full` (Default)
313313+The MVP implements a single "full" format using Coves rich text with facets:
304314305305-```markdown
315315+**Plain Text Structure:**
316316+```
306317{Main summary paragraph with source citations}
307318308308-**Highlights:**
319319+Highlights:
309320• {Bullet point 1}
310321• {Bullet point 2}
311322• ...
312323313313-**Perspectives:**
314314-• **{Actor}**: {Their perspective} ([Source]({url}))
324324+Perspectives:
325325+• {Actor}: {Their perspective} (Source)
315326• ...
316327317317-> {Notable quote} — {Attribution}
328328+"{Notable quote}" — {Attribution}
318329319319-**Sources:**
320320-• [{Title}]({url}) - {domain}
330330+Sources:
331331+• {Title} - {domain}
321332• ...
322333323334---
324324-📰 Story aggregated by [Kagi News]({kagi_story_url})
335335+📰 Story aggregated by Kagi News
325336```
326337327327-**Rationale:** Preserves Kagi's rich multi-source analysis, provides maximum value.
338338+**Rich Text Facets Applied:**
339339+- **Bold** (`social.coves.richtext.facet#bold`) on section headers: "Highlights:", "Perspectives:", "Sources:"
340340+- **Bold** on perspective actors
341341+- **Italic** (`social.coves.richtext.facet#italic`) on quotes
342342+- **Link** (`social.coves.richtext.facet#link`) on all URLs (source links, Kagi story link, perspective sources)
343343+- Byte ranges calculated using UTF-8 byte positions
328344329329----
345345+**Example with Facets:**
346346+```json
347347+{
348348+ "content": "Main summary [source.com#1]\n\nHighlights:\n• Key point 1...",
349349+ "facets": [
350350+ {
351351+ "index": {"byteStart": 35, "byteEnd": 46},
352352+ "features": [{"$type": "social.coves.richtext.facet#bold"}]
353353+ },
354354+ {
355355+ "index": {"byteStart": 15, "byteEnd": 26},
356356+ "features": [{"$type": "social.coves.richtext.facet#link", "uri": "https://source.com"}]
357357+ }
358358+ ]
359359+}
360360+```
330361331331-#### Format: `summary`
332332-333333-```markdown
334334-{Main summary paragraph with source citations}
335335-336336-**Sources:**
337337-• [{Title}]({url}) - {domain}
338338-• ...
362362+**Rationale:**
363363+- Uses native Coves rich text format (not markdown)
364364+- Preserves Kagi's rich multi-source analysis
365365+- Provides maximum value to communities
366366+- Meets CC BY-NC attribution requirements
367367+- Additional formats ("summary", "minimal") can be added post-MVP
339368340369---
341341-📰 Story aggregated by [Kagi News]({kagi_story_url})
342342-```
343370344344-**Rationale:** Clean summary with source links, less overwhelming.
371371+## Implementation Details (Python MVP)
345372346346----
373373+### Technology Stack
347374348348-#### Format: `minimal`
375375+**Language:** Python 3.11+
349376350350-```markdown
351351-{Story title}
352352-353353-Read more: {kagi_story_url}
377377+**Key Libraries:**
378378+- `feedparser` - RSS/Atom parsing
379379+- `beautifulsoup4` - HTML parsing for RSS item descriptions
380380+- `requests` - HTTP client for fetching feeds
381381+- `atproto` - Official ATProto Python SDK for authentication
382382+- `pyyaml` - Configuration file parsing
383383+- `pytest` - Testing framework
354384355355-**Sources:** {domain1}, {domain2}, {domain3}...
385385+### Project Structure
356386357357----
358358-📰 Via [Kagi News]({kagi_story_url})
359387```
360360-361361-**Rationale:** Just headlines with link, for high-volume communities or breaking news alerts.
388388+aggregators/kagi-news/
389389+├── Dockerfile
390390+├── docker-compose.yml
391391+├── requirements.txt
392392+├── config.example.yaml
393393+├── crontab # CRON schedule configuration
394394+├── .env.example # Environment variables template
395395+├── scripts/
396396+│ └── generate_did.py # Helper to generate aggregator DID
397397+├── src/
398398+│ ├── main.py # Entry point (single run, called by CRON)
399399+│ ├── config.py # Configuration loading and validation
400400+│ ├── rss_fetcher.py # RSS feed fetching with retry logic
401401+│ ├── html_parser.py # Parse Kagi HTML to structured data
402402+│ ├── richtext_formatter.py # Format content with rich text facets
403403+│ ├── atproto_client.py # ATProto authentication and operations
404404+│ ├── state_manager.py # Deduplication state tracking (JSON)
405405+│ └── models.py # Data models (KagiStory, etc.)
406406+├── tests/
407407+│ ├── test_parser.py
408408+│ ├── test_richtext_formatter.py
409409+│ ├── test_state_manager.py
410410+│ └── fixtures/ # Sample RSS feeds for testing
411411+└── README.md
412412+```
362413363414---
364415365365-## Implementation Details
416416+### Component 1: RSS Fetcher (`rss_fetcher.py`) ✅ COMPLETE
366417367367-### Component 1: Feed Poller
418418+**Responsibility:** Fetch RSS feeds with retry logic and error handling
368419369369-**Responsibility:** Fetch RSS feeds on schedule
420420+**Key Functions:**
421421+- `fetch_feed(url: str) -> feedparser.FeedParserDict`
422422+ - Uses `requests` with timeout (30s)
423423+ - Retry logic: 3 attempts with exponential backoff
424424+ - Returns parsed RSS feed or raises exception
370425371371-```go
372372-type FeedPoller struct {
373373- categories []string
374374- pollInterval time.Duration
375375- httpClient *http.Client
376376-}
426426+**Error Handling:**
427427+- Network timeouts
428428+- Invalid XML
429429+- HTTP errors (404, 500, etc.)
377430378378-func (p *FeedPoller) Start(ctx context.Context) error {
379379- ticker := time.NewTicker(p.pollInterval) // 15 minutes
380380- defer ticker.Stop()
381381-382382- for {
383383- select {
384384- case <-ticker.C:
385385- for _, category := range p.categories {
386386- feedURL := fmt.Sprintf("https://news.kagi.com/%s.xml", category)
387387- feed, err := p.fetchFeed(feedURL)
388388- if err != nil {
389389- log.Printf("Failed to fetch %s: %v", feedURL, err)
390390- continue
391391- }
392392- p.handleFeed(ctx, category, feed)
393393- }
394394- case <-ctx.Done():
395395- return nil
396396- }
397397- }
398398-}
399399-400400-func (p *FeedPoller) fetchFeed(url string) (*gofeed.Feed, error) {
401401- parser := gofeed.NewParser()
402402- feed, err := parser.ParseURL(url)
403403- return feed, err
404404-}
405405-```
406406-407407-**Libraries:**
408408-- `github.com/mmcdole/gofeed` - RSS/Atom parser
431431+**Implementation Status:**
432432+- ✅ Implemented with comprehensive error handling
433433+- ✅ Tests passing (5 tests)
434434+- ✅ Handles retries with exponential backoff
409435410436---
411437412412-### Component 2: Item Parser
438438+### Component 2: HTML Parser (`html_parser.py`) ✅ COMPLETE
413439414414-**Responsibility:** Extract structured data from RSS item HTML
415415-416416-```go
417417-type KagiStory struct {
418418- Title string
419419- Link string
420420- GUID string
421421- PubDate time.Time
422422- Categories []string
423423-424424- // Parsed from HTML description
425425- Summary string
426426- Highlights []string
427427- Perspectives []Perspective
428428- Quote *Quote
429429- Sources []Source
430430- ImageURL string
431431- ImageAlt string
432432-}
440440+**Responsibility:** Extract structured data from Kagi's HTML description field
433441434434-type Perspective struct {
435435- Actor string
436436- Description string
437437- SourceURL string
438438-}
442442+**Key Class:** `KagiHTMLParser`
439443440440-type Quote struct {
441441- Text string
442442- Attribution string
443443-}
444444+**Data Model (`models.py`):**
445445+```python
446446+@dataclass
447447+class KagiStory:
448448+ title: str
449449+ link: str
450450+ guid: str
451451+ pub_date: datetime
452452+ categories: List[str]
444453445445-type Source struct {
446446- Title string
447447- URL string
448448- Domain string
449449-}
454454+ # Parsed from HTML
455455+ summary: str
456456+ highlights: List[str]
457457+ perspectives: List[Perspective]
458458+ quote: Optional[Quote]
459459+ sources: List[Source]
460460+ image_url: Optional[str]
461461+ image_alt: Optional[str]
450462451451-func (p *ItemParser) Parse(item *gofeed.Item) (*KagiStory, error) {
452452- doc, err := goquery.NewDocumentFromReader(strings.NewReader(item.Description))
453453- if err != nil {
454454- return nil, err
455455- }
463463+@dataclass
464464+class Perspective:
465465+ actor: str
466466+ description: str
467467+ source_url: str
456468457457- story := &KagiStory{
458458- Title: item.Title,
459459- Link: item.Link,
460460- GUID: item.GUID,
461461- PubDate: *item.PublishedParsed,
462462- Categories: item.Categories,
463463- }
469469+@dataclass
470470+class Quote:
471471+ text: str
472472+ attribution: str
464473465465- // Extract summary (first <p> tag)
466466- story.Summary = doc.Find("p").First().Text()
474474+@dataclass
475475+class Source:
476476+ title: str
477477+ url: str
478478+ domain: str
479479+```
467480468468- // Extract highlights
469469- doc.Find("h3:contains('Highlights')").Next("ul").Find("li").Each(func(i int, s *goquery.Selection) {
470470- story.Highlights = append(story.Highlights, s.Text())
471471- })
481481+**Parsing Strategy:**
482482+- Use BeautifulSoup to parse HTML description
483483+- Extract sections by finding `<h3>` tags (Highlights, Perspectives, Sources)
484484+- Handle missing sections gracefully (not all stories have all sections)
485485+- Clean and normalize text
472486473473- // Extract perspectives
474474- doc.Find("h3:contains('Perspectives')").Next("ul").Find("li").Each(func(i int, s *goquery.Selection) {
475475- text := s.Text()
476476- link := s.Find("a").First()
477477- sourceURL, _ := link.Attr("href")
487487+**Implementation Status:**
488488+- ✅ Extracts all 3 H3 sections (Highlights, Perspectives, Sources)
489489+- ✅ Handles optional elements (quote, image)
490490+- ✅ Tests passing (8 tests)
491491+- ✅ Validates against real feed data
478492479479- // Parse format: "Actor: Description (Source)"
480480- parts := strings.SplitN(text, ":", 2)
481481- if len(parts) == 2 {
482482- story.Perspectives = append(story.Perspectives, Perspective{
483483- Actor: strings.TrimSpace(parts[0]),
484484- Description: strings.TrimSpace(parts[1]),
485485- SourceURL: sourceURL,
486486- })
487487- }
488488- })
493493+---
489494490490- // Extract quote
491491- doc.Find("blockquote").Each(func(i int, s *goquery.Selection) {
492492- text := s.Text()
493493- parts := strings.Split(text, " - ")
494494- if len(parts) == 2 {
495495- story.Quote = &Quote{
496496- Text: strings.TrimSpace(parts[0]),
497497- Attribution: strings.TrimSpace(parts[1]),
498498- }
499499- }
500500- })
495495+### Component 3: State Manager (`state_manager.py`) ✅ COMPLETE
501496502502- // Extract sources
503503- doc.Find("h3:contains('Sources')").Next("ul").Find("li").Each(func(i int, s *goquery.Selection) {
504504- link := s.Find("a").First()
505505- url, _ := link.Attr("href")
506506- title := link.Text()
507507- domain := extractDomain(s.Text())
497497+**Responsibility:** Track processed stories to prevent duplicates
508498509509- story.Sources = append(story.Sources, Source{
510510- Title: title,
511511- URL: url,
512512- Domain: domain,
513513- })
514514- })
499499+**Implementation:** Simple JSON file persistence
515500516516- // Extract image
517517- img := doc.Find("img").First()
518518- if img.Length() > 0 {
519519- story.ImageURL, _ = img.Attr("src")
520520- story.ImageAlt, _ = img.Attr("alt")
501501+**State File Format:**
502502+```json
503503+{
504504+ "feeds": {
505505+ "https://news.kagi.com/world.xml": {
506506+ "last_successful_run": "2025-10-23T12:00:00Z",
507507+ "posted_guids": [
508508+ "https://kite.kagi.com/uuid1/world/123",
509509+ "https://kite.kagi.com/uuid2/world/124"
510510+ ]
521511 }
522522-523523- return story, nil
512512+ }
524513}
525514```
526515527527-**Libraries:**
528528-- `github.com/PuerkitoBio/goquery` - HTML parsing
516516+**Key Functions:**
517517+- `is_posted(feed_url: str, guid: str) -> bool`
518518+- `mark_posted(feed_url: str, guid: str, post_uri: str)`
519519+- `get_last_run(feed_url: str) -> Optional[datetime]`
520520+- `update_last_run(feed_url: str, timestamp: datetime)`
529521530530----
522522+**Deduplication Strategy:**
523523+- Keep last 100 GUIDs per feed (rolling window)
524524+- Stories older than 30 days are automatically removed
525525+- Simple, no database needed
531526532532-### Component 3: Deduplication
527527+**Implementation Status:**
528528+- ✅ JSON-based persistence with atomic writes
529529+- ✅ GUID tracking with rolling window
530530+- ✅ Tests passing (12 tests)
531531+- ✅ Thread-safe operations
533532534534-**Responsibility:** Track posted stories to prevent duplicates
533533+---
535534536536-```go
537537-type Deduplicator struct {
538538- db *sql.DB
539539-}
535535+### Component 4: Rich Text Formatter (`richtext_formatter.py`) ✅ COMPLETE
540536541541-func (d *Deduplicator) AlreadyPosted(guid string) (bool, error) {
542542- var exists bool
543543- err := d.db.QueryRow(`
544544- SELECT EXISTS(
545545- SELECT 1 FROM kagi_news_posted_stories
546546- WHERE guid = $1
547547- )
548548- `, guid).Scan(&exists)
549549- return exists, err
550550-}
537537+**Responsibility:** Format parsed Kagi stories into Coves rich text with facets
551538552552-func (d *Deduplicator) MarkPosted(guid, postURI string) error {
553553- _, err := d.db.Exec(`
554554- INSERT INTO kagi_news_posted_stories (guid, post_uri, posted_at)
555555- VALUES ($1, $2, NOW())
556556- ON CONFLICT (guid) DO NOTHING
557557- `, guid, postURI)
558558- return err
559559-}
560560-```
539539+**Key Function:**
540540+- `format_full(story: KagiStory) -> dict`
541541+ - Returns: `{"content": str, "facets": List[dict]}`
542542+ - Builds plain text content with all sections
543543+ - Calculates UTF-8 byte positions for facets
544544+ - Applies bold, italic, and link facets
545545+ - Includes all sections: summary, highlights, perspectives, quote, sources
546546+ - Adds Kagi News attribution footer with link
561547562562-**Database Table:**
563563-```sql
564564-CREATE TABLE kagi_news_posted_stories (
565565- guid TEXT PRIMARY KEY,
566566- post_uri TEXT NOT NULL,
567567- posted_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
568568-);
548548+**Facet Types Applied:**
549549+- `social.coves.richtext.facet#bold` - Section headers, perspective actors
550550+- `social.coves.richtext.facet#italic` - Quotes
551551+- `social.coves.richtext.facet#link` - All URLs (sources, Kagi story link)
569552570570-CREATE INDEX idx_kagi_posted_at ON kagi_news_posted_stories(posted_at DESC);
571571-```
553553+**Key Challenge:** UTF-8 byte position calculation
554554+- Must handle multi-byte characters correctly (emoji, non-ASCII)
555555+- Use `str.encode('utf-8')` to get byte positions
556556+- Test with complex characters
572557573573-**Cleanup:** Periodic job deletes rows older than 30 days (Kagi unlikely to re-post old stories).
558558+**Implementation Status:**
559559+- ✅ Full rich text formatting with facets
560560+- ✅ UTF-8 byte position calculation working correctly
561561+- ✅ Tests passing (10 tests)
562562+- ✅ Handles all sections: summary, highlights, perspectives, quote, sources
574563575564---
576565577577-### Component 4: Category Mapper
566566+### Component 5: Coves Client (`coves_client.py`) ✅ COMPLETE
578567579579-**Responsibility:** Map Kagi categories to authorized communities
568568+**Responsibility:** Handle authentication and post creation via Coves API
580569581581-```go
582582-func (m *CategoryMapper) GetTargetCommunities(story *KagiStory) ([]*CommunityAuth, error) {
583583- // Get all communities that have authorized this aggregator
584584- allAuths, err := m.aggregator.GetAuthorizedCommunities(context.Background())
585585- if err != nil {
586586- return nil, err
587587- }
570570+**Implementation Note:** Uses direct HTTP client instead of ATProto SDK for simplicity in MVP.
588571589589- var targets []*CommunityAuth
590590- for _, auth := range allAuths {
591591- if !auth.Enabled {
592592- continue
593593- }
572572+**Key Functions:**
573573+- `authenticate() -> dict`
574574+ - Authenticates aggregator using credentials
575575+ - Returns auth token for subsequent API calls
594576595595- config := auth.Config
577577+- `create_post(community_handle: str, title: str, content: str, facets: List[dict], ...) -> dict`
578578+ - Calls Coves post creation endpoint
579579+ - Includes aggregator authentication
580580+ - Returns post URI and metadata
596581597597- // Check if story's primary category is in config.categories
598598- primaryCategory := story.Categories[0]
599599- if !contains(config["categories"], primaryCategory) {
600600- continue
601601- }
582582+**Authentication Flow:**
583583+- Load aggregator credentials from environment
584584+- Authenticate with Coves API
585585+- Store and use auth token for requests
586586+- Handle token refresh if needed
602587603603- // Check subcategory filter (if specified)
604604- if subcatFilter, ok := config["subcategoryFilter"].([]string); ok && len(subcatFilter) > 0 {
605605- if !hasAnySubcategory(story.Categories, subcatFilter) {
606606- continue
607607- }
608608- }
609609-610610- // Check minimum sources requirement
611611- minSources := config["minSources"].(int)
612612- if len(story.Sources) < minSources {
613613- continue
614614- }
615615-616616- targets = append(targets, auth)
617617- }
618618-619619- return targets, nil
620620-}
621621-```
588588+**Implementation Status:**
589589+- ✅ HTTP-based client implementation
590590+- ✅ Authentication and token management
591591+- ✅ Post creation with all required fields
592592+- ✅ Error handling and retries
622593623594---
624595625625-### Component 5: Post Formatter
626626-627627-**Responsibility:** Convert Kagi story to Coves post format
628628-629629-```go
630630-func (f *PostFormatter) Format(story *KagiStory, format string) string {
631631- switch format {
632632- case "full":
633633- return f.formatFull(story)
634634- case "summary":
635635- return f.formatSummary(story)
636636- case "minimal":
637637- return f.formatMinimal(story)
638638- default:
639639- return f.formatFull(story)
640640- }
641641-}
642642-643643-func (f *PostFormatter) formatFull(story *KagiStory) string {
644644- var buf strings.Builder
645645-646646- // Summary
647647- buf.WriteString(story.Summary)
648648- buf.WriteString("\n\n")
649649-650650- // Highlights
651651- if len(story.Highlights) > 0 {
652652- buf.WriteString("**Highlights:**\n")
653653- for _, h := range story.Highlights {
654654- buf.WriteString(fmt.Sprintf("• %s\n", h))
655655- }
656656- buf.WriteString("\n")
657657- }
596596+### Component 6: Config Manager (`config.py`) ✅ COMPLETE
658597659659- // Perspectives
660660- if len(story.Perspectives) > 0 {
661661- buf.WriteString("**Perspectives:**\n")
662662- for _, p := range story.Perspectives {
663663- buf.WriteString(fmt.Sprintf("• **%s**: %s ([Source](%s))\n", p.Actor, p.Description, p.SourceURL))
664664- }
665665- buf.WriteString("\n")
666666- }
598598+**Responsibility:** Load and validate configuration from YAML and environment
667599668668- // Quote
669669- if story.Quote != nil {
670670- buf.WriteString(fmt.Sprintf("> %s — %s\n\n", story.Quote.Text, story.Quote.Attribution))
671671- }
600600+**Key Functions:**
601601+- `load_config(config_path: str) -> AggregatorConfig`
602602+ - Loads YAML configuration
603603+ - Validates structure and required fields
604604+ - Merges with environment variables
605605+ - Returns validated config object
672606673673- // Sources
674674- buf.WriteString("**Sources:**\n")
675675- for _, s := range story.Sources {
676676- buf.WriteString(fmt.Sprintf("• [%s](%s) - %s\n", s.Title, s.URL, s.Domain))
677677- }
678678- buf.WriteString("\n")
607607+**Implementation Status:**
608608+- ✅ YAML parsing with validation
609609+- ✅ Environment variable support
610610+- ✅ Tests passing (3 tests)
611611+- ✅ Clear error messages for config issues
679612680680- // Attribution
681681- buf.WriteString(fmt.Sprintf("---\n📰 Story aggregated by [Kagi News](%s)", story.Link))
613613+---
682614683683- return buf.String()
684684-}
615615+### Main Orchestration (`main.py`) ✅ COMPLETE
685616686686-func (f *PostFormatter) formatSummary(story *KagiStory) string {
687687- var buf strings.Builder
617617+**Responsibility:** Coordinate all components in a single execution (called by CRON)
688618689689- buf.WriteString(story.Summary)
690690- buf.WriteString("\n\n**Sources:**\n")
691691- for _, s := range story.Sources {
692692- buf.WriteString(fmt.Sprintf("• [%s](%s) - %s\n", s.Title, s.URL, s.Domain))
693693- }
694694- buf.WriteString("\n")
695695- buf.WriteString(fmt.Sprintf("---\n📰 Story aggregated by [Kagi News](%s)", story.Link))
619619+**Flow (Single Run):**
620620+1. Load configuration from `config.yaml`
621621+2. Load environment variables (AGGREGATOR_DID, AGGREGATOR_PRIVATE_KEY)
622622+3. Initialize all components (fetcher, parser, formatter, client, state)
623623+4. For each enabled feed in config:
624624+ a. Fetch RSS feed
625625+ b. Parse all items
626626+ c. Filter out already-posted items (check state)
627627+ d. For each new item:
628628+ - Parse HTML to structured KagiStory
629629+ - Format post content with rich text facets
630630+ - Build post record (with hot-linked image if present)
631631+ - Create post via XRPC
632632+ - Mark as posted in state
633633+ e. Update last run timestamp
634634+5. Save state to disk
635635+6. Log summary (posts created, errors encountered)
636636+7. Exit (CRON will call again on schedule)
696637697697- return buf.String()
698698-}
638638+**Error Isolation:**
639639+- Feed-level: One feed failing doesn't stop others
640640+- Item-level: One item failing doesn't stop feed processing
641641+- Continue on non-fatal errors, log all failures
642642+- Exit code 0 even with partial failures (CRON won't alert)
643643+- Exit code 1 only on catastrophic failure (config missing, auth failure)
699644700700-func (f *PostFormatter) formatMinimal(story *KagiStory) string {
701701- sourceDomains := make([]string, len(story.Sources))
702702- for i, s := range story.Sources {
703703- sourceDomains[i] = s.Domain
704704- }
705705-706706- return fmt.Sprintf(
707707- "%s\n\nRead more: %s\n\n**Sources:** %s\n\n---\n📰 Via [Kagi News](%s)",
708708- story.Title,
709709- story.Link,
710710- strings.Join(sourceDomains, ", "),
711711- story.Link,
712712- )
713713-}
714714-```
645645+**Implementation Status:**
646646+- ✅ Complete orchestration logic implemented
647647+- ✅ Feed-level and item-level error isolation
648648+- ✅ Structured logging throughout
649649+- ✅ Tests passing (9 tests covering various scenarios)
650650+- ✅ Dry-run mode for testing
715651716652---
717653718718-### Component 6: Post Publisher
719719-720720-**Responsibility:** Create posts via Coves API
654654+## Deployment (Docker Compose with CRON)
721655722722-```go
723723-func (p *PostPublisher) PublishStory(ctx context.Context, story *KagiStory, communities []*CommunityAuth) error {
724724- for _, comm := range communities {
725725- config := comm.Config
656656+### Dockerfile
726657727727- // Format content based on config
728728- postFormat := config["postFormat"].(string)
729729- content := p.formatter.Format(story, postFormat)
730730-731731- // Build embed
732732- var embed *aggregator.Embed
733733- if config["includeImages"].(bool) && story.ImageURL != "" {
734734- // TODO: Handle image upload/blob creation
735735- embed = &aggregator.Embed{
736736- Type: "app.bsky.embed.external",
737737- External: &aggregator.External{
738738- URI: story.Link,
739739- Title: story.Title,
740740- Description: truncate(story.Summary, 300),
741741- Thumb: story.ImageURL, // or blob reference
742742- },
743743- }
744744- }
658658+```dockerfile
659659+FROM python:3.11-slim
745660746746- // Create post
747747- post := aggregator.Post{
748748- Title: story.Title,
749749- Content: content,
750750- Embed: embed,
751751- FederatedFrom: &aggregator.FederatedSource{
752752- Platform: "kagi-news-rss",
753753- URI: story.Link,
754754- ID: story.GUID,
755755- OriginalCreatedAt: story.PubDate,
756756- },
757757- ContentLabels: story.Categories,
758758- }
661661+WORKDIR /app
759662760760- err := p.aggregator.CreatePost(ctx, comm.CommunityDID, post)
761761- if err != nil {
762762- log.Printf("Failed to create post in %s: %v", comm.CommunityDID, err)
763763- continue
764764- }
663663+# Install cron
664664+RUN apt-get update && apt-get install -y cron && rm -rf /var/lib/apt/lists/*
765665766766- // Mark as posted
767767- _ = p.deduplicator.MarkPosted(story.GUID, "post-uri-from-response")
768768- }
666666+# Install dependencies
667667+COPY requirements.txt .
668668+RUN pip install --no-cache-dir -r requirements.txt
769669770770- return nil
771771-}
772772-```
670670+# Copy source code and scripts
671671+COPY src/ ./src/
672672+COPY scripts/ ./scripts/
673673+COPY crontab /etc/cron.d/kagi-news-cron
773674774774----
675675+# Set up cron
676676+RUN chmod 0644 /etc/cron.d/kagi-news-cron && \
677677+ crontab /etc/cron.d/kagi-news-cron && \
678678+ touch /var/log/cron.log
775679776776-## Image Handling Strategy
680680+# Create non-root user for security
681681+RUN useradd --create-home appuser && \
682682+ chown -R appuser:appuser /app && \
683683+ chown appuser:appuser /var/log/cron.log
777684778778-### Initial Implementation (MVP)
685685+USER appuser
779686780780-**Approach:** Use Kagi proxy URLs directly in embeds
687687+# Run cron in foreground
688688+CMD ["cron", "-f"]
689689+```
781690782782-**Rationale:**
783783-- Simplest implementation
784784-- Kagi proxy likely allows hotlinking for non-commercial use
785785-- No storage costs
786786-- Images are already optimized by Kagi
691691+### Crontab Configuration (`crontab`)
787692788788-**Risk Mitigation:**
789789-- Monitor for broken images
790790-- Add fallback: if image fails to load, skip embed
791791-- Prepare migration plan to self-hosting if needed
693693+```bash
694694+# Run Kagi News aggregator daily at 1 PM UTC (after Kagi updates around noon)
695695+0 13 * * * cd /app && /usr/local/bin/python -m src.main >> /var/log/cron.log 2>&1
792696793793-**Code:**
794794-```go
795795-if config["includeImages"].(bool) && story.ImageURL != "" {
796796- // Use Kagi proxy URL directly
797797- embed = &aggregator.Embed{
798798- External: &aggregator.External{
799799- Thumb: story.ImageURL, // https://kagiproxy.com/img/...
800800- },
801801- }
802802-}
697697+# Blank line required at end of crontab
803698```
804699805700---
806701807807-### Future Enhancement (If Issues Arise)
702702+### docker-compose.yml
808703809809-**Approach:** Download and re-host images
704704+```yaml
705705+version: '3.8'
810706811811-**Implementation:**
812812-1. Download image from Kagi proxy
813813-2. Upload to Coves blob storage (or S3/CDN)
814814-3. Use blob reference in embed
707707+services:
708708+ kagi-news-aggregator:
709709+ build: .
710710+ container_name: kagi-news-aggregator
711711+ restart: unless-stopped
815712816816-**Code:**
817817-```go
818818-func (p *PostPublisher) uploadImage(imageURL string) (string, error) {
819819- // Download from Kagi proxy
820820- resp, err := http.Get(imageURL)
821821- if err != nil {
822822- return "", err
823823- }
824824- defer resp.Body.Close()
713713+ environment:
714714+ # Aggregator identity (from aggregator creation)
715715+ - AGGREGATOR_DID=${AGGREGATOR_DID}
716716+ - AGGREGATOR_PRIVATE_KEY=${AGGREGATOR_PRIVATE_KEY}
825717826826- // Upload to blob storage
827827- blob, err := p.blobStore.Upload(resp.Body, resp.Header.Get("Content-Type"))
828828- if err != nil {
829829- return "", err
830830- }
718718+ volumes:
719719+ # Config file (read-only)
720720+ - ./config.yaml:/app/config.yaml:ro
721721+ # State file (read-write for deduplication)
722722+ - ./data/state.json:/app/data/state.json
831723832832- return blob.Ref, nil
833833-}
724724+ logging:
725725+ driver: "json-file"
726726+ options:
727727+ max-size: "10m"
728728+ max-file: "3"
834729```
835730836836-**Decision Point:** Only implement if:
837837-- Kagi blocks hotlinking
838838-- Kagi proxy becomes unreliable
839839-- Legal clarification needed
731731+**Environment Variables:**
732732+- `AGGREGATOR_DID`: PLC DID created for this aggregator instance
733733+- `AGGREGATOR_PRIVATE_KEY`: Base64-encoded private key for signing
840734841841----
735735+**Volumes:**
736736+- `config.yaml`: Feed-to-community mappings (user-editable)
737737+- `data/state.json`: Deduplication state (managed by aggregator)
842738843843-## Rate Limiting & Performance
739739+**Deployment:**
740740+```bash
741741+# On same host as Coves
742742+cd aggregators/kagi-news
743743+cp config.example.yaml config.yaml
744744+# Edit config.yaml with your feed mappings
844745845845-### Rate Limits
746746+# Set environment variables
747747+export AGGREGATOR_DID="did:plc:xyz..."
748748+export AGGREGATOR_PRIVATE_KEY="base64-key..."
846749847847-**RSS Fetching:**
848848-- Poll each category feed every 15 minutes
849849-- Max 4 categories = 4 requests per 15 min = 16 req/hour
850850-- Well within any reasonable limit
750750+# Start aggregator
751751+docker-compose up -d
851752852852-**Post Creation:**
853853-- Aggregator rate limit: 10 posts/hour per community
854854-- Global limit: 100 posts/hour across all communities
855855-- Kagi News publishes ~5-10 stories per category per day
856856-- = ~20-40 posts/day total across all categories
857857-- = ~2-4 posts/hour average
858858-- Well within limits
859859-860860-**Performance Targets:**
861861-- Story posted within 15 minutes of appearing in RSS feed
862862-- < 1 second to parse and format a story
863863-- < 500ms to publish a post via API
753753+# View logs
754754+docker-compose logs -f
755755+```
864756865757---
866758867867-## Monitoring & Observability
759759+## Image Handling Strategy (MVP)
868760869869-### Metrics to Track
761761+### Approach: Hot-Linked Images via External Embed
870762871871-**Feed Polling:**
872872-- `kagi_feed_poll_total` (counter) - Total feed polls by category
873873-- `kagi_feed_poll_errors` (counter) - Failed polls by category/error
874874-- `kagi_feed_items_fetched` (gauge) - Items per poll by category
875875-- `kagi_feed_poll_duration_seconds` (histogram) - Poll latency
763763+The MVP uses hot-linked images from Kagi's proxy:
876764877877-**Story Processing:**
878878-- `kagi_stories_parsed_total` (counter) - Successfully parsed stories
879879-- `kagi_stories_parse_errors` (counter) - Parse failures by error type
880880-- `kagi_stories_filtered` (counter) - Stories filtered out by reason (duplicate, min sources, category)
881881-- `kagi_stories_posted` (counter) - Stories successfully posted by community
765765+**Flow:**
766766+1. Extract image URL from HTML description (`https://kagiproxy.com/img/...`)
767767+2. Include in post using `social.coves.embed.external`:
768768+ ```json
769769+ {
770770+ "$type": "social.coves.embed.external",
771771+ "external": {
772772+ "uri": "{Kagi story URL}",
773773+ "title": "{Story title}",
774774+ "description": "{Summary excerpt}",
775775+ "thumb": "{Kagi proxy image URL}"
776776+ }
777777+ }
778778+ ```
779779+3. Frontend renders image from Kagi proxy URL
882780883883-**Post Publishing:**
884884-- `kagi_posts_created_total` (counter) - Total posts created
885885-- `kagi_posts_failed` (counter) - Failed posts by error type
886886-- `kagi_post_publish_duration_seconds` (histogram) - Post creation latency
781781+**Rationale:**
782782+- Simpler MVP implementation (no blob upload complexity)
783783+- No storage requirements on our end
784784+- Kagi proxy is reliable and CDN-backed
785785+- Faster posting (no download/upload step)
786786+- Images already properly sized and optimized
887787888888-**Health:**
889889-- `kagi_aggregator_up` (gauge) - Service health (1 = healthy, 0 = down)
890890-- `kagi_last_successful_poll_timestamp` (gauge) - Last successful poll time by category
788788+**Future Consideration:** If Kagi proxy becomes unreliable, migrate to blob storage in Phase 2.
891789892790---
893791894894-### Logging
792792+## Rate Limiting & Performance (MVP)
895793896896-**Structured Logging:**
897897-```go
898898-log.Info("Story posted",
899899- "guid", story.GUID,
900900- "title", story.Title,
901901- "community", comm.CommunityDID,
902902- "post_uri", postURI,
903903- "sources", len(story.Sources),
904904- "format", postFormat,
905905-)
794794+### Simplified Rate Strategy
906795907907-log.Error("Failed to parse story",
908908- "guid", item.GUID,
909909- "feed", feedURL,
910910- "error", err,
911911-)
912912-```
796796+**RSS Fetching:**
797797+- Poll each feed once per day (~noon UTC after Kagi updates)
798798+- No aggressive polling needed (Kagi updates daily)
799799+- ~3-5 feeds = minimal load
913800914914-**Log Levels:**
915915-- DEBUG: Feed items, parsing details
916916-- INFO: Stories posted, communities targeted
917917-- WARN: Parse errors, rate limit approaching
918918-- ERROR: Failed posts, feed fetch failures
801801+**Post Creation:**
802802+- One run per day = 5-15 posts per feed
803803+- Total: ~15-75 posts/day across all communities
804804+- Well within any reasonable rate limits
919805920920----
806806+**Performance:**
807807+- RSS fetch + parse: < 5 seconds per feed
808808+- Image download + upload: < 3 seconds per image
809809+- Post creation: < 1 second per post
810810+- Total runtime per day: < 5 minutes
921811922922-### Alerts
923923-924924-**Critical:**
925925-- Feed polling failing for > 1 hour
926926-- Post creation failing for > 10 consecutive attempts
927927-- Aggregator unauthorized (auth record disabled/deleted)
928928-929929-**Warning:**
930930-- Post creation rate < 50% of expected
931931-- Parse errors > 10% of items
932932-- Approaching rate limits (> 80% of quota)
812812+No complex rate limiting needed for MVP.
933813934814---
935815936936-## Deployment
937937-938938-### Infrastructure
939939-940940-**Service Type:** Long-running daemon
816816+## Logging & Observability (MVP)
941817942942-**Hosting:** Kubernetes (same cluster as Coves AppView)
818818+### Structured Logging
943819944944-**Resources:**
945945-- CPU: 0.5 cores (low CPU usage, mostly I/O)
946946-- Memory: 512 MB (small in-memory cache for recent GUIDs)
947947-- Storage: 1 GB (SQLite for deduplication tracking)
820820+**Python logging module** with JSON formatter:
948821949949----
822822+```python
823823+import logging
824824+import json
950825951951-### Configuration
826826+logging.basicConfig(
827827+ level=logging.INFO,
828828+ format='%(message)s'
829829+)
952830953953-**Environment Variables:**
954954-```bash
955955-# Aggregator identity
956956-AGGREGATOR_DID=did:web:kagi-news.coves.social
957957-AGGREGATOR_PRIVATE_KEY_PATH=/secrets/private-key.pem
831831+logger = logging.getLogger(__name__)
958832959959-# Coves API
960960-COVES_API_URL=https://api.coves.social
833833+# Example structured log
834834+logger.info(json.dumps({
835835+ "event": "post_created",
836836+ "feed": "world.xml",
837837+ "story_title": "Breaking News...",
838838+ "community": "world-news.coves.social",
839839+ "post_uri": "at://...",
840840+ "timestamp": "2025-10-23T12:00:00Z"
841841+}))
842842+```
961843962962-# Feed polling
963963-POLL_INTERVAL=15m
964964-CATEGORIES=world,tech,business,sports
844844+**Key Events to Log:**
845845+- `feed_fetched`: RSS feed successfully fetched
846846+- `story_parsed`: Story successfully parsed from HTML
847847+- `post_created`: Post successfully created
848848+- `error`: Any failures (with context)
849849+- `run_completed`: Summary of entire run
965850966966-# Database (for deduplication)
967967-DB_PATH=/data/kagi-news.db
851851+**Log Levels:**
852852+- INFO: Successful operations
853853+- WARNING: Retryable errors, skipped items
854854+- ERROR: Fatal errors, failed posts
968855969969-# Monitoring
970970-METRICS_PORT=9090
971971-LOG_LEVEL=info
972972-```
856856+### Simple Monitoring
973857974974----
858858+**Health Check:** Check last successful run timestamp
859859+- If > 48 hours: alert (should run daily)
860860+- If errors > 50% of items: investigate
975861976976-### Deployment Manifest
862862+**Metrics to Track (manually via logs):**
863863+- Posts created per run
864864+- Parse failures per run
865865+- Post creation failures per run
866866+- Total runtime
977867978978-```yaml
979979-apiVersion: apps/v1
980980-kind: Deployment
981981-metadata:
982982- name: kagi-news-aggregator
983983- namespace: coves
984984-spec:
985985- replicas: 1
986986- selector:
987987- matchLabels:
988988- app: kagi-news-aggregator
989989- template:
990990- metadata:
991991- labels:
992992- app: kagi-news-aggregator
993993- spec:
994994- containers:
995995- - name: aggregator
996996- image: coves/kagi-news-aggregator:latest
997997- env:
998998- - name: AGGREGATOR_DID
999999- value: did:web:kagi-news.coves.social
10001000- - name: COVES_API_URL
10011001- value: https://api.coves.social
10021002- - name: POLL_INTERVAL
10031003- value: 15m
10041004- - name: CATEGORIES
10051005- value: world,tech,business,sports
10061006- - name: DB_PATH
10071007- value: /data/kagi-news.db
10081008- - name: AGGREGATOR_PRIVATE_KEY_PATH
10091009- value: /secrets/private-key.pem
10101010- volumeMounts:
10111011- - name: data
10121012- mountPath: /data
10131013- - name: secrets
10141014- mountPath: /secrets
10151015- readOnly: true
10161016- ports:
10171017- - name: metrics
10181018- containerPort: 9090
10191019- resources:
10201020- requests:
10211021- cpu: 250m
10221022- memory: 256Mi
10231023- limits:
10241024- cpu: 500m
10251025- memory: 512Mi
10261026- volumes:
10271027- - name: data
10281028- persistentVolumeClaim:
10291029- claimName: kagi-news-data
10301030- - name: secrets
10311031- secret:
10321032- secretName: kagi-news-private-key
10331033-```
868868+No complex metrics infrastructure needed for MVP - Docker logs are sufficient.
10348691035870---
103687110371037-## Testing Strategy
872872+## Testing Strategy ✅ COMPLETE
103887310391039-### Unit Tests
874874+### Unit Tests - 57 Tests Passing (83% Coverage)
104087510411041-**Feed Parsing:**
10421042-```go
10431043-func TestParseFeed(t *testing.T) {
10441044- feed := loadTestFeed("testdata/world.xml")
10451045- stories, err := parser.Parse(feed)
10461046- assert.NoError(t, err)
10471047- assert.Len(t, stories, 10)
876876+**Test Coverage by Component:**
877877+- ✅ **RSS Fetcher** (5 tests)
878878+ - Successful feed fetch
879879+ - Timeout handling
880880+ - Retry logic with exponential backoff
881881+ - Invalid XML handling
882882+ - Empty URL validation
104888310491049- story := stories[0]
10501050- assert.NotEmpty(t, story.Title)
10511051- assert.NotEmpty(t, story.Summary)
10521052- assert.Greater(t, len(story.Sources), 1)
10531053-}
884884+- ✅ **HTML Parser** (8 tests)
885885+ - Summary extraction
886886+ - Image URL and alt text extraction
887887+ - Highlights list parsing
888888+ - Quote extraction with attribution
889889+ - Perspectives parsing with actors and sources
890890+ - Sources list extraction
891891+ - Missing sections handling
892892+ - Full story object creation
105489310551055-func TestParseStoryHTML(t *testing.T) {
10561056- html := `<p>Summary [source.com#1]</p>
10571057- <h3>Highlights:</h3>
10581058- <ul><li>Point 1</li></ul>
10591059- <h3>Sources:</h3>
10601060- <ul><li><a href="https://example.com">Title</a> - example.com</li></ul>`
894894+- ✅ **Rich Text Formatter** (10 tests)
895895+ - Full format generation
896896+ - Bold facets on headers and actors
897897+ - Italic facets on quotes
898898+ - Link facets on URLs
899899+ - UTF-8 byte position calculation
900900+ - Multi-byte character handling (emoji, special chars)
901901+ - All sections formatted correctly
106190210621062- story, err := parser.ParseHTML(html)
10631063- assert.NoError(t, err)
10641064- assert.Equal(t, "Summary [source.com#1]", story.Summary)
10651065- assert.Len(t, story.Highlights, 1)
10661066- assert.Len(t, story.Sources, 1)
10671067-}
10681068-```
903903+- ✅ **State Manager** (12 tests)
904904+ - GUID tracking
905905+ - Duplicate detection
906906+ - Rolling window (100 GUID limit)
907907+ - Age-based cleanup (30 days)
908908+ - Last run timestamp tracking
909909+ - JSON persistence
910910+ - Atomic file writes
911911+ - Concurrent access safety
106991210701070-**Formatting:**
10711071-```go
10721072-func TestFormatFull(t *testing.T) {
10731073- story := &KagiStory{
10741074- Summary: "Test summary",
10751075- Sources: []Source{
10761076- {Title: "Article", URL: "https://example.com", Domain: "example.com"},
10771077- },
10781078- }
913913+- ✅ **Config Manager** (3 tests)
914914+ - YAML loading and validation
915915+ - Environment variable merging
916916+ - Error handling for missing/invalid config
107991710801080- content := formatter.Format(story, "full")
10811081- assert.Contains(t, content, "Test summary")
10821082- assert.Contains(t, content, "**Sources:**")
10831083- assert.Contains(t, content, "📰 Story aggregated by")
10841084-}
10851085-```
918918+- ✅ **Main Orchestrator** (9 tests)
919919+ - End-to-end flow
920920+ - Feed-level error isolation
921921+ - Item-level error isolation
922922+ - Dry-run mode
923923+ - State persistence across runs
924924+ - Multiple feed handling
108692510871087-**Deduplication:**
10881088-```go
10891089-func TestDeduplication(t *testing.T) {
10901090- guid := "test-guid-123"
926926+- ✅ **E2E Tests** (6 skipped - require live API)
927927+ - Integration with Coves API (manual testing required)
928928+ - Authentication flow
929929+ - Post creation
109193010921092- posted, err := deduplicator.AlreadyPosted(guid)
10931093- assert.NoError(t, err)
10941094- assert.False(t, posted)
10951095-10961096- err = deduplicator.MarkPosted(guid, "at://...")
10971097- assert.NoError(t, err)
10981098-10991099- posted, err = deduplicator.AlreadyPosted(guid)
11001100- assert.NoError(t, err)
11011101- assert.True(t, posted)
11021102-}
931931+**Test Results:**
932932+```
933933+57 passed, 6 skipped, 1 warning in 8.76s
934934+Coverage: 83%
1103935```
110493611051105----
937937+**Test Fixtures:**
938938+- Real Kagi News RSS item with all sections
939939+- Sample HTML descriptions
940940+- Mock HTTP responses
11069411107942### Integration Tests
110894311091109-**With Mock Coves API:**
11101110-```go
11111111-func TestPublishStory(t *testing.T) {
11121112- // Setup mock Coves API
11131113- mockAPI := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
11141114- assert.Equal(t, "/xrpc/social.coves.post.create", r.URL.Path)
944944+**Manual Integration Testing Required:**
945945+- [ ] Can authenticate with live Coves API
946946+- [ ] Can create post via Coves API
947947+- [ ] Can fetch real Kagi RSS feed
948948+- [ ] Images display correctly from Kagi proxy
949949+- [ ] State persistence works in production
950950+- [ ] CRON scheduling works correctly
111595111161116- var input CreatePostInput
11171117- json.NewDecoder(r.Body).Decode(&input)
952952+**Pre-deployment Checklist:**
953953+- [x] All unit tests passing
954954+- [x] Can parse real Kagi HTML
955955+- [x] State persistence works
956956+- [x] Config validation works
957957+- [x] Error handling comprehensive
958958+- [ ] Aggregator DID created
959959+- [ ] Can authenticate with Coves API
960960+- [ ] Docker container builds and runs
111896111191119- assert.Equal(t, "did:plc:test-community", input.Community)
11201120- assert.NotEmpty(t, input.Title)
11211121- assert.Contains(t, input.Content, "📰 Story aggregated by")
962962+---
112296311231123- w.WriteHeader(200)
11241124- json.NewEncoder(w).Encode(CreatePostOutput{URI: "at://..."})
11251125- }))
11261126- defer mockAPI.Close()
964964+## Success Metrics
112796511281128- // Test story publishing
11291129- publisher := NewPostPublisher(mockAPI.URL)
11301130- err := publisher.PublishStory(ctx, testStory, []*CommunityAuth{testComm})
11311131- assert.NoError(t, err)
11321132-}
11331133-```
966966+### ✅ Phase 1: Implementation - COMPLETE
113496711351135----
968968+- [x] All core components implemented
969969+- [x] 57 tests passing with 83% coverage
970970+- [x] RSS fetching and parsing working
971971+- [x] Rich text formatting with facets
972972+- [x] State management and deduplication
973973+- [x] Configuration management
974974+- [x] Comprehensive error handling
975975+- [x] Documentation complete
113697611371137-### E2E Tests
977977+### 🔄 Phase 2: Integration Testing - IN PROGRESS
113897811391139-**With Real RSS Feed:**
11401140-```go
11411141-func TestE2E_FetchAndParse(t *testing.T) {
11421142- if testing.Short() {
11431143- t.Skip("Skipping E2E test")
11441144- }
979979+- [ ] Aggregator DID created (PLC)
980980+- [ ] Aggregator authorized in 1+ test communities
981981+- [ ] Can authenticate with Coves API
982982+- [ ] First post created end-to-end
983983+- [ ] Attribution visible ("Via Kagi News")
984984+- [ ] No duplicate posts on repeated runs
985985+- [ ] Images display correctly
114598611461146- // Fetch real Kagi News feed
11471147- feed, err := poller.fetchFeed("https://news.kagi.com/world.xml")
11481148- assert.NoError(t, err)
11491149- assert.NotEmpty(t, feed.Items)
987987+### 📋 Phase 3: Alpha Deployment (First Week)
115098811511151- // Parse first item
11521152- story, err := parser.Parse(feed.Items[0])
11531153- assert.NoError(t, err)
11541154- assert.NotEmpty(t, story.Title)
11551155- assert.NotEmpty(t, story.Summary)
11561156- assert.Greater(t, len(story.Sources), 0)
11571157-}
11581158-```
11591159-11601160-**With Test Coves Instance:**
11611161-```go
11621162-func TestE2E_CreatePost(t *testing.T) {
11631163- if testing.Short() {
11641164- t.Skip("Skipping E2E test")
11651165- }
11661166-11671167- // Create post in test community
11681168- post := aggregator.Post{
11691169- Title: "Test Kagi News Post",
11701170- Content: "Test content...",
11711171- }
989989+- [ ] Docker Compose runs successfully in production
990990+- [ ] 2-3 communities receiving posts
991991+- [ ] 20+ posts created successfully
992992+- [ ] Zero duplicates
993993+- [ ] < 10% errors (parse or post creation)
994994+- [ ] CRON scheduling reliable
117299511731173- err := aggregator.CreatePost(ctx, testCommunityDID, post)
11741174- assert.NoError(t, err)
996996+### 🎯 Phase 4: Beta (First Month)
117599711761176- // Verify post appears in feed
11771177- // (requires test community setup)
11781178-}
11791179-```
998998+- [ ] 5+ communities using aggregator
999999+- [ ] 200+ posts created
10001000+- [ ] Positive community feedback
10011001+- [ ] No rate limit issues
10021002+- [ ] < 5% error rate
10031003+- [ ] Performance metrics tracked
1180100411811005---
1182100611831183-## Success Metrics
10071007+## What's Next: Integration & Deployment
1184100811851185-### Pre-Launch Checklist
10091009+### Immediate Next Steps
1186101011871187-- [ ] Aggregator service declaration published
11881188-- [ ] DID created and configured (did:web:kagi-news.coves.social)
11891189-- [ ] RSS feed parser handles all Kagi HTML structures
11901190-- [ ] Deduplication prevents duplicate posts
11911191-- [ ] Category mapping works for all configs
11921192-- [ ] All 3 post formats render correctly
11931193-- [ ] Attribution to Kagi News visible on all posts
11941194-- [ ] Rate limiting prevents spam
11951195-- [ ] Monitoring/alerting configured
11961196-- [ ] E2E tests passing against test instance
10111011+1. **Create Aggregator Identity**
10121012+ - Generate DID for aggregator
10131013+ - Store credentials securely
10141014+ - Test authentication with Coves API
1197101511981198----
10161016+2. **Integration Testing**
10171017+ - Test with live Coves API
10181018+ - Verify post creation works
10191019+ - Validate rich text rendering
10201020+ - Check image display from Kagi proxy
1199102112001200-### Alpha Goals (First Week)
10221022+3. **Docker Deployment**
10231023+ - Build Docker image
10241024+ - Test docker-compose setup
10251025+ - Verify CRON scheduling
10261026+ - Set up monitoring/logging
1201102712021202-- [ ] 3+ communities using Kagi News aggregator
12031203-- [ ] 50+ posts created successfully
12041204-- [ ] Zero duplicate posts
12051205-- [ ] < 5% parse errors
12061206-- [ ] < 1% post creation failures
12071207-- [ ] Stories posted within 15 minutes of RSS publication
10281028+4. **Community Authorization**
10291029+ - Get aggregator authorized in test community
10301030+ - Verify authorization flow works
10311031+ - Test posting to real community
1208103212091209----
10331033+5. **Production Deployment**
10341034+ - Deploy to production server
10351035+ - Configure feeds for real communities
10361036+ - Monitor first batch of posts
10371037+ - Gather community feedback
1210103812111211-### Beta Goals (First Month)
10391039+### Open Questions to Resolve
1212104012131213-- [ ] 10+ communities using aggregator
12141214-- [ ] 500+ posts created
12151215-- [ ] Community feedback positive (surveys)
12161216-- [ ] Attribution compliance verified
12171217-- [ ] No rate limit violations
12181218-- [ ] < 1% error rate (parsing + posting)
10411041+1. **Aggregator DID Creation:**
10421042+ - Need helper script or manual process?
10431043+ - Where to store credentials securely?
1219104412201220----
10451045+2. **Authorization Flow:**
10461046+ - How does community admin authorize aggregator?
10471047+ - UI flow or XRPC endpoint?
1221104812221222-## Future Enhancements
10491049+3. **Image Strategy:**
10501050+ - Confirm Kagi proxy images work reliably
10511051+ - Fallback plan if proxy becomes unreliable?
1223105212241224-### Phase 2 Features
10531053+4. **Monitoring:**
10541054+ - What metrics to track initially?
10551055+ - Alerting strategy for failures?
1225105612261226-**Smart Category Detection:**
12271227-- Use LLM to suggest additional categories for stories
12281228-- Map Kagi categories to community tags automatically
10571057+---
1229105812301230-**Customizable Templates:**
12311231-- Allow communities to customize post format with templates
12321232-- Support Markdown/Handlebars templates in config
10591059+## Future Enhancements (Post-MVP)
1233106012341234-**Story Scoring:**
12351235-- Prioritize high-impact stories (many sources, breaking news)
12361236-- Delay low-priority stories to avoid flooding feed
10611061+### Phase 2
10621062+- Multiple post formats (summary, minimal)
10631063+- Per-community filtering (subcategories, min sources)
10641064+- More sophisticated deduplication
10651065+- Metrics dashboard
1237106612381238-**Cross-posting Prevention:**
12391239-- Detect when multiple communities authorize same category
12401240-- Intelligently cross-post vs. duplicate
10671067+### Phase 3
10681068+- Interactive features (bot responds to comments)
10691069+- Cross-posting prevention
10701070+- Federation support
1241107112421072---
1243107312441244-### Phase 3 Features
10741074+## References
1245107512461246-**Interactive Features:**
12471247-- Bot responds to comments with additional sources
12481248-- Updates megathread with new sources as story develops
12491249-12501250-**Analytics Dashboard:**
12511251-- Show communities which stories get most engagement
12521252-- Trending topics from Kagi News
12531253-- Source diversity metrics
12541254-12551255-**Federation:**
12561256-- Support other Coves instances using same aggregator
12571257-- Shared deduplication across instances
10761076+- Kagi News About: https://news.kagi.com/about
10771077+- Kagi News RSS: https://news.kagi.com/world.xml
10781078+- CC BY-NC License: https://creativecommons.org/licenses/by-nc/4.0/
10791079+- Parent PRD: [PRD_AGGREGATORS.md](PRD_AGGREGATORS.md)
10801080+- ATProto Python SDK: https://github.com/MarshalX/atproto
10811081+- Implementation: [aggregators/kagi-news/](/aggregators/kagi-news/)
1258108212591083---
1260108412611261-## Open Questions
10851085+## Implementation Summary
1262108612631263-### Need to Resolve Before Launch
10871087+**Phase 1 Status:** ✅ **COMPLETE**
1264108812651265-1. **Image Licensing:**
12661266- - ❓ Are images from Kagi proxy covered by CC BY-NC?
12671267- - ❓ Do we need to attribute original image sources?
12681268- - **Action:** Email support@kagi.com for clarification
10891089+The Kagi News RSS Aggregator implementation is complete and ready for integration testing and deployment. All 7 core components have been implemented with comprehensive test coverage (57 tests, 83% coverage).
1269109012701270-2. **Hotlinking Policy:**
12711271- - ❓ Is embedding Kagi proxy images acceptable?
12721272- - ❓ Should we download and re-host?
12731273- - **Action:** Test in staging, monitor for issues
10911091+**What Was Built:**
10921092+- Complete RSS feed fetching and parsing pipeline
10931093+- HTML parser that extracts all structured data from Kagi News feeds (summary, highlights, perspectives, quote, sources)
10941094+- Rich text formatter with proper facets for Coves
10951095+- State management system for deduplication
10961096+- Configuration management with YAML and environment variables
10971097+- HTTP client for Coves API authentication and post creation
10981098+- Main orchestrator with robust error handling
10991099+- Comprehensive test suite with real feed fixtures
11001100+- Documentation and example configurations
1274110112751275-3. **Category Discovery:**
12761276- - ❓ How to discover all available category feeds?
12771277- - ❓ Are there categories beyond world/tech/business/sports?
12781278- - **Action:** Scrape https://news.kagi.com/ for all .xml links
11021102+**Key Findings:**
11031103+- Kagi News RSS feeds contain only 3 structured sections (Highlights, Perspectives, Sources)
11041104+- Historical context is woven into the summary and highlights, not a separate section
11051105+- Timeline feature visible on Kagi website is not in the RSS feed
11061106+- All essential data for rich posts is available in the feed
11071107+- Feed structure is stable and well-formed
1279110812801280-4. **Attribution Format:**
12811281- - ❓ Is "📰 Story aggregated by Kagi News" sufficient?
12821282- - ❓ Do we need more prominent attribution?
12831283- - **Action:** Review CC BY-NC best practices
11091109+**Next Phase:**
11101110+Integration testing with live Coves API, followed by alpha deployment to test communities.
1284111112851112---
1286111312871287-## References
12881288-12891289-- Kagi News About Page: https://news.kagi.com/about
12901290-- Kagi News RSS Example: https://news.kagi.com/world.xml
12911291-- Kagi Kite Public Repo: https://github.com/kagisearch/kite-public
12921292-- CC BY-NC License: https://creativecommons.org/licenses/by-nc/4.0/
12931293-- Parent PRD: [PRD_AGGREGATORS.md](PRD_AGGREGATORS.md)
12941294-- Aggregator SDK: [TBD]
11141114+**End of PRD - Phase 1 Implementation Complete**