A public feed using rss
I made the public feed work, subscribing to rss feeds that I would like to follow, but keeping it open for others to see what I read and follow. well, I finally built it.
the idea
the concept was pretty simple: pull rss feeds from sources I follow, normalize the content, and display it on the site. make it automated so I don’t have to think about it. I wanted people to see what I’m reading without having to maintain yet another social media presence. And I also wanted to have my own public feed that is automatic and I can read it, just aggregating to stuff I subscribe to. Yes rss feed readers exist, but then I couldn’t share it with others so that they also can see what I read.
the setup and recipe to copy this
I went with a pretty straightforward approach:
- python script to fetch and process feeds
- hugo template to display the content with pagination
- github actions to keep everything updated automatically
Is the code any good? nah. But it works and simple to maintain. And thats a little fun, when you don’t really care and just want it to work. Like throwing code at the wall and continue doing it until it works, but with some knowledge and precision. Also had claude code check the code after and it added some stuff, it added some comments and stuff but I didn’t bother to clean up, since it worked so I think that was fine. Most important was the github actions logic so it didn’t just continue to generate and run in loops and do weird stuff.
feed configuration
first, I needed a way to configure which feeds to follow. went with a simple toml file:
1[[feeds]]
2name = "GitHub Blog"
3url = "https://github.blog/feed/"
4
5[[feeds]]
6name = "Go Blog"
7url = "https://go.dev/blog/feed/"
keeps it simple and easy to add new feeds later. Why toml? Because its simple to edit and I remembered that it exists. I also write this in neovim and for some reason my setup has better support for it than other stuff and I haven’t bothered configuring it correctly.
the python aggregator
the main script does the heavy lifting, fetches feeds, normalizes content, extracts images, and outputs json for hugo to consume. Also had to get images from the feeds. turns out many rss feeds embed images in their content, so I used beautifulsoup to extract the first image from each post:
1#!/usr/bin/env python3
2"""
3RSS Feed Aggregator for Hugo
4Fetches RSS feeds, normalizes content, and outputs JSON for Hugo data.
5"""
6
7import json
8import hashlib
9import re
10import toml
11import requests
12import feedparser
13from datetime import datetime
14from pathlib import Path
15from html import unescape
16from urllib.parse import urlparse, urljoin
17from bs4 import BeautifulSoup
18
19def load_feeds_config(config_path="feeds.toml"):
20 """Load feed configuration from TOML file."""
21 with open(config_path, 'r') as f:
22 config = toml.load(f)
23 return config['feeds']
24
25def clean_html(text):
26 """Remove HTML tags and decode entities."""
27 if not text:
28 return ""
29
30 # Remove HTML tags
31 text = re.sub(r'<[^>]+>', '', text)
32 # Decode HTML entities
33 text = unescape(text)
34 # Collapse whitespace
35 text = re.sub(r'\s+', ' ', text)
36 # Trim
37 text = text.strip()
38 return text
39
40def truncate_summary(text, max_length=300):
41 """Truncate text to max_length on word boundary."""
42 if not text or len(text) <= max_length:
43 return text
44
45 # Find last space before max_length
46 truncated = text[:max_length]
47 last_space = truncated.rfind(' ')
48 if last_space > 0:
49 return truncated[:last_space] + "..."
50 return truncated + "..."
51
52def generate_id(entry):
53 """Generate stable SHA-1 ID from entry content."""
54 # Try guid/id first, then link, then title
55 identity_source = ""
56
57 if hasattr(entry, 'id') and entry.id:
58 identity_source = entry.id
59 elif hasattr(entry, 'guid') and entry.guid:
60 identity_source = entry.guid
61 elif hasattr(entry, 'link') and entry.link:
62 identity_source = entry.link
63 elif hasattr(entry, 'title') and entry.title:
64 identity_source = entry.title
65
66 return hashlib.sha1(identity_source.encode('utf-8')).hexdigest()
67
68def parse_date(entry):
69 """Extract and normalize published date."""
70 date_str = None
71
72 # Try published first, then updated
73 if hasattr(entry, 'published_parsed') and entry.published_parsed:
74 date_str = datetime(*entry.published_parsed[:6]).isoformat() + 'Z'
75 elif hasattr(entry, 'updated_parsed') and entry.updated_parsed:
76 date_str = datetime(*entry.updated_parsed[:6]).isoformat() + 'Z'
77
78 return date_str
79
80def extract_image(entry, feed_url):
81 """Extract the first image from RSS entry content."""
82 image_url = None
83
84 # Look for images in content:encoded first, then description
85 content = ""
86 if hasattr(entry, 'content') and entry.content:
87 content = entry.content[0].value
88 elif hasattr(entry, 'description'):
89 content = entry.description
90
91 if content:
92 try:
93 soup = BeautifulSoup(content, 'html.parser')
94 img_tag = soup.find('img')
95 if img_tag and img_tag.get('src'):
96 src = img_tag.get('src')
97 # Convert relative URLs to absolute
98 if src.startswith('//'):
99 src = 'https:' + src
100 elif src.startswith('/'):
101 base_url = '/'.join(feed_url.split('/')[:3])
102 src = base_url + src
103 elif not src.startswith('http'):
104 src = urljoin(feed_url, src)
105 image_url = src
106 except Exception as e:
107 # If HTML parsing fails, continue without image
108 pass
109
110 return image_url
111
112def fetch_feed(feed_config):
113 """Fetch and parse a single RSS feed."""
114 try:
115 print(f"Fetching {feed_config['name']} from {feed_config['url']}")
116
117 # Fetch with user agent
118 headers = {'User-Agent': 'RSS Aggregator/1.0'}
119 response = requests.get(feed_config['url'], headers=headers, timeout=30)
120 response.raise_for_status()
121
122 # Parse feed
123 feed = feedparser.parse(response.content)
124
125 if feed.bozo:
126 print(f"Warning: Feed {feed_config['name']} has parsing issues")
127
128 items = []
129 for entry in feed.entries:
130 # Generate stable ID
131 item_id = generate_id(entry)
132
133 # Extract title
134 title = clean_html(getattr(entry, 'title', '')).strip()
135 if not title:
136 title = "(untitled)"
137
138 # Extract link
139 link = getattr(entry, 'link', '')
140 if not link:
141 continue # Skip items without links
142
143 # Extract and clean summary
144 summary = ""
145 if hasattr(entry, 'summary'):
146 summary = truncate_summary(clean_html(entry.summary))
147 elif hasattr(entry, 'description'):
148 summary = truncate_summary(clean_html(entry.description))
149
150 # Parse date
151 published = parse_date(entry)
152
153 # Extract image
154 image = extract_image(entry, feed_config['url'])
155
156 item = {
157 'id': item_id,
158 'title': title,
159 'link': link,
160 'source': feed_config['name']
161 }
162
163 if summary:
164 item['summary'] = summary
165 if published:
166 item['published'] = published
167 if image:
168 item['image'] = image
169
170 items.append(item)
171
172 print(f"Fetched {len(items)} items from {feed_config['name']}")
173 return items
174
175 except Exception as e:
176 print(f"Error fetching {feed_config['name']}: {e}")
177 return []
178
179def main():
180 """Main function."""
181 print("Starting RSS feed aggregation...")
182
183 # Load feeds configuration
184 feeds = load_feeds_config()
185
186 # Fetch all feeds
187 all_items = []
188 seen_ids = set()
189
190 for feed_config in feeds:
191 items = fetch_feed(feed_config)
192
193 # Deduplicate
194 for item in items:
195 if item['id'] not in seen_ids:
196 all_items.append(item)
197 seen_ids.add(item['id'])
198
199 # Sort by published date (newest first), undated items last
200 def sort_key(item):
201 if 'published' in item:
202 return (0, item['published'])
203 else:
204 return (1, '') # Undated items sort after dated ones
205
206 all_items.sort(key=sort_key, reverse=True)
207
208 # Limit to 500 items
209 if len(all_items) > 500:
210 all_items = all_items[:500]
211 print(f"Limited to 500 items (had {len(all_items)})")
212
213 # Create output data
214 output_data = {
215 'generated_at': datetime.utcnow().isoformat() + 'Z',
216 'items': all_items
217 }
218
219 # Ensure data directory exists
220 data_dir = Path('data')
221 data_dir.mkdir(exist_ok=True)
222
223 # Write JSON output
224 output_path = data_dir / 'feeds.json'
225 with open(output_path, 'w', encoding='utf-8') as f:
226 json.dump(output_data, f, indent=2, ensure_ascii=False)
227
228 print(f"Generated {len(all_items)} items in {output_path}")
229 print("RSS feed aggregation complete!")
230
231if __name__ == '__main__':
232 main()
hugo template with pagination
for the frontend, I built a hugo template that uses javascript for client-side pagination. keeps things snappy since all the data is already there:
1document.addEventListener('DOMContentLoaded', function() {
2 console.log('DOM loaded, feedData exists:', !!window.feedData);
3
4 if (!window.feedData) {
5 console.error('No feedData found!');
6 return;
7 }
8
9 // Parse JSON string if needed
10 let feedItems = window.feedData;
11 if (typeof feedItems === 'string') {
12 try {
13 feedItems = JSON.parse(feedItems);
14 } catch (e) {
15 console.error('Failed to parse feedData JSON:', e);
16 return;
17 }
18 }
19
20 console.log('Parsed feedItems:', feedItems);
21 console.log('feedItems is array:', Array.isArray(feedItems));
22
23 const pageSize = 10;
24 const totalItems = feedItems.length;
25 const totalPages = Math.ceil(totalItems / pageSize);
26
27 console.log('pageSize:', pageSize, 'totalItems:', totalItems, 'totalPages:', totalPages);
28
29 // Get current page from URL
30 const urlParams = new URLSearchParams(window.location.search);
31 let currentPage = parseInt(urlParams.get('page')) || 1;
32 if (currentPage < 1) currentPage = 1;
33 if (currentPage > totalPages) currentPage = totalPages;
34
35 function renderItems(page) {
36 const startIndex = (page - 1) * pageSize;
37 const endIndex = Math.min(startIndex + pageSize, totalItems);
38 const pageItems = feedItems.slice(startIndex, endIndex);
39
40 const itemsContainer = document.getElementById('feed-items');
41 itemsContainer.innerHTML = '';
42
43 pageItems.forEach(item => {
44 const article = document.createElement('article');
45 article.className = 'feed-item';
46
47 let metaHtml = '';
48 if (item.published) {
49 const date = new Date(item.published).toISOString().split('T')[0];
50 metaHtml += `<time>${date}</time>`;
51 }
52 if (item.source) {
53 metaHtml += `<span class="source">${item.source}</span>`;
54 }
55
56 article.innerHTML = `
57 <div class="feed-item-content">
58 ${item.image ? `<div class="feed-item-image">
59 <img src="${item.image}" alt="${item.title}" loading="lazy">
60 </div>` : ''}
61 <div class="feed-item-text">
62 <h2><a href="${item.link}" target="_blank" rel="noopener">${item.title}</a></h2>
63 <div class="feed-item-meta">${metaHtml}</div>
64 ${item.summary ? `<p class="summary">${item.summary}</p>` : ''}
65 </div>
66 </div>
67 `;
68
69 itemsContainer.appendChild(article);
70 });
71
72 // Update meta text
73 const metaText = document.getElementById('feed-meta-text');
74 const generatedDate = new Date(window.feedGeneratedAt).toLocaleString();
75 metaText.textContent = `Showing ${startIndex + 1}-${endIndex} of ${totalItems} items • Last updated: ${generatedDate}`;
76
77 // Update pagination
78 updatePagination(page);
79 }
80
81 function updatePagination(page) {
82 const pagination = document.getElementById('pagination');
83 const prevBtn = document.getElementById('prev-btn');
84 const nextBtn = document.getElementById('next-btn');
85 const pageInfo = document.getElementById('page-info');
86
87 if (totalPages <= 1) {
88 pagination.style.display = 'none';
89 return;
90 }
91
92 pagination.style.display = 'flex';
93 pageInfo.textContent = `Page ${page} of ${totalPages}`;
94
95 // Previous button
96 if (page <= 1) {
97 prevBtn.style.opacity = '0.5';
98 prevBtn.style.pointerEvents = 'none';
99 } else {
100 prevBtn.style.opacity = '1';
101 prevBtn.style.pointerEvents = 'auto';
102 prevBtn.onclick = () => goToPage(page - 1);
103 }
104
105 // Next button
106 if (page >= totalPages) {
107 nextBtn.style.opacity = '0.5';
108 nextBtn.style.pointerEvents = 'none';
109 } else {
110 nextBtn.style.opacity = '1';
111 nextBtn.style.pointerEvents = 'auto';
112 nextBtn.onclick = () => goToPage(page + 1);
113 }
114 }
115
116 function goToPage(page) {
117 const url = new URL(window.location);
118 if (page === 1) {
119 url.searchParams.delete('page');
120 } else {
121 url.searchParams.set('page', page);
122 }
123 window.history.pushState({}, '', url);
124 currentPage = page;
125 renderItems(page);
126 }
127
128 // Handle browser back/forward
129 window.addEventListener('popstate', function() {
130 const urlParams = new URLSearchParams(window.location.search);
131 const page = parseInt(urlParams.get('page')) || 1;
132 currentPage = page;
133 renderItems(page);
134 });
135
136 // Initial render
137 renderItems(currentPage);
138});
the layout shows 10 items per page with proper navigation and keeps the url state synced so you can bookmark specific pages.
github actions automation
to keep everything updated, I set up a github action that runs the feed script daily and on manual triggers:
1name: Update RSS Feeds
2
3on:
4 schedule:
5 # Run daily at 6 AM UTC
6 - cron: '0 6 * * *'
7
8 # Run when feeds config is updated
9 push:
10 paths:
11 - 'feeds.toml'
12 - 'scripts/fetch-feeds.py'
13 - '.github/workflows/update-feeds.yml'
14
15 # Allow manual triggering
16 workflow_dispatch:
17
18# Permissions needed to commit changes
19permissions:
20 contents: write
21
22jobs:
23 update-feeds:
24 runs-on: ubuntu-latest
25
26 steps:
27 - name: Checkout repository
28 uses: actions/checkout@v4
29
30 - name: Setup Python
31 uses: actions/setup-python@v4
32 with:
33 python-version: '3.x'
34
35 - name: Install Python dependencies
36 run: pip install requests feedparser toml beautifulsoup4
37
38 - name: Fetch RSS feeds
39 run: python scripts/fetch-feeds.py
40
41 - name: Check for changes
42 id: changes
43 run: |
44 if git diff --quiet data/feeds.json; then
45 echo "changed=false" >> $GITHUB_OUTPUT
46 else
47 echo "changed=true" >> $GITHUB_OUTPUT
48 fi
49
50 - name: Commit and push changes
51 if: steps.changes.outputs.changed == 'true'
52 run: |
53 git config --local user.email "action@github.com"
54 git config --local user.name "GitHub Action"
55 git add data/feeds.json
56 git commit -m "Update RSS feeds - $(date '+%Y-%m-%d %H:%M UTC')"
57 git push
the key thing here is the paths-ignore
to prevent infinite loops when the action commits the updated feeds.
So what
the technical bits were straightforward. python for data processing, hugo for static generation, github actions for automation. but the result feels pretty satisfying.
one thing worth noting: while we call it an “rss feed aggregator”, we’re actually fetching rss feeds (xml format) over standard http/https using rest style requests. rss doesn’t have its own transport protocol, it’s just xml served over the web like any other resource. the feedparser
library handles parsing the xml into python objects we can work with. The feed also just overwrites and resets every time its pulled, and I think thats fine. Keeps the json file we write to small and I don’t need to have a huge archive of links.
you can check out the feed page to see it. and since everything is automated, it stays current without me having to think about it.