A public feed using rss

I made the public feed work, subscribing to rss feeds that I would like to follow, but keeping it open for others to see what I read and follow. well, I finally built it.

the idea

the concept was pretty simple: pull rss feeds from sources I follow, normalize the content, and display it on the site. make it automated so I don’t have to think about it. I wanted people to see what I’m reading without having to maintain yet another social media presence. And I also wanted to have my own public feed that is automatic and I can read it, just aggregating to stuff I subscribe to. Yes rss feed readers exist, but then I couldn’t share it with others so that they also can see what I read.

the setup and recipe to copy this

I went with a pretty straightforward approach:

python script to fetch and process feeds
hugo template to display the content with pagination
github actions to keep everything updated automatically

Is the code any good? nah. But it works and simple to maintain. And thats a little fun, when you don’t really care and just want it to work. Like throwing code at the wall and continue doing it until it works, but with some knowledge and precision. Also had claude code check the code after and it added some stuff, it added some comments and stuff but I didn’t bother to clean up, since it worked so I think that was fine. Most important was the github actions logic so it didn’t just continue to generate and run in loops and do weird stuff.

feed configuration

first, I needed a way to configure which feeds to follow. went with a simple toml file:

1[[feeds]]
2name = "GitHub Blog"
3url = "https://github.blog/feed/"
4
5[[feeds]]
6name = "Go Blog"
7url = "https://go.dev/blog/feed/"

keeps it simple and easy to add new feeds later. Why toml? Because its simple to edit and I remembered that it exists. I also write this in neovim and for some reason my setup has better support for it than other stuff and I haven’t bothered configuring it correctly.

the python aggregator

the main script does the heavy lifting, fetches feeds, normalizes content, extracts images, and outputs json for hugo to consume. Also had to get images from the feeds. turns out many rss feeds embed images in their content, so I used beautifulsoup to extract the first image from each post:

  1#!/usr/bin/env python3
  2"""
  3RSS Feed Aggregator for Hugo
  4Fetches RSS feeds, normalizes content, and outputs JSON for Hugo data.
  5"""
  6
  7import json
  8import hashlib
  9import re
 10import toml
 11import requests
 12import feedparser
 13from datetime import datetime
 14from pathlib import Path
 15from html import unescape
 16from urllib.parse import urlparse, urljoin
 17from bs4 import BeautifulSoup
 18
 19def load_feeds_config(config_path="feeds.toml"):
 20    """Load feed configuration from TOML file."""
 21    with open(config_path, 'r') as f:
 22        config = toml.load(f)
 23    return config['feeds']
 24
 25def clean_html(text):
 26    """Remove HTML tags and decode entities."""
 27    if not text:
 28        return ""
 29    
 30    # Remove HTML tags
 31    text = re.sub(r'<[^>]+>', '', text)
 32    # Decode HTML entities
 33    text = unescape(text)
 34    # Collapse whitespace
 35    text = re.sub(r'\s+', ' ', text)
 36    # Trim
 37    text = text.strip()
 38    return text
 39
 40def truncate_summary(text, max_length=300):
 41    """Truncate text to max_length on word boundary."""
 42    if not text or len(text) <= max_length:
 43        return text
 44    
 45    # Find last space before max_length
 46    truncated = text[:max_length]
 47    last_space = truncated.rfind(' ')
 48    if last_space > 0:
 49        return truncated[:last_space] + "..."
 50    return truncated + "..."
 51
 52def generate_id(entry):
 53    """Generate stable SHA-1 ID from entry content."""
 54    # Try guid/id first, then link, then title
 55    identity_source = ""
 56    
 57    if hasattr(entry, 'id') and entry.id:
 58        identity_source = entry.id
 59    elif hasattr(entry, 'guid') and entry.guid:
 60        identity_source = entry.guid
 61    elif hasattr(entry, 'link') and entry.link:
 62        identity_source = entry.link
 63    elif hasattr(entry, 'title') and entry.title:
 64        identity_source = entry.title
 65    
 66    return hashlib.sha1(identity_source.encode('utf-8')).hexdigest()
 67
 68def parse_date(entry):
 69    """Extract and normalize published date."""
 70    date_str = None
 71    
 72    # Try published first, then updated
 73    if hasattr(entry, 'published_parsed') and entry.published_parsed:
 74        date_str = datetime(*entry.published_parsed[:6]).isoformat() + 'Z'
 75    elif hasattr(entry, 'updated_parsed') and entry.updated_parsed:
 76        date_str = datetime(*entry.updated_parsed[:6]).isoformat() + 'Z'
 77    
 78    return date_str
 79
 80def extract_image(entry, feed_url):
 81    """Extract the first image from RSS entry content."""
 82    image_url = None
 83    
 84    # Look for images in content:encoded first, then description
 85    content = ""
 86    if hasattr(entry, 'content') and entry.content:
 87        content = entry.content[0].value
 88    elif hasattr(entry, 'description'):
 89        content = entry.description
 90    
 91    if content:
 92        try:
 93            soup = BeautifulSoup(content, 'html.parser')
 94            img_tag = soup.find('img')
 95            if img_tag and img_tag.get('src'):
 96                src = img_tag.get('src')
 97                # Convert relative URLs to absolute
 98                if src.startswith('//'):
 99                    src = 'https:' + src
100                elif src.startswith('/'):
101                    base_url = '/'.join(feed_url.split('/')[:3])
102                    src = base_url + src
103                elif not src.startswith('http'):
104                    src = urljoin(feed_url, src)
105                image_url = src
106        except Exception as e:
107            # If HTML parsing fails, continue without image
108            pass
109    
110    return image_url
111
112def fetch_feed(feed_config):
113    """Fetch and parse a single RSS feed."""
114    try:
115        print(f"Fetching {feed_config['name']} from {feed_config['url']}")
116        
117        # Fetch with user agent
118        headers = {'User-Agent': 'RSS Aggregator/1.0'}
119        response = requests.get(feed_config['url'], headers=headers, timeout=30)
120        response.raise_for_status()
121        
122        # Parse feed
123        feed = feedparser.parse(response.content)
124        
125        if feed.bozo:
126            print(f"Warning: Feed {feed_config['name']} has parsing issues")
127        
128        items = []
129        for entry in feed.entries:
130            # Generate stable ID
131            item_id = generate_id(entry)
132            
133            # Extract title
134            title = clean_html(getattr(entry, 'title', '')).strip()
135            if not title:
136                title = "(untitled)"
137            
138            # Extract link
139            link = getattr(entry, 'link', '')
140            if not link:
141                continue  # Skip items without links
142            
143            # Extract and clean summary
144            summary = ""
145            if hasattr(entry, 'summary'):
146                summary = truncate_summary(clean_html(entry.summary))
147            elif hasattr(entry, 'description'):
148                summary = truncate_summary(clean_html(entry.description))
149            
150            # Parse date
151            published = parse_date(entry)
152            
153            # Extract image
154            image = extract_image(entry, feed_config['url'])
155            
156            item = {
157                'id': item_id,
158                'title': title,
159                'link': link,
160                'source': feed_config['name']
161            }
162            
163            if summary:
164                item['summary'] = summary
165            if published:
166                item['published'] = published
167            if image:
168                item['image'] = image
169                
170            items.append(item)
171        
172        print(f"Fetched {len(items)} items from {feed_config['name']}")
173        return items
174        
175    except Exception as e:
176        print(f"Error fetching {feed_config['name']}: {e}")
177        return []
178
179def main():
180    """Main function."""
181    print("Starting RSS feed aggregation...")
182    
183    # Load feeds configuration
184    feeds = load_feeds_config()
185    
186    # Fetch all feeds
187    all_items = []
188    seen_ids = set()
189    
190    for feed_config in feeds:
191        items = fetch_feed(feed_config)
192        
193        # Deduplicate
194        for item in items:
195            if item['id'] not in seen_ids:
196                all_items.append(item)
197                seen_ids.add(item['id'])
198    
199    # Sort by published date (newest first), undated items last
200    def sort_key(item):
201        if 'published' in item:
202            return (0, item['published'])
203        else:
204            return (1, '')  # Undated items sort after dated ones
205    
206    all_items.sort(key=sort_key, reverse=True)
207    
208    # Limit to 500 items
209    if len(all_items) > 500:
210        all_items = all_items[:500]
211        print(f"Limited to 500 items (had {len(all_items)})")
212    
213    # Create output data
214    output_data = {
215        'generated_at': datetime.utcnow().isoformat() + 'Z',
216        'items': all_items
217    }
218    
219    # Ensure data directory exists
220    data_dir = Path('data')
221    data_dir.mkdir(exist_ok=True)
222    
223    # Write JSON output
224    output_path = data_dir / 'feeds.json'
225    with open(output_path, 'w', encoding='utf-8') as f:
226        json.dump(output_data, f, indent=2, ensure_ascii=False)
227    
228    print(f"Generated {len(all_items)} items in {output_path}")
229    print("RSS feed aggregation complete!")
230
231if __name__ == '__main__':
232    main()

hugo template with pagination

for the frontend, I built a hugo template that uses javascript for client-side pagination. keeps things snappy since all the data is already there:

  1document.addEventListener('DOMContentLoaded', function() {
  2  console.log('DOM loaded, feedData exists:', !!window.feedData);
  3  
  4  if (!window.feedData) {
  5    console.error('No feedData found!');
  6    return;
  7  }
  8  
  9  // Parse JSON string if needed
 10  let feedItems = window.feedData;
 11  if (typeof feedItems === 'string') {
 12    try {
 13      feedItems = JSON.parse(feedItems);
 14    } catch (e) {
 15      console.error('Failed to parse feedData JSON:', e);
 16      return;
 17    }
 18  }
 19  
 20  console.log('Parsed feedItems:', feedItems);
 21  console.log('feedItems is array:', Array.isArray(feedItems));
 22  
 23  const pageSize = 10;
 24  const totalItems = feedItems.length;
 25  const totalPages = Math.ceil(totalItems / pageSize);
 26  
 27  console.log('pageSize:', pageSize, 'totalItems:', totalItems, 'totalPages:', totalPages);
 28  
 29  // Get current page from URL
 30  const urlParams = new URLSearchParams(window.location.search);
 31  let currentPage = parseInt(urlParams.get('page')) || 1;
 32  if (currentPage < 1) currentPage = 1;
 33  if (currentPage > totalPages) currentPage = totalPages;
 34  
 35  function renderItems(page) {
 36    const startIndex = (page - 1) * pageSize;
 37    const endIndex = Math.min(startIndex + pageSize, totalItems);
 38    const pageItems = feedItems.slice(startIndex, endIndex);
 39    
 40    const itemsContainer = document.getElementById('feed-items');
 41    itemsContainer.innerHTML = '';
 42    
 43    pageItems.forEach(item => {
 44      const article = document.createElement('article');
 45      article.className = 'feed-item';
 46      
 47      let metaHtml = '';
 48      if (item.published) {
 49        const date = new Date(item.published).toISOString().split('T')[0];
 50        metaHtml += `<time>${date}</time>`;
 51      }
 52      if (item.source) {
 53        metaHtml += `<span class="source">${item.source}</span>`;
 54      }
 55      
 56      article.innerHTML = `
 57        <div class="feed-item-content">
 58          ${item.image ? `<div class="feed-item-image">
 59            <img src="${item.image}" alt="${item.title}" loading="lazy">
 60          </div>` : ''}
 61          <div class="feed-item-text">
 62            <h2><a href="${item.link}" target="_blank" rel="noopener">${item.title}</a></h2>
 63            <div class="feed-item-meta">${metaHtml}</div>
 64            ${item.summary ? `<p class="summary">${item.summary}</p>` : ''}
 65          </div>
 66        </div>
 67      `;
 68      
 69      itemsContainer.appendChild(article);
 70    });
 71    
 72    // Update meta text
 73    const metaText = document.getElementById('feed-meta-text');
 74    const generatedDate = new Date(window.feedGeneratedAt).toLocaleString();
 75    metaText.textContent = `Showing ${startIndex + 1}-${endIndex} of ${totalItems} items • Last updated: ${generatedDate}`;
 76    
 77    // Update pagination
 78    updatePagination(page);
 79  }
 80  
 81  function updatePagination(page) {
 82    const pagination = document.getElementById('pagination');
 83    const prevBtn = document.getElementById('prev-btn');
 84    const nextBtn = document.getElementById('next-btn');
 85    const pageInfo = document.getElementById('page-info');
 86    
 87    if (totalPages <= 1) {
 88      pagination.style.display = 'none';
 89      return;
 90    }
 91    
 92    pagination.style.display = 'flex';
 93    pageInfo.textContent = `Page ${page} of ${totalPages}`;
 94    
 95    // Previous button
 96    if (page <= 1) {
 97      prevBtn.style.opacity = '0.5';
 98      prevBtn.style.pointerEvents = 'none';
 99    } else {
100      prevBtn.style.opacity = '1';
101      prevBtn.style.pointerEvents = 'auto';
102      prevBtn.onclick = () => goToPage(page - 1);
103    }
104    
105    // Next button  
106    if (page >= totalPages) {
107      nextBtn.style.opacity = '0.5';
108      nextBtn.style.pointerEvents = 'none';
109    } else {
110      nextBtn.style.opacity = '1';
111      nextBtn.style.pointerEvents = 'auto';
112      nextBtn.onclick = () => goToPage(page + 1);
113    }
114  }
115  
116  function goToPage(page) {
117    const url = new URL(window.location);
118    if (page === 1) {
119      url.searchParams.delete('page');
120    } else {
121      url.searchParams.set('page', page);
122    }
123    window.history.pushState({}, '', url);
124    currentPage = page;
125    renderItems(page);
126  }
127  
128  // Handle browser back/forward
129  window.addEventListener('popstate', function() {
130    const urlParams = new URLSearchParams(window.location.search);
131    const page = parseInt(urlParams.get('page')) || 1;
132    currentPage = page;
133    renderItems(page);
134  });
135  
136  // Initial render
137  renderItems(currentPage);
138});

the layout shows 10 items per page with proper navigation and keeps the url state synced so you can bookmark specific pages.

github actions automation

to keep everything updated, I set up a github action that runs the feed script daily and on manual triggers:

 1name: Update RSS Feeds
 2
 3on:
 4  schedule:
 5    # Run daily at 6 AM UTC
 6    - cron: '0 6 * * *'
 7  
 8  # Run when feeds config is updated
 9  push:
10    paths:
11      - 'feeds.toml'
12      - 'scripts/fetch-feeds.py'
13      - '.github/workflows/update-feeds.yml'
14  
15  # Allow manual triggering
16  workflow_dispatch:
17
18# Permissions needed to commit changes
19permissions:
20  contents: write
21
22jobs:
23  update-feeds:
24    runs-on: ubuntu-latest
25    
26    steps:
27      - name: Checkout repository
28        uses: actions/checkout@v4
29        
30      - name: Setup Python
31        uses: actions/setup-python@v4
32        with:
33          python-version: '3.x'
34          
35      - name: Install Python dependencies
36        run: pip install requests feedparser toml beautifulsoup4
37        
38      - name: Fetch RSS feeds
39        run: python scripts/fetch-feeds.py
40        
41      - name: Check for changes
42        id: changes
43        run: |
44          if git diff --quiet data/feeds.json; then
45            echo "changed=false" >> $GITHUB_OUTPUT
46          else
47            echo "changed=true" >> $GITHUB_OUTPUT
48          fi
49                    
50      - name: Commit and push changes
51        if: steps.changes.outputs.changed == 'true'
52        run: |
53          git config --local user.email "action@github.com"
54          git config --local user.name "GitHub Action"
55          git add data/feeds.json
56          git commit -m "Update RSS feeds - $(date '+%Y-%m-%d %H:%M UTC')"
57          git push

the key thing here is the paths-ignore to prevent infinite loops when the action commits the updated feeds.

So what

the technical bits were straightforward. python for data processing, hugo for static generation, github actions for automation. but the result feels pretty satisfying.

one thing worth noting: while we call it an “rss feed aggregator”, we’re actually fetching rss feeds (xml format) over standard http/https using rest style requests. rss doesn’t have its own transport protocol, it’s just xml served over the web like any other resource. the feedparser library handles parsing the xml into python objects we can work with. The feed also just overwrites and resets every time its pulled, and I think thats fine. Keeps the json file we write to small and I don’t need to have a huge archive of links.

you can check out the feed page to see it. and since everything is automated, it stays current without me having to think about it.