Add article extractor

This commit is contained in:
Sridhar Ratnakumar 2025-11-10 13:34:38 -05:00
parent 7fa46d3969
commit 4a3f7812fb
3 changed files with 42 additions and 3 deletions

View file

@ -34,9 +34,9 @@ in
skillDirs;
home.packages = [
# Used in ./memory.md
# Used in skills
# TODO: Encapsulate
pkgs.python313Packages.markitdown
pkgs.reader
];
programs.claude-code = {
enable = true;

View file

@ -8,5 +8,5 @@
# Tools
- **gh**: If `gh` is unavailable, get it from nixpkgs, viz.: `nix run nixpkgs#gh`.
- **markitdown**: To convert web URLs to plain text, run `curl -k URL | markitdown` (markitdown is already installed)
- **article-extractor**: To extract clean article content from URLs (blog posts, articles, tutorials), use the article-extractor skill. It removes ads, navigation, and clutter and saves readable text.

View file

@ -0,0 +1,39 @@
---
name: article-extractor
description: Extract clean article content from URLs using reader. Use when user wants to download/extract/save an article from a URL.
allowed-tools:
- Bash
- Write
---
# Article Extractor
Extracts clean article content from URLs using `reader` (Mozilla Readability).
## When to Use
User wants to:
- Extract article from URL
- Download blog post content
- Save article as text
## Workflow
```bash
# Extract article
reader "$URL" > temp.txt
# Get title from first line
TITLE=$(head -n 1 temp.txt | sed 's/^# //')
# Clean filename
FILENAME=$(echo "$TITLE" | tr '/:?"<>|' '-' | cut -c 1-80 | sed 's/ *$//')".txt"
# Save
mv temp.txt "$FILENAME"
# Show preview
echo "✓ Saved: $FILENAME"
head -n 10 "$FILENAME"
```