Add article extractor

2026-02-23 05:15:19 +08:00 · 2025-11-10 13:34:38 -05:00 · 2025-11-10 13:34:38 -05:00 · 4a3f7812fb
commit 4a3f7812fb
parent 7fa46d3969
3 changed files with 42 additions and 3 deletions
--- a/modules/home/claude-code/default.nix
+++ b/modules/home/claude-code/default.nix
@ -34,9 +34,9 @@ in
    skillDirs;

  home.packages = [
-    # Used in ./memory.md
+    # Used in skills
    # TODO: Encapsulate
-    pkgs.python313Packages.markitdown
+    pkgs.reader
  ];
  programs.claude-code = {
    enable = true;
--- a/modules/home/claude-code/memory.md
+++ b/modules/home/claude-code/memory.md
@ -8,5 +8,5 @@
 # Tools

 - **gh**: If `gh` is unavailable, get it from nixpkgs, viz.: `nix run nixpkgs#gh`.
- **markitdown**: To convert web URLs to plain text, run `curl -k URL | markitdown` (markitdown is already installed)
+- **article-extractor**: To extract clean article content from URLs (blog posts, articles, tutorials), use the article-extractor skill. It removes ads, navigation, and clutter and saves readable text.

--- a/modules/home/claude-code/skills/article-extractor/SKILL.md
+++ b/modules/home/claude-code/skills/article-extractor/SKILL.md
@ -0,0 +1,39 @@
+---
+name: article-extractor
+description: Extract clean article content from URLs using reader. Use when user wants to download/extract/save an article from a URL.
+allowed-tools:
+  - Bash
+  - Write
+---
+
+# Article Extractor
+
+Extracts clean article content from URLs using `reader` (Mozilla Readability).
+
+## When to Use
+
+User wants to:
+- Extract article from URL
+- Download blog post content
+- Save article as text
+
+## Workflow
+
+```bash
+# Extract article
+reader "$URL" > temp.txt
+
+# Get title from first line
+TITLE=$(head -n 1 temp.txt | sed 's/^# //')
+
+# Clean filename
+FILENAME=$(echo "$TITLE" | tr '/:?"<>|' '-' | cut -c 1-80 | sed 's/ *$//')".txt"
+
+# Save
+mv temp.txt "$FILENAME"
+
+# Show preview
+echo "✓ Saved: $FILENAME"
+head -n 10 "$FILENAME"
+```
+