AgentSkillsCN

blog-create

通过文件命名(ls-lint)、Markdown语法检查(markdownlint-cli2),以及分级的Git钩子,实现自动化质量管控。

SKILL.md
--- frontmatter
name: blog-create
description: "This skill should be used when adding blog posts to epub library, discovering new blogs to follow, or syncing blog archives. Triggers include 'add blog', 'harvest blog', 'get all posts from', 'add [author] blog', 'discover blogs', 'suggest blogs'."

blog-create

harvest blog articles from individual author websites and convert to epub for reading on e-ink devices.

Purpose: This skill adds blog articles (individual posts from personal blogs) to the epub library, NOT topical feeds or actual books. The resulting epubs are organized in the /blogs/ folder on the X4 device, grouped by author.

Scope: Individual author blogs (Simon Willison, Armin Ronacher, etc.), not curated topical feeds (use feed-create for r/soccer, AI digests, etc.).

when to use

useskip
adding all articles from an author's blogsingle post conversion (use reader url)
discovering new individual author blogstopical feeds like r/soccer (use feed-create)
batch harvesting blog archivesactual books (use reader search)
suggesting similar blogs based on libraryagent documentation (use agents-digest)

decision tree: mode selection

code
What do you want to do?
├── add all posts from URL → harvest mode
├── add posts from author name → author mode (lookup URL first)
├── discover similar blogs → suggest mode (analyze library)
├── check what we have → inventory mode
└── unclear → ask which blog URL

harvest mode

extract all post URLs from a blog and convert to epub.

supported platforms

platformdetectionextraction method
custom blogsany HTMLagent-browser snapshot + link extraction
substack/archive pathagent-browser + pagination
medium/@usernameagent-browser + infinite scroll
ghost/ghost/ in HTMLagent-browser + RSS fallback
wordpress/wp-content/agent-browser + sitemap

workflow

  1. discover: use agent-browser to load blog and extract all post URLs
  2. dedupe: check sqlite to skip existing posts
  3. convert: batch process with reader CLI
  4. verify: confirm count matches expected

tool integration

bash
# discover posts using agent-browser
agent-browser open https://example.com/blog
agent-browser snapshot -i -c
agent-browser get text "a[href*='/posts/']"  # extract post links

# check existing articles
sqlite3 ~/.epub/library.db "SELECT title FROM library_items WHERE author = 'Author Name'"

# batch convert
for url in $(cat ~/.abbie/feeds/harvested/urls.txt); do
  reader url "$url" --author "Author Name"
done

# Or use the built-in batch command
reader blog harvest --file ~/.abbie/feeds/harvested/urls.txt --author "Author Name"

# verify
sqlite3 ~/.epub/library.db "SELECT COUNT(*) FROM library_items WHERE author = 'Author Name'"

author mode

look up author's blog URL then harvest.

sources

sourcecommandnotes
librarysqlite3 ~/.epub/library.db "SELECT DISTINCT author FROM library_items"existing authors
assetscat ~/.claude/skills/blog-add/assets/supported-blogs.jsoncurated list
searchagent-browser search "[author] blog"fallback

workflow

  1. lookup: find blog URL from author name
  2. confirm: verify URL loads and matches author
  3. harvest: run harvest mode with URL

suggest mode

discover new blogs based on library patterns.

discovery methods

methoddescriptiontool
author mentionsblogs mentioned in existing postsgrep epub content
similar domainssame hosting/platformdomain analysis
blogroll links"blogroll" sections in existing blogsagent-browser extraction
related topicstags/categories overlapsqlite tag query

workflow

  1. analyze: query library for patterns
  2. extract: find candidate URLs
  3. rank: score by relevance
  4. present: show top 10 with stats

tool integration

bash
# get existing authors
sqlite3 ~/.epub/library.db "SELECT DISTINCT author FROM library_items ORDER BY author"

# analyze for mentions
for epub in ~/.epub/library/*.epub; do
  unzip -p "$epub" | grep -o 'https://[^"]*' | grep blog
done | sort | uniq -c | sort -rn | head -20

inventory mode

check what's in the library and suggest additions.

queries

bash
# authors with < 10 posts (incomplete archives)
sqlite3 ~/.epub/library.db "
  SELECT author, COUNT(*) as count
  FROM library_items
  WHERE source IN ('feed', 'backfill', 'url')
  GROUP BY author
  HAVING count < 10
  ORDER BY count DESC
"

# authors missing from known blogs
comm -13 \
  <(sqlite3 ~/.epub/library.db "SELECT DISTINCT author FROM library_items WHERE source IN ('feed', 'backfill')" | sort) \
  <(jq -r '.[].author' ~/.claude/skills/blog-add/assets/supported-blogs.json | sort)

# oldest post dates (check for updates)
sqlite3 ~/.epub/library.db "
  SELECT author, MAX(created_at) as latest
  FROM library_items
  WHERE source IN ('feed', 'backfill')
  GROUP BY author
  ORDER BY latest ASC
  LIMIT 20
"

batch conversion

convert multiple URLs efficiently.

parallel processing

bash
# sequential (safe)
cat urls.txt | while read url; do
  node ~/.epub/bin/run.js url "$url" --author "Author"
done

# parallel (faster, 4 workers)
cat urls.txt | xargs -P 4 -I {} node ~/.epub/bin/run.js url "{}" --author "Author"

error handling

bash
# with retry and logging
mkdir -p ~/.abbie/feeds/harvested
cat urls.txt | while read url; do
  if ! node ~/.epub/bin/run.js url "$url" --author "Author" 2>&1 | tee -a ~/.abbie/feeds/harvested/conversion.log; then
    echo "$url" >> ~/.abbie/feeds/harvested/failed.txt
  fi
  sleep 1  # rate limiting
done

agent-browser patterns

extract post list

bash
# open blog archive
agent-browser open https://blog.example.com/archive

# wait for load
agent-browser wait --load networkidle

# get snapshot
agent-browser snapshot -i -c

# extract links
mkdir -p ~/.abbie/feeds/harvested
agent-browser eval "
  Array.from(document.querySelectorAll('a[href*=\"/posts/\"]'))
    .map(a => a.href)
    .join('\\n')
" > ~/.abbie/feeds/harvested/urls.txt

handle pagination

bash
# click "load more" until exhausted
while agent-browser is visible ".load-more"; do
  agent-browser click ".load-more"
  agent-browser wait 2000
done

# extract all loaded links
agent-browser eval "Array.from(document.querySelectorAll('article a')).map(a => a.href).join('\\n')"

infinite scroll

bash
# scroll to bottom repeatedly
for i in {1..20}; do
  agent-browser scroll down 1000
  agent-browser wait 1000
done

# extract all visible posts
agent-browser snapshot -i -c

validation

post conversion

bash
# verify epub created
test -f ~/.epub/library/*.epub && echo "✓ epub exists"

# verify in database
sqlite3 ~/.epub/library.db "SELECT title FROM library_items WHERE id = '$ID'"

# verify author set
sqlite3 ~/.epub/library.db "SELECT author FROM library_items WHERE id = '$ID'" | grep -q "Author Name"

batch completion

bash
# expected vs actual count
EXPECTED=$(wc -l < urls.txt)
ACTUAL=$(sqlite3 ~/.epub/library.db "SELECT COUNT(*) FROM library_items WHERE author = 'Author' AND source IN ('url', 'backfill')")
echo "Expected: $EXPECTED, Actual: $ACTUAL"

# check for failures
test -f ~/.abbie/feeds/harvested/failed.txt && echo "⚠ $(wc -l < ~/.abbie/feeds/harvested/failed.txt) failures"

references

scripts

assets

anti-patterns

patternproblemfix
calling these "books"misleading - they're blog articlessay "articles" or "posts", not "books"
using blog-add for topical feedsblog-add is for individual authors, not r/socceruse feed-create skill for curated topics
confusing blogs/ and feeds/blogs/=authors, feeds/=topicsremember the distinction
using WebFetch for extractionfails on JS-heavy blogsuse agent-browser with snapshot
no deduplicationre-converts existing postscheck sqlite before converting
sequential conversion onlyslow for 100+ postsuse xargs -P for parallelization
missing author nameposts go to Unknown/always pass --author flag
no error loggingcan't retry failureslog errors to file, save failed URLs
no rate limitinggets blockedadd sleep between requests