AgentSkillsCN

invisible-threads

当用户希望在海量论文、文章或文字稿中发掘那些不为人知的深层主题关联时使用。当您需要编辑文集、寻找反复出现的核心思想,或在不同来源间穿针引线、巧妙引用时,此技能同样适用。若需提取叙事精华,请参考 narrative-snippets。

SKILL.md
--- frontmatter
name: invisible-threads
version: 1.0.0
description: "When the user wants to discover non-obvious thematic connections across a corpus of essays, articles, or transcripts. Also use when editing an anthology, finding recurring ideas, or weaving quotes across sources. For narrative extraction, see narrative-snippets."

Invisible Threads Skill

Discovers non-obvious thematic connections ("invisible threads") across a corpus of essays, articles, or transcripts.

Adapted from invisible-threads.

When to Use

  • Editing an anthology and want to find thematic connections
  • Analyzing a body of work for recurring ideas
  • Finding quotes that support a theme across multiple sources
  • Writing editorial commentary that weaves sources together

Quick Start

bash
# 1. Prepare your corpus as markdown files in a source directory

# 2. Chunk the corpus into a database
python chunk_corpus.py --source /path/to/markdown/files --output corpus.db

# 3. Extract insights using Gemini
python extract_insights.py --db corpus.db --backend gemini

# 4. Find threads
python find_threads.py --input data/insights_*.json

# 5. Review threads_*.json for editorial use

Adapting for Your Project

The key file to modify is the extraction prompt in extract_insights.py.

Default Prompt Categories

The default is set up for Catholic agrarian content (Cross & Plough):

  • industrialism, land, family, property, craft, liturgy
  • organic-farming, distributism, eugenics, totalitarianism
  • natural-law, peasantry, economics, spirituality

For Other Projects

Change the EXTRACTION_PROMPT in extract_insights.py:

  1. Context section: Describe what the corpus is
  2. Examples section: Give 3 examples of genuine insights from this domain
  3. Categories: List 10-15 thematic categories relevant to your content

See references/prompt-templates.md for examples.

Output

insights_*.json

Each insight includes:

  • insight_text: The extracted quote/idea
  • category: Thematic category
  • novelty_score: 1-10 how surprising
  • specificity_score: 1-10 how quotable
  • source: Which document it came from
  • raw_chunk: Original context

threads_*.json

Each thread includes:

  • thread_id: Identifier
  • category: Dominant theme
  • size: Number of connected insights
  • num_sources: How many different documents
  • years_spanned: Timeline (if applicable)
  • insights: All insights in this thread

Editorial Applications

  1. Reorganize structure around discovered threads
  2. Write headnotes referencing how themes recur
  3. Add footnotes pointing to related pieces
  4. Find unused quotes that connect to themes
  5. Write bridge paragraphs between sections

Scripts

ScriptPurpose
chunk_corpus.pySplit markdown files into database
extract_insights.pyExtract insights using LLM (Gemini/Claude/Ollama)
find_threads.pyCluster insights into thematic threads

Requirements

code
google-generativeai>=0.3.0  # For Gemini
sentence-transformers>=2.2.0
networkx>=3.0
python-louvain>=0.16
scikit-learn>=1.0

Cost

Backend~1000 chunksSpeed
Gemini FlashFree tierFast
Gemini Pro~$1-2Fast
Claude Haiku~$0.50Fast
OllamaFreeSlow

Related Skills

  • Enhanced by: skill-creator
  • Feeds into: social-content-creation