AgentSkillsCN

text-summarizer

从长文本文档生成抽取式摘要。控制摘要长度、提取关键句子并处理多个文档。

SKILL.md
--- frontmatter
name: text-summarizer
description: Generate extractive summaries from long text documents. Control summary length, extract key sentences, and process multiple documents.

Text Summarizer

Create concise summaries from long text documents using extractive summarization. Identifies and extracts the most important sentences while preserving meaning.

Quick Start

python
from scripts.text_summarizer import TextSummarizer

# Summarize text
summarizer = TextSummarizer()
summary = summarizer.summarize(long_text, ratio=0.2)  # 20% of original
print(summary)

# Summarize file
summary = summarizer.summarize_file("article.txt", num_sentences=5)

Features

  • Extractive Summarization: Selects key sentences from original text
  • Length Control: By ratio, sentence count, or word count
  • Multiple Algorithms: TextRank, LSA, frequency-based
  • Key Points: Extract bullet-point summaries
  • Batch Processing: Summarize multiple documents
  • Preserve Structure: Maintains sentence order option

API Reference

Initialization

python
summarizer = TextSummarizer(
    method="textrank",    # textrank, lsa, frequency
    language="english"
)

Summarization

python
# By ratio (20% of original length)
summary = summarizer.summarize(text, ratio=0.2)

# By sentence count
summary = summarizer.summarize(text, num_sentences=5)

# By word count
summary = summarizer.summarize(text, max_words=100)

Key Points Extraction

python
# Get bullet points
points = summarizer.extract_key_points(text, num_points=5)
for point in points:
    print(f"• {point}")

Batch Processing

python
# Summarize multiple texts
texts = [text1, text2, text3]
summaries = summarizer.summarize_batch(texts, ratio=0.2)

# Summarize files in directory
summaries = summarizer.summarize_directory("./articles/", ratio=0.3)

Options

python
# Preserve original sentence order
summary = summarizer.summarize(text, preserve_order=True)

# Include title/first sentence
summary = summarizer.summarize(text, include_first=True)

# Minimum sentence length filter
summarizer.min_sentence_length = 10

CLI Usage

bash
# Summarize text file
python text_summarizer.py --input article.txt --ratio 0.2

# Specific sentence count
python text_summarizer.py --input article.txt --sentences 5

# Extract key points
python text_summarizer.py --input article.txt --points 5

# Batch process
python text_summarizer.py --input-dir ./docs --output-dir ./summaries --ratio 0.3

# Output to file
python text_summarizer.py --input article.txt --output summary.txt --ratio 0.2

CLI Arguments

ArgumentDescriptionDefault
--inputInput file pathRequired
--outputOutput file pathstdout
--input-dirDirectory of files-
--output-dirOutput directory-
--ratioSummary ratio (0.0-1.0)0.2
--sentencesNumber of sentences-
--wordsMaximum words-
--pointsExtract N key points-
--methodAlgorithm to usetextrank
--preserve-orderKeep sentence orderFalse

Examples

News Article Summary

python
summarizer = TextSummarizer()

article = """
[Long news article text...]
"""

# Get a 3-sentence summary
summary = summarizer.summarize(article, num_sentences=3)
print("Summary:")
print(summary)

# Get key points
points = summarizer.extract_key_points(article, num_points=5)
print("\nKey Points:")
for i, point in enumerate(points, 1):
    print(f"{i}. {point}")

Research Paper Abstract

python
summarizer = TextSummarizer(method="lsa")

paper = open("research_paper.txt").read()

# Create abstract-length summary
abstract = summarizer.summarize(paper, max_words=250)
print(abstract)

Meeting Notes Summary

python
summarizer = TextSummarizer()

notes = """
Meeting started at 2pm. John presented Q3 results showing 15% growth.
Sarah raised concerns about supply chain delays affecting Q4 projections.
The team discussed mitigation strategies including dual-sourcing.
Budget allocation for marketing was approved at $50k.
Next steps include vendor outreach by Friday.
Follow-up meeting scheduled for next Tuesday.
"""

summary = summarizer.summarize(notes, num_sentences=3)
points = summarizer.extract_key_points(notes, num_points=4)

print("Summary:", summary)
print("\nAction Items:")
for point in points:
    print(f"• {point}")

Batch Document Summarization

python
summarizer = TextSummarizer()

import os
for filename in os.listdir("./documents"):
    if filename.endswith(".txt"):
        text = open(f"./documents/{filename}").read()
        summary = summarizer.summarize(text, ratio=0.2)

        with open(f"./summaries/{filename}", "w") as f:
            f.write(summary)

        print(f"Summarized: {filename}")

Algorithm Comparison

AlgorithmSpeedQualityBest For
TextRankMediumHighGeneral text
LSAFastGoodTechnical docs
FrequencyFastMediumQuick summaries

Dependencies

code
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.2.0

Limitations

  • Extractive only (doesn't paraphrase or generate new text)
  • Works best with well-structured text (paragraphs, clear sentences)
  • Very short texts may not summarize well
  • Doesn't understand context deeply (may miss nuance)