Paper Style Extraction Skill

Name: paper-extract-style
Rating: 76
Author: xiaooomd

Goal: Reverse engineer academic style from PDF/TeX files and build the psmfiles/ knowledge base.

Workflow

•
Locate Source Files:
- •Look for PDF, TeX, or TXT files in the current directory or user-specified path.
- •If user requests new papers/datasets, use WebSearch to find them first.
•
Extract Text:
- •Execute the extraction script: python paper-extract-style/extract_text.py <file_path>
- •This creates ref_article/*_cleaned_body.md and ref_article/*_appendix.md.
•
Build Knowledge Base (Incremental Fusion):
- •Read: Process ref_article/*_cleaned_body.md.
- •
  Update psmfiles/lexicon_domain.md:
  - •Extract high-frequency sentence patterns for each section (Abstract, Intro, Methods, Experiments, Conclusion).
  - •Generalize: Replace specific nouns with placeholders (e.g., [Method], [Metric]).
  - •De-duplicate: Do not copy the static skeleton from LEXICON.md.
- •
  Update psmfiles/DOMAINS_Knowledge.md:
  - •Extract "Territory" statements (Industrial impact).
  - •Extract "Niche" statements (Critiques of prior work, Solved problems).
  - •Extract "Trends" and "Citations".
  - •Fusion Logic: Append new insights; merge duplicate views to reinforce arguments. Do NOT overwrite existing data.
- •
  Generate psmfiles/STYLE_GUIDE.md:
  - •Read paper-extract-style/TEMPLATES.md.
  - •Fill in the template based on observed style (Voice, Narrative Flow, Formatting).
  - •Statistics: Calculate average sentence length for each section.
  - •Appendix Check: Check *_appendix.md to define Appendix formatting standards.
•
Completion:
- •Report the location of generated assets in psmfiles/.