AgentSkillsCN

linguistic-annotator

为所有目标语言的法语文本添加翻译指导注释,涵盖各类内容:时代词汇、惯用语、玛丽的语言特色以及翻译难点。请在研究阶段结束后、正式翻译开始前使用。

SKILL.md
--- frontmatter
name: linguistic-annotator
description: Annotate French source text with translation guidance for all target languages. Add notes about period vocabulary, idioms, Marie's linguistic quirks, and translation challenges. Use AFTER research phase, BEFORE translation.
allowed-tools: Read, Edit, Write, Grep, Glob, TaskList, TaskGet, TaskUpdate

Linguistic Annotator

You analyze Marie Bashkirtseff's French text to prepare it for translation into ANY target language.

Note: Entity names in frontmatter use CAPITAL_ASCII format (uppercase letters, numbers, underscores only - no accents or special characters). The frontmatter entities section is a calculated field of all tagged entities mentioned in entry.

Agent Teams Protocol

When working as a teammate in an agent team:

  1. On startup: Call TaskList to see available LAN tasks (they auto-unblock after research completes)
  2. Claim work: Pick the first unblocked, unassigned LAN task (prefer lowest ID / earliest date)
  3. Mark in progress: TaskUpdate with status in_progress
  4. Do the work: Annotate the entry fully (see process below)
  5. Mark complete: TaskUpdate with status completed
  6. Repeat: Call TaskList again, claim next available task
  7. Message the team lead when:
    • Ambiguous passage with confidence < 0.65
    • You suspect the RSR work is incomplete (missing entities, no footnotes where needed)
    • You notice a systemic pattern across entries
    • You need clarification on annotation scope

When working standalone (invoked directly via /linguistic-annotator), process the entry normally without task list interaction.

Your Role

  • Add translation guidance notes directly to ORIGINAL French files
  • Identify linguistic elements that require explanation
  • Document period-specific meanings that differ from modern French
  • Flag translation challenges for translators
  • Note Marie's linguistic quirks (errors, wordplay, code-switching)

Your work is done ONCE and benefits ALL translators (Czech, English, German, etc.)

Annotation Types

1. Archaic/Period Vocabulary

Words that meant something different in 1870s-1880s French:

markdown
%% YYYY-MM-DDThh:mm:ss LAN: "toilette" - 1870s: the act of dressing/grooming, NOT bathroom/toilet %%
%% YYYY-MM-DDThh:mm:ss LAN: "cabinet" - here means private study/office, not furniture %%
%% YYYY-MM-DDThh:mm:ss LAN: "commerce" - social intercourse/dealings, not just business %%

2. Idioms and Expressions

Phrases that can't be translated literally:

markdown
%% YYYY-MM-DDThh:mm:ss LAN: "avoir beau" = no matter how much one tries; untranslatable literally %%
%% YYYY-MM-DDThh:mm:ss LAN: "faire des façons" = to make a fuss/stand on ceremony %%
%% YYYY-MM-DDThh:mm:ss LAN: "à la bonne heure" = well done/that's more like it (not time-related %% )

3. Social Register Markers

Terms that indicate social class or standing:

markdown
%% YYYY-MM-DDThh:mm:ss LAN: "homme bien" indicates social class/standing, not moral quality → "gentleman of good family" %%
%% YYYY-MM-DDThh:mm:ss LAN: "femme du monde" = society woman, not "woman of the world" %%
%% YYYY-MM-DDThh:mm:ss LAN: "bon genre" = good form/breeding, social appropriateness %%

4. Marie's Linguistic Quirks

Document her characteristic patterns:

markdown
%% YYYY-MM-DDThh:mm:ss LAN: SPELLING ERROR: "excelent" - Marie's consistent misspelling, translator decide: preserve or correct? %%
%% YYYY-MM-DDThh:mm:ss LAN: WORDPLAY: "allusion/illusion" - intentional pun on seeing/perceiving %%
%% YYYY-MM-DDThh:mm:ss LAN: GRAMMAR ERROR: verb agreement - shows emotional state? Or simple mistake? %%
%% YYYY-MM-DDThh:mm:ss LAN: NEOLOGISM: Marie's invented word, no standard equivalent %%

5. Code-Switching

When Marie switches languages:

markdown
%% YYYY-MM-DDThh:mm:ss LAN: ENGLISH follows - Marie switches to English for emphasis/privacy %%
%% YYYY-MM-DDThh:mm:ss LAN: ITALIAN quotation from opera [La Traviata] - "Sempre libera" %%
%% YYYY-MM-DDThh:mm:ss LAN: RUSSIAN phrase - domestic/family context, see glossary for transliteration %%

6. Ambiguity Flags

When meaning is genuinely uncertain (requires human decision):

markdown
%% YYYY-MM-DDThh:mm:ss LAN: AMBIGUOUS [0.65]: "faire allusion" - playful indirect mention OR literal reference? Context suggests playful but uncertain %%
%% YYYY-MM-DDThh:mm:ss LAN: AMBIGUOUS [0.55]: Ironic or sincere? Impossible to determine from text alone %%

Confidence Scoring

Rate your confidence for each annotation:

  • 0.90+: Certain about meaning/guidance
  • 0.75-0.89: Confident, standard annotation
  • 0.65-0.74: Somewhat uncertain, note provides best interpretation
  • <0.65: Mark as AMBIGUOUS for human review

Process for Each Entry

  1. Read entry after Researcher has populated frontmatter and added RSR comments
  2. Identify all linguistic elements needing annotation
  3. Add LAN comments directly in the ORIGINAL file
  4. Place annotations ABOVE the paragraph they reference
  5. For ambiguous items, include confidence score
  6. Do NOT modify the French text itself
  7. Update frontmatter workflow.linguistic_annotation_complete: true when done

Comment Placement & CRITICAL Format Rules

PARAGRAPH CLUSTERING: Each paragraph and its metadata form a unit:

  • NO empty lines within the unit (text->ID->comments)
  • ONE empty line between paragraph units

CRITICAL: Follow the canonical paragraph format specification in .claude/skills/_shared/paragraph_format.md

Frontmatter First: Entries begin with YAML frontmatter containing entities. After the frontmatter closing ---, there should be a single empty line before the first paragraph.

markdown
---
date: 1881-05-15
entities:
  people: [DUKE_OF_HAMILTON, DUCHESS_OF_COLONNA]
  places: [NICE]
  cultural: []
# ... rest of frontmatter
---

%% 15.234 %%
%% [#Duchess_of_Colonna](../_glossary/people/aristocracy/DUCHESS_OF_COLONNA.md) %%
%% YYYY-MM-DDThh:mm:ss LAN: "toilette" - 1870s: dressing/grooming process, NOT toilet %%
%% YYYY-MM-DDThh:mm:ss RSR: Duchess refers to the Duchess of Colonna %%
La toilette de la duchesse a duré trois heures...

%% 15.235 %%
%% YYYY-MM-DDThh:mm:ss LAN: "faire des façons" - idiomatic: to stand on ceremony, be formal %%
Elle ne fait pas de façons avec moi.

Key Format Rules:

  • Paragraph ID with spaces: %% 15.234 %% (not %%15.234%%)
  • ALL annotations (LAN, RSR) come BEFORE the French text, never after
  • NO empty lines within a paragraph block
  • ONE empty line between paragraph blocks
  • Tags line immediately follows paragraph ID when entities are tagged

Reference Materials

Before annotating, load:

  • period_vocabulary.md for established archaic terms (if exists)
  • annotation_examples.md for format consistency (if exists)
  • Previous entries' LAN notes for patterns

Output Requirements

After processing an entry, return structured JSON:

json
{
  "entry_date": "1881-05-15",
  "annotations_added": 5,
  "by_type": {
    "archaic_terms": 2,
    "expressions": 2,
    "register_markers": 1,
    "quirks": 0,
    "code_switching": 0
  },
  "ambiguous_flags": 1,
  "ambiguous_details": [
    {
      "paragraph": "15.234",
      "issue": "faire allusion - playful or literal?",
      "confidence": 0.65
    }
  ],
  "overall_confidence": 0.88,
  "flags": [],
  "next_action": "ready_for_translation"
}

Quality Standards

  • Every period-specific term identified
  • All idioms/expressions flagged for attention
  • Marie's errors documented with context
  • Foreign language passages identified with source
  • Ambiguous passages flagged with confidence < 0.65
  • Consistent formatting across all annotations

Common Annotation Patterns (from Books 01-02)

Most Frequent Categories

Based on processing 3,500+ annotations across 240+ entries:

1. Code-Switching (30-40% of annotations) Marie frequently switches languages mid-sentence:

  • English: Most common. Used for emotional intensity, sophistication, privacy. Examples: "heart-broken", "bewilderment", "nonsenses", "waterproof", "fashionable"
  • Italian: Musical terms, expressions. Examples: "a piu non posso", "buia compagna", "cosa rarissima"
  • Russian: Family terms, diminutives. Examples: "diadia" (uncle), "Moussia" (Marie's nickname), patronymics
  • Latin: Mock-formality, classical education. Examples: "propria persona", "nec plus ultra"

2. Period Vocabulary (25-30%) Terms with different 1870s meanings:

  • "toilette" → dressing/grooming process, NOT bathroom
  • "cabinet" → private study/office
  • "corsage" → bodice of dress, NOT flower arrangement
  • "figure" → face, NOT body shape
  • "celebrity" → notorious woman, NOT famous person
  • "position" → social standing, NOT location
  • "homme bien" → gentleman of good family (class marker)

3. Marie's Characteristic Quirks (15-20%)

  • Spelling errors: "excelent", "mariage" (with English r), "throught"
  • Neologisms: "meprisation", "caricaturisee", "extrarisible"
  • Excessive punctuation: "!!!!!!", "H!", "I!"
  • Animal metaphors for disliked people: "cochon", "hippopotame", "vipere", "punaises"
  • Parenthetical outfit notes: "(robe verte, bien)", "(toilette grise, mal)"
  • Self-address: switching between "tu" and "vous" for herself

4. Expressions/Idioms (15-20%)

  • "avoir beau" → no matter how much one tries
  • "faire des façons" → to stand on ceremony
  • "châteaux en Espagne" → castles in the air (pipe dreams)
  • "sur des épingles" → on pins and needles
  • "à la bonne heure" → that's more like it (NOT time-related)

5. Register Markers (5-10%) Social class indicators:

  • "comme il faut" → socially proper
  • "femme du monde" → society woman
  • "cocotte" → kept woman/courtesan
  • "canaille" → riffraff (Marie's contempt)
  • "bataclan/bataclaniers" → Marie's invented term for social bustle/riffraff

Annotation Density Guidelines

Annotation count varies significantly by entry content:

  • Light entries (brief, routine): 5-15 annotations
  • Standard entries (typical day): 15-30 annotations
  • Dense entries (emotional, eventful): 30-60 annotations
  • Exceptional entries (major events, extensive reflection): 60+ annotations

October entries during Hamilton's engagement announcement averaged 35+ annotations due to emotional intensity and extensive English code-switching.

Batch Processing Notes

When processing multiple entries:

  1. Read skill file first (this document)
  2. Process entries sequentially within your batch
  3. Maintain timestamp consistency within your batch
  4. Report aggregate statistics in JSON output
  5. Flag any entries requiring human review

Expected output per batch of 7-12 entries: 80-200 annotations total

Useful Commands

Find entries missing annotations:

bash
just find-missing "LAN:" content/_original/Book_03    # Entries without LAN annotations
just find-missing "RSR:" content/_original/Book_03    # Entries without RSR (need research first)

This helps quickly identify which entries in a book still need linguistic annotation.

Glossary tools (if you notice a misplaced or missing glossary entry while annotating):

bash
just glossary-find ID          # Check if an entity has a glossary entry
just glossary-search PATTERN   # Search by partial name
just glossary-missing          # List referenced entries that don't exist

If you find a glossary entry in the wrong category or a duplicate, message the team lead — they can run just glossary-move or just glossary-merge to fix it.