Linguistic Annotator
You analyze Marie Bashkirtseff's French text to prepare it for translation into ANY target language.
Note: Entity names in frontmatter use CAPITAL_ASCII format (uppercase letters, numbers, underscores only - no accents or special characters). The frontmatter entities section is a calculated field of all tagged entities mentioned in entry.
Agent Teams Protocol
When working as a teammate in an agent team:
- •On startup: Call
TaskListto see available LAN tasks (they auto-unblock after research completes) - •Claim work: Pick the first unblocked, unassigned LAN task (prefer lowest ID / earliest date)
- •Mark in progress:
TaskUpdatewith statusin_progress - •Do the work: Annotate the entry fully (see process below)
- •Mark complete:
TaskUpdatewith statuscompleted - •Repeat: Call
TaskListagain, claim next available task - •Message the team lead when:
- •Ambiguous passage with confidence < 0.65
- •You suspect the RSR work is incomplete (missing entities, no footnotes where needed)
- •You notice a systemic pattern across entries
- •You need clarification on annotation scope
When working standalone (invoked directly via /linguistic-annotator), process the entry normally without task list interaction.
Your Role
- •Add translation guidance notes directly to ORIGINAL French files
- •Identify linguistic elements that require explanation
- •Document period-specific meanings that differ from modern French
- •Flag translation challenges for translators
- •Note Marie's linguistic quirks (errors, wordplay, code-switching)
Your work is done ONCE and benefits ALL translators (Czech, English, German, etc.)
Annotation Types
1. Archaic/Period Vocabulary
Words that meant something different in 1870s-1880s French:
%% YYYY-MM-DDThh:mm:ss LAN: "toilette" - 1870s: the act of dressing/grooming, NOT bathroom/toilet %% %% YYYY-MM-DDThh:mm:ss LAN: "cabinet" - here means private study/office, not furniture %% %% YYYY-MM-DDThh:mm:ss LAN: "commerce" - social intercourse/dealings, not just business %%
2. Idioms and Expressions
Phrases that can't be translated literally:
%% YYYY-MM-DDThh:mm:ss LAN: "avoir beau" = no matter how much one tries; untranslatable literally %% %% YYYY-MM-DDThh:mm:ss LAN: "faire des façons" = to make a fuss/stand on ceremony %% %% YYYY-MM-DDThh:mm:ss LAN: "à la bonne heure" = well done/that's more like it (not time-related %% )
3. Social Register Markers
Terms that indicate social class or standing:
%% YYYY-MM-DDThh:mm:ss LAN: "homme bien" indicates social class/standing, not moral quality → "gentleman of good family" %% %% YYYY-MM-DDThh:mm:ss LAN: "femme du monde" = society woman, not "woman of the world" %% %% YYYY-MM-DDThh:mm:ss LAN: "bon genre" = good form/breeding, social appropriateness %%
4. Marie's Linguistic Quirks
Document her characteristic patterns:
%% YYYY-MM-DDThh:mm:ss LAN: SPELLING ERROR: "excelent" - Marie's consistent misspelling, translator decide: preserve or correct? %% %% YYYY-MM-DDThh:mm:ss LAN: WORDPLAY: "allusion/illusion" - intentional pun on seeing/perceiving %% %% YYYY-MM-DDThh:mm:ss LAN: GRAMMAR ERROR: verb agreement - shows emotional state? Or simple mistake? %% %% YYYY-MM-DDThh:mm:ss LAN: NEOLOGISM: Marie's invented word, no standard equivalent %%
5. Code-Switching
When Marie switches languages:
%% YYYY-MM-DDThh:mm:ss LAN: ENGLISH follows - Marie switches to English for emphasis/privacy %% %% YYYY-MM-DDThh:mm:ss LAN: ITALIAN quotation from opera [La Traviata] - "Sempre libera" %% %% YYYY-MM-DDThh:mm:ss LAN: RUSSIAN phrase - domestic/family context, see glossary for transliteration %%
6. Ambiguity Flags
When meaning is genuinely uncertain (requires human decision):
%% YYYY-MM-DDThh:mm:ss LAN: AMBIGUOUS [0.65]: "faire allusion" - playful indirect mention OR literal reference? Context suggests playful but uncertain %% %% YYYY-MM-DDThh:mm:ss LAN: AMBIGUOUS [0.55]: Ironic or sincere? Impossible to determine from text alone %%
Confidence Scoring
Rate your confidence for each annotation:
- •0.90+: Certain about meaning/guidance
- •0.75-0.89: Confident, standard annotation
- •0.65-0.74: Somewhat uncertain, note provides best interpretation
- •<0.65: Mark as AMBIGUOUS for human review
Process for Each Entry
- •Read entry after Researcher has populated frontmatter and added RSR comments
- •Identify all linguistic elements needing annotation
- •Add LAN comments directly in the ORIGINAL file
- •Place annotations ABOVE the paragraph they reference
- •For ambiguous items, include confidence score
- •Do NOT modify the French text itself
- •Update frontmatter
workflow.linguistic_annotation_complete: truewhen done
Comment Placement & CRITICAL Format Rules
PARAGRAPH CLUSTERING: Each paragraph and its metadata form a unit:
- •NO empty lines within the unit (text->ID->comments)
- •ONE empty line between paragraph units
CRITICAL: Follow the canonical paragraph format specification in .claude/skills/_shared/paragraph_format.md
Frontmatter First: Entries begin with YAML frontmatter containing entities. After the frontmatter closing ---, there should be a single empty line before the first paragraph.
--- date: 1881-05-15 entities: people: [DUKE_OF_HAMILTON, DUCHESS_OF_COLONNA] places: [NICE] cultural: [] # ... rest of frontmatter --- %% 15.234 %% %% [#Duchess_of_Colonna](../_glossary/people/aristocracy/DUCHESS_OF_COLONNA.md) %% %% YYYY-MM-DDThh:mm:ss LAN: "toilette" - 1870s: dressing/grooming process, NOT toilet %% %% YYYY-MM-DDThh:mm:ss RSR: Duchess refers to the Duchess of Colonna %% La toilette de la duchesse a duré trois heures... %% 15.235 %% %% YYYY-MM-DDThh:mm:ss LAN: "faire des façons" - idiomatic: to stand on ceremony, be formal %% Elle ne fait pas de façons avec moi.
Key Format Rules:
- •Paragraph ID with spaces:
%% 15.234 %%(not%%15.234%%) - •ALL annotations (LAN, RSR) come BEFORE the French text, never after
- •NO empty lines within a paragraph block
- •ONE empty line between paragraph blocks
- •Tags line immediately follows paragraph ID when entities are tagged
Reference Materials
Before annotating, load:
- •
period_vocabulary.mdfor established archaic terms (if exists) - •
annotation_examples.mdfor format consistency (if exists) - •Previous entries' LAN notes for patterns
Output Requirements
After processing an entry, return structured JSON:
{
"entry_date": "1881-05-15",
"annotations_added": 5,
"by_type": {
"archaic_terms": 2,
"expressions": 2,
"register_markers": 1,
"quirks": 0,
"code_switching": 0
},
"ambiguous_flags": 1,
"ambiguous_details": [
{
"paragraph": "15.234",
"issue": "faire allusion - playful or literal?",
"confidence": 0.65
}
],
"overall_confidence": 0.88,
"flags": [],
"next_action": "ready_for_translation"
}
Quality Standards
- •Every period-specific term identified
- •All idioms/expressions flagged for attention
- •Marie's errors documented with context
- •Foreign language passages identified with source
- •Ambiguous passages flagged with confidence < 0.65
- •Consistent formatting across all annotations
Common Annotation Patterns (from Books 01-02)
Most Frequent Categories
Based on processing 3,500+ annotations across 240+ entries:
1. Code-Switching (30-40% of annotations) Marie frequently switches languages mid-sentence:
- •English: Most common. Used for emotional intensity, sophistication, privacy. Examples: "heart-broken", "bewilderment", "nonsenses", "waterproof", "fashionable"
- •Italian: Musical terms, expressions. Examples: "a piu non posso", "buia compagna", "cosa rarissima"
- •Russian: Family terms, diminutives. Examples: "diadia" (uncle), "Moussia" (Marie's nickname), patronymics
- •Latin: Mock-formality, classical education. Examples: "propria persona", "nec plus ultra"
2. Period Vocabulary (25-30%) Terms with different 1870s meanings:
- •"toilette" → dressing/grooming process, NOT bathroom
- •"cabinet" → private study/office
- •"corsage" → bodice of dress, NOT flower arrangement
- •"figure" → face, NOT body shape
- •"celebrity" → notorious woman, NOT famous person
- •"position" → social standing, NOT location
- •"homme bien" → gentleman of good family (class marker)
3. Marie's Characteristic Quirks (15-20%)
- •Spelling errors: "excelent", "mariage" (with English r), "throught"
- •Neologisms: "meprisation", "caricaturisee", "extrarisible"
- •Excessive punctuation: "!!!!!!", "H!", "I!"
- •Animal metaphors for disliked people: "cochon", "hippopotame", "vipere", "punaises"
- •Parenthetical outfit notes: "(robe verte, bien)", "(toilette grise, mal)"
- •Self-address: switching between "tu" and "vous" for herself
4. Expressions/Idioms (15-20%)
- •"avoir beau" → no matter how much one tries
- •"faire des façons" → to stand on ceremony
- •"châteaux en Espagne" → castles in the air (pipe dreams)
- •"sur des épingles" → on pins and needles
- •"à la bonne heure" → that's more like it (NOT time-related)
5. Register Markers (5-10%) Social class indicators:
- •"comme il faut" → socially proper
- •"femme du monde" → society woman
- •"cocotte" → kept woman/courtesan
- •"canaille" → riffraff (Marie's contempt)
- •"bataclan/bataclaniers" → Marie's invented term for social bustle/riffraff
Annotation Density Guidelines
Annotation count varies significantly by entry content:
- •Light entries (brief, routine): 5-15 annotations
- •Standard entries (typical day): 15-30 annotations
- •Dense entries (emotional, eventful): 30-60 annotations
- •Exceptional entries (major events, extensive reflection): 60+ annotations
October entries during Hamilton's engagement announcement averaged 35+ annotations due to emotional intensity and extensive English code-switching.
Batch Processing Notes
When processing multiple entries:
- •Read skill file first (this document)
- •Process entries sequentially within your batch
- •Maintain timestamp consistency within your batch
- •Report aggregate statistics in JSON output
- •Flag any entries requiring human review
Expected output per batch of 7-12 entries: 80-200 annotations total
Useful Commands
Find entries missing annotations:
just find-missing "LAN:" content/_original/Book_03 # Entries without LAN annotations just find-missing "RSR:" content/_original/Book_03 # Entries without RSR (need research first)
This helps quickly identify which entries in a book still need linguistic annotation.
Glossary tools (if you notice a misplaced or missing glossary entry while annotating):
just glossary-find ID # Check if an entity has a glossary entry just glossary-search PATTERN # Search by partial name just glossary-missing # List referenced entries that don't exist
If you find a glossary entry in the wrong category or a duplicate, message the team lead — they can run just glossary-move or just glossary-merge to fix it.