AgentSkillsCN

parser-development

在 Biome 中,为新语言构建具备错误恢复功能的解析器指南。适用于为 JavaScript、CSS、JSON、HTML、GraphQL 创建解析器,或为新增语言提供支持时使用。示例:<example>用户需为新语言添加解析支持</example><example>用户希望在解析器中集成错误恢复机制</example><example>用户正以 .ungram 格式编写语法规则</example>

SKILL.md
--- frontmatter
name: parser-development
description: Guide for implementing parsers with error recovery for new languages in Biome. Use when creating parsers for JavaScript, CSS, JSON, HTML, GraphQL, or adding new language support. Examples:<example>User needs to add parsing support for a new language</example><example>User wants to implement error recovery in parser</example><example>User is writing grammar definitions in .ungram format</example>

Purpose

Use this skill when creating or modifying Biome's parsers. Covers grammar authoring with ungrammar, lexer implementation, error recovery strategies, and list parsing patterns.

Prerequisites

  1. Install required tools: just install-tools
  2. Understand the language syntax you're implementing
  3. Read crates/biome_parser/CONTRIBUTING.md for detailed concepts

Common Workflows

Create Grammar for New Language

Create a .ungram file in xtask/codegen/ (e.g., html.ungram):

code
// html.ungram
// Legend:
//   Name =                -- non-terminal definition
//   'ident'               -- token (terminal)
//   A B                   -- sequence
//   A | B                 -- alternation
//   A*                    -- zero or more repetition
//   (A (',' A)* ','?)     -- repetition with separator and optional trailing comma
//   A?                    -- zero or one repetition
//   label:A               -- suggested name for field

HtmlRoot = element*

HtmlElement =
  '<'
  tag_name: HtmlName
  attributes: HtmlAttributeList
  '>'
  children: HtmlElementList
  '<' '/' close_tag_name: HtmlName '>'

HtmlAttributeList = HtmlAttribute*

HtmlAttribute =
  | HtmlSimpleAttribute
  | HtmlBogusAttribute

HtmlSimpleAttribute =
  name: HtmlName
  '='
  value: HtmlString

HtmlBogusAttribute = /* error recovery node */

Naming conventions:

  • Prefix all nodes with language name: HtmlElement, CssRule
  • Unions start with Any: AnyHtmlAttribute
  • Error recovery nodes use Bogus: HtmlBogusAttribute
  • Lists end with List: HtmlAttributeList
  • Lists are mandatory (never optional), empty by default

Generate Parser from Grammar

shell
# Generate for specific language
just gen-grammar html

# Generate for multiple languages
just gen-grammar html css

# Generate all grammars
just gen-grammar

This creates:

  • biome_html_syntax/src/generated/ - Node definitions
  • biome_html_factory/src/generated/ - Node construction helpers
  • Parser skeleton files (you'll implement the actual parsing logic)

Implement a Lexer

Create lexer/mod.rs in your parser crate:

rust
use biome_html_syntax::HtmlSyntaxKind;
use biome_parser::{lexer::Lexer, ParseDiagnostic};

pub(crate) struct HtmlLexer<'source> {
    source: &'source str,
    position: usize,
    current_kind: HtmlSyntaxKind,
    diagnostics: Vec<ParseDiagnostic>,
}

impl<'source> Lexer<'source> for HtmlLexer<'source> {
    const NEWLINE: Self::Kind = HtmlSyntaxKind::NEWLINE;
    const WHITESPACE: Self::Kind = HtmlSyntaxKind::WHITESPACE;
    
    type Kind = HtmlSyntaxKind;
    type LexContext = ();
    type ReLexContext = ();

    fn source(&self) -> &'source str {
        self.source
    }

    fn current(&self) -> Self::Kind {
        self.current_kind
    }

    fn position(&self) -> usize {
        self.position
    }

    fn advance(&mut self, context: Self::LexContext) -> Self::Kind {
        // Implement token scanning logic
        let start = self.position;
        let kind = self.read_next_token();
        self.current_kind = kind;
        kind
    }
    
    // Implement other required methods...
}

Implement Token Source

rust
use biome_parser::lexer::BufferedLexer;
use biome_html_syntax::HtmlSyntaxKind;
use crate::lexer::HtmlLexer;

pub(crate) struct HtmlTokenSource<'src> {
    lexer: BufferedLexer<HtmlSyntaxKind, HtmlLexer<'src>>,
}

impl<'source> TokenSourceWithBufferedLexer<HtmlLexer<'source>> for HtmlTokenSource<'source> {
    fn lexer(&mut self) -> &mut BufferedLexer<HtmlSyntaxKind, HtmlLexer<'source>> {
        &mut self.lexer
    }
}

Write Parse Rules

Example: Parsing an if statement:

rust
use biome_parser::prelude::*;
use biome_js_syntax::JsSyntaxKind::*;

fn parse_if_statement(p: &mut JsParser) -> ParsedSyntax {
    // Presence test - return Absent if not at 'if'
    if !p.at(T![if]) {
        return Absent;
    }

    let m = p.start();

    // Parse required tokens
    p.expect(T![if]);
    p.expect(T!['(']);
    
    // Parse required nodes with error recovery
    parse_any_expression(p).or_add_diagnostic(p, expected_expression);
    
    p.expect(T![')']);
    parse_block_statement(p).or_add_diagnostic(p, expected_block);
    
    // Parse optional else clause
    if p.at(T![else]) {
        parse_else_clause(p).ok();
    }

    Present(m.complete(p, JS_IF_STATEMENT))
}

Parse Lists with Error Recovery

Use ParseSeparatedList for comma-separated lists:

rust
struct ArrayElementsList;

impl ParseSeparatedList for ArrayElementsList {
    type ParsedElement = CompletedMarker;

    fn parse_element(&mut self, p: &mut Parser) -> ParsedSyntax<Self::ParsedElement> {
        parse_array_element(p)
    }

    fn is_at_list_end(&self, p: &mut Parser) -> bool {
        // Stop at array closing bracket or file end
        p.at(T![']']) || p.at(EOF)
    }

    fn recover(
        &mut self,
        p: &mut Parser,
        parsed_element: ParsedSyntax<Self::ParsedElement>,
    ) -> RecoveryResult {
        parsed_element.or_recover(
            p,
            &ParseRecoveryTokenSet::new(
                JS_BOGUS_EXPRESSION,
                token_set![T![']'], T![,]]
            ),
            expected_array_element,
        )
    }
    
    fn separating_element_kind(&mut self) -> JsSyntaxKind {
        T![,]
    }
}

// Use the list parser
fn parse_array_elements(p: &mut Parser) -> CompletedMarker {
    let m = p.start();
    ArrayElementsList.parse_list(p);
    m.complete(p, JS_ARRAY_ELEMENT_LIST)
}

Implement Error Recovery

Error recovery wraps invalid tokens in BOGUS nodes:

rust
// Recovery set includes:
// - List terminator tokens (e.g., ']', '}')
// - Statement terminators (e.g., ';')
// - List separators (e.g., ',')
let recovery_set = token_set![T![']'], T![,], T![;]];

parsed_element.or_recover(
    p,
    &ParseRecoveryTokenSet::new(JS_BOGUS_EXPRESSION, recovery_set),
    expected_expression_error,
)

Handle Conditional Syntax

For syntax only valid in certain contexts (e.g., strict mode):

rust
fn parse_with_statement(p: &mut Parser) -> ParsedSyntax {
    if !p.at(T![with]) {
        return Absent;
    }

    let m = p.start();
    p.bump(T![with]);
    parenthesized_expression(p).or_add_diagnostic(p, expected_expression);
    parse_statement(p).or_add_diagnostic(p, expected_statement);
    
    let with_stmt = m.complete(p, JS_WITH_STATEMENT);

    // Mark as invalid in strict mode
    let conditional = StrictMode.excluding_syntax(p, with_stmt, |p, marker| {
        p.err_builder(
            "`with` statements are not allowed in strict mode",
            marker.range(p)
        )
    });

    Present(conditional.or_invalid_to_bogus(p))
}

Test Parser

Create test files in tests/:

code
crates/biome_html_parser/tests/
├── html_specs/
│   ├── ok/
│   │   ├── simple_element.html
│   │   └── nested_elements.html
│   └── error/
│       ├── unclosed_tag.html
│       └── invalid_syntax.html
└── html_test.rs

Run tests:

shell
cd crates/biome_html_parser
cargo test

Tips

  • Presence test: Always return Absent if the first token doesn't match - never progress parsing before returning Absent
  • Required vs optional: Use p.expect() for required tokens, p.eat() for optional ones
  • Missing markers: Use .or_add_diagnostic() for required nodes to add missing markers and errors
  • Error recovery: Include list terminators, separators, and statement boundaries in recovery sets
  • Bogus nodes: Check grammar for which BOGUS_* node types are valid in your context
  • Checkpoints: Use p.checkpoint() to save state and p.rewind() if parsing fails
  • Lookahead: Use p.at() to check tokens, p.nth_at() for lookahead beyond current token
  • Lists are mandatory: Always create list nodes even if empty - use parse_list() not parse_list().ok()

Common Patterns

rust
// Optional token
if p.eat(T![async]) {
    // handle async
}

// Required token with error
p.expect(T!['{']);

// Optional node
parse_type_annotation(p).ok();

// Required node with error
parse_expression(p).or_add_diagnostic(p, expected_expression);

// Lookahead
if p.at(T![if]) || p.at(T![for]) {
    // handle control flow
}

// Checkpoint for backtracking
let checkpoint = p.checkpoint();
if parse_something(p).is_absent() {
    p.rewind(checkpoint);
    parse_something_else(p);
}

References

  • Full guide: crates/biome_parser/CONTRIBUTING.md
  • Grammar examples: xtask/codegen/*.ungram
  • Parser examples: crates/biome_js_parser/src/syntax/
  • Error recovery: Search for ParseRecoveryTokenSet in existing parsers