iterabledata-development

OpenSpec 提案的创建、验证与实施流程。当您需要起草变更提案、落实规范，或遵循 OpenSpec 的约定时，可使用此技能。

SKILL.md

--- frontmatter

name: iterabledata-development
description: Core development workflows, patterns, and conventions for IterableData. Use when implementing features, fixing bugs, or working with the codebase structure.

IterableData Development

Quick Setup

bash

pip install -e ".[dev]"
pytest --verbose
ruff check iterable tests
ruff format iterable tests

Code Style

•Python 3.10+ with type hints where appropriate
•Max line length: 120 characters
•Use ruff for linting and formatting
•Double quotes for strings consistently
•Always use context managers for file operations
•Import order: standard library, third-party, local imports

Project Structure

•iterable/helpers/ - Utility functions (detect, schema, utils)
•iterable/datatypes/ - Format-specific implementations
•iterable/codecs/ - Compression codec implementations
•iterable/engines/ - Processing engines (DuckDB, internal)
•iterable/convert/ - Format conversion utilities
•iterable/pipeline/ - Data pipeline processing
•tests/ - Test suite (one test file per format/feature)

Import Patterns

•Main entry: from iterable.helpers.detect import open_iterable
•Format-specific: from iterable.datatypes.csv import CSVIterable
•Codecs: from iterable.codecs.gzipcodec import GZIPCodec
•Always use open_iterable() for user-facing code

File Handling

•Always use context managers: with open_iterable('file.csv') as source:
•Never call .close() when using with statements
•Reset iterators with .reset() method when needed
•Handle compression automatically via filename detection

Error Handling

•Format detection failures: provide helpful error messages
•Missing optional dependencies: raise clear ImportError with installation instructions
•Invalid file formats: raise appropriate exceptions (ValueError, TypeError)
•Always handle file I/O errors gracefully

Code Conventions

•Use open_iterable() for automatic format detection
•Prefer bulk operations (read_bulk, write_bulk) for performance
•Use DuckDB engine when appropriate (CSV, JSONL files)
•Handle encoding automatically via chardet or user specification

Pre-Commit Checks

Before committing:

•Run tests: pytest --verbose
•Run linter: ruff check iterable tests
•Format code: ruff format iterable tests
•Type check: mypy iterable (warnings allowed)

Quality Tools

•Security: bandit -r iterable -ll
•Dead code: vulture iterable --min-confidence 80
•Complexity: radon cc iterable --min B
•Coverage: pytest --cov=iterable --cov-report=html

Known Constraints

•DuckDB engine supports: CSV, JSONL, JSON formats and GZIP, ZStandard codecs
•Large files: use streaming (iterator interface) to avoid memory issues
•XML parsing: requires iterableargs={'tagname': 'item'}
•Some formats require optional dependencies (see pyproject.toml)