IterableData Development
Quick Setup
bash
pip install -e ".[dev]" pytest --verbose ruff check iterable tests ruff format iterable tests
Code Style
- •Python 3.10+ with type hints where appropriate
- •Max line length: 120 characters
- •Use
rufffor linting and formatting - •Double quotes for strings consistently
- •Always use context managers for file operations
- •Import order: standard library, third-party, local imports
Project Structure
- •
iterable/helpers/- Utility functions (detect, schema, utils) - •
iterable/datatypes/- Format-specific implementations - •
iterable/codecs/- Compression codec implementations - •
iterable/engines/- Processing engines (DuckDB, internal) - •
iterable/convert/- Format conversion utilities - •
iterable/pipeline/- Data pipeline processing - •
tests/- Test suite (one test file per format/feature)
Import Patterns
- •Main entry:
from iterable.helpers.detect import open_iterable - •Format-specific:
from iterable.datatypes.csv import CSVIterable - •Codecs:
from iterable.codecs.gzipcodec import GZIPCodec - •Always use
open_iterable()for user-facing code
File Handling
- •Always use context managers:
with open_iterable('file.csv') as source: - •Never call
.close()when usingwithstatements - •Reset iterators with
.reset()method when needed - •Handle compression automatically via filename detection
Error Handling
- •Format detection failures: provide helpful error messages
- •Missing optional dependencies: raise clear ImportError with installation instructions
- •Invalid file formats: raise appropriate exceptions (ValueError, TypeError)
- •Always handle file I/O errors gracefully
Code Conventions
- •Use
open_iterable()for automatic format detection - •Prefer bulk operations (
read_bulk,write_bulk) for performance - •Use DuckDB engine when appropriate (CSV, JSONL files)
- •Handle encoding automatically via
chardetor user specification
Pre-Commit Checks
Before committing:
- •Run tests:
pytest --verbose - •Run linter:
ruff check iterable tests - •Format code:
ruff format iterable tests - •Type check:
mypy iterable(warnings allowed)
Quality Tools
- •Security:
bandit -r iterable -ll - •Dead code:
vulture iterable --min-confidence 80 - •Complexity:
radon cc iterable --min B - •Coverage:
pytest --cov=iterable --cov-report=html
Known Constraints
- •DuckDB engine supports: CSV, JSONL, JSON formats and GZIP, ZStandard codecs
- •Large files: use streaming (iterator interface) to avoid memory issues
- •XML parsing: requires
iterableargs={'tagname': 'item'} - •Some formats require optional dependencies (see
pyproject.toml)