Format Implementation Guide
Adding New Formats
Step-by-Step Process
- •Create format file:
iterable/datatypes/<format>.py - •Implement BaseIterable: Inherit from
BaseIterableiniterable/base.py - •Required methods:
read(),write(),read_bulk(),write_bulk(), etc. - •Add detection: Update
iterable/helpers/detect.py - •Create tests:
tests/test_<format>.py - •Update dependencies: Add optional dependency to
pyproject.tomlif needed - •Update documentation: Add format to docs
Implementation Pattern
python
from iterable.base import BaseIterable
class NewFormatIterable(BaseIterable):
def __init__(self, source, mode='r', **kwargs):
super().__init__(source, mode, **kwargs)
# Initialize format-specific resources
def read(self):
# Return iterator of dict objects
pass
def write(self, data):
# Write dict objects to file
pass
def read_bulk(self, size=1000):
# Bulk read for performance
pass
def write_bulk(self, data):
# Bulk write for performance
pass
Format Detection
Update iterable/helpers/detect.py:
- •Add file extension detection
- •Add magic number detection (for binary formats)
- •Add content-based heuristics if needed
- •Update
detect_file_type()function
Example:
python
def detect_file_type(filename, content=None):
# Check extension
if filename.endswith('.newformat'):
return 'newformat'
# Check magic numbers
if content and content.startswith(b'MAGIC'):
return 'newformat'
# ... existing detection logic
Adding New Codecs
Step-by-Step Process
- •Create codec file:
iterable/codecs/<codec>codec.py - •Implement codec class:
read(),write(),close()methods - •Add detection: Update
iterable/helpers/detect.py - •Add compression detection: Update format detection logic
- •Create tests: Add to relevant test file or create new one
- •Update dependencies: Add optional dependency to
pyproject.toml
Codec Pattern
python
class NewCodec:
def __init__(self, fileobj, mode='r'):
self.fileobj = fileobj
self.mode = mode
# Initialize compression library
def read(self, size=-1):
# Decompress and return data
pass
def write(self, data):
# Compress and write data
pass
def close(self):
# Clean up resources
pass
Testing Requirements
Test File Structure
python
import pytest
from iterable.helpers.detect import open_iterable
class TestNewFormat:
def test_read(self):
# Test basic reading
pass
def test_write(self):
# Test basic writing
pass
def test_read_bulk(self):
# Test bulk operations
pass
def test_compressed(self):
# Test with compression (.gz, .bz2, .zst, etc.)
pass
def test_edge_cases(self):
# Empty files, malformed data, etc.
pass
Test Coverage
- •Basic read/write operations
- •Bulk operations
- •Compressed files (if supported)
- •Various encodings (for text formats)
- •Edge cases: empty files, malformed data
- •Missing optional dependencies (should skip gracefully)
Dependencies
Optional Dependencies
Add to pyproject.toml:
toml
[project.optional-dependencies] newformat = ["newformat-library>=1.0.0"]
Import Handling
Handle missing dependencies gracefully:
python
try:
import newformat_library
except ImportError:
raise ImportError(
"newformat support requires 'newformat-library'. "
"Install with: pip install iterabledata[newformat]"
)
Format Capabilities
Implement capability reporting:
python
def get_capabilities(self):
return {
'read': True,
'write': True,
'bulk': True,
'totals': False, # Can't count rows without reading
'streaming': True,
'tables': False, # Single table format
}
Examples
Look at existing implementations:
- •
iterable/datatypes/csv.py- Text format example - •
iterable/datatypes/parquet.py- Binary format example - •
iterable/codecs/gzipcodec.py- Compression codec example
Common Pitfalls
- •Memory issues: Use streaming for large files
- •Encoding: Handle various text encodings automatically
- •Compression: Support common codecs (gzip, bz2, zstd, etc.)
- •Error messages: Provide helpful errors for missing dependencies
- •Context managers: Always support
withstatements