numpy-best-practices

以 NumPy 数组编程、数值计算以及性能优化为重心，践行 Python 编程的最佳实践。

SKILL.md

--- frontmatter

name: numpy-best-practices
description: Best practices for NumPy array programming, numerical computing, and performance optimization in Python

NumPy Best Practices

Expert guidelines for NumPy development, focusing on array programming, numerical computing, and performance optimization.

Code Style and Structure

•Write concise, technical Python code with accurate NumPy examples
•Prefer vectorized operations over explicit loops for performance
•Use descriptive variable names reflecting data content (e.g., weights, gradients, input_array)
•Follow PEP 8 style guidelines for Python code
•Use functional programming patterns when appropriate

Array Creation and Manipulation

•Use appropriate array creation functions: np.array(), np.zeros(), np.ones(), np.empty(), np.arange(), np.linspace()
•Prefer np.zeros() or np.empty() for pre-allocation when array size is known
•Use np.concatenate(), np.vstack(), np.hstack() for combining arrays
•Leverage broadcasting for operations on arrays with different shapes

Indexing and Slicing

•Use advanced indexing with boolean arrays for conditional selection
•Prefer views over copies when possible to save memory
•Use np.where() for conditional element selection
•Understand the difference between fancy indexing (creates copy) and basic slicing (creates view)

Data Types

•Specify appropriate data types explicitly using dtype parameter
•Use np.float32 for memory-efficient computations when full precision is not needed
•Be aware of integer overflow with fixed-size integer types
•Use np.asarray() for type conversion without unnecessary copies

Performance Optimization

Vectorization

•Always prefer vectorized operations over Python loops
•Use NumPy universal functions (ufuncs) for element-wise operations
•Leverage np.einsum() for complex tensor operations
•Use np.dot() or @ operator for matrix multiplication

Memory Management

•Use np.ndarray.flags to check memory layout (C-contiguous vs Fortran-contiguous)
•Prefer in-place operations with out parameter when possible
•Use memory-mapped arrays (np.memmap) for large datasets
•Be mindful of array copies vs views

Computation Efficiency

•Use np.sum(), np.mean(), np.std() with axis parameter for aggregations
•Leverage np.cumsum(), np.cumprod() for cumulative operations
•Use np.searchsorted() for efficient sorted array operations

Error Handling and Validation

•Validate input shapes and data types before computations
•Use assertions for dimension checking with informative messages
•Handle NaN and Inf values appropriately with np.isnan(), np.isinf()
•Use np.errstate() context manager for controlling floating-point error handling

Random Number Generation

•Use np.random.default_rng() for modern random number generation
•Set seeds for reproducibility: rng = np.random.default_rng(seed=42)
•Prefer the new Generator API over legacy np.random functions
•Use appropriate distributions: rng.normal(), rng.uniform(), rng.choice()

Linear Algebra

•Use np.linalg for linear algebra operations
•Leverage np.linalg.solve() instead of computing inverse for linear systems
•Use np.linalg.eig(), np.linalg.svd() for decompositions
•Check matrix condition with np.linalg.cond() before inversion

Testing and Documentation

•Write unit tests using pytest with np.testing assertions
•Use np.testing.assert_array_equal() for exact comparisons
•Use np.testing.assert_array_almost_equal() for floating-point comparisons
•Include comprehensive docstrings following NumPy docstring format

Key Conventions

•Import as import numpy as np
•Use snake_case for variables and functions
•Document array shapes in docstrings
•Profile code with %timeit to identify bottlenecks