Overview
Indexing in NumPy ranges from basic slicing (zero-copy) to advanced "fancy" indexing (always creates a copy). Understanding the distinction is vital for memory management and avoiding unintended side effects in data analysis.
When to Use
- •Extracting sub-regions of arrays for processing.
- •Filtering data based on complex conditional logic (boolean masking).
- •Selecting arbitrary elements using coordinate lists.
- •Managing memory when dealing with large datasets that have small regions of interest.
Decision Tree
- •Do you need a view or a copy?
- •View: Use basic slicing (
arr[0:5]). - •Copy: Use advanced indexing (
arr[[0, 1, 2]]) or.copy().
- •View: Use basic slicing (
- •Are you filtering by value?
- •Use a boolean mask:
arr[arr > threshold].
- •Use a boolean mask:
- •Selecting a grid of values across axes?
- •Use
np.ix_to construct the selection mesh.
- •Use
Workflows
- •
Filtering Data with Boolean Masks
- •Apply a comparison operator (e.g.,
x > 0) to an array to create a boolean mask. - •Pass the mask into the array's indexing brackets:
x[mask]. - •Operate on the resulting array (note that this is a copy, not a view).
- •Apply a comparison operator (e.g.,
- •
Memory-Efficient Sub-array Extraction
- •Slice a small portion from a large ndarray.
- •Call
.copy()on the slice to create a new independent array. - •Delete the original large array to free system memory.
- •
Cross-Axis Selection with np.ix_
- •Define row indices and column indices as separate lists.
- •Pass them into
np.ix_to construct the appropriate broadcasting meshes. - •Apply the resulting objects to the array to select a sub-grid of values.
Non-Obvious Insights
- •Memory Leak Risks: Small views of large arrays prevent garbage collection of the entire base array; always copy small slices of massive data.
- •Copy vs. View Rule: Basic slicing always returns a view; advanced indexing (using non-tuple sequences or arrays) always returns a copy.
- •Adjacent Indexing: Mixing basic and advanced indexing behavior changes significantly based on whether the advanced indices are adjacent in the index tuple.
Evidence
- •"All arrays generated by basic slicing are always views of the original array." Source
- •"Advanced indexing always returns a copy of the data (contrast with basic slicing that returns a view)." Source
Scripts
- •
scripts/numpy-indexing_tool.py: Demonstrates boolean masking and sub-array extraction. - •
scripts/numpy-indexing_tool.js: Simulated coordinate selection logic.
Dependencies
- •
numpy(Python)