Light Curve Preprocessing
Preprocessing is essential before period analysis. Raw light curves often contain outliers, long-term trends, and instrumental effects that can mask or create false periodic signals.
Overview
Common preprocessing steps:
- •Remove outliers
- •Remove long-term trends
- •Handle data quality flags
- •Remove stellar variability (optional)
Outlier Removal
Using Lightkurve
import lightkurve as lk # Remove outliers using sigma clipping lc_clean, mask = lc.remove_outliers(sigma=3, return_mask=True) outliers = lc[mask] # Points that were removed # Common sigma values: # sigma=3: Standard (removes ~0.3% of data) # sigma=5: Conservative (removes fewer points) # sigma=2: Aggressive (removes more points)
Manual Outlier Removal
import numpy as np # Calculate median and standard deviation median = np.median(flux) std = np.std(flux) # Remove points beyond 3 sigma good = np.abs(flux - median) < 3 * std time_clean = time[good] flux_clean = flux[good] error_clean = error[good]
Removing Long-Term Trends
Flattening with Lightkurve
# Flatten to remove low-frequency variability # window_length: number of cadences to use for smoothing lc_flat = lc_clean.flatten(window_length=500) # Common window lengths: # 100-200: Remove short-term trends # 300-500: Remove medium-term trends (typical for TESS) # 500-1000: Remove long-term trends
The flatten() method uses a Savitzky-Golay filter to remove trends while preserving transit signals.
Iterative Sine Fitting
For removing high-frequency stellar variability (rotation, pulsation):
def sine_fitting(lc):
"""Remove dominant periodic signal by fitting sine wave."""
pg = lc.to_periodogram()
model = pg.model(time=lc.time, frequency=pg.frequency_at_max_power)
lc_new = lc.copy()
lc_new.flux = lc_new.flux / model.flux
return lc_new, model
# Iterate multiple times to remove multiple periodic components
lc_processed = lc_clean.copy()
for i in range(50): # Number of iterations
lc_processed, model = sine_fitting(lc_processed)
Warning: This removes periodic signals, so use carefully if you're searching for periodic transits.
Handling Data Quality Flags
IMPORTANT: Quality flag conventions vary by data source!
Standard TESS format
# For standard TESS files (flag=0 is GOOD): good = flag == 0 time_clean = time[good] flux_clean = flux[good] error_clean = error[good]
Alternative formats
# For some exported files (flag=0 is BAD): good = flag != 0 time_clean = time[good] flux_clean = flux[good] error_clean = error[good]
Always verify your data format! Check which approach gives cleaner results.
Preprocessing Pipeline Considerations
When building a preprocessing pipeline for exoplanet detection:
Key Steps (Order Matters!)
- •Quality filtering: Apply data quality flags first
- •Outlier removal: Remove bad data points (flares, cosmic rays)
- •Trend removal: Remove long-term variations (stellar rotation, instrumental drift)
- •Optional second pass: Additional outlier removal after detrending
Important Principles
- •Always include flux_err: Critical for proper weighting in period search algorithms
- •Preserve transit shapes: Use methods like
flatten()that preserve short-duration dips - •Don't over-process: Too aggressive preprocessing can remove real signals
- •Verify visually: Plot each step to ensure quality
Parameter Selection
- •Outlier removal sigma: Lower sigma (2-3) is aggressive, higher (5-7) is conservative
- •Flattening window: Should be longer than transit duration but shorter than stellar rotation period
- •When to do two passes: Remove obvious outliers before detrending, then remove residual outliers after
Preprocessing for Exoplanet Detection
For transit detection, be careful not to remove the transit signal:
- •Remove outliers first: Use sigma=3 or sigma=5
- •Flatten trends: Use window_length appropriate for your data
- •Don't over-process: Too much smoothing can remove shallow transits
Visualizing Results
Always plot your light curve to verify preprocessing quality:
import matplotlib.pyplot as plt # Use .plot() method on LightCurve objects lc.plot() plt.show()
Best practice: Plot before and after each major step to ensure you're improving data quality, not removing real signals.
Dependencies
pip install lightkurve numpy matplotlib
References
Best Practices
- •Always check quality flags first: Remove bad data before processing
- •Remove outliers before flattening: Outliers can affect trend removal
- •Choose appropriate window length: Too short = doesn't remove trends, too long = removes transits
- •Visualize each step: Make sure preprocessing improves the data
- •Don't over-process: More preprocessing isn't always better