Julia Performance Tips
Essential performance optimization guidelines for Julia code. Reference: https://docs.julialang.org/en/v1/manual/performance-tips/
Core Principles
Functions and Globals
- •Put performance-critical code in functions - code inside functions runs faster than top-level code
- •Avoid untyped global variables - use
constfor globals, or pass as function arguments - •Break functions into multiple definitions - prefer
f(x::Vector) = ...overif isa(x, Vector) ... end
Type Stability
- •Write type-stable functions - return consistent types: use
zero(x)not0,oneunit(x)not1 - •Avoid changing variable types - initialize with correct type:
x::Float64 = 1notx = 1thenx /= ... - •Use function barriers - separate type-unstable setup from type-stable computation
Type Annotations
- •Avoid abstract type parameters - prefer
Vector{Float64}overVector{Real} - •Use parametric types for struct fields -
struct MyType{T} a::T endnotstruct MyType a::AbstractFloat end - •Annotate values from untyped locations -
x = a[1]::Int32when working withVector{Any} - •Force specialization when needed -
f(t::Type{T}) where Tnotf(t::Type)forType,Function,Vararg
Memory Management
- •Pre-allocate outputs - use in-place functions
f!(out, args...)and pre-allocateout - •Use views for slices -
@viewsorview()instead ofarray[1:5, :]when possible - •Fuse vectorized operations -
@. 3x^2 + 4xfuses into single loop,3x.^2 + 4xcreates temporaries - •Unfuse when recomputing - if broadcast recomputes constant values, pre-compute:
let s = sqrt.(d); x ./= s end - •Access arrays column-major - inner loop should vary first index:
for col, rownotfor row, col - •Copy irregular views when beneficial - copying non-contiguous views can speed up repeated operations
Closures
- •Type-annotate captured variables -
r::Int = r0in closure scope - •Use
letblocks -f = let r = r; x -> x * r endavoids boxing - •Use
@__FUNCTION__for recursive closures -(@__FUNCTION__)(n-1)instead offib(n-1)
Advanced Types
- •Use
Valfor compile-time values -f(::Val{N}) where Nwhen dimension known at compile time - •Avoid excessive type parameters - only use values-as-parameters when processing homogeneous collections
Performance Tools
- •
@time- measure time and allocations (ignore first run, it's compilation) - •
@code_warntype- find type instabilities (red = non-concrete types) - •
@allocated- measure memory allocations - •Profiling - use Profile.jl or ProfileView.jl for bottlenecks
- •JET.jl - static analysis for performance issues
- •
--track-allocation=user- find allocation sources
Performance Annotations
- •
@inbounds- disable bounds checking (use with caution) - •
@fastmath- allow floating-point optimizations (may change results) - •
@simd- promise independent loop iterations (experimental, use carefully)
Miscellaneous
- •Avoid unnecessary arrays -
x+y+znotsum([x,y,z]) - •Use
abs2for complex numbers -abs2(z)notabs(z)^2 - •Use
div,fld,cld- nottrunc(x/y),floor(x/y),ceil(x/y) - •Fix deprecation warnings - they add lookup overhead
- •Avoid string interpolation for I/O -
println(file, a, " ", b)notprintln(file, "$a $b") - •Use
LazyStringfor conditional strings -lazy"..."for error paths - •Set
OPENBLAS_NUM_THREADS=1when usingJULIA_NUM_THREADS>1for multithreaded code
Package Performance
- •Use PrecompileTools.jl - reduce time-to-first-execution
- •Minimize dependencies - use package extensions for optional features
- •Avoid heavy
__init__()- minimize compilation in initialization - •Use
@time_imports- diagnose slow package loading