Text Processing (Awk & Sed)
Advanced manipulation of text streams using the classic Unix power tools: awk and sed.
Knowledge
- •
The Power of Awk (Data Extraction)
code* **Philosophy:** Awk is a data-driven scripting language. It operates on records (lines) and fields (columns). * **Structure:** `pattern { action }`. If pattern is true, perform action. * **Variables:** `$0` (Whole line), `$1` (First field), `NR` (Line number), `NF` (Field count), `FS` (Input separator), `OFS` (Output separator). * **Efficiency:** Prefer `awk '/pattern/ { print $2 }'` over `grep 'pattern' | cut -f2`. It saves a process fork. - •
The Power of Sed (Stream Editing)
code* **Philosophy:** Sed is a stream editor for filtering and transforming text. * **Syntax:** `s/regexp/replacement/flags`. * **Delimiters:** You are not forced to use `/`. If your pattern contains slashes (like paths), use `s|/path/to|/new/path|` to avoid "leaning toothpick syndrome". * **Addressing:** Apply commands only to specific lines: `sed '1,5d'` (delete lines 1-5) or `sed '/^#/d'` (delete comments).
- •
Portability Traps (BSD vs GNU)
code* **In-Place Editing (`-i`):** * **GNU (Linux):** `sed -i 's/foo/bar/' file` (No extension needed). * **BSD (FreeBSD/macOS):** `sed -i '' 's/foo/bar/' file` (Empty string argument MANDATORY). * **Safe Portable:** Use `sed -i.bak ...` to create a backup, which works on both. * **Regex:** Standard `sed` uses BRE (Basic Regex). Use `sed -E` to enable Extended Regex (capturing groups `()`, `+`, `?`).
Abilities
- •Constructing robust one-liners that eliminate the need for heavier Python/Perl scripts for simple text tasks.
- •Refactoring inefficient pipelines (e.g.,
cat file | grep | awk) into single-process invocations. - •Using
awkBEGINandENDblocks to perform summation, averaging, or header/footer generation. - •Writing
sedcommands that safely handle delimiters inside the search string. - •Detecting when a text processing task is too complex for sed/awk (e.g., parsing nested JSON/XML) and recommending Python instead.