Backend Hang Debug
Purpose
- •Detect and resolve event-loop hangs where the FastAPI app stops responding (e.g.,
curl http://localhost:8000/times out) due to synchronous executor shutdown in the SSE news stream. - •Provide a repeatable triage flow using
py-spyto capture live stacks and pinpoint blocking code.
Scope
- •Backend:
backend/app/api/routes/stream.py(news stream),backend/app/services/rss_ingestion.py(RSS workers), startup processes. - •Tooling:
py-spyfor live stack dumps;curlwith timeouts for smoke tests.
Quick Triage
- •Reproduce hang:
curl -m 5 http://localhost:8000/andcurl -m 5 http://localhost:8000/health; note timeouts. - •Process check:
ss -tlnp | grep 8000to confirm listener;ls /proc/$(pgrep -f "uvicorn app.main")/fd | wc -lto rule out FD leak. - •Stack capture (inside backend venv):
uv pip install py-spythensudo /home/bender/classwork/Thesis/backend/.venv/bin/py-spy dump --pid $(pgrep -f "uvicorn app.main")(and worker pid if multiprocess). Look forThreadPoolExecutor.shutdowninapi/routes/stream.pyframes.
Fix Pattern (non-blocking executor)
- •Replace synchronous context manager
with ThreadPoolExecutor(...):insideevent_generatorwith a long-lived executor plus explicit non-blocking shutdown:- •Create executor outside the context manager.
- •On client disconnect, cancel pending futures instead of awaiting shutdown.
- •In
finally, callexecutor.shutdown(wait=False, cancel_futures=True).
- •Rationale: context manager calls
shutdown(wait=True), blocking the event loop if RSS worker threads hang on network I/O.
Implementation Steps
- •Update stream executor usage in
backend/app/api/routes/stream.py:- •Instantiate
executor = concurrent.futures.ThreadPoolExecutor(max_workers=5). - •Dispatch work via
loop.run_in_executor(executor, _process_source_with_debug, ...). - •On disconnect,
cancel()pending futures. - •In
finally,executor.shutdown(wait=False, cancel_futures=True).
- •Instantiate
- •Keep RSS executor as-is (
rss_ingestion.py) since it runs in background threads, but ensure request timeouts remain reasonable (currently 60s per RSSrequests.get). - •Retest:
- •Restart uvicorn;
curl -m 5 http://localhost:8000/healthshould respond. - •Start a stream request and abort the client; server must stay responsive.
- •Re-run
py-spy dumpto verify noThreadPoolExecutor.shutdown(wait=True)frames in main thread.
- •Restart uvicorn;
Verification Checklist
- •
curl -m 5 http://localhost:8000/returns a response (no hang). - •
curl -m 5 http://localhost:8000/healthsucceeds. - • Aborting
/news/streamdoes not freeze subsequent requests. - •
py-spy dumpshows event loop not blocked onThreadPoolExecutor.shutdown. - • Frontend no longer stalls waiting on root/health while backend is busy with streams.
Notes & Future Hardening
- •Consider adding request timeout middleware to fail fast on slow handlers.
- •Add per-source network timeouts and shorter retries for RSS feeds to reduce long-lived threads.
- •If multi-worker uvicorn is used, run
py-spyon each worker pid when diagnosing hangs.