AgentSkillsCN

unsw-cse

抓取 UNSW 计算机科学与工程学院的课程资料——包括讲座幻灯片、代码、教程、考试以及 YouTube 录播课程。支持 16 门以上拥有公共 CGI 网站和 WebCMS3 认证访问权限的课程。

SKILL.md
--- frontmatter
name: unsw-cse
description: Scrape UNSW CSE course materials — lecture slides, code, tutorials, exams, YouTube recordings. Supports 16+ courses with public CGI sites and WebCMS3 authenticated access.

UNSW CSE Course Scraper

Universal knowledge file for AI agents to scrape UNSW CSE course materials. Compatible with: Claude Code, Codex CLI, Gemini, ChatGPT, Cursor, and any LLM-based tool.

Quick Start

When a user asks to scrape CSE course materials, follow this flow:

  1. Ask: Which course? (e.g. COMP2521) Which term? (e.g. 26T1)
  2. Check: Does this course have a public CGI site? (see table below)
  3. If yes: Scrape directly — no authentication needed
  4. If no: User needs to provide WebCMS3 cookies (exported from browser)

Architecture: Two Independent Systems

System 1: CGI Sites — cgi.cse.unsw.edu.au (Public)

Static course websites hosted by lecturers. Apache serves files directly. No authentication required. Past terms are preserved indefinitely.

  • URL pattern: https://cgi.cse.unsw.edu.au/~cs{code}/{term}/
  • Example: https://cgi.cse.unsw.edu.au/~cs2521/26T1/

Publicly Accessible Resources

ResourceURL PatternFormat
Lecture slides/lectures/slides/ or /lectures/ or /slides/ or /Lectures/PDF
Lecture code/lectures/code/.c, .h, .zip, Makefile
Code solutions/lectures/code/{week}/solution/.c source files
Revision exercises/lectures/revision/.zip
Tutorial questions/tut/{1-10}/questionsHTML
Lab questions/lab/{1-17}/questionsHTML
Assignment specs/assignments/ass{1-2}/HTML
Past exams/past-exam/{term} (e.g. /past-exam/22T3)HTML
Sample exam/sample-examHTML
Practice exercises/practice-exercises/ (with solutions!)HTML
Style guide/style-guideHTML
DSA manual/dsa-manualHTML

Protected Resources (403 — separate auth required)

  • /labs/ — lab submission system
  • /exams/ — exam papers (current term)
  • /autotest/ — automated testing system
  • /view/main.cgi — CGI portal, uses zID/zPass (NOT WebCMS3 cookies)

System 2: WebCMS3 — webcms3.cse.unsw.edu.au (Authenticated)

Course management system built with Flask/gunicorn. Requires browser cookies for access. Only current term data exists.

  • URL pattern: https://webcms3.cse.unsw.edu.au/COMP{CODE}/{term}/
  • Example: https://webcms3.cse.unsw.edu.au/COMP2521/26T1/

Access Levels

ContentAny authenticated userEnrolled students only
Course homepage & descriptionYes
Announcements/noticesYes
Staff names & sidebarYes
Resource pages & filesYes (403 otherwise)
Grades & analyticsYes
ForumYes

Required Cookies

Export from browser as Netscape format .txt file:

  • remember_token — persistent login, format: {zID_number}|{hash}, lasts ~1 year
  • session — Flask signed session, expires when browser closes

Tools to export: "Cookie Editor" or "Get cookies.txt LOCALLY" browser extension.

Key Endpoints

EndpointDescription
/COMP{code}/{term}/Course homepage
/COMP{code}/{term}/resources/{id}Individual resource (sequential integer IDs)
/COMP{code}/{term}/noticesAnnouncements page
/users/{zID}User profile
/searchCourse search (all terms back to 2014)
/messages/JSON API endpoint

Courses with Public CGI Sites

Verified February 2026. Slide directory paths vary by lecturer.

Available (public, no auth):

CourseNameTermsSlide Path
COMP1511Programming Fundamentals26T1, 25T3, 25T1varies
COMP1521Computer Systems Fundamentals26T1, 25T3, 25T1varies
COMP2041Software Construction26T1, 25T1varies
COMP2521Data Structures and Algorithms26T1, 25T3, 25T1/lectures/slides/
COMP3131Programming Languages and Compilers26T1, 25T1/Lectures/
COMP3161Concepts of Programming Languages25T3varies
COMP3222Digital Circuits and Systems26T1, 25T1/slides/
COMP3311Database Systems26T1, 25T1/lectures/
COMP3411Artificial Intelligence26T1, 25T1varies
COMP3891Ext Operating Systems26T1, 25T3, 25T1(redirect)
COMP4337Securing Fixed and Wireless Networks25T1varies
COMP6080Web Front-End Programming26T1, 25T3, 25T1/lectures/slides/
COMP9020Foundations of Computer Science25T3varies
COMP9024Data Structures and Algorithms (PG)26T1, 25T3, 25T1varies
COMP9242Advanced Operating Systems25T3varies
COMP9311Database Systems (PG)26T1, 25T3, 25T1varies
COMP9315DBMS Implementation26T1, 25T1/lectures/
COMP9334Capacity Planning25T1varies

NOT Available (WebCMS3 only, needs enrollment):

COMP1531, COMP2121, COMP2511, COMP3141, COMP3153, COMP3211, COMP3231, COMP3331, COMP3421, COMP3900, COMP4336, COMP4511, COMP6443, COMP6451, COMP6452, COMP9319, COMP9417, COMP9444, COMP9517


Scraping Procedures

1. Discover slide directory

Slide paths vary by lecturer. Try these in order:

bash
COURSE=cs2521
TERM=26T1
BASE="https://cgi.cse.unsw.edu.au/~${COURSE}/${TERM}"

for path in lectures/slides/ lectures/ slides/ Lectures/ lecs/; do
  code=$(curl -s -o /dev/null -w "%{http_code}" "${BASE}/${path}")
  if [ "$code" = "200" ]; then
    count=$(curl -s "${BASE}/${path}" | grep -oi 'href="[^"]*\.pdf"' | wc -l)
    if [ "$count" -gt 0 ]; then
      echo "FOUND: ${path} (${count} PDFs)"
      break
    fi
  fi
done

2. Download all lecture slides

bash
SLIDE_PATH="lectures/slides/"  # from step 1
SAVE_DIR=~/UNSW/COMP2521/lectures/slides
mkdir -p "$SAVE_DIR"

curl -s "${BASE}/${SLIDE_PATH}" \
  | grep -o 'href="[^"]*\.pdf"' | sed 's/href="//;s/"$//' \
  | while read f; do
      echo "Downloading: $f"
      curl -s -o "${SAVE_DIR}/$f" "${BASE}/${SLIDE_PATH}$f"
    done

3. Download lecture code

bash
SAVE_DIR=~/UNSW/COMP2521/lectures/code
mkdir -p "$SAVE_DIR"

curl -s "${BASE}/lectures/code/" \
  | grep -o 'href="[^"]*/"' | sed 's/href="//;s/\/"$//' \
  | grep -v '^\.\|^\?' \
  | while read dir; do
      mkdir -p "${SAVE_DIR}/${dir}"
      curl -s -f -o "${SAVE_DIR}/${dir}/all.zip" \
        "${BASE}/lectures/code/${dir}/all.zip" 2>/dev/null && \
        echo "Downloaded: ${dir}/all.zip"
    done

4. Download tutorials and labs

bash
SAVE_DIR=~/UNSW/COMP2521
mkdir -p "${SAVE_DIR}/tutorials" "${SAVE_DIR}/labs"

# Tutorials (week numbers vary by course)
for i in 1 2 3 4 5 7 8 9 10; do
  curl -s -f "${BASE}/tut/${i}/questions" -o "${SAVE_DIR}/tutorials/tut${i}.html" 2>/dev/null
done

# Labs
for i in 1 2 3 4 5 7 8 9 11 12 13 14 15 16 17; do
  curl -s -f "${BASE}/lab/${i}/questions" -o "${SAVE_DIR}/labs/lab${i}.html" 2>/dev/null
done

5. Download exams and guides

bash
SAVE_DIR=~/UNSW/COMP2521
mkdir -p "${SAVE_DIR}/exams" "${SAVE_DIR}/guides"

curl -s -f "${BASE}/sample-exam" -o "${SAVE_DIR}/exams/sample-exam.html"
curl -s -f "${BASE}/style-guide" -o "${SAVE_DIR}/guides/style-guide.html"
curl -s -f "${BASE}/dsa-manual" -o "${SAVE_DIR}/guides/dsa-manual.html"

# Past exams
for t in 21T2 21T3 22T1 22T2 22T3 23T1 23T2 23T3 24T1 24T3 25T1 25T3; do
  curl -s -f "${BASE}/past-exam/${t}" -o "${SAVE_DIR}/exams/past-${t}.html" 2>/dev/null
done

6. Download YouTube lectures

Requires yt-dlp (brew install yt-dlp / pip install yt-dlp).

bash
# List videos in a playlist (no download)
yt-dlp --flat-playlist \
  --print "%(playlist_index)s. %(title)s (%(duration_string)s)" \
  "PLAYLIST_URL"

# Download full playlist (1080p video)
yt-dlp \
  -o '~/UNSW/COMP2521/lectures/videos/%(playlist_index)s-%(title)s.%(ext)s' \
  --format 'bestvideo[height<=1080]+bestaudio/best' \
  --write-subs --sub-langs en \
  --no-overwrites \
  "PLAYLIST_URL"

# Audio only (for commute listening)
yt-dlp \
  -o '~/UNSW/COMP2521/lectures/audio/%(playlist_index)s-%(title)s.%(ext)s' \
  --extract-audio --audio-format mp3 --audio-quality 128K \
  "PLAYLIST_URL"

7. WebCMS3 scraping (needs cookies)

bash
COOKIES=~/UNSW/cookies.txt  # Netscape format

# Fetch course page
curl -s -b "$COOKIES" -L "https://webcms3.cse.unsw.edu.au/COMP2521/26T1/"

# Extract resource IDs
curl -s -b "$COOKIES" -L "https://webcms3.cse.unsw.edu.au/COMP2521/26T1/" \
  | grep -o '/resources/[0-9]*' | sort -u

# Fetch specific resource
curl -s -b "$COOKIES" -L "https://webcms3.cse.unsw.edu.au/COMP2521/26T1/resources/{ID}"

# Extract YouTube links from lectures page
curl -s -b "$COOKIES" -L "https://webcms3.cse.unsw.edu.au/COMP2521/26T1/" \
  | grep -o 'https://[^"]*youtube[^"]*' | sed 's/&amp;/\&/g' | sort -u

Output Directory Structure

code
~/UNSW/COMP{CODE}/
├── lectures/
│   ├── slides/              # PDF lecture slides
│   ├── code/                # Source code per week
│   │   ├── wk1-topic/
│   │   │   ├── all.zip
│   │   │   ├── solution/
│   │   │   └── starter/
│   │   └── ...
│   ├── revision/            # Revision exercise zips
│   ├── videos/              # YouTube recordings (via yt-dlp)
│   ├── audio/               # Audio-only versions
│   └── youtube-links.txt    # All YouTube URLs
├── tutorials/               # Tutorial question pages (HTML)
├── labs/                    # Lab question pages (HTML)
├── assignments/             # Assignment specs (HTML)
├── exams/                   # Past exams + sample exam (HTML)
├── guides/                  # Style guide, manuals
└── webcms-pages/            # Raw WebCMS3 pages (HTML)

Troubleshooting

ProblemSolution
403 on CGI resourceBehind CGI portal auth — not accessible via cookies
404 on WebCMS3Past terms get deleted; only current term exists
Can't find slidesTry all paths: lectures/slides/, lectures/, slides/, Lectures/
Empty directoryTerm just started; try previous term
yt-dlp failsVideo may be unlisted. Try --cookies-from-browser chrome
Cookie expiredremember_token ~1 year, session = browser session. Re-export.
CGI site doesn't exist for courseCourse only uses WebCMS3 — needs enrollment cookies