Learning Code Generation
Objectives
Generate well-structured, self-documenting Python code for course assignments that meets academic requirements.
Instructions
1. Understand Requirements
Before generating code:
- •Read assignment document thoroughly
- •Identify all required steps and their order
- •Note submission requirements (file format, naming, structure)
- •Check for instructor-specific code style requirements
2. Code Structure
Student Information:
Read student information from .env.local in workspace root:
- •
NAME- Student name - •
NUMBER- Student number - •
EMAIL- Student email (optional)
Use python-dotenv to load environment variables at the start of the script.
For Python scripts (.py):
Use the template at templates/standard_bilingual_template.py as base.
# ============================================================
# 配置常量
# Configuration Constants
# ============================================================
RANDOM_STATE = 42
def main():
# ============================================================
# 步骤 0:实验初始化 (Noise Reduction)
# Step 0: Lab Initialization
# ============================================================
output_dir, line_width = initialize_lab()
# ============================================================
# 步骤 1:数据加载
# Step 1: Data Loading
# ============================================================
# [Bilingual Comments Here]
print_step("Step 1: Data Loading", "CSV file", "DataFrame (150, 4)", line_width)
⚠️ RULE: main() Step Dividers
Every step call inside main() MUST use 60-char = dividers with bilingual titles — not plain inline comments. This ensures consistent visual structure between function definitions and their invocations in main.
def main():
# ============================================================
# 步骤 0:实验初始化
# Step 0: Lab Initialization
# ============================================================
config = initialize_lab()
# ============================================================
# 步骤 1:数据加载
# Step 1: Data Loading
# ============================================================
df = load_data("data.csv")
❌ BAD — plain comments without dividers:
def main():
# 步骤 1:数据加载
# Step 1: Data loading
df = load_data("data.csv")
3. Output Formatting Requirements
⚠️ PRINCIPLE 1: Raw Data Integrity
When printing datasets or statistics, always show the original form of the data.
- •No Internal Mapping: Never map numeric labels (e.g.,
0, 1) to string names (e.g.,class_0) inside the script for the "Statistics" step. - •Show Objects As-Is: Display data exactly as it looks after loading. Do not "beautify" or alter original values.
- •Raw Means Clean: The most professional output is one that accurately reflects the raw state of the dataset.
⚠️ PRINCIPLE 2: Concise and Aligned Output
- •Header Format: Use exactly 60 '=' characters above and below the step title.
python
# ============================================================ # Step N: Step Title # ============================================================
- •Avoid Truncation: Use
pd.set_option('display.max_columns', None),pd.set_option('display.width', 1000), andpd.set_option('display.expand_frame_repr', False)to ensure all data columns are visible in a single block. - •Overwrite Policy: All operations (executing scripts, capturing output, generating screenshots) should directly overwrite existing files. Do not use temporary or numbered filenames.
- •Verification First: Always run the script and verify the console output (saved to
output.txt) BEFORE generating screenshots.
Example of Precise Output:
# ✅ GOOD - Precision for "Step 2: Dataset Statistics"
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
pd.set_option('display.expand_frame_repr', False)
print("=" * 60)
print("Step 2: Print dataset statistics")
print("=" * 60)
print(f"Number of instances: {X.shape[0]}")
print(f"Number of attributes: {X.shape[1]}")
print(df.head())
Do NOT include:
- •Over-design: No complex tables if simple DataFrame print is enough.
- •Redundant info: Don't print stats in the "Load" step.
- •Truncated output: Ensure all columns are shown.
- •Mismatched headers: Always 60 '='.
# ❌ BAD - Results as raw arrays
print(f"Accuracy: {accuracy}")
print(f"Confusion Matrix: {cm}")
# ✅ GOOD - Formatted results table
from tabulate import tabulate
print("Step 10: Results table with accuracies and confusion matrices")
print("-" * 40)
results_table = []
for name, acc, cm in results:
results_table.append([name, f"{acc:.4f}", str(cm.tolist())])
headers = ["Model", "Accuracy", "Confusion Matrix"]
print(tabulate(results_table, headers=headers, tablefmt="simple"))
Required Dependencies for Formatted Output:
from tabulate import tabulate # For formatted tables import pandas as pd # For DataFrame display
⚠️ PRINCIPLE 3: Noise Isolation (Step 0)
All environment-related "noise" (dotenv, pandas options, directory creation, student info retrieval) MUST be abstracted into an initialize_lab() function.
- •Global Scope: Only library imports and core algorithm constants (e.g.,
RANDOM_STATE) are allowed. - •Local Scope: UI constants like
line_widthand path constants likeoutput_dirshould be defined inmainorinitialize_laband passed as arguments.
Default Parameter Documentation:
When using algorithms with default parameters, ALWAYS print them:
# ✅ GOOD - Document default parameters print(f"SVM with Linear kernel") print(f" C (regularization): 1.0 (default)") print(f" - Higher C = less regularization, may overfit") print(f" - Lower C = more regularization, may underfit")
For Jupyter Notebooks (.ipynb):
- •First cell: Code to load student info from
.env.localand print header - •Second cell: Markdown with title, author info (using variables), date
- •Each step: Markdown cell + Code cell
- •Final cell: Submission reminder (if needed)
For detailed structure examples: See references/structure-examples.md
3. Core Principles
Self-Documenting Code:
- •
Use clear, descriptive variable names Absolutely No Magic Numbers:
- •
All numeric literals with domain meaning (thresholds, sizes, ratios, limits) MUST be extracted to named constants.
- •
Constants should be defined in the
Configuration Constantssection usingUPPER_SNAKE_CASE. - •
Only trivially obvious values (0, 1, -1, 2 for halving/doubling) may remain inline.
- •
Structure code to reveal intent.
- •
Scientific Notation: Use decimal forms (e.g.,
0.001,0.0003) instead of scientific notation (e.g.,1e-3,3e-4). Add a comment explaining the value and its role (e.g., "controls the step size of weight updates").
Function Usage:
- •Only create functions when code is repeated (DRY principle)
- •Don't create functions for one-time operations
- •Keep main program flow readable and sequential
Comments (Bilingual):
Follow dev-code_comment skill for bilingual comments:
- •File docstring: English only
- •Inline comments: Chinese line + English line above code
- •Complex logic: Add reason (原因/Reason)
- •API parameters: Explain what each value does, not just restate parameter names. For complex APIs (e.g., Stable-Baselines3, PyGame, OpenCV), define how each argument affects the algorithm or visualization.
Example:
# 使用StandardScaler标准化数据 # Use StandardScaler to standardize data # 原因:SVM对特征尺度敏感 # Reason: SVM is sensitive to feature scales scaler = StandardScaler() # 参数:127 是明暗分界线(亮度 > 127 的像素变白,≤ 127 的变黑), # 255 是"变白"后赋予的像素值(纯白), # THRESH_BINARY 表示输出只有纯黑(0)和纯白(255)两种结果 # Parameters: 127 is the brightness cutoff (pixels > 127 become white, ≤ 127 become black), # 255 is the value assigned to "white" pixels (pure white), # THRESH_BINARY means output has only two values: black(0) and white(255) # 原因:二值化处理有助于提取目标轮廓 # Reason: Binarization helps extract object contours _, binary = cv2.threshold(gray, 127, 255, cv2.THRESH_BINARY)
Class Documentation:
Always include a 60-character box header ABOVE class definitions:
# ============================================================
# QLearningAgent: 封装有 Q-Table 及其更新法则的强化学习类
# Reinforcement learning class encapsulating Q-Table and its update rules
# ============================================================
class QLearningAgent:
"""封装有 Q-Table 及其更新法则的强化学习类
Reinforcement learning class encapsulating Q-Table and its update rules"""
...
Avoid AI Appearance:
- •No summary/conclusion sections at end
- •No "Lab completed successfully!" messages
- •No structured final summaries with statistics
- •End with last required step + simple submission reminder
For detailed principles and examples: See references/code-principles.md
4. Common Patterns
- •Data Analysis: Import → Load → Preprocess → Analyze → Visualize
- •Algorithm: Import → Define helpers → Implement → Test → Analyze
- •Machine Learning: Import → Load → Engineer → Train → Evaluate → Visualize
For pattern details: See references/common-patterns.md
5. Environment Setup
Required Package:
Ensure python-dotenv is available for loading .env.local:
from dotenv import load_dotenv
import os
load_dotenv('.env.local')
STUDENT_NAME = os.getenv('NAME', '[Your Name]')
STUDENT_NUMBER = os.getenv('NUMBER', '[Your Student Number]')
Date Formatting:
Use datetime for current date:
from datetime import datetime
current_date = datetime.now().strftime('%Y-%m-%d')
6. Language Requirements
Bilingual Comments (Chinese + English):
- •Variable names: English
- •Function names: English
- •Comments: Chinese line first, then English line
- •Docstrings: English only
- •Print outputs: English
7. Screenshot Separation
CRITICAL: Do NOT include internal logging or screenshot code in scripts.
- •No Logger Classes: Avoid defining
LoggerorOutputCaptureclasses in the coursework script. - •Manual Redirection: Use terminal redirection to capture
output.txtfor verification and screenshots.bashuv run python lab[n]_*.py > lab[n]_images/output.txt 2>&1
Never include:
- •
OutputCaptureorLoggerclasses - •
save_code_screenshot()functions - •
StringIOor screenshot-related imports - •Any code that captures or redirects
stdoutinternally
Keep assignment code focused on:
- •Core analysis logic
- •Data processing
- •Visualization (using
plt.savefig()for plots) - •Results output to terminal
For screenshot generation: Use the learning-code_screenshot skill separately.
8. Submission Reminder
⚠️ IMPORTANT: Submission reminders are for debugging purposes ONLY.
The reminder section should:
- •Be placed at the very END of the script
- •Be separated from the last step's output
- •NOT appear in output screenshots (use
generate_output_screenshots.pywhich captures step-by-step output, not the entire run)
# Only print reminder after all steps are complete (debugging only)
print()
print("=" * 60)
print("Reminder:")
print("1. Take screenshots of code from Google Colab")
print("2. Paste screenshots into Lab1AnswerTemplate.md")
print("3. Fill in descriptions for each step")
print("4. Convert markdown to .docx for submission")
print("=" * 60)
Note: When generating output screenshots with learning-code_screenshot skill, each step's output is captured separately, so submission reminders won't appear in the screenshots.
Validation
After generating code, check:
Documentation:
- • Student info and environment "noise" isolated in
initialize_lab()(Step 0) - • Current date generated using
datetimeinside Step 0 - • File-level docstring with author info
- • Concise function docstrings (two-line bilingual)
- • Box-style function headers ABOVE definitions for parameters/returns
- • Box-style class headers ABOVE class definitions
- • Dividers are exactly 60 characters long
- • Every step in
main()uses 60-char=dividers (not plain comments) - • No scientific notation (e.g., 1e-3 used); all replaced with 0.001 decimal style
Code Quality:
- • Meaningful, self-explanatory variable names
- • Absolutely NO magic numbers (all meaningful numeric literals extracted to constants)
- • UI/Formatting constants (like line_width) are LOCAL to main
- • Minimal "why" comments above every single line of code
Requirements:
- • Follows assignment step order exactly
- • All required steps implemented
- • No AI-generated appearance (summaries, conclusions)
- • English language throughout
For detailed validation checklist: See references/validation-guide.md
Workflow
- •Read assignment document
- •Identify all steps
- •Generate code following step order
- •Use self-documenting code practices
- •Add functions only for repeated operations
- •Include docstrings for functions
- •Add minimal "why" comments if needed
- •Add submission reminder (if appropriate)
- •Validate against checklist
- •Save to appropriate location
Anti-Patterns
- •❌ Generating code without reading requirements
- •❌ Using Chinese comments or variable names
- •❌ Over-commenting obvious operations
- •❌ Creating functions for one-time operations
- •❌ Adding AI-generated summaries/conclusions
- •❌ Using meaningless variable names (x, y, data1)
- •❌ Not following assignment step order
- •❌ Hardcoding values that should be constants
- •❌ Including screenshot generation code (use
learning-code_screenshotskill)
For more examples: See references/code-principles.md