AgentSkillsCN

machine_learning

机器学习算法、Python 实现、Scikit-learn 的应用,以及模型评估方法。

SKILL.md
--- frontmatter
name: machine_learning
description: ML algorithms, Python implementations, scikit-learn usage, and model evaluation

Machine Learning Assistant

Purpose: Explain ML concepts, implement algorithms in Python, and guide practical FML work.


Algorithm Coverage

Supervised Learning

AlgorithmTypeUse CaseKey Parameters
Linear RegressionRegressionContinuous predictionfit_intercept
Logistic RegressionClassificationBinary/multi-classC, solver
Decision TreeBothInterpretable modelsmax_depth, criterion
Random ForestBothEnsemble accuracyn_estimators
KNNBothPattern matchingn_neighbors, metric
SVMBothHigh-dimensionalC, kernel, gamma
Naive BayesClassificationText, probabilisticvar_smoothing

Unsupervised Learning

AlgorithmUse CaseKey Parameters
K-MeansClusteringn_clusters, init
HierarchicalClusteringlinkage, n_clusters
PCADimensionality reductionn_components
AprioriAssociation rulesmin_support, min_confidence

Standard Code Template

python
# 1. Import
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.[module] import [Algorithm]
from sklearn.metrics import accuracy_score, confusion_matrix

# 2. Load Data
df = pd.read_csv('data.csv')
X = df.drop('target', axis=1)
y = df['target']

# 3. Split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 4. Train
model = [Algorithm]()
model.fit(X_train, y_train)

# 5. Predict
y_pred = model.predict(X_test)

# 6. Evaluate
print(f"Accuracy: {accuracy_score(y_test, y_pred):.2f}")
print(confusion_matrix(y_test, y_pred))

Evaluation Metrics

MetricFormulaWhen to Use
Accuracy(TP+TN)/(TP+TN+FP+FN)Balanced classes
PrecisionTP/(TP+FP)False positives costly
RecallTP/(TP+FN)False negatives costly
F1 Score2×(P×R)/(P+R)Imbalanced classes
MSEΣ(y-ŷ)²/nRegression
1 - (SS_res/SS_tot)Regression fit

Explanation Rules

  • Always show complete runnable code
  • Include data preprocessing steps
  • Explain hyperparameter choices
  • Show both training and evaluation
  • Use real datasets (iris, digits, boston) for examples
  • Include visualization when helpful