AgentSkillsCN

peil-evaluation

评估并评判PEIL生成的提示词,以检验其质量和有效性。当您评估提示词质量、为系统提示词提供反馈、依据既定标准对提示词进行评分,或致力于提升提示词工程的输出效果时,可选用此工具。关键词:提示词评估、提示词质量、提示词反馈、提示词评分、提示词评估、LLM评估、提示词评判。

SKILL.md
--- frontmatter
name: peil-evaluation
description: Evaluates and judges PEIL-generated prompts for quality and effectiveness. Use when assessing prompt quality, providing feedback on system prompts, rating prompts against criteria, or improving prompt engineering outputs. Keywords: prompt evaluation, prompt quality, prompt feedback, prompt rating, prompt assessment, LLM evaluation, prompt judge.
license: MIT
metadata:
  author: Tim Haintz
  version: "0.2"
  companion-skill: peil

PEIL Evaluation Skill

This skill evaluates prompts generated using the Prompt Engineering Instructional Language (PEIL) methodology for quality and effectiveness.

When to Use This Skill

  • Assessing the quality of generated system prompts
  • Providing constructive feedback on prompt design
  • Rating prompts against established criteria
  • Iteratively improving prompts before deployment
  • Comparing multiple prompt versions

Evaluation Criteria (Weighted)

CriterionWeightDescription
Clarity and Coherence30%Is the language clear and unambiguous? Does the prompt have a logical flow?
Completeness and Comprehensiveness25%Does the prompt cover all necessary aspects? Are important elements missing?
Relevance and Applicability20%How well does the prompt align with its intended purpose? Is it practical?
Creativity and Originality15%Does the prompt introduce novel approaches? How original is it?
Technical Accuracy10%Are technical details and instructions accurate?

Evaluation Process

Step 1: Initial Assessment

Read the prompt completely before scoring. Identify:

  • The intended purpose/task
  • Target audience (agent, human, specific domain)
  • Expected output format

Step 2: Criterion-by-Criterion Analysis

For each of the 5 criteria:

  1. Identify specific strengths
  2. Identify areas for improvement
  3. Assign a score (0-100)

Step 3: Calculate Overall Score

code
Overall = (Clarity × 0.30) + (Completeness × 0.25) + (Relevance × 0.20) + (Creativity × 0.15) + (Accuracy × 0.10)

Step 4: Generate Feedback

Provide actionable recommendations for improvement.

Evaluation Output Format

markdown
## Prompt Evaluation Report

### Overall Score: [X]/100

### Criterion Breakdown

| Criterion | Score | Strengths | Areas for Improvement |
|-----------|-------|-----------|----------------------|
| Clarity (30%) | X/100 | ... | ... |
| Completeness (25%) | X/100 | ... | ... |
| Relevance (20%) | X/100 | ... | ... |
| Creativity (15%) | X/100 | ... | ... |
| Technical Accuracy (10%) | X/100 | ... | ... |

### Key Recommendations
1. [Specific, actionable recommendation]
2. [Specific, actionable recommendation]
3. [Specific, actionable recommendation]

### Summary
[2-3 sentence summary of the evaluation]

Scoring Guidelines

Clarity and Coherence (30%)

Score RangeIndicators
90-100Crystal clear instructions, perfect logical flow, no ambiguity
70-89Mostly clear, minor ambiguities, good structure
50-69Some unclear sections, could be better organized
30-49Confusing in places, poor flow, significant ambiguity
0-29Very unclear, disorganized, highly ambiguous

Completeness and Comprehensiveness (25%)

Score RangeIndicators
90-100All PEIL components present, thorough coverage
70-89Most components present, good coverage with minor gaps
50-69Some components missing, moderate coverage
30-49Several components missing, incomplete coverage
0-29Most components missing, very incomplete

Relevance and Applicability (20%)

Score RangeIndicators
90-100Perfectly aligned with purpose, immediately applicable
70-89Well-aligned, practical with minor adjustments
50-69Somewhat aligned, needs modification for use
30-49Poorly aligned, limited practical value
0-29Not aligned with purpose, impractical

Creativity and Originality (15%)

Score RangeIndicators
90-100Highly innovative approach, novel techniques
70-89Some creative elements, good use of techniques
50-69Standard approach, minimal creativity
30-49Very basic, formulaic
0-29No creativity, copied template

Technical Accuracy (10%)

Score RangeIndicators
90-100All technical details correct, best practices followed
70-89Mostly accurate, minor technical issues
50-69Some inaccuracies, deviations from best practices
30-49Several inaccuracies, poor technical implementation
0-29Major technical errors, incorrect information

Quick Evaluation Checklist

Before detailed evaluation, check:

  • Does the prompt have a clear Role defined?
  • Is the Context specific and relevant?
  • Are complex questions broken down?
  • Are there specific, actionable instructions?
  • Is there a length/conciseness constraint?
  • Is an appropriate prompting technique applied?
  • Is the desired output format specified?

Common Issues and Recommendations

IssueRecommendation
Vague role definitionAdd specific expertise and domain context
Missing contextExplain the situation and constraints
Overly complexBreak into sub-prompts or stages
No output formatSpecify Markdown, JSON, bullet points, etc.
Wrong techniqueMatch technique to task type (see PEIL techniques)
Too longFocus on essential instructions, move details to examples
Too shortAdd constraints, examples, or edge case handling

Additional Resources