Evidence Evaluation Skill

A systematic framework for assessing the quality and reliability of evidence used in decision-making.

Purpose

Evidence-based decisions require more than just having data - they require understanding how trustworthy that data is. This skill provides tools to distinguish facts from assumptions, evaluate source quality, and document evidence systematically.

Fact vs Assumption

Understanding the difference between facts and assumptions is fundamental to evidence evaluation.

Facts

Definition: Verifiable statements supported by direct observation, measurement, or documentation.

Characteristics:

•Can be independently verified
•Based on direct evidence
•Remain true regardless of perspective
•Have clear provenance

Examples:

Statement	Why It's a Fact
"Server response time averaged 245ms over 1000 requests"	Measured and logged
"The contract expires on March 15, 2026"	Documented in signed agreement
"Three customers reported this bug in the last week"	Tracked in support system
"The test suite has 847 passing tests"	Verified by running tests

Assumptions

Definition: Beliefs or expectations accepted as true without direct verification.

Characteristics:

•May feel obvious but lack proof
•Often based on past patterns
•Can vary by perspective
•May be reasonable but unverified

Examples:

Statement	Why It's an Assumption
"Users prefer the new design"	No user research conducted
"Performance will improve with more servers"	Not tested in this context
"The competitor will respond with a price cut"	Prediction, not fact
"The team can deliver this in two sprints"	Estimate, not commitment

Converting Assumptions to Facts

When an assumption is critical to a decision, consider how to verify it:

code

Assumption: "Most users access the app on mobile"
  -> Verification: Check analytics data for device breakdown
  -> Result: "68% of sessions are from mobile devices" (fact)

Assumption: "The API can handle 10x current load"
  -> Verification: Run load tests
  -> Result: "API failed at 7x load with memory exhaustion" (fact)

Assumption: "Customers want feature X"
  -> Verification: Conduct user interviews or surveys
  -> Result: "23 of 30 interviewed customers expressed need for X" (fact)

Source Quality Framework

Evaluate evidence sources across four dimensions, each rated High/Medium/Low.

1. Credibility (Who)

Assesses the trustworthiness of the information source.

Rating	Criteria	Examples
High	Expert in relevant domain; established track record; no conflicts of interest; peer-reviewed or audited	Academic research, official documentation, recognized industry experts, audited financial statements
Medium	Knowledgeable but not expert; some track record; minor potential conflicts; reviewed but not rigorously	Industry reports, experienced practitioners, trade publications, internal analysis
Low	Unknown expertise; no track record; significant conflicts of interest; unreviewed	Anonymous sources, marketing materials, unverified claims, self-reported without validation

Questions to Ask:

•What are the source's qualifications on this topic?
•Does the source have a track record of accuracy?
•What incentives might bias the source?
•Has the information been reviewed or verified by others?

2. Methodology (How)

Assesses how the information was gathered or conclusions were reached.

Rating	Criteria	Examples
High	Rigorous, documented methodology; appropriate sample size; controlled conditions; reproducible	Scientific studies, comprehensive testing, statistical analysis with proper controls
Medium	Reasonable methodology with some gaps; moderate sample size; some documentation	Surveys with decent response rates, pilot tests, case studies with multiple examples
Low	Unclear or flawed methodology; small or biased samples; anecdotal; not reproducible	Single anecdotes, informal polls, untested hypotheses, "common knowledge"

Questions to Ask:

•How was this data collected?
•Is the sample size adequate and representative?
•Could this result be replicated?
•What variables were controlled?

3. Recency (When)

Assesses how current the information is relative to the decision context.

Rating	Criteria	Examples
High	Current or real-time; collected within relevant timeframe; accounts for recent changes	Live metrics, recent market research, current documentation, data from this quarter
Medium	Reasonably recent; may not reflect latest changes; still largely applicable	Last year's data, recent but not current reports, slightly outdated documentation
Low	Outdated; from before significant changes; may no longer apply	Multi-year-old research, pre-pivot data, documentation from deprecated systems

Questions to Ask:

•When was this information collected or published?
•Have relevant conditions changed since then?
•Is there a more recent source available?
•Does the age of this data matter for this decision?

4. Relevance (What)

Assesses how directly the information applies to the specific decision.

Rating	Criteria	Examples
High	Directly addresses the question; same context and conditions; specific to this situation	Data from this system, research on this exact problem, feedback from target users
Medium	Related but not exact; similar context; requires some extrapolation	Industry benchmarks, analogous case studies, research on similar problems
Low	Tangentially related; different context; significant extrapolation required	General statistics, research from different domains, opinions on different problems

Questions to Ask:

•Does this evidence directly address our specific question?
•How similar is the source context to our situation?
•What assumptions are needed to apply this evidence here?
•Is there more directly relevant evidence available?

Evidence Quality Score

Combine the four dimension ratings into an overall quality assessment.

Scoring Method

Assign points to each rating:

•High = 3 points
•Medium = 2 points
•Low = 1 point

Calculate total (max 12 points):

Total Score	Overall Quality	Guidance
10-12	Strong Evidence	Can rely on for important decisions
7-9	Moderate Evidence	Useful but seek corroboration
4-6	Weak Evidence	Use cautiously; actively seek better sources
1-3	Poor Evidence	Do not rely on; treat as unverified assumption

Example Evaluation

Evidence: "Industry report says 40% of companies are adopting this technology"

Dimension	Rating	Rationale
Credibility	Medium	Reputable firm but funded by vendor
Methodology	Medium	Survey of 500 companies; self-reported
Recency	High	Published this quarter
Relevance	Medium	Includes our industry but broader scope

Score: 2 + 2 + 3 + 2 = 9 (Moderate Evidence)

Conclusion: Useful directional indicator but seek additional sources before major commitments.

Red Flags

Warning signs that evidence may be unreliable:

Conflict of Interest

•Source benefits from a particular conclusion
•Funding source has stake in results
•Vendor-provided benchmarks for their own products
•Example: Security vendor's report on rising cyber threats

Cherry-Picking

•Only favorable data points presented
•Timeframe chosen to support conclusion
•Comparison groups selected to look good
•Example: "Our growth this quarter" ignoring down quarters

Missing Methodology

•No explanation of how data was gathered
•Sample size not disclosed
•Margin of error not provided
•Example: "Studies show..." without citation

Survivorship Bias

•Only successes examined, failures ignored
•Selection based on outcome
•Missing data from dropouts or failures
•Example: "Successful startups all did X"

Correlation vs Causation

•Correlation presented as causation
•No control for confounding variables
•Post-hoc reasoning
•Example: "Companies using our tool grow faster"

Appeal to Authority

•Credentials in unrelated field
•Popularity substituted for expertise
•"Everyone knows" or "experts agree"
•Example: Celebrity endorsement of technical approach

Outdated Context

•Old data applied to changed conditions
•Pre-disruption research
•Technology or market has shifted
•Example: Pre-AI research on software productivity

Sample Issues

•Non-representative sample
•Self-selection bias
•Too small to be meaningful
•Example: Survey of conference attendees as "the industry"

Evidence Documentation Template

Use this template to document and evaluate evidence systematically.

markdown

## Evidence Record

### Claim
[What assertion does this evidence support?]

### Source
- **Origin**: [Where did this evidence come from?]
- **Author/Creator**: [Who produced it?]
- **Date**: [When was it created/published?]
- **URL/Reference**: [How to find the original]

### Quality Assessment

| Dimension | Rating | Rationale |
|-----------|--------|-----------|
| Credibility (Who) | High/Medium/Low | [Why?] |
| Methodology (How) | High/Medium/Low | [Why?] |
| Recency (When) | High/Medium/Low | [Why?] |
| Relevance (What) | High/Medium/Low | [Why?] |

**Overall Score**: [X/12] - [Strong/Moderate/Weak/Poor]

### Red Flags Identified
- [ ] Conflict of interest
- [ ] Cherry-picking
- [ ] Missing methodology
- [ ] Survivorship bias
- [ ] Correlation as causation
- [ ] Appeal to authority
- [ ] Outdated context
- [ ] Sample issues
- [ ] Other: [specify]

### Notes
[Additional context, caveats, or observations]

### Recommendation
[How should this evidence be used in the decision?]

Using This Skill

When Gathering Evidence

•Document each piece of evidence using the template
•Evaluate quality across all four dimensions
•Note any red flags
•Calculate overall quality score

When Reviewing Evidence

•Check that facts are truly facts (verifiable)
•Identify unstated assumptions
•Look for red flags
•Assess whether evidence quality matches decision importance

When Presenting Evidence

•Lead with highest-quality evidence
•Acknowledge limitations and gaps
•Distinguish facts from assumptions
•Be explicit about evidence quality

Integration with Other Skills

•Decision Analysis: Use quality scores to weight evidence
•Risk Assessment: Lower-quality evidence = higher uncertainty
•Stakeholder Communication: Calibrate confidence to evidence quality