Transaction Classification Debugger
Overview
This skill helps you understand and debug Budget Buddy's transaction classification system, which uses fuzzy matching to find similar transactions at an 85% similarity threshold. It's critical for the "smart batch update" feature that suggests applying classifications to similar unclassified transactions.
Prerequisites
- •Database exists with transactions:
budget_buddy.db - •Backend code accessible
- •Understanding of Python's
difflib.SequenceMatcher
Quick Start
Step 1: Understand the Fuzzy Matching Algorithm
The core algorithm is in /backend/services/database_service.py - method get_similar_unclassified_transactions:
from difflib import SequenceMatcher
def get_similar_unclassified_transactions(
transaction_id: int,
similarity_threshold: float = 0.85
):
# 1. Get reference transaction
reference_tx = get_transaction_by_id(transaction_id)
# 2. Find candidates (same merchant OR similar description)
candidates = session.query(Transaction).filter(
# Must NOT be manually classified
Transaction.bb_category_manual == False,
# Different transaction
Transaction.id != transaction_id,
# Either exact merchant match OR similar description
or_(
Transaction.merchant_name == reference_tx.merchant_name,
# Description will be checked with fuzzy matching below
)
).all()
# 3. Fuzzy match descriptions
similar_transactions = []
for candidate in candidates:
similarity = SequenceMatcher(
None,
reference_tx.description.lower(),
candidate.description.lower()
).ratio()
if similarity >= similarity_threshold:
similar_transactions.append({
'transaction': candidate,
'similarity_score': similarity,
'match_reason': 'description' if similarity >= 0.85 else 'merchant'
})
return similar_transactions
Step 2: Test Fuzzy Matching
Test with sample descriptions:
from difflib import SequenceMatcher
# Example: Check similarity between two transaction descriptions
desc1 = "CHECK #80 - MONTHLY RENT"
desc2 = "CHECK #79 - MONTHLY RENT"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}") # Should be ~95%
Test in database:
import sqlite3
conn = sqlite3.connect('budget_buddy.db')
cursor = conn.cursor()
# Get transaction with ID 123
cursor.execute("SELECT id, description, merchant_name FROM transactions WHERE id = 123")
reference = cursor.fetchone()
print(f"Reference: {reference}")
# Find similar transactions
cursor.execute("""
SELECT id, description, merchant_name, bb_category_manual
FROM transactions
WHERE id != 123
AND bb_category_manual = 0
LIMIT 100
""")
from difflib import SequenceMatcher
for tx in cursor.fetchall():
similarity = SequenceMatcher(None, reference[1].lower(), tx[1].lower()).ratio()
if similarity >= 0.85:
print(f"Match: ID={tx[0]}, Similarity={similarity:.2%}, Desc={tx[1]}")
conn.close()
Step 3: Debug Classification Issues
Check bb_category_manual flag:
sqlite3 budget_buddy.db "
SELECT
id,
description,
merchant_name,
bb_category,
bb_category_manual
FROM transactions
WHERE merchant_name = 'TARGET'
LIMIT 10;
"
Find unclassified transactions:
sqlite3 budget_buddy.db " SELECT COUNT(*) as unclassified_count FROM transactions WHERE bb_category_manual = 0; "
Check for similar descriptions:
import sqlite3
from difflib import SequenceMatcher
conn = sqlite3.connect('budget_buddy.db')
cursor = conn.cursor()
# Find transactions similar to "WHOLE FOODS MARKET"
cursor.execute("SELECT id, description FROM transactions LIMIT 1000")
transactions = cursor.fetchall()
target_desc = "WHOLE FOODS MARKET #12345"
for tx_id, desc in transactions:
similarity = SequenceMatcher(None, target_desc.lower(), desc.lower()).ratio()
if similarity >= 0.85 and similarity < 1.0: # Similar but not identical
print(f"ID {tx_id}: {similarity:.2%} - {desc}")
conn.close()
Key Validation Points
Matching Criteria
- •
Merchant Name Match (Exact):
- •
merchant_namemust be identical - •Case-sensitive comparison
- •Example: "TARGET" ≠ "Target"
- •
- •
Description Match (Fuzzy, 85%):
- •Uses
difflib.SequenceMatcher - •Threshold: 0.85 (85% similarity)
- •Case-insensitive (converted to lowercase)
- •Example: "CHECK #80" ≈ "CHECK #79" (95% similar)
- •Uses
- •
Manual Classification Filter:
- •
bb_category_manual = False(REQUIRED) - •Never suggest already-manually-classified transactions
- •Prevents overwriting user decisions
- •
Similarity Threshold Analysis
| Threshold | Strictness | Use Case |
|---|---|---|
| 0.95-1.0 | Very strict | Nearly identical (e.g., "CHECK #80" vs "CHECK #79") |
| 0.85-0.95 | Balanced (DEFAULT) | Similar patterns (e.g., same merchant with different check numbers) |
| 0.75-0.85 | Loose | Broader matches (may include false positives) |
| < 0.75 | Very loose | Too many false positives |
Why 0.85?
- •Captures variations like check numbers, dates, locations
- •Avoids false positives from unrelated merchants
- •Proven effective over 70+ commits
Common Issues & Solutions
Issue: No similar transactions found
Possible Causes:
- •All similar transactions already manually classified (
bb_category_manual = True) - •
merchant_nameis null/empty AND description similarity < 0.85 - •Reference transaction is the only one of its kind
Debug:
# Check if merchant_name exists sqlite3 budget_buddy.db " SELECT COUNT(*) FROM transactions WHERE merchant_name = 'YOUR_MERCHANT' AND bb_category_manual = 0; " # Check description patterns sqlite3 budget_buddy.db " SELECT description FROM transactions WHERE description LIKE '%PATTERN%' LIMIT 20; "
Issue: Too many false positive matches
Cause: Threshold too low or descriptions too generic
Solution:
# Test with higher threshold
similar = get_similar_unclassified_transactions(
transaction_id=123,
similarity_threshold=0.90 # Increased from 0.85
)
Example False Positives:
- •"PAYMENT THANK YOU" vs "PAYMENT RECEIVED" (85% similar but different meaning)
- •Generic descriptions matching unrelated transactions
Issue: Missing obvious matches
Cause: Threshold too high or merchant_name mismatch
Solution:
# Test with lower threshold
similar = get_similar_unclassified_transactions(
transaction_id=123,
similarity_threshold=0.80 # Decreased from 0.85
)
Example Missed Matches:
- •"WHOLE FOODS #123" vs "WHOLE FOODS MARKET #456" (if threshold too high)
- •Merchant name variations: "TARGET" vs "TARGET CORP"
Issue: Manually classified transactions appearing in suggestions
Cause: bb_category_manual not properly set
Solution:
# Verify flag is set correctly sqlite3 budget_buddy.db " SELECT id, description, bb_category, bb_category_manual FROM transactions WHERE id IN (123, 456, 789); " # Fix if needed sqlite3 budget_buddy.db " UPDATE transactions SET bb_category_manual = 1 WHERE id IN (SELECT id FROM transactions WHERE bb_category IS NOT NULL); "
Smart Batch Update Workflow
User Journey
- •
User manually classifies transaction (inline or modal)
- •Updates
bb_categoryand setsbb_category_manual = True
- •Updates
- •
Backend checks for similar transactions
- •Calls
get_similar_unclassified_transactions() - •Finds matches with
merchant_nameOR fuzzydescription - •Filters to only unclassified (
bb_category_manual = False)
- •Calls
- •
Frontend shows modal with checkboxes
- •Lists similar transactions
- •Shows similarity score for each
- •User selects which to update
- •
Batch update endpoint applies classification
- •Updates selected transactions
- •Sets
bb_category_manual = Truefor all - •Maintains audit trail
Integration Points
- •
backend/services/database_service.py- •Method:
get_similar_unclassified_transactions() - •Line: ~varies (search for method)
- •Method:
- •
Frontend:
ClassificationManagement.js- •Inline dropdown editing
- •Triggers similarity check on change
- •
Frontend:
EnhancedTransactionModal.js- •Modal form editing
- •OnSaveSuccess callback triggers similarity check
- •
Frontend: Batch Edit Modal
- •Checkbox selection
- •Batch update API call
Testing the Fuzzy Matcher
Test Case 1: Check Numbers
from difflib import SequenceMatcher
desc1 = "CHECK #1234 - MONTHLY RENT"
desc2 = "CHECK #1235 - MONTHLY RENT"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}") # ~95% - MATCH
# Should be found as similar (> 0.85)
assert similarity >= 0.85
Test Case 2: Merchant Variations
desc1 = "WHOLE FOODS MARKET #12345"
desc2 = "WHOLE FOODS MARKET #67890"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}") # ~88% - MATCH
assert similarity >= 0.85
Test Case 3: Unrelated Transactions
desc1 = "STARBUCKS COFFEE #123"
desc2 = "TARGET STORE #456"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}") # ~20% - NO MATCH
assert similarity < 0.85
Test Case 4: Date Variations
desc1 = "PAYMENT DUE 01/15/2026"
desc2 = "PAYMENT DUE 02/15/2026"
similarity = SequenceMatcher(None, desc1.lower(), desc2.lower()).ratio()
print(f"Similarity: {similarity:.2%}") # ~90% - MATCH
assert similarity >= 0.85
Advanced Debugging
Visualize Similarity Scores
import sqlite3
from difflib import SequenceMatcher
import matplotlib.pyplot as plt # if available
conn = sqlite3.connect('budget_buddy.db')
cursor = conn.cursor()
# Get reference transaction
ref_id = 123
cursor.execute("SELECT description FROM transactions WHERE id = ?", (ref_id,))
ref_desc = cursor.fetchone()[0]
# Get all other transactions
cursor.execute("SELECT id, description FROM transactions WHERE id != ?", (ref_id,))
transactions = cursor.fetchall()
# Calculate similarities
similarities = []
for tx_id, desc in transactions:
score = SequenceMatcher(None, ref_desc.lower(), desc.lower()).ratio()
similarities.append((tx_id, score, desc))
# Sort by score
similarities.sort(key=lambda x: x[1], reverse=True)
# Print top 10
print(f"\nTop 10 matches for: {ref_desc}\n")
for tx_id, score, desc in similarities[:10]:
print(f"{score:.2%} - ID {tx_id}: {desc}")
conn.close()
Test Threshold Variations
thresholds = [0.70, 0.75, 0.80, 0.85, 0.90, 0.95]
for threshold in thresholds:
similar = get_similar_unclassified_transactions(
transaction_id=123,
similarity_threshold=threshold
)
print(f"Threshold {threshold:.2f}: {len(similar)} matches")
Technical Details
difflib.SequenceMatcher
from difflib import SequenceMatcher # Create matcher matcher = SequenceMatcher(None, "string1", "string2") # Get similarity ratio (0.0 to 1.0) ratio = matcher.ratio() # Get matching blocks blocks = matcher.get_matching_blocks() # Get opcodes (insert, delete, replace, equal) opcodes = matcher.get_opcodes()
Ratio Calculation:
ratio = 2 * M / T Where: M = number of matching characters T = total number of characters in both strings
Database Schema
Transactions Table:
- •
id- Primary key - •
description- Original transaction description - •
merchant_name- Extracted merchant (from Plaid or manual) - •
bb_category- Assigned budget category - •
bb_category_manual- Boolean (0=auto, 1=manual) - •
amount- Transaction amount - •
date- Transaction date
Key Insight: Only transactions with bb_category_manual = 0 (False) are suggested for batch updates.
Integration with Other Skills
- •Code Explanation - Can explain fuzzy matching algorithm visually
- •Development Diagnostics - Validates database has transactions to classify
- •Testing & Validation Suite - Can include fuzzy matching tests
References
- •
/backend/services/database_service.py-get_similar_unclassified_transactions()method - •
/frontend/src/components/transactions/ClassificationManagement.js- Inline classification - •
/frontend/src/components/transactions/EnhancedTransactionModal.js- Modal classification - •Python docs: https://docs.python.org/3/library/difflib.html#difflib.SequenceMatcher
Last Updated
January 1, 2026