Cleaning Up Research Sessions
Overview
Remove intermediate files created during research workflow while preserving all important data.
Core principle: Conservative cleanup with user confirmation. Never delete anything important.
When to Use
Use this skill when:
- •Research session is complete and consolidated
- •Preparing to archive or share research session folder
- •Research folder has accumulated temporary/intermediate files
- •User explicitly asks to clean up
When NOT to use:
- •Research is still in progress
- •User hasn't reviewed final outputs yet
- •Unsure what files are safe to delete
Files That Are ALWAYS KEPT
NEVER delete these (protected list):
Core outputs:
- •
SUMMARY.md- Enhanced findings with methodology - •
relevant-papers.json- Filtered relevant papers - •
papers-reviewed.json- Complete screening history - •
papers/directory - All PDFs and supplementary files - •
citations/citation-graph.json- Citation relationships
Methodology documentation:
- •
screening-criteria.json- Rubric definition (if exists) - •
test-set.json- Rubric validation papers (if exists) - •
abstracts-cache.json- Cached abstracts for re-screening (if exists) - •
rubric-changelog.md- Rubric version history (if exists)
Auxiliary documentation (if exists):
- •
README.md- Project overview - •
TOP_PRIORITY_PAPERS.md- Curated priority list - •
evaluated-papers.json- Rich structured data
Project configuration:
- •
.claude/directory - Permissions and settings - •
*.pyhelper scripts that were created - Keep for reproducibility
Files That May Be Cleaned Up
Candidates for removal (with confirmation):
Intermediate search results:
- •
initial-search-results.json- Raw PubMed results before screening- •Safe to delete: Data is in papers-reviewed.json
- •Reason to keep: Shows raw search results for reproducibility
Temporary files:
- •
*.tmpfiles - •
*.swpfiles (vim swap files) - •
.DS_Store(macOS) - •
__pycache__/(Python cache) - •
*.pyc(Python compiled)
Log files:
- •
*.logfiles - •
debug-*.txtfiles
Cleanup Workflow
Step 1: Analyze Research Session
cd research-sessions/YYYY-MM-DD-description/
# List all files with sizes
find . -type f -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh
Identify files by category:
- •Core outputs (MUST keep)
- •Methodology files (SHOULD keep)
- •Intermediate files (candidates for cleanup)
- •Temporary files (safe to delete)
Step 2: Present Cleanup Plan to User
Show what will be deleted:
🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/ Files to KEEP (protected): ✅ SUMMARY.md (45 KB) ✅ relevant-papers.json (12 KB) ✅ papers-reviewed.json (28 KB) ✅ papers/ (14 PDFs, 32 MB) ✅ citations/citation-graph.json (5 KB) ✅ screening-criteria.json (2 KB) ✅ abstracts-cache.json (156 KB) Files that CAN be removed (intermediate): 🗑️ initial-search-results.json (8 KB) - Raw PubMed results 🗑️ .DS_Store (6 KB) - macOS metadata Total space to recover: 14 KB Proceed with cleanup? (y/n/review)
Options:
- •
y- Delete intermediate files - •
n- Cancel cleanup, keep everything - •
review- Show contents of each file before deciding
Step 3: Confirm Deletions
Before deleting ANY file:
- •Verify it's not in protected list
- •Check file isn't referenced in SUMMARY.md
- •Confirm with user one more time
Example confirmation:
About to delete: - initial-search-results.json (8 KB) This file contains raw PubMed search results. The data is preserved in papers-reviewed.json, so this is safe to delete. Confirm deletion? (y/n)
Step 4: Perform Cleanup
Delete confirmed files:
# Move to trash instead of rm (safer) # On macOS: mv initial-search-results.json ~/.Trash/ # On Linux: mv initial-search-results.json ~/.local/share/Trash/files/ # Or use rm if user confirms rm initial-search-results.json
Report results:
✅ Cleanup complete! Removed: - initial-search-results.json (8 KB) - .DS_Store (6 KB) Space recovered: 14 KB Protected files preserved: - All 8 core files kept - All 14 PDFs kept - All methodology documentation kept
Step 5: Verify Integrity
After cleanup, verify critical files:
# Check core files exist test -f SUMMARY.md && echo "✓ SUMMARY.md" test -f relevant-papers.json && echo "✓ relevant-papers.json" test -f papers-reviewed.json && echo "✓ papers-reviewed.json" test -d papers && echo "✓ papers/ directory" # Verify JSON files are valid jq empty relevant-papers.json && echo "✓ relevant-papers.json valid JSON" jq empty papers-reviewed.json && echo "✓ papers-reviewed.json valid JSON"
Report to user:
✅ Integrity check passed - All core files present - All JSON files valid - All PDFs intact
Special Cases
Case 1: Large abstracts-cache.json
If abstracts-cache.json is very large (>100 MB):
⚠️ abstracts-cache.json is 256 MB This file enables re-screening if you update the rubric. Options: 1. Keep (recommended if you might refine rubric) 2. Compress (gzip to ~50 MB, can decompress later) 3. Delete (only if research is final and won't be updated) Choice? (1/2/3)
If user chooses compress:
gzip abstracts-cache.json # Creates abstracts-cache.json.gz echo "Compressed abstracts-cache.json to $(du -h abstracts-cache.json.gz | cut -f1)"
Case 2: Helper Scripts
If user created helper scripts during research:
📝 Found helper scripts: - screen_papers.py (created for batch screening) - deep_dive_papers.py (created for data extraction) These scripts document your methodology. Recommendations: - Keep for reproducibility - Add comments if not already documented - Reference in SUMMARY.md under "Reproducibility" section Keep scripts? (y/n)
Case 3: Multiple Research Sessions
If cleaning up multiple sessions:
# Find all research sessions
find research-sessions/ -maxdepth 1 -type d
# For each session:
for session in research-sessions/*/; do
echo "Analyzing: $session"
# Run cleanup analysis
done
Ask user:
Found 5 completed research sessions. Clean up all sessions? (y/n/select) - y: Analyze and clean all sessions - n: Cancel - select: Choose which sessions to clean
Safety Mechanisms
Protected File List
Maintain hardcoded list of patterns to NEVER delete:
PROTECTED_PATTERNS = [
'SUMMARY.md',
'relevant-papers.json',
'papers-reviewed.json',
'papers/*.pdf',
'papers/*.zip',
'citations/citation-graph.json',
'screening-criteria.json',
'test-set.json',
'abstracts-cache.json',
'rubric-changelog.md',
'README.md',
'TOP_PRIORITY_PAPERS.md',
'evaluated-papers.json',
'*.py', # Helper scripts
'.claude/*', # Project settings
]
Before deleting any file:
def is_protected(filepath):
"""Check if file matches any protected pattern"""
for pattern in PROTECTED_PATTERNS:
if fnmatch(filepath, pattern):
return True
return False
# Never delete protected files
if is_protected(file_to_delete):
print(f"⚠️ ERROR: {file_to_delete} is protected and cannot be deleted")
return
Dry Run Mode
Always show what will be deleted before doing it:
# Dry run (show only, don't delete)
echo "DRY RUN - No files will be deleted"
for file in $candidate_files; do
if is_safe_to_delete "$file"; then
echo "Would delete: $file ($(du -h $file | cut -f1))"
fi
done
echo ""
echo "Proceed with actual deletion? (y/n)"
Integration with Other Skills
After answering-research-questions workflow:
- •Complete Phase 8 (consolidation)
- •User reviews SUMMARY.md and relevant-papers.json
- •Optionally: Run cleaning-up-research-sessions
- •Archive or share research folder
Add to answering-research-questions Phase 8:
### Optional: Cleanup After reviewing outputs, optionally clean up intermediate files: "Research session is complete. Would you like me to clean up intermediate files? I'll show you what will be deleted before removing anything." If yes: Use `cleaning-up-research-sessions` skill
Common Mistakes
Deleting papers-reviewed.json: This is the deduplication database - NEVER delete → Always protect with hardcoded list Deleting abstracts-cache.json: Needed for re-screening → Ask user, default to keep Deleting helper scripts: Important for reproducibility → Keep by default, ask if user wants to remove Not showing user what will be deleted: User needs to see the plan → Always show dry run first Deleting without confirmation: Too risky → Always ask for final confirmation Not verifying after cleanup: Files could be corrupted → Run integrity check after deletion
Quick Reference
| Action | Command |
|---|---|
| List all files with sizes | find . -type f -exec ls -lh {} \; | sort -rh |
| Check if JSON valid | jq empty file.json |
| Move to trash (macOS) | mv file ~/.Trash/ |
| Move to trash (Linux) | mv file ~/.local/share/Trash/files/ |
| Compress large file | gzip file |
| Calculate folder size | du -sh . |
Success Criteria
Cleanup successful when:
- •All protected files still exist and are valid
- •Only confirmed intermediate files deleted
- •User knows what was deleted and why
- •Space recovered is reported
- •Integrity check passed
- •Research session is still fully functional
Example Session
User: "Can you clean up the research session folder?"
Agent: "I'll analyze the research session and show you what can be safely cleaned up."
[Analyzes folder]
Agent: "🧹 Cleanup Analysis for: research-sessions/2025-10-11-btk-selectivity/
Files to KEEP (protected):
✅ SUMMARY.md (45 KB) - Final findings
✅ relevant-papers.json (12 KB) - Filtered results
✅ papers-reviewed.json (28 KB) - Complete history
✅ papers/ (14 PDFs, 32 MB) - Downloaded papers
✅ citations/citation-graph.json (5 KB) - Citation network
✅ screening-criteria.json (2 KB) - Rubric definition
✅ abstracts-cache.json (156 KB) - Cached abstracts
Files that CAN be removed:
🗑️ initial-search-results.json (8 KB) - Raw PubMed results
→ Data is preserved in papers-reviewed.json
🗑️ .DS_Store (6 KB) - macOS metadata
→ Not needed
Total space to recover: 14 KB
Proceed with cleanup? (y/n/review)"
User: "y"
Agent: "Final confirmation - About to delete:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
All protected files will be preserved. Confirm? (y/n)"
User: "y"
Agent: [Deletes files]
"✅ Cleanup complete!
Removed:
- initial-search-results.json (8 KB)
- .DS_Store (6 KB)
Space recovered: 14 KB
Running integrity check...
✅ All core files intact
✅ All JSON files valid
✅ All 14 PDFs present
Research session is ready for archiving or sharing."
Next Steps
After cleanup:
- •Research folder is clean and ready to archive
- •Share folder with collaborators
- •Move to long-term storage
- •Continue with follow-up research if needed