Instructions
1. Understand the Request
- •Parse the user's request to identify:
- •The specific datasets/models in question
- •The user's license selection criteria (e.g., "most permissive for derivative use")
- •Required output format (e.g., issue reply, README updates)
- •Any authentication tokens needed (e.g.,
.hf_token)
2. Investigate Data Sources
For each dataset/model mentioned:
- •GitHub Repositories:
- •Use
github-get_file_contentsto examine repository structure and README - •Look for license files (LICENSE, LICENSE.md, etc.) and documentation
- •Use
- •Hugging Face Resources:
- •Use
huggingface-hub_repo_detailsto get metadata including license tags - •Use
fetch-fetch_markdownto retrieve full README/content - •Search for related datasets/models using
huggingface-dataset_searchorhuggingface-model_search
- •Use
- •Trace Provenance:
- •Follow references to original sources (e.g., "adopted from HuggingFaceTB team")
- •Identify base models used for synthesis (e.g., "uses DeepSeek-V2.5")
3. Determine License Hierarchy
When multiple sources exist:
- •Identify direct sources: For raw/derived datasets, find the original data source
- •Identify synthesis models: For processed/transformed datasets, find the models used for generation
- •Compare licenses: Evaluate which is more permissive for derivative/secondary use based on:
- •Attribution requirements
- •Use-based restrictions
- •Commercial use allowances
- •Redistribution terms
4. Apply License Selection Logic
Follow user's priority rules:
- •If user specifies "most permissive for derivative/secondary use," compare licenses and select the one with fewer restrictions
- •If user specifies "directly reuse license from source," use the identified source's license
- •Document the reasoning for license selection
5. Execute Required Actions
Based on user requirements:
- •Reply to Issues:
- •Format response exactly as specified by user
- •Include all required placeholders filled with determined licenses
- •Update Documentation:
- •Retrieve current README/content using appropriate tools
- •Append license section at the end as specified
- •Maintain existing formatting and content
- •Authentication:
- •Use provided tokens (e.g., read
.hf_tokenfile) - •Pass tokens to API calls as needed
- •Use provided tokens (e.g., read
6. Complete and Close
- •Verify all actions completed successfully
- •Provide confirmation of updates made
- •Close issues if requested
- •Claim task completion
Key Considerations
- •License Types: Be familiar with common licenses (MIT, Apache-2.0, CC-BY, etc.) and custom licenses (DeepSeek License, The Stack Terms of Use)
- •Provenance Chains: Some datasets may have multiple layers of derivation; trace back to the original source
- •Format Compliance: Strictly adhere to user-specified output formats; do not modify, add, or remove content outside placeholders
- •Error Handling: If license information cannot be determined, document the investigation and ask for clarification