PowerBI Analysis Automator Skill
This skill assists in transforming raw CSV data into a Power BI-ready state. It performs data cleaning, generates DAX measures, and provides the necessary Python scripts for integration.
Usage
Input: A path to a CSV file (e.g., data/sales_data.csv).
Context: Optional details about the data schema or specific analysis goals.
Instructions
Follow these steps to process the request:
- •Analyze the Request: Identify the CSV file path and any specific analysis requirements (e.g., specific columns to target).
- •Generate Data Cleaning Code (Mental Check or Scratchpad):
- •Formulate Pandas code to load the CSV.
- •Remove duplicate rows.
- •Handle missing values (e.g., fill with 0 for numeric, 'Unknown' for categorical, or drop if appropriate).
- •Generate DAX Measure:
- •Create a standard DAX measure for "Total Sales" or relevant aggregation based on column names (e.g.,
Sales,Revenue,Amount). - •Format:
Total Sales = SUM('Table'[ColumnName])
- •Create a standard DAX measure for "Total Sales" or relevant aggregation based on column names (e.g.,
- •Generate Power BI Python Script:
- •Create a Python script suitable for a Power BI "Python script visual" or Power Query transformation.
- •The script should:
- •Import
pandasandmatplotlib.pyplot. - •Load the dataset (assuming Power BI passes it as
datasetin a visual, or loading from source in Power Query). Note: For the output script, prefer the pattern where the user can copy-paste into Power BI's editor. - •Include a basic plot code snippet (e.g., bar chart).
- •Import
- •Output Format:
- •Present the DAX Measure.
- •Present the Python Script.
- •Explain briefly what the script does.
Few-Shot Examples
Example 1: Sales Data Analysis
User Input:
"Analyze sales_january.csv. It has columns: Product, Quantity, UnitPrice, Region."
Response:
Here is the analysis setup for sales_january.csv.
1. Data Cleaning (Pandas Logic Applied)
- •Duplicates removed.
- •Missing
QuantityorUnitPricefilled with 0. - •Calculated
TotalAmount = Quantity * UnitPrice.
2. DAX Measure Use this measure in Power BI to calculate total sales:
Total Sales = SUM('sales_january'[TotalAmount])
3. Python Script for Power BI Copy this into a Power BI Python Visual to visualize Sales by Region:
# 'dataset' holds the input data for this script
import matplotlib.pyplot as plt
import pandas as pd
# Standard Power BI dataframe loading (if testing locally, uncomment the defined dataframe)
# dataset = pd.read_csv('sales_january.csv')
# Data Cleaning
dataset.drop_duplicates(inplace=True)
dataset['Quantity'] = dataset['Quantity'].fillna(0)
dataset['UnitPrice'] = dataset['UnitPrice'].fillna(0)
dataset['TotalAmount'] = dataset['Quantity'] * dataset['UnitPrice']
# Aggregation
data_agg = dataset.groupby('Region')['TotalAmount'].sum()
# Plotting
plt.figure(figsize=(10, 6))
data_agg.plot(kind='bar', color='skyblue')
plt.title('Total Sales by Region')
plt.xlabel('Region')
plt.ylabel('Sales Amount')
plt.tight_layout()
plt.show()
Example 2: Inventory Check
User Input:
"Check inventory.csv. Columns: ItemID, Category, StockLevel."
Response:
1. Data Cleaning
- •Duplicates removed based on
ItemID. - •Missing
StockLevelrows dropped to avoid skewing data.
2. DAX Measure
Total Stock = SUM('inventory'[StockLevel])
3. Python Script for Power BI Use this to visualize stock distribution:
import matplotlib.pyplot as plt
import pandas as pd
# Clean missing values
dataset.dropna(subset=['StockLevel'], inplace=True)
dataset.drop_duplicates(subset=['ItemID'], inplace=True)
# Plot
plt.figure(figsize=(8, 8))
dataset['Category'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('Inventory Distribution by Category')
plt.show()