Awsflow Glue
Manage Glue ETL jobs, triggers, crawlers, and query the Data Catalog.
When to Use This Skill
Use this skill when the user:
- •Asks about Glue ETL jobs, crawlers, or triggers
- •Wants to inspect or query the Glue Data Catalog (databases, tables, partitions)
- •Needs to start a Glue job run
- •Wants to create a Glue job
- •Asks about Glue connections or job bookmarks
- •Needs to inspect crawl history
Tool: GlueTool
Execute AWS Glue commands including Data Catalog queries. ALWAYS provide params object.
Commands
ListJobs
List Glue jobs.
{ "command": "ListJobs", "params": { "MaxResults": 50 } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| MaxResults | number | No | Maximum items to return |
| nextToken | string | No | Pagination token |
GetJob
Get details of a Glue job.
{ "command": "GetJob", "params": { "JobName": "my-etl-job" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| JobName | string | Yes | Job name |
GetJobRun
Get details of a specific job run.
{ "command": "GetJobRun", "params": { "JobName": "my-etl-job", "RunId": "jr_abc123" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| JobName | string | Yes | Job name |
| RunId | string | Yes | Job run ID |
GetJobRuns
List all runs of a job.
{ "command": "GetJobRuns", "params": { "JobName": "my-etl-job", "MaxResults": 20 } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| JobName | string | Yes | Job name |
| MaxResults | number | No | Maximum items to return |
| nextToken | string | No | Pagination token |
GetJobBookmark
Get the bookmark state for a job.
{ "command": "GetJobBookmark", "params": { "JobName": "my-etl-job" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| JobName | string | Yes | Job name |
StartJobRun
Start a Glue job run.
{ "command": "StartJobRun", "params": { "JobName": "my-etl-job", "Arguments": { "--input-path": "s3://my-bucket/input/" } } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| JobName | string | Yes | Job name |
| Arguments | object | No | Job run arguments (override defaults) |
| Timeout | number | No | Job timeout in minutes |
| MaxCapacity | number | No | Maximum DPU capacity |
| WorkerType | string | No | Standard, G.1X, G.2X, G.025X |
| NumberOfWorkers | number | No | Number of workers |
| SecurityConfiguration | string | No | Security configuration name |
| AllocatedCapacity | number | No | Allocated capacity |
| JobRunId | string | No | Job run ID |
CreateJob
Create a new Glue job.
{
"command": "CreateJob",
"params": {
"Name": "my-new-job",
"Role": "arn:aws:iam::123456789012:role/GlueRole",
"Command": { "Name": "glueetl", "ScriptLocation": "s3://my-bucket/scripts/etl.py" },
"WorkerType": "G.1X",
"NumberOfWorkers": 10
}
}
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| Name | string | Yes | Job name |
| Role | string | Yes | IAM role ARN |
| Command | object | Yes | Command config with Name (glueetl/pythonshell/gluestreaming) and ScriptLocation |
| Description | string | No | Job description |
| LogUri | string | No | S3 URI for job logs |
| DefaultArguments | object | No | Default job arguments |
| MaxRetries | number | No | Maximum retries |
| Timeout | number | No | Timeout in minutes |
| MaxCapacity | number | No | Max DPU capacity |
| WorkerType | string | No | Standard, G.1X, G.2X, G.025X |
| NumberOfWorkers | number | No | Number of workers |
| SecurityConfiguration | string | No | Security config name |
| Tags | object | No | Key-value tags |
ListTriggers
List Glue triggers.
{ "command": "ListTriggers", "params": { "MaxResults": 50 } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| DependentJobName | string | No | Filter by dependent job name |
| MaxResults | number | No | Maximum items |
| nextToken | string | No | Pagination token |
GetTrigger
Get details of a trigger.
{ "command": "GetTrigger", "params": { "Name": "my-trigger" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| Name | string | Yes | Trigger name |
GetTriggers
List triggers with optional filter.
{ "command": "GetTriggers", "params": { "DependencyJobName": "my-job" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| DependencyJobName | string | No | Filter by dependency job name |
| MaxResults | number | No | Maximum items |
| nextToken | string | No | Pagination token |
ListCrawlers
List Glue crawlers.
{ "command": "ListCrawlers", "params": {} }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| MaxResults | number | No | Maximum items |
| nextToken | string | No | Pagination token |
GetCrawler
Get details of a crawler.
{ "command": "GetCrawler", "params": { "CrawlerName": "my-crawler" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| CrawlerName | string | Yes | Crawler name |
GetCrawlers
List crawlers with details.
{ "command": "GetCrawlers", "params": {} }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| MaxResults | number | No | Maximum items |
| nextToken | string | No | Pagination token |
ListCrawls
List crawl runs for a crawler.
{ "command": "ListCrawls", "params": { "CrawlerName": "my-crawler" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| CrawlerName | string | Yes | Crawler name |
| MaxResults | number | No | Maximum items |
| nextToken | string | No | Pagination token |
GetDatabase
Get a Glue Data Catalog database.
{ "command": "GetDatabase", "params": { "DatabaseName": "my-database" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| DatabaseName | string | Yes | Database name |
| CatalogId | string | No | Catalog ID (AWS account ID) |
GetDatabases
List all databases in the Data Catalog.
{ "command": "GetDatabases", "params": {} }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| CatalogId | string | No | Catalog ID |
GetTable
Get a table definition from the Data Catalog.
{ "command": "GetTable", "params": { "DatabaseName": "my-database", "Name": "my-table" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| DatabaseName | string | Yes | Database name |
| TableName | string | Yes | Table name |
| CatalogId | string | No | Catalog ID |
GetTables
List tables in a database.
{ "command": "GetTables", "params": { "DatabaseName": "my-database" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| DatabaseName | string | Yes | Database name |
| CatalogId | string | No | Catalog ID |
GetPartitions
List partitions for a table.
{ "command": "GetPartitions", "params": { "DatabaseName": "my-database", "TableName": "my-table" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| DatabaseName | string | Yes | Database name |
| TableName | string | Yes | Table name |
| Expression | string | No | Partition filter expression |
| CatalogId | string | No | Catalog ID |
| Segment | object | No | Segment config for parallel scanning |
GetConnections
List or get Glue connections.
{ "command": "GetConnections", "params": { "ConnectionName": "my-connection" } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| ConnectionName | string | No | Connection name |
| HidePassword | boolean | No | Hide connection password |
| CatalogId | string | No | Catalog ID |
GetTags
Get tags for a Glue resource.
{ "command": "GetTags", "params": { "ResourceArn": "arn:aws:glue:..." } }
Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| ResourceArn | string | Yes | Resource ARN |
Related Services
- •Glue → CloudWatch Logs: Glue job output logs go to
/aws-glue/jobs/outputand error logs to/aws-glue/jobs/error. Crawler logs go to/aws-glue/crawlers. UseCloudWatchLogToolto read them - •Glue → S3: Glue jobs read from and write to S3. Crawler targets are often S3 paths. Job scripts are stored in S3. Use
S3Toolto inspect - •Glue → IAM: Jobs require IAM roles. Use
IAMToolto inspect the execution role - •Glue → Data Catalog → Athena/EMR/Redshift: The Glue Data Catalog is shared with Athena, EMR, and Redshift Spectrum
- •Glue → CloudFormation: Glue resources managed by CloudFormation stacks
- •Glue → DynamoDB: Glue can read from DynamoDB tables as data sources