Athena.
AWS Athena runs SQL over data sitting in S3 — Parquet, ORC, JSON, CSV. Orchid connects to Athena through a workgroup, runs queries against your Glue Data Catalog, and streams results back into a notebook.
This connector is on the v1.1 roadmap. The setup steps below are the planned flow.
Requirements
- An AWS account with Athena set up and at least one workgroup.
- An S3 bucket configured as the workgroup's query result location.
- An IAM principal (user or role) with Athena, Glue, and S3 permissions.
- A Glue Data Catalog database containing the tables you want to query.
IAM setup
The IAM principal Orchid uses needs the following permissions. A managed policy that covers this is AmazonAthenaFullAccess, but a tighter custom policy is better. At minimum:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"athena:StartQueryExecution",
"athena:GetQueryExecution",
"athena:GetQueryResults",
"athena:GetQueryResultsStream",
"athena:StopQueryExecution",
"athena:ListWorkGroups",
"athena:GetWorkGroup"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"glue:GetDatabase",
"glue:GetDatabases",
"glue:GetTable",
"glue:GetTables",
"glue:GetPartitions"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::your-data-bucket",
"arn:aws:s3:::your-data-bucket/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::your-athena-results",
"arn:aws:s3:::your-athena-results/*"
]
}
]
}
Two distinct S3 ARNs: one for the data you're querying (read-only is enough), one for the query result location (read + write — Athena writes result files there).
Connect
- Open Integrations → + Add connection → Athena.
- Fill in:
- Region — e.g.
us-east-1 - Workgroup — typically
primary; pick the one whose result location your IAM principal can write to - Database — the Glue catalog database (e.g.
analytics) - Output location — S3 URI (e.g.
s3://your-athena-results/orchid/). Orchid auto-fills this from the workgroup if it's set there; otherwise paste explicitly. - Auth — Access key + secret, AWS profile, or SSO
- Region — e.g.
- Click Test connection → save.
Authentication methods
Access key + secret
Simplest. Paste the access key ID and secret access key. Stored in your OS keychain. Use a dedicated IAM user for Orchid, not your root account.
AWS profile (~/.aws/credentials)
If you have ~/.aws/credentials set up, pick Use AWS profile and provide the profile name. Orchid loads credentials from your local AWS config — same identity you use with the AWS CLI.
SSO / IAM Identity Center
For organizations on AWS SSO, run aws sso login --profile your-profile in your shell, then point Orchid at that profile.
Optional settings
Engine version
Athena supports SQL engine v2 (Trino-based) and v3 (newer Trino). The workgroup determines this. v3 is recommended for new workgroups — wider function coverage and better performance.
Result encryption
If your workgroup enforces result encryption (SSE-S3, SSE-KMS, or CSE-KMS), Orchid respects it transparently. KMS-encrypted results require kms:Decryptpermission on the principal.
Athena charges $5 per TB scanned (on-demand). Orchid shows the bytes-scanned for each query in the result panel. Always filter on partition columns, and prune columns with explicit SELECT a, b rather than SELECT *.
Common gotchas
- "Insufficient permissions to execute the query" — your IAM principal can't write to the result location, or can't read from the data S3 bucket. Check both bucket ARNs in the policy.
- "HIVE_PARTITION_SCHEMA_MISMATCH" — Glue catalog and underlying Parquet files disagree on schema (column added, type changed). Run
MSCK REPAIR TABLE your_table;or recreate the table. - Query exceeds Athena query timeout (30 minutes) — refactor to scan less data (partition filters, column pruning), or split into multiple queries.
- Results truncated — Athena's result stream is paginated. Orchid streams the full result back; if you see truncation it's usually a workgroup setting capping result size.
- "ResourceNotFoundException: Database not found" — the database name is wrong, or it lives in a different Glue catalog (not the default one). Athena uses
AwsDataCatalogby default. - Region mismatch — Athena queries data in the same region as the workgroup. Cross-region S3 reads work but cost more and are slower.
Example queries
-- List databases in the default catalog
SHOW DATABASES;
-- Daily event count with partition filter
SELECT date_trunc('day', from_unixtime(event_time)) AS day,
count(*) AS events
FROM events
WHERE year = 2026 AND month = 5
GROUP BY 1
ORDER BY 1;
-- Top S3 access log paths today (CloudFront logs example)
SELECT request_uri, count(*) AS hits
FROM cloudfront_logs
WHERE date = current_date
GROUP BY request_uri
ORDER BY hits DESC
LIMIT 50;
Where to go next
For more on writing SQL cells, see SQL cells.