Athena.

AWS Athena runs SQL over data sitting in S3 — Parquet, ORC, JSON, CSV. Orchid connects to Athena through a workgroup, runs queries against your Glue Data Catalog, and streams results back into a notebook.

Coming soon

This connector is on the v1.1 roadmap. The setup steps below are the planned flow.

Requirements

  • An AWS account with Athena set up and at least one workgroup.
  • An S3 bucket configured as the workgroup's query result location.
  • An IAM principal (user or role) with Athena, Glue, and S3 permissions.
  • A Glue Data Catalog database containing the tables you want to query.

IAM setup

The IAM principal Orchid uses needs the following permissions. A managed policy that covers this is AmazonAthenaFullAccess, but a tighter custom policy is better. At minimum:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "athena:StartQueryExecution",
        "athena:GetQueryExecution",
        "athena:GetQueryResults",
        "athena:GetQueryResultsStream",
        "athena:StopQueryExecution",
        "athena:ListWorkGroups",
        "athena:GetWorkGroup"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetPartitions"
      ],
      "Resource": "*"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::your-data-bucket",
        "arn:aws:s3:::your-data-bucket/*"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket",
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::your-athena-results",
        "arn:aws:s3:::your-athena-results/*"
      ]
    }
  ]
}
      

Two distinct S3 ARNs: one for the data you're querying (read-only is enough), one for the query result location (read + write — Athena writes result files there).

Connect

  1. Open Integrations+ Add connectionAthena.
  2. Fill in:
    • Region — e.g. us-east-1
    • Workgroup — typically primary; pick the one whose result location your IAM principal can write to
    • Database — the Glue catalog database (e.g. analytics)
    • Output location — S3 URI (e.g. s3://your-athena-results/orchid/). Orchid auto-fills this from the workgroup if it's set there; otherwise paste explicitly.
    • Auth — Access key + secret, AWS profile, or SSO
  3. Click Test connection → save.
The Athena connection form with region, workgroup, database, output location, and auth method./docs-images/connectors/athena-form.png
The Athena connection form. The output location S3 bucket must be writable by your IAM principal.

Authentication methods

Access key + secret

Simplest. Paste the access key ID and secret access key. Stored in your OS keychain. Use a dedicated IAM user for Orchid, not your root account.

AWS profile (~/.aws/credentials)

If you have ~/.aws/credentials set up, pick Use AWS profile and provide the profile name. Orchid loads credentials from your local AWS config — same identity you use with the AWS CLI.

SSO / IAM Identity Center

For organizations on AWS SSO, run aws sso login --profile your-profile in your shell, then point Orchid at that profile.

Optional settings

Engine version

Athena supports SQL engine v2 (Trino-based) and v3 (newer Trino). The workgroup determines this. v3 is recommended for new workgroups — wider function coverage and better performance.

Result encryption

If your workgroup enforces result encryption (SSE-S3, SSE-KMS, or CSE-KMS), Orchid respects it transparently. KMS-encrypted results require kms:Decryptpermission on the principal.

Cost discipline

Athena charges $5 per TB scanned (on-demand). Orchid shows the bytes-scanned for each query in the result panel. Always filter on partition columns, and prune columns with explicit SELECT a, b rather than SELECT *.

Common gotchas

  • "Insufficient permissions to execute the query" — your IAM principal can't write to the result location, or can't read from the data S3 bucket. Check both bucket ARNs in the policy.
  • "HIVE_PARTITION_SCHEMA_MISMATCH" — Glue catalog and underlying Parquet files disagree on schema (column added, type changed). Run MSCK REPAIR TABLE your_table; or recreate the table.
  • Query exceeds Athena query timeout (30 minutes) — refactor to scan less data (partition filters, column pruning), or split into multiple queries.
  • Results truncated — Athena's result stream is paginated. Orchid streams the full result back; if you see truncation it's usually a workgroup setting capping result size.
  • "ResourceNotFoundException: Database not found" — the database name is wrong, or it lives in a different Glue catalog (not the default one). Athena uses AwsDataCatalog by default.
  • Region mismatch — Athena queries data in the same region as the workgroup. Cross-region S3 reads work but cost more and are slower.

Example queries

-- List databases in the default catalog
SHOW DATABASES;
      
-- Daily event count with partition filter
SELECT date_trunc('day', from_unixtime(event_time)) AS day,
       count(*) AS events
FROM events
WHERE year = 2026 AND month = 5
GROUP BY 1
ORDER BY 1;
      
-- Top S3 access log paths today (CloudFront logs example)
SELECT request_uri, count(*) AS hits
FROM cloudfront_logs
WHERE date = current_date
GROUP BY request_uri
ORDER BY hits DESC
LIMIT 50;
      

Where to go next

For more on writing SQL cells, see SQL cells.