Skip to content

FromDataLake

TL;DR — Retrieves rows from the configured data lake query and exposes each row as a generated JSON object.

When to use

Runs a Trino query and emits the returned rows as generated JSON-like objects.

This generator is useful when test data already lives in a lakehouse or analytics system and you want the scenario to read it directly instead of exporting it first. Optional ignored columns let you drop fields that are not relevant for the rest of the flow.

YAML configuration

Use the hook name in the matching runtime section, then place hook-specific fields under the configuration object shown in the examples below.

Minimal example

DataSources:
  - Name: OrdersFromDataLake
    Generator: FromDataLake
    GeneratorConfiguration:
      TrinoServerUri: http://trino.local:8080
      Catalog: lakehouse
      Query: SELECT order_id, customer_id, debug_flag FROM qaas.orders
      Username: docs
      Password: docs-password
      ClientTag: qaas-docs
      ColumnsToIgnore:
        - debug_flag

Realistic example

This configuration connects to Trino, runs the provided query, and turns each result row into one generated item.

The debug_flag column is removed before the items are exposed to the rest of the configuration, so downstream consumers only see order_id and customer_id.

Edge cases

  • Missing required configuration keys fail schema validation before the hook runs.
  • Keep hook names and referenced session or data-source names aligned with the surrounding YAML.

See also