Skip to content

FromS3

Retrieves data from objects in a configured S3 bucket and prefix.

What It Does

Reads objects from an S3-compatible bucket and emits one generated item per object.

It can walk the bucket directly or load object metadata first, filter keys, skip empty objects, order the results, and attach storage metadata that describes which object was used.

YAML Example

DataSources:
  - Name: BucketPayloads
    Generator: FromS3
    GeneratorConfiguration:
      DataArrangeOrder: AsciiAsc
      LoadMetadataFirst: true
      StorageMetaData: ItemName
      S3:
        AccessKey: access-key
        SecretKey: secret-key
        ServiceURL: http://minio.local:9000
        StorageBucket: qaas-docs
        Prefix: payloads/
        SkipEmptyObjects: true
        ForcePathStyle: true

What This Configuration Does

This configuration reads objects from the qaas-docs bucket under the payloads/ prefix, skips empty objects, and emits the remaining objects in deterministic ASCII order.

Because metadata is loaded first, the generator can order and filter the object set before it starts retrieving the actual object bodies.