Advanced Use Cases

SQL Transformer

Run SQL queries on files within your Tray workflows.

If you're interested in using this feature, please reach out to your Customer Success Manager or Account Executive.

Overview

The SQL Transformer is a Tray connector that enables you to run SQL queries directly on files within your workflows. Whether your files are stored in the Tray File System or sourced from external systems, the SQL Transformer allows you to query, join, and transform data using standard SQL syntax.

Supported File Formats

The SQL Transformer supports the following file formats as input sources:

  • CSV - Comma-separated values
  • JSON - JavaScript Object Notation
  • Parquet - Columnar storage format
  • Excel - Microsoft Excel files
  • Plain text - Raw text data

How It Works

The SQL Transformer operates as a connector within your Tray workflow:

  1. Add the connector - Insert the SQL Transformer connector into your workflow
  2. Select source files - Choose one or more files to query from the Tray File System or other sources
  3. Write your SQL query - Use DuckDB SQL syntax to query and transform your data
  4. Choose output format - Output results as JSON or save as a file in your preferred format

SQL Transformer workflow

Adding Sources

To query files with the SQL Transformer, you first need to add them as sources.

Each source file requires a unique name that you'll reference in your SQL query.

Source Configuration

For each source, you'll specify:

  • Source Name - A unique identifier to reference in your SQL query (e.g., "customers", "orders")
  • Content Type - The format of your file (CSV, JSON, Parquet, Excel)
  • File - The file to query, selected from the Tray File System
  • Sheet Name (Excel only) - The specific sheet to query within an Excel workbook

Source configuration

File Explorer

Use the built-in file explorer to browse and select files from your Tray File System:

File explorer

Writing SQL Queries

The SQL Transformer uses DuckDB SQL syntax to query your data. DuckDB is a fast, in-process SQL database that supports standard SQL operations.

Basic Query Structure

Reference your sources using the names you defined when adding them:

SELECT
    o.order_id,
    o.order_date,
    c.name AS customer_name,
    c.email,
    c.city,
    c.country,
    o.product,
    o.quantity,
    o.price,
    (o.quantity * o.price) AS total_amount
FROM customers AS c
INNER JOIN orders AS o
    ON c.customer_id = o.customer_id
ORDER BY o.order_date DESC;

SQL query example

Querying Multiple Files

You can join data across multiple source files in a single query:

SELECT
    customers.name,
    customers.email,
    COUNT(orders.order_id) as total_orders,
    SUM(orders.amount) as total_revenue
FROM customers
LEFT JOIN orders
    ON customers.customer_id = orders.customer_id
GROUP BY customers.name, customers.email
ORDER BY total_revenue DESC;

Common SQL Operations

DuckDB supports standard SQL operations including:

  • SELECT - Retrieve specific columns
  • WHERE - Filter rows based on conditions
  • JOIN - Combine data from multiple sources (INNER, LEFT, RIGHT, FULL OUTER)
  • GROUP BY - Aggregate data by specified columns
  • ORDER BY - Sort results
  • Aggregate functions - COUNT, SUM, AVG, MIN, MAX
  • String functions - CONCAT, UPPER, LOWER, SUBSTRING
  • Date functions - Date manipulation and formatting

For detailed DuckDB SQL syntax and functions, refer to the DuckDB documentation.

Output Options

The SQL Transformer provides flexible output options to suit your workflow needs:

JSON Result Output

Output query results as JSON data that can be used directly in subsequent workflow steps:

JSON output option

Use JSON output when:

  • Passing data to API endpoints
  • Processing results in subsequent workflow steps
  • Working with individual records programmatically

File Output

Save query results as a file in the Tray File System:

File output option

File output supports:

  • CSV - Comma-separated values
  • JSON - JavaScript Object Notation
  • Parquet - Columnar storage format
  • Excel - Microsoft Excel files

File output configuration:

  • Parent Folder ID - The Tray File System folder to save the file
  • Output File Name - The name for your output file
  • File Format - Choose your preferred format
  • CSV Options - Configure delimiters and formatting (for CSV output)

Use file output when:

  • Creating reports for download or storage
  • Preparing data for external systems
  • Archiving query results
  • Building data pipelines with file-based handoffs

Use Cases

Data Aggregation and Reporting

Combine order data with customer information to generate sales reports:

SELECT
    country,
    COUNT(DISTINCT customer_id) as customers,
    COUNT(order_id) as total_orders,
    SUM(total_amount) as total_revenue,
    AVG(total_amount) as avg_order_value
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
GROUP BY country
ORDER BY total_revenue DESC;

Data Cleansing and Filtering

Clean and filter data before passing to downstream systems:

SELECT
    customer_id,
    UPPER(name) as name,
    LOWER(email) as email,
    city,
    country
FROM customers
WHERE
    email IS NOT NULL
    AND country IN ('US', 'CA', 'GB')
    AND status = 'active';

Joining Multiple Data Sources

Merge data from different systems before integration:

SELECT
    crm.customer_id,
    crm.name,
    crm.email,
    billing.subscription_tier,
    billing.mrr,
    support.ticket_count,
    support.last_contact_date
FROM crm_customers AS crm
LEFT JOIN billing_data AS billing
    ON crm.customer_id = billing.customer_id
LEFT JOIN support_tickets AS support
    ON crm.customer_id = support.customer_id
WHERE billing.status = 'active';

Data Transformation for ETL

Transform data structure and format for loading into target systems:

SELECT
    order_id,
    customer_id,
    DATE_FORMAT(order_date, '%Y-%m-%d') as order_date_formatted,
    product,
    quantity,
    price,
    CASE
        WHEN total_amount > 1000 THEN 'High Value'
        WHEN total_amount > 500 THEN 'Medium Value'
        ELSE 'Standard'
    END as order_category
FROM orders
WHERE order_date >= '2024-01-01';

Best Practices

Query Performance

For optimal performance, filter data early in your query using WHERE clauses before performing joins or aggregations.

  1. Use WHERE clauses - Filter unnecessary data before joins
  2. Select only needed columns - Avoid SELECT * when possible
  3. Test with sample data - Validate queries on smaller datasets first
  4. Consider file sizes - Large files may require additional processing time

Source Naming

  • Use descriptive, meaningful names for your sources (e.g., "customers", "orders", "products")
  • Keep names short and simple for easier query writing
  • Use consistent naming conventions across your workflow

Error Handling

Configure error handling at the connector level:

  • Stop workflow (default) - Halt execution if query fails
  • Continue workflow - Proceed even if errors occur
  • Custom error handling - Implement specific logic for query failures

Always validate your SQL syntax before running workflows in production. Invalid queries will cause workflow execution to fail.

Common Patterns

Deduplication

Remove duplicate records while keeping the most recent:

SELECT DISTINCT ON (customer_id)
    customer_id,
    name,
    email,
    updated_at
FROM customers
ORDER BY customer_id, updated_at DESC;

Date Range Filtering

Filter records within a specific date range:

SELECT *
FROM orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
ORDER BY order_date;

Conditional Aggregation

Create summary statistics with conditional logic:

SELECT
    product_category,
    COUNT(*) as total_orders,
    SUM(CASE WHEN amount > 100 THEN 1 ELSE 0 END) as high_value_orders,
    SUM(CASE WHEN status = 'returned' THEN 1 ELSE 0 END) as returns
FROM orders
GROUP BY product_category;

Troubleshooting

DuckDB SQL Reference

The SQL Transformer uses DuckDB SQL syntax. Key features include:

  • ANSI SQL compliant - Standard SQL operations work as expected
  • Rich function library - Extensive built-in functions for data manipulation
  • Advanced analytics - Window functions, CTEs, and complex aggregations
  • Performance optimized - Efficient query execution on large datasets

For complete SQL reference, see the DuckDB SQL documentation.

Next Steps

Was this page helpful?