Operations (sample payloads)

Main operations

Extract data

Extract text from a single image (jpg/png) or multi page PDF/Tiff document. Sample Input

{
    "file": \{
        "name": "invoice.pdf",
        "url": "https://example.com/files/invoice.pdf",
        "mime_type": "application/pdf",
        "expires": 1623456789
    \},
    "features": \{
        "layout": true,
        "forms": true,
        "tables": true,
        "signatures": true
    \},
    "queries": [
        "What is the total amount?",
        "Who is the invoice issued to?"
    ],
    "include_bounding_boxes": true
}

Sample Output

{
    "totalPages": 2,
    "pages": [
        {
            "number": 1,
            "lines": [
                {
                    "confidence": 0.98,
                    "items": [
                        {
                            "confidence": 0.99,
                            "text": "INVOICE",
                            "bounding_box": \{
                                "top": 50,
                                "left": 100,
                                "width": 200,
                                "height": 40
                            \}
                        }
                    ]
                }
            ],
            "layout": {
                "title": {
                    "confidence": 0.99,
                    "text": "INVOICE",
                    "bounding_box": \{
                        "top": 50,
                        "left": 100,
                        "width": 200,
                        "height": 40
                    \}
                },
                "header": {
                    "confidence": 0.95,
                    "text": "ABC Company",
                    "bounding_box": \{
                        "top": 10,
                        "left": 10,
                        "width": 150,
                        "height": 30
                    \}
                },
                "footer": {
                    "confidence": 0.9,
                    "text": "Page 1 of 2",
                    "bounding_box": \{
                        "top": 750,
                        "left": 450,
                        "width": 100,
                        "height": 20
                    \}
                },
                "page_number": {
                    "confidence": 0.99,
                    "text": "1",
                    "bounding_box": \{
                        "top": 750,
                        "left": 500,
                        "width": 20,
                        "height": 20
                    \}
                },
                "section_headers": [
                    {
                        "confidence": 0.97,
                        "text": "Bill To:",
                        "bounding_box": \{
                            "top": 150,
                            "left": 50,
                            "width": 100,
                            "height": 30
                        \}
                    }
                ],
                "lists": [
                    {
                        "confidence": 0.95,
                        "items": [
                            {
                                "confidence": 0.96,
                                "text": "Item 1",
                                "bounding_box": \{
                                    "top": 300,
                                    "left": 50,
                                    "width": 100,
                                    "height": 20
                                \}
                            },
                            {
                                "confidence": 0.97,
                                "text": "Item 2",
                                "bounding_box": \{
                                    "top": 330,
                                    "left": 50,
                                    "width": 100,
                                    "height": 20
                                \}
                            }
                        ],
                        "bounding_box": \{
                            "top": 300,
                            "left": 50,
                            "width": 100,
                            "height": 60
                        \}
                    }
                ]
            }
        }
    ],
    "form_items": [
        {
            "pageNumber": 1,
            "confidence": 0.98,
            "key": {
                "confidence": 0.99,
                "text": "Invoice Number:",
                "bounding_box": \{
                    "top": 100,
                    "left": 50,
                    "width": 150,
                    "height": 30
                \}
            },
            "value": {
                "confidence": 0.99,
                "text": "INV-001",
                "bounding_box": \{
                    "top": 100,
                    "left": 200,
                    "width": 100,
                    "height": 30
                \}
            }
        }
    ],
    "tables": [
        {
            "pageNumber": 1,
            "confidence": 0.97,
            "header": [
                {
                    "confidence": 0.98,
                    "text": "Description",
                    "rowIndex": 0,
                    "columnIndex": 0,
                    "rowSpan": 1,
                    "columnSpan": 1,
                    "bounding_box": \{
                        "top": 400,
                        "left": 50,
                        "width": 200,
                        "height": 30
                    \}
                },
                {
                    "confidence": 0.98,
                    "text": "Amount",
                    "rowIndex": 0,
                    "columnIndex": 1,
                    "rowSpan": 1,
                    "columnSpan": 1,
                    "bounding_box": \{
                        "top": 400,
                        "left": 250,
                        "width": 100,
                        "height": 30
                    \}
                }
            ],
            "rows": [
                {
                    "cells": [
                        {
                            "confidence": 0.99,
                            "text": "Product A",
                            "rowIndex": 1,
                            "columnIndex": 0,
                            "rowSpan": 1,
                            "columnSpan": 1,
                            "bounding_box": \{
                                "top": 430,
                                "left": 50,
                                "width": 200,
                                "height": 30
                            \}
                        },
                        {
                            "confidence": 0.99,
                            "text": "$100.00",
                            "rowIndex": 1,
                            "columnIndex": 1,
                            "rowSpan": 1,
                            "columnSpan": 1,
                            "bounding_box": \{
                                "top": 430,
                                "left": 250,
                                "width": 100,
                                "height": 30
                            \}
                        }
                    ]
                }
            ],
            "bounding_box": \{
                "top": 400,
                "left": 50,
                "width": 300,
                "height": 60
            \}
        }
    ],
    "queries": [
        {
            "pageNumber": 1,
            "query": "What is the total amount?",
            "confidence": 0.95,
            "text": "The total amount is $100.00",
            "bounding_box": \{
                "Width": 0.3,
                "Height": 0.05,
                "Left": 0.6,
                "Top": 0.8
            \}
        },
        {
            "pageNumber": 1,
            "query": "Who is the invoice issued to?",
            "confidence": 0.93,
            "text": "The invoice is issued to XYZ Corporation",
            "bounding_box": \{
                "Width": 0.4,
                "Height": 0.05,
                "Left": 0.1,
                "Top": 0.2
            \}
        }
    ],
    "signatures": [
        {
            "pageNumber": 2,
            "confidence": 0.9,
            "bounding_box": \{
                "top": 700,
                "left": 400,
                "width": 150,
                "height": 50
            \}
        }
    ]
}

DDL operations

Extract HTML (DDL)

Note that DDL operations can only be called directly by Connectors API, or when using CustomJS in the Embedded solution editor for e.g. DDL-dependent data mapping

Extract text structure as HTML from a single image (jpg/png) or multi page PDF/Tiff document. Sample Input

{
    "file": \{
        "name": "sample_document.pdf",
        "url": "https://example.com/files/sample_document.pdf",
        "mime_type": "application/pdf",
        "expires": 1623456789
    \},
    "skipElements": \{
        "layout_header": false,
        "layout_footer": true,
        "layout_page_number": true,
        "layout_title": false,
        "layout_section_header": false,
        "layout_table": false,
        "layout_figure": false
    \}
}

Sample Output

\{
    "html": "<html><body><h1>Sample Document Title</h1><p>This is the first paragraph of the sample document.</p><h2>Section 1</h2><p>Here's some content for section 1.</p>&lt;table><tr><th>Column 1</th><th>Column 2</th></tr><tr><td>Data 1</td><td>Data 2</td></tr>&lt;/table&gt;<h2>Section 2</h2><p>Here's some content for section 2.</p><img src='sample_image.jpg' alt='Sample image'><p>This is the last paragraph of the sample document.</p></body></html>"
\}

Extract markdown (DDL)

Note that DDL operations can only be called directly by Connectors API, or when using CustomJS in the Embedded solution editor for e.g. DDL-dependent data mapping

Extract text structure as Markdown from a single image (jpg/png) or multi page PDF/Tiff document. Sample Input

{
    "file": \{
        "name": "sample_document.pdf",
        "url": "https://example.com/files/sample_document.pdf",
        "mime_type": "application/pdf",
        "expires": 1623456789
    \},
    "skipElements": \{
        "layout_header": false,
        "layout_footer": true,
        "layout_page_number": true,
        "layout_title": false,
        "layout_section_header": false,
        "layout_table": false,
        "layout_figure": false
    \}
}

Sample Output

{
    "markdown": "# Sample Document Title\n\n## Introduction\n\nThis is a sample document to demonstrate the extraction of text structure as Markdown.\n\n### Section 1\n\nHere's some content for the first section.\n\n- Bullet point 1\n- Bullet point 2\n- Bullet point 3\n\n### Section 2\n\nHere's a table:\n\n| Column 1 | Column 2 | Column 3 |\n|----------|----------|----------|\n| Data 1   | Data 2   | Data 3   |\n| Data 4   | Data 5   | Data 6   |\n\n## Conclusion\n\nThis concludes our sample document."
}