Operations (sample payloads)
Main operations
Extract data
Extract text from a single image (jpg/png) or multi page PDF/Tiff document. Sample Input
{
"file": \{
"name": "invoice.pdf",
"url": "https://example.com/files/invoice.pdf",
"mime_type": "application/pdf",
"expires": 1623456789
\},
"features": \{
"layout": true,
"forms": true,
"tables": true,
"signatures": true
\},
"queries": [
"What is the total amount?",
"Who is the invoice issued to?"
],
"include_bounding_boxes": true
}
Sample Output
{
"totalPages": 2,
"pages": [
{
"number": 1,
"lines": [
{
"confidence": 0.98,
"items": [
{
"confidence": 0.99,
"text": "INVOICE",
"bounding_box": \{
"top": 50,
"left": 100,
"width": 200,
"height": 40
\}
}
]
}
],
"layout": {
"title": {
"confidence": 0.99,
"text": "INVOICE",
"bounding_box": \{
"top": 50,
"left": 100,
"width": 200,
"height": 40
\}
},
"header": {
"confidence": 0.95,
"text": "ABC Company",
"bounding_box": \{
"top": 10,
"left": 10,
"width": 150,
"height": 30
\}
},
"footer": {
"confidence": 0.9,
"text": "Page 1 of 2",
"bounding_box": \{
"top": 750,
"left": 450,
"width": 100,
"height": 20
\}
},
"page_number": {
"confidence": 0.99,
"text": "1",
"bounding_box": \{
"top": 750,
"left": 500,
"width": 20,
"height": 20
\}
},
"section_headers": [
{
"confidence": 0.97,
"text": "Bill To:",
"bounding_box": \{
"top": 150,
"left": 50,
"width": 100,
"height": 30
\}
}
],
"lists": [
{
"confidence": 0.95,
"items": [
{
"confidence": 0.96,
"text": "Item 1",
"bounding_box": \{
"top": 300,
"left": 50,
"width": 100,
"height": 20
\}
},
{
"confidence": 0.97,
"text": "Item 2",
"bounding_box": \{
"top": 330,
"left": 50,
"width": 100,
"height": 20
\}
}
],
"bounding_box": \{
"top": 300,
"left": 50,
"width": 100,
"height": 60
\}
}
]
}
}
],
"form_items": [
{
"pageNumber": 1,
"confidence": 0.98,
"key": {
"confidence": 0.99,
"text": "Invoice Number:",
"bounding_box": \{
"top": 100,
"left": 50,
"width": 150,
"height": 30
\}
},
"value": {
"confidence": 0.99,
"text": "INV-001",
"bounding_box": \{
"top": 100,
"left": 200,
"width": 100,
"height": 30
\}
}
}
],
"tables": [
{
"pageNumber": 1,
"confidence": 0.97,
"header": [
{
"confidence": 0.98,
"text": "Description",
"rowIndex": 0,
"columnIndex": 0,
"rowSpan": 1,
"columnSpan": 1,
"bounding_box": \{
"top": 400,
"left": 50,
"width": 200,
"height": 30
\}
},
{
"confidence": 0.98,
"text": "Amount",
"rowIndex": 0,
"columnIndex": 1,
"rowSpan": 1,
"columnSpan": 1,
"bounding_box": \{
"top": 400,
"left": 250,
"width": 100,
"height": 30
\}
}
],
"rows": [
{
"cells": [
{
"confidence": 0.99,
"text": "Product A",
"rowIndex": 1,
"columnIndex": 0,
"rowSpan": 1,
"columnSpan": 1,
"bounding_box": \{
"top": 430,
"left": 50,
"width": 200,
"height": 30
\}
},
{
"confidence": 0.99,
"text": "$100.00",
"rowIndex": 1,
"columnIndex": 1,
"rowSpan": 1,
"columnSpan": 1,
"bounding_box": \{
"top": 430,
"left": 250,
"width": 100,
"height": 30
\}
}
]
}
],
"bounding_box": \{
"top": 400,
"left": 50,
"width": 300,
"height": 60
\}
}
],
"queries": [
{
"pageNumber": 1,
"query": "What is the total amount?",
"confidence": 0.95,
"text": "The total amount is $100.00",
"bounding_box": \{
"Width": 0.3,
"Height": 0.05,
"Left": 0.6,
"Top": 0.8
\}
},
{
"pageNumber": 1,
"query": "Who is the invoice issued to?",
"confidence": 0.93,
"text": "The invoice is issued to XYZ Corporation",
"bounding_box": \{
"Width": 0.4,
"Height": 0.05,
"Left": 0.1,
"Top": 0.2
\}
}
],
"signatures": [
{
"pageNumber": 2,
"confidence": 0.9,
"bounding_box": \{
"top": 700,
"left": 400,
"width": 150,
"height": 50
\}
}
]
}
DDL operations
Extract HTML (DDL)
Note that DDL operations can only be called directly by Connectors API, or when using CustomJS in the Embedded solution editor for e.g. DDL-dependent data mapping
Extract text structure as HTML from a single image (jpg/png) or multi page PDF/Tiff document. Sample Input
{
"file": \{
"name": "sample_document.pdf",
"url": "https://example.com/files/sample_document.pdf",
"mime_type": "application/pdf",
"expires": 1623456789
\},
"skipElements": \{
"layout_header": false,
"layout_footer": true,
"layout_page_number": true,
"layout_title": false,
"layout_section_header": false,
"layout_table": false,
"layout_figure": false
\}
}
Sample Output
\{
"html": "<html><body><h1>Sample Document Title</h1><p>This is the first paragraph of the sample document.</p><h2>Section 1</h2><p>Here's some content for section 1.</p><table><tr><th>Column 1</th><th>Column 2</th></tr><tr><td>Data 1</td><td>Data 2</td></tr></table><h2>Section 2</h2><p>Here's some content for section 2.</p><img src='sample_image.jpg' alt='Sample image'><p>This is the last paragraph of the sample document.</p></body></html>"
\}
Extract markdown (DDL)
Note that DDL operations can only be called directly by Connectors API, or when using CustomJS in the Embedded solution editor for e.g. DDL-dependent data mapping
Extract text structure as Markdown from a single image (jpg/png) or multi page PDF/Tiff document. Sample Input
{
"file": \{
"name": "sample_document.pdf",
"url": "https://example.com/files/sample_document.pdf",
"mime_type": "application/pdf",
"expires": 1623456789
\},
"skipElements": \{
"layout_header": false,
"layout_footer": true,
"layout_page_number": true,
"layout_title": false,
"layout_section_header": false,
"layout_table": false,
"layout_figure": false
\}
}
Sample Output
{
"markdown": "# Sample Document Title\n\n## Introduction\n\nThis is a sample document to demonstrate the extraction of text structure as Markdown.\n\n### Section 1\n\nHere's some content for the first section.\n\n- Bullet point 1\n- Bullet point 2\n- Bullet point 3\n\n### Section 2\n\nHere's a table:\n\n| Column 1 | Column 2 | Column 3 |\n|----------|----------|----------|\n| Data 1 | Data 2 | Data 3 |\n| Data 4 | Data 5 | Data 6 |\n\n## Conclusion\n\nThis concludes our sample document."
}