Tray IDP
Intelligent Document Processing (IDP) extracts structured data from unstructured documents and integrates it directly into your business systems through Tray's automation platform.
Tray Intelligent Document Processing (IDP) extracts structured data from unstructured documents including PDFs, images (JPEG, PNG), and multi-page files (TIFF). IDP integrates extracted data directly into your business systems through Tray's 600+ connectors, reducing manual data entry and accelerating workflows such as invoice processing, contract analysis, claims management, and employee onboarding.
For detailed connector operations and API reference, see the Merlin IDP Connector documentation.
When to Use Tray IDP
Use Tray IDP to automate document-heavy processes where:
- Documents arrive in multiple formats - PDFs, images, scanned documents, multi-page files
- Data must flow into downstream systems - ERP, CRM, databases, storage systems
- Manual data entry creates bottlenecks - Time-consuming transcription and errors
- Document volumes justify automation - Processing dozens to thousands of documents per month
- Structured data extraction is needed - Invoices, forms, contracts, receipts
Common Use Cases
Finance & Accounting
- Invoice processing and AP automation - Extract vendor details, line items, totals, and payment terms from invoices for automated entry into accounting systems
- Expense report management - Parse receipts and expense forms to streamline reimbursement workflows
- Purchase order reconciliation - Match purchase orders with invoices and delivery documents
- Financial statement analysis - Extract key figures from statements and reports for consolidation
Legal & Compliance
- Contract data extraction and management - Pull key terms, dates, obligations, and parties from contracts
- Regulatory compliance documentation - Extract required data points from compliance filings and reports
- Legal discovery support - Process and categorize large volumes of documents for e-discovery
Human Resources
- Resume parsing and candidate screening - Extract education, experience, skills, and contact information from resumes
- Employee onboarding documentation - Process ID documents, tax forms, and employment agreements
- Benefits administration - Extract data from benefits enrollment forms and insurance documents
Healthcare
- Patient intake forms - Digitize patient information from registration and medical history forms
- Insurance claims processing - Extract diagnosis codes, treatment details, and billing information
- Medical records digitization - Convert paper records into structured, searchable data
Supply Chain & Logistics
- Bill of lading extraction - Pull shipment details, weights, and routing information
- Customs documentation - Extract required fields from customs forms and declarations
- Shipping manifests - Process cargo lists and tracking information
Insurance
- Policy application processing - Extract applicant information and coverage details
- Claims adjudication - Parse claims forms, supporting documentation, and medical records
- Underwriting document analysis - Extract risk factors and financial information from applications
How It Works
Tray IDP uses natural language queries to extract specific information from documents. The extraction process follows these steps:
- Document Input - Connect to document sources (email attachments, cloud storage, API uploads, webhooks)
- Define Queries - Specify what data to extract using plain English questions
- Process Documents - IDP analyzes document structure, layout, and content to locate requested information
- Integrate Data - Map extracted data to destination systems via any of Tray's 600+ connectors
Example: Invoice Processing
Here's how IDP processes an invoice and sends data to NetSuite:
Query examples for invoice extraction:
"What is the invoice number?"
"What is the vendor name?"
"What is the invoice date?"
"What is the invoice total?"
"What is the due date?"
"Extract all line items with description, quantity, unit price, and total"
Extracted data structure:
{
"invoice_number": "INV-2024-001",
"vendor_name": "Acme Supplies Inc.",
"invoice_date": "2024-03-15",
"invoice_total": "$4,850.00",
"due_date": "2024-04-15",
"line_items": [
{
"description": "Office Supplies - Premium Pack",
"quantity": 10,
"unit_price": "$250.00",
"total": "$2,500.00"
},
{
"description": "Printer Toner Cartridges",
"quantity": 5,
"unit_price": "$470.00",
"total": "$2,350.00"
}
]
}
This structured data then flows directly into NetSuite for bill creation, eliminating manual data entry.
Supported Formats and Capabilities
File Format Support
- PDF - Single or multi-page documents (up to 20 pages)
- JPEG - High-resolution images (minimum 300 DPI recommended)
- PNG - Images with transparency support
- TIFF - Multi-page scanned documents
Input Methods
- File URL - Process documents hosted on web servers or cloud storage (required)
- Email attachments - Extract attachment URLs from incoming emails
- Cloud storage integrations - Connect to Google Drive, Dropbox, OneDrive, or S3 to access document URLs
- API integrations - Receive document URLs from any connected system
Document Types Supported
- Invoices and bills
- Purchase orders
- Receipts and expense reports
- Contracts and agreements
- Forms and applications
- Tax documents
- Medical records and claims
- Shipping documents
- Identity documents
- Resumes and CVs
Key Capabilities
- Natural language queries - Extract data using plain English questions without complex templates
- Multi-field extraction - Extract multiple data points in a single operation
- Table and line item extraction - Parse structured tables with multiple rows
- Context awareness - Understands document layout, headers, footers, and structure
- Multi-page processing - Handle complex documents up to 20 pages
- 600+ integrations - Direct connection to business systems without custom development
Getting Started
Prerequisites
Before using Tray IDP, ensure you have:
- Active Tray.ai account with IDP access enabled (contact your Customer Success representative if not enabled)
- Document source configured (email inbox, cloud storage connection, API endpoint)
- Destination system connection set up (optional but recommended for automation)
Quick Start Guide
-
Add the Merlin IDP connector to your Tray workflow from the connector library
-
Configure document input with the following parameters:
- File name - Name of the document for tracking and logging
- File URL - Web address where the document is accessible (required)
- MIME type - Specify the file format:
application/pdffor PDF filesimage/jpegfor JPEG imagesimage/pngfor PNG imagesimage/tiffor TIFF files
- File expire - URL expiration timestamp for hosted files
-
Define extraction queries as a list of natural language questions
- Be specific about what information you need
- Use terminology that appears in the documents
- For tables, specify all columns you want to extract
-
Map extracted data to your destination system using Tray's data mapping tools
-
Add validation and error handling to ensure data quality and handle exceptions
Sample Workflow Templates
Ready-to-use templates for common IDP scenarios:
- Invoice to NetSuite - Automatically process invoices and create bills in NetSuite
- Contract analysis to Salesforce - Extract contract terms and update Salesforce opportunities
- Resume parsing to ATS - Parse resumes and create candidate records in applicant tracking systems
- Expense report to QuickBooks - Process employee receipts and create expense entries
For step-by-step connector configuration, see the Merlin IDP Connector operations guide.
Best Practices
Document Quality Recommendations
For optimal extraction accuracy:
- Use high-resolution images (minimum 300 DPI, 600 DPI recommended for scanned documents)
- Ensure text is legible and not obscured by stamps, signatures, or annotations
- Avoid skewed or rotated pages - straighten scanned documents before processing
- Remove unnecessary pages - blank pages or cover sheets that don't contain data
- Use color or grayscale rather than black and white for better text recognition
Query Design Tips
Writing effective extraction queries:
-
Be specific in your questions
- Good: "What is the total amount due?"
- Avoid: "What is the amount?"
-
Use terminology that matches the document
- If the document says "Invoice Number", ask "What is the invoice number?"
- Adapt queries to different document formats or regional variations
-
For tables, specify all required columns
- Example: "Extract all line items with item description, quantity, unit price, and line total"
-
Request dates in a consistent format
- Example: "What is the invoice date in YYYY-MM-DD format?"
-
Ask for full contact information
- Example: "What is the vendor's complete name, address, and tax ID?"
Workflow Design Patterns
Building robust IDP workflows:
Error Handling
- Add conditional logic to check for missing or unclear data
- Set up fallback paths for documents that fail processing
- Implement retry logic with exponential backoff for transient failures
Validation Checks
- Cross-check extracted values against expected formats (dates, amounts, IDs)
- Validate totals by summing line items and comparing to extracted total
- Flag mismatches for human review
Human Review Workflows
- Route documents to review queues when:
- Required fields are missing
- Extracted values seem unreasonable (negative amounts, future dates)
- Document quality is poor
- Use approval connectors (Slack, Microsoft Teams, email) for review requests
Batch Processing
- Process multiple documents in loops for efficiency
- Add delays between documents to respect rate limits
- Group similar documents together for consistent processing
Logging and Monitoring
- Log all extraction attempts with document IDs
- Track processing times and success rates
- Monitor for patterns in failures to improve queries
Integration with Other AI Tools
Enhance IDP workflows by combining with other Tray AI capabilities:
Merlin Guardian - Data masking and PII protection
- Mask sensitive information (SSN, credit card numbers, personal data) before storing or sharing
- Ensure compliance with GDPR, HIPAA, PCI-DSS regulations
- Example: Extract patient data from medical forms, then mask PII before sending to analytics systems
Merlin Text Analysis - Sentiment and classification
- Analyze sentiment of contract clauses or customer feedback in documents
- Classify documents by type, urgency, or department
- Example: Extract text from support tickets, analyze sentiment, and route to appropriate teams
Merlin Generate Text - Summarization and content generation
- Generate executive summaries of extracted contract terms
- Create email notifications with document highlights
- Example: Extract invoice details, then generate approval request email with key information
Integration Pattern Example:
Combine multiple AI connectors in sequence for sophisticated document workflows: Document → Merlin IDP (extract) → Merlin Guardian (mask PII) → Merlin Text Analysis (classify) → Route to appropriate system based on classification.
Limitations and Considerations
Processing Limits
Document Size Constraints
- Pages per execution: 20 pages maximum per document
- Monthly quota: 1,000 pages per month (default allocation)
- File size: Recommended maximum 10MB per file (varies by format)
- Documents per workflow: No hard limit, but consider rate limits
For higher volumes: Contact your Customer Success representative to discuss increasing limits based on your needs.
Document Constraints
Factors That May Reduce Accuracy:
- Small text or low resolution - Text smaller than 8pt or images below 200 DPI
- Poor scan quality - Faded, blurry, or low-contrast documents
- Handwritten text - Limited support for handwriting recognition
- Complex multi-column layouts - Newspapers or academic papers with intricate formatting
- Heavy redactions or annotations - Stamps, signatures, or highlights that obscure text
- Non-standard fonts - Decorative or stylized fonts may reduce accuracy
- Mixed languages - Documents with multiple languages in a single page
Processing Time
Typical extraction times by document complexity:
- Simple documents (1-2 pages, straightforward layout): 10-20 seconds
- Standard documents (3-5 pages, tables and forms): 20-40 seconds
- Complex documents (10-20 pages, multiple tables): 30-60 seconds
Processing time varies based on:
- Number of pages
- Document complexity
- Number of queries
- System load
For time-sensitive workflows, account for processing time in your automation design and consider parallel processing for batch operations.
Considerations
Extraction Accuracy
Factors Affecting Accuracy:
- Document quality - Higher quality source documents yield better results
- Query specificity - Well-defined questions produce more accurate extractions
- Document standardization - Consistent formats from the same source improve reliability
- Field complexity - Simple fields (dates, numbers) extract more reliably than complex free-text
Improving Accuracy:
- Test with representative sample documents before production deployment
- Refine queries based on test results
- Standardize document sources when possible
- Implement validation checks in workflows
Review and Validation Workflows
For business-critical data, implement human review:
- Extract data with IDP
- Apply validation rules (format checks, range validation, required fields)
- Flag for review when:
- Required fields are missing
- Values fall outside expected ranges
- Document quality is questionable
- Route to approval using Slack, Teams, or email connectors
- Collect corrections and update destination systems
Data Security and Compliance
Security Measures:
- All document processing occurs in SOC 2 Type II compliant environments
- Data is encrypted in transit and at rest using industry-standard protocols
- Documents are not retained after processing completes
- Processing occurs in regional data centers based on your Tray instance location
Compliance Considerations:
- GDPR: IDP processes documents within your specified region and does not store personal data
- HIPAA: Suitable for processing healthcare documents when combined with proper workflow design and BAA
- PCI-DSS: Can process payment-related documents; combine with Merlin Guardian for card number masking
For specific compliance requirements, consult with your Tray Customer Success team.
Testing Best Practices
Before Production Deployment:
- Collect test documents - Gather 20-30 representative samples covering variations you expect
- Define success criteria - Set accuracy targets (e.g., 95% field accuracy)
- Test extraction queries - Iterate on query wording based on results
- Validate with real data - Process actual documents in a test environment
- Measure performance - Track processing time and success rates
- Document edge cases - Identify document types or conditions that require special handling
Advanced Patterns
Pricing and Access
Tray IDP is available on specific pricing plans. Default allocation is 1,000 pages per month with options to increase based on your processing needs.
To get started with IDP:
- Contact your Customer Success representative to enable IDP in your workspace
- Request higher processing limits for increased document volumes
- Discuss enterprise requirements for custom implementations and dedicated support
Contact Options:
- Reach out to your assigned CSM
- Email sales@tray.io for new inquiries
- Contact support@tray.io for technical assistance