Docling
IBM's document conversion tool for AI pipelines
⭐15,000
Data & ETLFree (open-source)
About Docling
Docling by IBM Research converts documents (PDFs, DOCX, PPTX, images) into clean, structured formats optimized for AI. It handles tables, OCR, and complex layouts — designed for RAG pipelines.
Features
✦PDF conversion
✦Table extraction
✦OCR
✦Markdown output
✦LlamaIndex integration
Pros & Cons
Pros
- +Excellent PDF parsing
- +Table extraction
- +OCR capability
- +IBM Research quality
- +LlamaIndex integration
Cons
- −Heavy dependencies
- −Can be slow on large docs
- −Python only
- −Complex output format
Platforms
LinuxmacOSWindows
Tags
Similar Tools
Unstructured
ETL for unstructured data — PDFs, images, HTML to LLM-ready
Free (open-source) + APILlamaIndex
Data framework for connecting LLMs to external data
Free (open-source) + CloudFirecrawl
Turn websites into LLM-ready markdown or structured data
Free (open-source) + CloudHaystack
Open-source LLM framework for building NLP pipelines
Free (open-source)Category
Browse all Data & ETL tools →Need help choosing?
Compare Docling with alternatives side by side
Compare Tools →