Docling

IBM's document conversion tool for AI pipelines

15,000
Data & ETLFree (open-source)

About Docling

Docling by IBM Research converts documents (PDFs, DOCX, PPTX, images) into clean, structured formats optimized for AI. It handles tables, OCR, and complex layouts — designed for RAG pipelines.

Features

PDF conversion
Table extraction
OCR
Markdown output
LlamaIndex integration

Pros & Cons

Pros

  • +Excellent PDF parsing
  • +Table extraction
  • +OCR capability
  • +IBM Research quality
  • +LlamaIndex integration

Cons

  • Heavy dependencies
  • Can be slow on large docs
  • Python only
  • Complex output format

Platforms

LinuxmacOSWindows

Tags

Similar Tools

Need help choosing?

Compare Docling with alternatives side by side

Compare Tools →