Docling

IBM's document conversion tool for AI pipelines
Data & ETLFree (open-source)15,000

About Docling

Docling by IBM Research converts documents (PDFs, DOCX, PPTX, images) into clean, structured formats optimized for AI. It handles tables, OCR, and complex layouts — designed for RAG pipelines.

Features

PDF conversion
Table extraction
OCR
Markdown output
LlamaIndex integration

The tally

FOR
  • +Excellent PDF parsing
  • +Table extraction
  • +OCR capability
  • +IBM Research quality
  • +LlamaIndex integration
AGAINST
  • Heavy dependencies
  • Can be slow on large docs
  • Python only
  • Complex output format

Kept nearby

Browse all Data & ETL tools →