Docling
IBM's document conversion tool for AI pipelines
Data & ETLFree (open-source)★ 15,000
About Docling
Docling by IBM Research converts documents (PDFs, DOCX, PPTX, images) into clean, structured formats optimized for AI. It handles tables, OCR, and complex layouts — designed for RAG pipelines.
Features
PDF conversion
Table extraction
OCR
Markdown output
LlamaIndex integration
The tally
FOR
- +Excellent PDF parsing
- +Table extraction
- +OCR capability
- +IBM Research quality
- +LlamaIndex integration
AGAINST
- −Heavy dependencies
- −Can be slow on large docs
- −Python only
- −Complex output format
Kept nearby
Unstructured
ETL for unstructured data — PDFs, images, HTML to LLM-ready
Free (open-source) + API · ★ 9,000
LlamaIndex
Data framework for connecting LLMs to external data
Free (open-source) + Cloud · ★ 38,000
Firecrawl
Turn websites into LLM-ready markdown or structured data
Free (open-source) + Cloud · ★ 20,000
Crawl4AI
Open-source LLM-friendly web crawler and scraper
Free (open-source) · ★ 50,000