Table Question Answering (TQA) requires identifying the right table from a massive corpus before reasoning over its structure to derive answers. Existing methods like DTR and ColBERT are computationally expensive and need dataset-specific fine-tuning — limiting adaptability to new domains.
We propose CRAFT, a cascaded retrieval framework that chains off-the-shelf models in three progressive stages: sparse lexical retrieval (SPLADE) → dense semantic reranking (Sentence Transformer) → neural reranking (text-embedding-3). Table representations are enriched with LLM-generated titles and descriptions via Gemini 1.5 Flash.
CRAFT matches or outperforms SOTA fine-tuned retrievers on the NQ-Tables benchmark — without training a single parameter on the target dataset. End-to-end QA results with Mistral, LLaMA3, and Qwen demonstrate its effectiveness across diverse LLM backends.
A concise snapshot of the paper's main takeaways, performance gains, and practical advantages.
Sparse filtering, dense semantic reranking, and neural reranking work together to progressively narrow candidates without dataset-specific training.
The paper reports 41.13 Recall@1, 87.16 Recall@10, and 96.84 Recall@50 on NQ-Tables, exceeding fine-tuned baselines at deeper cutoffs.
CRAFT remains stable under paraphrased questions, with an average recall change of only -0.04 compared with much larger drops for fine-tuned DTR models.
Using compact sub-table context reduces token usage by more than 70% while still improving answer generation with pretrained LLMs such as Mistral, LLaMA3, and Qwen.
Five distinct advances that make CRAFT practical and competitive.
No fine-tuning on NQ-Tables or any target dataset. Purely off-the-shelf pretrained models chained together.
Three progressive retrieval stages — sparse → dense → neural — each refining candidates from the previous stage.
Outperforms THYME, DTR, BIBERT+SPLADE, and all other baselines at R@10 and R@50 without any training.
Under query perturbation, CRAFT loses only ~0.04 avg. recall points, while fine-tuned DTR drops 8–12 points.
Mini-table context reduces token count by 70%+, enabling cost-effective inference at scale across large corpora.
Progressive filtering from 169k tables down to a precise top-k for answer generation — each stage more expressive than the last.
CRAFT outperforms all fine-tuned retrievers at R@10 and R@50 on the NQ-Tables benchmark.
| Model | R@1 | R@10 | R@50 |
|---|---|---|---|
| SPARSE | |||
| BM25 | 18.49 | 36.94 | 52.61 |
| SPLADE | 39.84 | 83.33 | 94.65 |
| DENSE | |||
| DPR | 45.32 | 85.84 | 95.44 |
| TAPAS | 43.79 | 83.49 | 95.10 |
| DTR | 32.62 | 75.86 | 89.77 |
| T-RAG* | 46.07 | 85.40 | 95.03 |
| HYBRID | |||
| DHR | 43.67 | 84.65 | 95.62 |
| BIBERT+SPLADE | 45.62 | 86.72 | 95.62 |
| THYME | 48.55 | 86.38 | 96.08 |
| CRAFT (Ours — Training-Free) | |||
| 🏆 CRAFT | 41.13 | 87.16 | 96.84 |
CRAFT paired with off-the-shelf LLMs consistently surpasses all fine-tuned baselines.
Practical advantages that make CRAFT deployable in real-world scenarios.
Zero dataset-specific training. Deploy immediately on new domains without labeled examples.
Uses publicly available SPLADE, Sentence Transformers, and OpenAI embeddings. No proprietary stack needed.
Efficient cascaded filtering handles massive corpora without quadratic cost. Mini-tables cut token use by 70%.
Only −0.04 avg recall drop under paraphrasing. Fine-tuned competitors fall 8–12 points on the same perturbations.
Areas where CRAFT has room to grow and open questions for future work.
The three-stage cascade introduces additional inference steps compared to a single-model retriever. While each stage is lightweight, the sequential nature adds latency in real-time applications. Pipeline parallelization could address this in future work.
Evaluation is primarily conducted on the NQ-Tables dataset. Broader validation on other TQA benchmarks (OTT-QA, HybridQA, etc.) would better establish generalizability across diverse table formats and domains.
If CRAFT is useful for your research, please consider citing our paper.
@article{craft2025, title = {CRAFT: Training-Free Cascaded Retrieval for Tabular QA}, author = {Singh, Adarsh and Bhandari, Kushal Raj and Gao, Jianxi and Dan, Soham and Gupta, Vivek}, year = {2025}, eprint = {2505.14984}, archivePrefix = {arXiv}, primaryClass = {cs.CL} }