Data Filtering Networks (Alex Fang, 2023)
https://arxiv.org/pdf/2309.17425 (Alex Fang, 2023)
TLDR:
The classic paradigm — better data → better models — is replaced by: better data → better filtering model → even better model (filter-induced).
Train a Data Filtering Network (DFN) on a small subset of high-quality data. This paper specifically targets CLIP-like models that use paired image and text representations. The authors show that training even a small DFN (2B parameters) significantly improves zero-shot accuracy on ImageNet for ViT-L/14 (77.4% → 81.3%). The method also generalizes to visual question answering. Some interesting plots showcase how dataset “pollution” affects model quality.