Best Embedding Models for RAG

Embeddings are the silent workhorse of RAG pipelines. The list below is ordered by adoption and includes both bi-encoders (for retrieval) and cross-encoders (for reranking).

1
all-MiniLM-L6-v2
0.023B params
Tiny embedding model. Only 23MB. Perfect for on-device search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.023GBLicense: apache-2.0
2
BGE Small EN v1.5
0.033B params
Compact English embedding model. Good for basic semantic search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: mit
3
Nomic Embed Text v1.5
0.137B params
High quality text embedding model. 137M params. Good for RAG and search.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.139GBLicense: apache-2.0
4
BGE Large EN v1.5
0.335B params
High quality English embedding model. Best accuracy for English search.
Min VRAM: 0.83GBQuant: Q8_0Size: 0.334GBLicense: mit
5
BGE Reranker v2 M3
0.568B params
Multilingual reranker. 100+ languages. 1.1GB.
Min VRAM: 1.58GBQuant: FP16Size: 1.08GBLicense: mit
6
Snowflake Arctic Embed S
0.033B params
Compact embedding model from Snowflake. Good multilingual support.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: apache-2.0
7
Jina Reranker Tiny EN
0.033B params
Tiny English reranker. Only 67MB. Use with embedding models for better search.
Min VRAM: 0.15GBQuant: FP16Size: 0.067GBLicense: apache-2.0

Related

All embedding models