Best Embedding Models for RAG
Embeddings are the silent workhorse of RAG pipelines. The list below is ordered by adoption and includes both bi-encoders (for retrieval) and cross-encoders (for reranking).
- 1
all-MiniLM-L6-v2
0.023B paramsTiny embedding model. Only 23MB. Perfect for on-device search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.023GBLicense: apache-2.0 - 2
BGE Small EN v1.5
0.033B paramsCompact English embedding model. Good for basic semantic search.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: mit - 3
Nomic Embed Text v1.5
0.137B paramsHigh quality text embedding model. 137M params. Good for RAG and search.
Min VRAM: 0.3GBQuant: Q8_0Size: 0.139GBLicense: apache-2.0 - 4
BGE Large EN v1.5
0.335B paramsHigh quality English embedding model. Best accuracy for English search.
Min VRAM: 0.83GBQuant: Q8_0Size: 0.334GBLicense: mit - 5
BGE Reranker v2 M3
0.568B paramsMultilingual reranker. 100+ languages. 1.1GB.
Min VRAM: 1.58GBQuant: FP16Size: 1.08GBLicense: mit - 6
Snowflake Arctic Embed S
0.033B paramsCompact embedding model from Snowflake. Good multilingual support.
Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: apache-2.0 - 7
Jina Reranker Tiny EN
0.033B paramsTiny English reranker. Only 67MB. Use with embedding models for better search.
Min VRAM: 0.15GBQuant: FP16Size: 0.067GBLicense: apache-2.0