Best Embedding Models for RAG

Embeddings are the silent workhorse of RAG pipelines. The list below is ordered by adoption and includes both bi-encoders (for retrieval) and cross-encoders (for reranking).

  1. 1

    all-MiniLM-L6-v2

    0.023B params

    Tiny embedding model. Only 23MB. Perfect for on-device search.

    Min VRAM: 0.1GBQuant: Q8_0Size: 0.023GBLicense: apache-2.0
  2. 2

    BGE Small EN v1.5

    0.033B params

    Compact English embedding model. Good for basic semantic search.

    Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: mit
  3. 3

    Nomic Embed Text v1.5

    0.137B params

    High quality text embedding model. 137M params. Good for RAG and search.

    Min VRAM: 0.3GBQuant: Q8_0Size: 0.139GBLicense: apache-2.0
  4. 4

    BGE Large EN v1.5

    0.335B params

    High quality English embedding model. Best accuracy for English search.

    Min VRAM: 0.83GBQuant: Q8_0Size: 0.334GBLicense: mit
  5. 5

    BGE Reranker v2 M3

    0.568B params

    Multilingual reranker. 100+ languages. 1.1GB.

    Min VRAM: 1.58GBQuant: FP16Size: 1.08GBLicense: mit
  6. 6

    Snowflake Arctic Embed S

    0.033B params

    Compact embedding model from Snowflake. Good multilingual support.

    Min VRAM: 0.1GBQuant: Q8_0Size: 0.036GBLicense: apache-2.0
  7. 7

    Jina Reranker Tiny EN

    0.033B params

    Tiny English reranker. Only 67MB. Use with embedding models for better search.

    Min VRAM: 0.15GBQuant: FP16Size: 0.067GBLicense: apache-2.0

Related