Back to GlossaryGlossary

What is a Vector Database? The Storage Layer for Modern AI

Vector databases store embeddings — numerical representations of text, images, and audio — and find similar items in milliseconds. The infrastructure layer underneath RAG, recommendation, and semantic search systems. Learn what they are, when you need one, and the production tradeoffs.

What is a Vector Database? The Storage Layer for Modern AI

Listen to this article (2 min)
0:00--:--

A vector database stores high-dimensional numerical representations of data — called embeddings — and retrieves similar items by mathematical distance rather than exact match. It is the storage layer that makes semantic search, retrieval-augmented generation, recommendation engines, and most production LLM systems economically viable at enterprise scale.

The vector database market hit roughly $2.2 billion in 2024 and is projected to exceed $10 billion by 2028, driven almost entirely by enterprise RAG deployments. Pinecone, Weaviate, Qdrant, Chroma, and Milvus are the dominant pure-play vendors; pgvector (Postgres) and MongoDB Atlas Vector Search now ship vector capability inside existing relational and document databases.

The shift matters because traditional databases were built to answer questions like "find the row where customer_id equals 12345." Vector databases answer questions like "find the 10 paragraphs in our 8,000-page policy library that are most semantically similar to this customer's question." Same storage problem, fundamentally different retrieval pattern.

How Vector Databases Work (What Matters for Business)

Three components in every production deployment.

Embedding model. A neural network — typically OpenAI's text-embedding-3-large, Cohere's embed-v3, or an open-source model like BGE — converts each piece of text, image, or audio into a vector of 768 to 3,072 numbers. Two pieces of content with similar meaning produce similar vectors. The contract clause "Vendor will indemnify Buyer against all third-party claims" and the search query "who is responsible if a third party sues us" produce vectors that sit close together in vector space, even though they share almost no common words.

Vector storage and index. The database stores millions or billions of these vectors and builds an index — typically HNSW (Hierarchical Navigable Small Worlds) or IVF (Inverted File) — that allows similarity search in milliseconds rather than minutes. A linear scan over a billion vectors would take hours. A well-built index returns the top 10 most similar vectors in under 50ms.

Similarity search. When a query comes in, the database embeds it through the same model, then returns the K most similar stored vectors using cosine similarity or Euclidean distance. The retrieved items get passed to an LLM as context, or returned directly to the user.

When Enterprise Buyers Actually Need a Vector Database

You need one if any of the following is true:

You are building a RAG system on more than a few thousand documents. Below ~50,000 chunks, a flat in-memory index inside your application is enough. Above that, you need a real vector database to manage memory, persistence, updates, and concurrent queries.

Your users search by meaning, not keywords. Internal knowledge bases, support ticket triage, contract clause search, product catalog discovery — any system where "matches the intent" outperforms "matches the words" benefits from vector retrieval.

You need to combine semantic and structured filtering. Modern vector databases support hybrid queries — "find documents semantically similar to this query, but only those tagged 'finance' and modified in the last 90 days." This combination is where vector databases deliver the most over a generic embedding lookup.

You operate on multiple data types. Vector databases handle text, image, and audio embeddings in the same store. A multimodal product search ("find products visually similar to this photo and matching this text description") requires a single vector store, not separate systems per data type.

You probably do not need one if you are running a basic chatbot on a single FAQ document, doing keyword search over structured data, or evaluating fewer than ten documents per query — pgvector inside your existing Postgres instance handles those workloads without a separate system.

Vector Database vs Traditional Database

AspectTraditional DatabaseVector Database
Query typeExact match, range, joinsSemantic similarity (top-K nearest)
Data unitRow, document, key-valueHigh-dimensional vector (embedding)
IndexB-tree, hash, invertedHNSW, IVF, PQ
Best forTransactional, structuredSearch, retrieval, recommendation
Update costCheapCheap to add, expensive to re-embed
Latency targetSub-ms10-100ms for top-K search
Scale boundarySharding by keySharding by vector cluster or namespace

Most enterprise systems will use both — traditional databases for transactions and source-of-truth records, vector databases for the retrieval layer that feeds LLMs.

Production Tradeoffs Enterprise Buyers Must Understand

Embedding model choice locks you in. If you index 50 million documents with OpenAI's text-embedding-3-large and then decide to switch to Cohere's embed-v3, every document needs to be re-embedded. Re-embedding 50 million documents costs roughly $5,000-$15,000 in API fees and 24-72 hours of pipeline time. Pick the embedding model deliberately, and version it explicitly.

Hybrid search is now table stakes. Pure vector search underperforms on exact-match queries (product SKUs, dates, names). Production systems combine vector similarity with BM25 keyword scoring and structured metadata filters. Any vendor that doesn't support hybrid search out of the box should be ruled out for enterprise procurement.

Cost scales with vectors stored AND queries served. Pinecone's pricing model charges per million vectors stored per month plus per query. A 10M-document index serving 1M queries/month runs roughly $700-$1,500/month at production scale. Self-hosted Qdrant or Weaviate on your own infrastructure cuts that by 60-80% but requires DevOps capacity to operate.

Re-indexing is a planned outage, not a routine operation. Schema changes, embedding model upgrades, or chunking strategy changes all require a full re-index. For systems serving live RAG queries, plan for blue-green deployment with two parallel indexes during the migration window.

Latency at scale is harder than vendor benchmarks suggest. Sub-50ms top-K retrieval works at 10M vectors. At 1B vectors with concurrent queries from a production application, 99th-percentile latency commonly exceeds 250ms unless the index is carefully tuned and adequately resourced. Benchmark on your actual data volumes before finalizing architecture.

Where Vector Databases Sit in an Enterprise AI Stack

In a typical RAG deployment for an enterprise knowledge base or support automation system:

  1. Source documents (Confluence, Sharepoint, Notion, Salesforce, internal wikis) flow into a chunking pipeline.
  2. Each chunk is sent to an embedding model and stored in the vector database with metadata (source, author, last-modified, ACL).
  3. A user query is embedded the same way and searched against the index. Top-K relevant chunks come back.
  4. Those chunks are passed as context to an LLM along with the original query.
  5. The LLM generates a grounded answer with citations to the source chunks.

The vector database is the silent middle layer. When it works, no one notices. When it fails — stale data, slow queries, irrelevant retrieval — every downstream output gets worse.

Key Takeaways

  • Definition: A vector database stores embeddings and finds the most similar ones in milliseconds. The retrieval layer underneath modern LLM systems.
  • Best for: RAG systems above 50K chunks, semantic search, recommendation, multimodal retrieval, hybrid semantic + structured queries.
  • Not needed for: Small FAQ chatbots, keyword-only search, transactional workloads. Use pgvector inside Postgres for small deployments.
  • Cost: $700-$1,500/month for typical 10M-vector production deployments on managed platforms; 60-80% cheaper self-hosted with operational overhead.
  • Primary risk: Embedding model lock-in. Re-embedding is expensive — pick the embedding model with the same care as the LLM.

Frequently Asked Questions

What is the difference between a vector database and a regular database with vector support?

Pure vector databases (Pinecone, Weaviate, Qdrant, Milvus, Chroma) are built from the ground up around vector indexing, distance computation, and scale. Vector-extended traditional databases (Postgres with pgvector, MongoDB Atlas Vector Search, Elasticsearch with KNN) add vector capability to an existing system. The pure vector databases generally win on latency, scale, and index variety; the extensions win when you want to keep one operational system, share transactions across vector and relational data, or you are below the scale where dedicated infrastructure pays for itself. Most enterprises with under 5 million vectors should start with pgvector. Above that, evaluate dedicated systems.

How do I choose between Pinecone, Weaviate, Qdrant, Chroma, and Milvus?

Five rough decision criteria. Pinecone is the easiest managed option — fastest to production, highest cost. Weaviate has the strongest hybrid search and built-in modules; good for multimodal. Qdrant has the best self-hosted economics and Rust-based performance; common choice for cost-sensitive production. Chroma is the easiest local-development option; many teams prototype on Chroma and migrate to Qdrant or Pinecone for production. Milvus has the deepest scale story (billions of vectors) and is the choice for very large indexes. Run the same workload against two candidates with your actual embedding dimensions, query patterns, and concurrency targets before committing. Vendor benchmarks are not enterprise benchmarks.

Can we just use Postgres with pgvector instead of a dedicated vector database?

For most enterprise deployments under about 5 million vectors, yes. pgvector handles HNSW indexing, supports hybrid queries through standard Postgres SQL, and avoids running a second operational system. The crossover where a dedicated vector database meaningfully outperforms pgvector typically arrives at 10M+ vectors, sub-50ms latency requirements at high concurrency, or workloads that need IVF-PQ compression for memory efficiency. Until you hit one of those constraints, pgvector is the lower-risk default.

  • Retrieval-Augmented Generation — The architecture pattern that uses vector databases to ground LLM outputs in current, authoritative data
  • Large Language Models — The reasoning layer that consumes the retrieved context from a vector database
  • Knowledge Graph — A complementary structured retrieval approach; many production systems combine both
  • Multimodal AI — Vector databases store embeddings for text, image, and audio in the same index, making multimodal retrieval possible

Need help implementing AI?

We build production AI systems that actually ship. Talk to us about your document processing challenges.

Get in Touch