Abstract
Semantic search—the ability to search by meaning rather than just keywords—is rapidly becoming a standard requirement for modern applications. This chapter guides you through building a “Hybrid Search” system in Laravel using OpenAI Embeddings and PostgreSQL’s pgvector extension. We cover the entire pipeline: database migrations for vector columns, generating embeddings on model save, and executing cosine similarity queries.
The Stack Explained
- Laravel: Orchestrates the logic.
- OpenAI API: Converts text (e.g., “A cozy place to stay”) into an Embedding Vector—a list of 1,536 floating-point numbers that represent the text’s semantic meaning.41
- pgvector: An extension for PostgreSQL that allows storing these vectors and performing mathematical distance calculations (Cosine Similarity) efficiently.
Step 1: Database Configuration
First, the database must be prepared. Standard Laravel migrations don’t support vector types natively, so we use DB::statement.
Migration Code:
public function up()
{
// Enable the extension
DB::statement('CREATE EXTENSION IF NOT EXISTS vector');
Schema::create('articles', function (Blueprint $table) {
$table->id();
$table->string('title');
$table->text('content');
// Add vector column with 1536 dimensions (matching OpenAI's ada-002 model)
$table->vector('embedding', 1536)->nullable();
$table->timestamps();
});
// Add an HNSW index for fast approximate nearest neighbor search
DB::statement('CREATE INDEX articles_embedding_index ON articles USING hnsw (embedding vector_cosine_ops)');
}
The HNSW index is critical for performance; without it, the database must perform a full sequential scan for every search, which is too slow for production.
Step 2: Generating Embeddings
We can use a Model Observer or the booted method to automatically generate an embedding whenever an article is created or updated.
use OpenAI\Laravel\Facades\OpenAI;
class Article extends Model {
// Cast the vector column to a PHP array
protected $casts = ['embedding' => \Pgvector\Laravel\Vector::class];
protected static function booted() {
static::saved(function ($article) {
// Prevent infinite loop if we are only saving the embedding
if ($article->isDirty('embedding')) return;
$response = OpenAI::embeddings()->create([
'model' => 'text-embedding-ada-002',
'input' => $article->title. ': '. $article->content,
]);
$article->embedding = $response->embeddings->embedding;
$article->saveQuietly();
});
}
}
Step 3: Performing the Semantic Search
To search, we convert the user’s query into a vector and then ask PostgreSQL to find the rows with the “closest” vectors.
$userQuery = "How do I fix a flat tire?";
// 1. Vectorize the query
$queryVector = OpenAI::embeddings()->create([
'model' => 'text-embedding-ada-002',
'input' => $userQuery,
])->embeddings->embedding;
// 2. SQL Search using Cosine Distance operator (<=>)
$results = Article::query()
->selectRaw("*, embedding <=>? as distance", [json_encode($queryVector)])
->orderBy('distance', 'asc')
->take(5)
->get();
The result will include articles about car maintenance or roadside assistance, even if they don’t contain the exact words “fix,” “flat,” or “tire”.