Building AI-Powered Search with Laravel, OpenAI, and pgvector

Building AI-Powered Search with Laravel, OpenAI, and pgvector

Abstract

Semantic search—the ability to search by meaning rather than just keywords—is rapidly becoming a standard requirement for modern applications. This chapter guides you through building a “Hybrid Search” system in Laravel using OpenAI Embeddings and PostgreSQL’s pgvector extension. We cover the entire pipeline: database migrations for vector columns, generating embeddings on model save, and executing cosine similarity queries.

The Stack Explained

  • Laravel: Orchestrates the logic.
  • OpenAI API: Converts text (e.g., “A cozy place to stay”) into an Embedding Vector—a list of 1,536 floating-point numbers that represent the text’s semantic meaning.41
  • pgvector: An extension for PostgreSQL that allows storing these vectors and performing mathematical distance calculations (Cosine Similarity) efficiently.

Step 1: Database Configuration

First, the database must be prepared. Standard Laravel migrations don’t support vector types natively, so we use DB::statement.

Migration Code:

public function up()
{
    // Enable the extension
    DB::statement('CREATE EXTENSION IF NOT EXISTS vector');
    
    Schema::create('articles', function (Blueprint $table) {
        $table->id();
        $table->string('title');
        $table->text('content');
        // Add vector column with 1536 dimensions (matching OpenAI's ada-002 model)
        $table->vector('embedding', 1536)->nullable(); 
        $table->timestamps();
    });
    
    // Add an HNSW index for fast approximate nearest neighbor search
    DB::statement('CREATE INDEX articles_embedding_index ON articles USING hnsw (embedding vector_cosine_ops)');
}

The HNSW index is critical for performance; without it, the database must perform a full sequential scan for every search, which is too slow for production.

Step 2: Generating Embeddings

We can use a Model Observer or the booted method to automatically generate an embedding whenever an article is created or updated.

use OpenAI\Laravel\Facades\OpenAI;

class Article extends Model {
    // Cast the vector column to a PHP array
    protected $casts = ['embedding' => \Pgvector\Laravel\Vector::class];

    protected static function booted() {
        static::saved(function ($article) {
            // Prevent infinite loop if we are only saving the embedding
            if ($article->isDirty('embedding')) return;

            $response = OpenAI::embeddings()->create([
                'model' => 'text-embedding-ada-002',
                'input' => $article->title. ': '. $article->content,
            ]);
            
            $article->embedding = $response->embeddings->embedding;
            $article->saveQuietly();
        });
    }
}

Step 3: Performing the Semantic Search

To search, we convert the user’s query into a vector and then ask PostgreSQL to find the rows with the “closest” vectors.

$userQuery = "How do I fix a flat tire?";

// 1. Vectorize the query
$queryVector = OpenAI::embeddings()->create([
    'model' => 'text-embedding-ada-002',
    'input' => $userQuery,
])->embeddings->embedding;

// 2. SQL Search using Cosine Distance operator (<=>)
$results = Article::query()
    ->selectRaw("*, embedding <=>? as distance", [json_encode($queryVector)])
    ->orderBy('distance', 'asc')
    ->take(5)
    ->get();

The result will include articles about car maintenance or roadside assistance, even if they don’t contain the exact words “fix,” “flat,” or “tire”.

Leave a Reply

Your email address will not be published. Required fields are marked *

Your Comment
Your Name
Your Email
Your Website