Blog

Build a Smarter Search: Create AI-Powered Embeddings on Vector Databases

Tutorial

Mar 28, 2025

Imagine a search experience where you simply describe your symptoms in plain language, “persistent stomach pain with diarrhea and constipation”—and an intelligent system finds the perfect doctor based on their specialties. With our embedding search workflow built on BuildShip (or Firestore databases), that’s exactly what you can do. This solution not only lets you perform natural language queries but also adds the power to filter results by numeric criteria, such as geographic distance.

Why Embedding Search?

Traditional searches can miss the nuances of language. Embeddings solve this by converting words or entire paragraphs into high-dimensional numerical vectors. For example, think of it like this:

King – Man + Woman = Queen
In reality, embeddings work with hundreds of dimensions to capture the subtle relationships between words. This makes it possible for a system to understand that “persistent stomach pain” might closely relate to certain medical specialties.

What You’ll Build

In this project, you’ll create a matching system that:

Extracts structured data from unstructured text: Convert plain-text inputs into structured data using a JSON schema.
Generates embeddings: Use the best-in-class model from OpenAI to convert text into vectors.
Stores and searches the data: Insert these embeddings into a BuildShip (or Firestore) database and run semantic search queries.
Applies numeric filters: For instance, filter doctors by geographic proximity using latitude and longitude data.

Step-by-Step Breakdown

1. Setting Up the Semantic Search Workflow

Choose Your Database:
Use either a BuildShip-integrated database or your existing Firestore setup. For this demo, the BuildShip database is selected because it’s built directly into the application.
Cloning the Template:
In the BuildShip template library, search for “semantic” to find the semantic search workflow template. This template comes with two key workflows:
- Batch Generate Embeddings: This workflow takes your existing structured data (e.g., fields from an Airtable org chart) and creates embeddings for a chosen field.
- Query Workflow: This allows natural language queries to be run against the stored embeddings, returning the best matches.

2. Data Preparation and Extraction

Unstructured Data Input:
Start by adding an input field for unstructured data—this could be a paragraph describing a doctor’s role, responsibilities, or expertise.
Defining the Data Schema:
Provide a JSON schema detailing the key fields to extract. For example, include fields like name, role, department, responsibilities, and expertise. This schema guides the AI in extracting the right information from the text.
Embedding Field Selection:
Specify which field from your extracted data will be converted into an embedding. For our doctor matching system, the “specialties” field might be the best candidate.

3. Generating Embeddings

Extraction and AI Processing:
The workflow extracts structured data from the unstructured text using natural language instructions. It then uses a high-quality OpenAI model to generate an embedding vector (a long list of numbers) for the selected field.
- Tip: A lower temperature (e.g., 0.1) is used for more predictable outputs, reducing the risk of hallucinations.
Database Insertion:
The generated embedding, along with the structured data and raw text, is stored in the database under a specified collection name (e.g., “Doctor's List”).

4. Querying the Database

Natural Language Query:
Users can enter a query like “severe headaches behind one eye with sensitivity to light,” which is then converted into an embedding vector using the same model.
- The system performs a vector search (using cosine distance) in the database to find the closest matches.
Handling Empty Queries:
If no query is provided, the workflow defaults to returning all entries. This makes the system versatile for both filtered and unfiltered searches.

5. Adding Geographic Filtering (Bonus)

Latitude and Longitude Filtering:
To refine results, additional inputs for latitude, longitude, and a radius (in miles) can be added.
- The workflow is modified (using AI-assisted editing) to include a numeric filter that only returns doctors within a specified distance.
- Testing shows that when the latitude/longitude inputs are adjusted, the results update accordingly—ensuring that only nearby doctors are returned.

6. Integration and Deployment

Connecting via REST API:
Once the workflow is ready, connect a REST API trigger so that it can be called from your application. BuildShip automatically generates an API endpoint with a sample input structure.
Using AI Handoff:
An upcoming feature (AI handoff) will help generate clear instructions and examples for integrating this endpoint into your app, making it even easier to adopt.

7. Testing and Refinement

End-to-End Testing:
Run the workflow and observe each step. For example, when testing with a query like “persistent stomach pain after eating,” the system successfully retrieves matching doctors based on specialties, even handling discrepancies with latitude and longitude filtering.
Adjusting and Troubleshooting:
Any issues, like type mismatches in extracted data (e.g., string versus object), are resolved by adjusting node configurations in BuildShip. The iterative testing ensures the workflow is robust before deployment.

For a complete step by step video tutorial, click below:

Final Thoughts

This embedding search workflow showcases how AI can revolutionize the way applications perform natural language queries. By converting textual descriptions into meaningful numerical embeddings, the system is capable of matching complex queries to relevant entries—whether it’s finding the right doctor for specific symptoms or filtering results by geographic proximity.

The flexibility of BuildShip, combined with state-of-the-art embedding models, allows you to build powerful, scalable search applications without writing a single line of code. Whether you’re using the integrated BuildShip database or your own Firestore setup, this workflow is a prime example of how AI can simplify and enhance search functionality.

Ready to revolutionize your application’s search? Dive into the tutorial and build a smarter, more intuitive search system today. Happy building!