Blog

Build a Multimodal WhatsApp AI Bot That Can Respond to Text, Images and Voice Messages

Tutorial

·

Jun 18, 2025

Looking to create a WhatsApp bot that can handle text, images, and voice messages? In this guide, we'll walk through how to build a powerful multimodal WhatsApp AI bot using BuildShip's no-code platform. Remix the template here to follow along.

Note: We've recently released a simplified version of this workflow that uses just one flow instead of two. If you want the fastest way to get this working, check out our updated guide here.

Looking to create a WhatsApp bot that can handle text, images, and voice messages? In this guide, we'll walk through how to build a powerful multimodal WhatsApp AI bot using BuildShip's no-code platform. Remix the template here to follow along.

Note: We've recently released a simplified version of this workflow that uses just one flow instead of two. If you want the fastest way to get this working, check out our updated guide here.

Looking to create a WhatsApp bot that can handle text, images, and voice messages? In this guide, we'll walk through how to build a powerful multimodal WhatsApp AI bot using BuildShip's no-code platform. Remix the template here to follow along.

Note: We've recently released a simplified version of this workflow that uses just one flow instead of two. If you want the fastest way to get this working, check out our updated guide here.

Building a Multimodal WhatsApp AI Bot with BuildShip

With our multimodal WhatsApp bot, you can build a powerful multimodal WhatsApp AI bot that responds intelligently across formats. It can answer text-based questions with fresh, real-time information, analyze images and reply with insights, generate images from text prompts, and even transcribe and respond to voice messages - making it a versatile assistant for modern communication.

Prerequisites

Before we start building, you'll need:

- A Meta developer account

- A business app created in the Meta dashboard

- A verified business on Facebook

- A registered WhatsApp business phone number

These requirements ensure your bot can operate in a live environment where anyone with WhatsApp can message it.

Building the WhatsApp bot

Step 1: Clone the template

Start by visiting the BuildShip templates library and searching for "WhatsApp." Select the multimodal WhatsApp bot template to clone it to your project.

This will create two workflows in your project:

- A verification flow (for webhook setup)

- The main bot flow

Step 2: Set up the verification flow

The verification flow is necessary to tell Meta to forward message events to your BuildShip workflow. Here's how to set it up:

1. Open the verification flow and configure the REST API Call trigger:

- Remove any existing API path

- Enter /buildship-whatsapp-bot (or your preferred path name)

- Set the HTTP method to GET

- Click Connect

2. In the token verification node, enter any value you want as your verified token (e.g., "secretcode"). Remember this value for later.

3. Ship your workflow and copy the workflow endpoint URL.

Step 3: Configure webhooks in Meta dashboard

1. Go to the Meta dashboard and navigate to the WhatsApp product configuration page.

2. Add a new webhook with your BuildShip API endpoint URL as the callback URL.

3. Enter your verified token from step 2.

4. Click "Verify and Save."

5. Subscribe to the "messages" field to receive message events.

Step 4: Configure the main bot flow

Now let's set up the main flow that will handle the messages:

1. Configure the WhatsApp bot trigger:

- Set the path to match exactly what you used in the verification flow (e.g., /buildship-whatsapp-bot)

- Set the HTTP method to POST

- Enter your App ID and App Secret from the Meta dashboard

- Connect the trigger

2. Understand the workflow structure:

- The utility extract message node pulls key information from incoming messages

- A branch node checks if we have an actual message to process

- A switch node directs the flow based on message type (text, image, audio)

Step 5: Understanding the message handling logic

Let's break down how each message type is processed:

Text messages

When a text message is received:

1. The OpenAI JSON generator node determines if the message is requesting image generation

2. If it's an image generation request:

- The OpenAI image generator node creates the image

- The image is uploaded to BuildShip storage

- The autoresponder node sends the image back to the user

3. If it's a regular text query:

- The Perplexity AI search node generates a response

- The autoresponder node sends the text back to the user

Image messages

When an image is received:

1. The download media node retrieves the image from WhatsApp

2. GPT Vision analyzes the image based on the caption or a default prompt

3. The autoresponder node sends the analysis back as text

Audio messages

When a voice message is received:

1. The download media node retrieves the audio

2. The utility convert MP3 node converts the WhatsApp OGG format to MP3

3. The Whisper speech-to-text node transcribes the audio

4. The Perplexity AI search node generates a response based on the transcription

5. The autoresponder node sends the text response back

Fallback

If none of the conditions match, the bot sends a generic message informing the user they've sent an unsupported message type.

Step 6: Configure the autoresponder

For the autoresponder node to work in a live environment, you'll need to enter your WhatsApp access token. Meta provides two options:

- Temporary access tokens for test mode

- Permanent tokens for live environments

Make sure to add your token to all instances of the autoresponder node in your workflow.

Step 7: Ship your workflow

Once everything is configured, ship your workflow to bring your WhatsApp bot to life. Your bot is now ready to handle messages from anyone with a WhatsApp account.

Customizing your bot

The template we've provided is just a starting point. Here are some ways you can customize it:

- Connect to your CRM or Google Sheets to save user interactions

- Integrate with other AI models beyond OpenAI and Perplexity

- Add custom logic for specific business use cases

- Implement multilingual support for global audiences

- Create personalized responses based on user data

BuildShip's node library offers integrations with popular APIs, AI models, databases, and more. And if you need something truly custom, you can use BuildShip's "Build with AI" feature to generate custom nodes in seconds.

For a complete video guide, please click below:

Try it yourself

Ready to build your own multimodal WhatsApp bot? You can remix our template and start customizing it for your needs. We'd love to see what you build with this template!

Building a Multimodal WhatsApp AI Bot with BuildShip

With our multimodal WhatsApp bot, you can build a powerful multimodal WhatsApp AI bot that responds intelligently across formats. It can answer text-based questions with fresh, real-time information, analyze images and reply with insights, generate images from text prompts, and even transcribe and respond to voice messages - making it a versatile assistant for modern communication.

Prerequisites

Before we start building, you'll need:

- A Meta developer account

- A business app created in the Meta dashboard

- A verified business on Facebook

- A registered WhatsApp business phone number

These requirements ensure your bot can operate in a live environment where anyone with WhatsApp can message it.

Building the WhatsApp bot

Step 1: Clone the template

Start by visiting the BuildShip templates library and searching for "WhatsApp." Select the multimodal WhatsApp bot template to clone it to your project.

This will create two workflows in your project:

- A verification flow (for webhook setup)

- The main bot flow

Step 2: Set up the verification flow

The verification flow is necessary to tell Meta to forward message events to your BuildShip workflow. Here's how to set it up:

1. Open the verification flow and configure the REST API Call trigger:

- Remove any existing API path

- Enter /buildship-whatsapp-bot (or your preferred path name)

- Set the HTTP method to GET

- Click Connect

2. In the token verification node, enter any value you want as your verified token (e.g., "secretcode"). Remember this value for later.

3. Ship your workflow and copy the workflow endpoint URL.

Step 3: Configure webhooks in Meta dashboard

1. Go to the Meta dashboard and navigate to the WhatsApp product configuration page.

2. Add a new webhook with your BuildShip API endpoint URL as the callback URL.

3. Enter your verified token from step 2.

4. Click "Verify and Save."

5. Subscribe to the "messages" field to receive message events.

Step 4: Configure the main bot flow

Now let's set up the main flow that will handle the messages:

1. Configure the WhatsApp bot trigger:

- Set the path to match exactly what you used in the verification flow (e.g., /buildship-whatsapp-bot)

- Set the HTTP method to POST

- Enter your App ID and App Secret from the Meta dashboard

- Connect the trigger

2. Understand the workflow structure:

- The utility extract message node pulls key information from incoming messages

- A branch node checks if we have an actual message to process

- A switch node directs the flow based on message type (text, image, audio)

Step 5: Understanding the message handling logic

Let's break down how each message type is processed:

Text messages

When a text message is received:

1. The OpenAI JSON generator node determines if the message is requesting image generation

2. If it's an image generation request:

- The OpenAI image generator node creates the image

- The image is uploaded to BuildShip storage

- The autoresponder node sends the image back to the user

3. If it's a regular text query:

- The Perplexity AI search node generates a response

- The autoresponder node sends the text back to the user

Image messages

When an image is received:

1. The download media node retrieves the image from WhatsApp

2. GPT Vision analyzes the image based on the caption or a default prompt

3. The autoresponder node sends the analysis back as text

Audio messages

When a voice message is received:

1. The download media node retrieves the audio

2. The utility convert MP3 node converts the WhatsApp OGG format to MP3

3. The Whisper speech-to-text node transcribes the audio

4. The Perplexity AI search node generates a response based on the transcription

5. The autoresponder node sends the text response back

Fallback

If none of the conditions match, the bot sends a generic message informing the user they've sent an unsupported message type.

Step 6: Configure the autoresponder

For the autoresponder node to work in a live environment, you'll need to enter your WhatsApp access token. Meta provides two options:

- Temporary access tokens for test mode

- Permanent tokens for live environments

Make sure to add your token to all instances of the autoresponder node in your workflow.

Step 7: Ship your workflow

Once everything is configured, ship your workflow to bring your WhatsApp bot to life. Your bot is now ready to handle messages from anyone with a WhatsApp account.

Customizing your bot

The template we've provided is just a starting point. Here are some ways you can customize it:

- Connect to your CRM or Google Sheets to save user interactions

- Integrate with other AI models beyond OpenAI and Perplexity

- Add custom logic for specific business use cases

- Implement multilingual support for global audiences

- Create personalized responses based on user data

BuildShip's node library offers integrations with popular APIs, AI models, databases, and more. And if you need something truly custom, you can use BuildShip's "Build with AI" feature to generate custom nodes in seconds.

For a complete video guide, please click below:

Try it yourself

Ready to build your own multimodal WhatsApp bot? You can remix our template and start customizing it for your needs. We'd love to see what you build with this template!

Building a Multimodal WhatsApp AI Bot with BuildShip

With our multimodal WhatsApp bot, you can build a powerful multimodal WhatsApp AI bot that responds intelligently across formats. It can answer text-based questions with fresh, real-time information, analyze images and reply with insights, generate images from text prompts, and even transcribe and respond to voice messages - making it a versatile assistant for modern communication.

Prerequisites

Before we start building, you'll need:

- A Meta developer account

- A business app created in the Meta dashboard

- A verified business on Facebook

- A registered WhatsApp business phone number

These requirements ensure your bot can operate in a live environment where anyone with WhatsApp can message it.

Building the WhatsApp bot

Step 1: Clone the template

Start by visiting the BuildShip templates library and searching for "WhatsApp." Select the multimodal WhatsApp bot template to clone it to your project.

This will create two workflows in your project:

- A verification flow (for webhook setup)

- The main bot flow

Step 2: Set up the verification flow

The verification flow is necessary to tell Meta to forward message events to your BuildShip workflow. Here's how to set it up:

1. Open the verification flow and configure the REST API Call trigger:

- Remove any existing API path

- Enter /buildship-whatsapp-bot (or your preferred path name)

- Set the HTTP method to GET

- Click Connect

2. In the token verification node, enter any value you want as your verified token (e.g., "secretcode"). Remember this value for later.

3. Ship your workflow and copy the workflow endpoint URL.

Step 3: Configure webhooks in Meta dashboard

1. Go to the Meta dashboard and navigate to the WhatsApp product configuration page.

2. Add a new webhook with your BuildShip API endpoint URL as the callback URL.

3. Enter your verified token from step 2.

4. Click "Verify and Save."

5. Subscribe to the "messages" field to receive message events.

Step 4: Configure the main bot flow

Now let's set up the main flow that will handle the messages:

1. Configure the WhatsApp bot trigger:

- Set the path to match exactly what you used in the verification flow (e.g., /buildship-whatsapp-bot)

- Set the HTTP method to POST

- Enter your App ID and App Secret from the Meta dashboard

- Connect the trigger

2. Understand the workflow structure:

- The utility extract message node pulls key information from incoming messages

- A branch node checks if we have an actual message to process

- A switch node directs the flow based on message type (text, image, audio)

Step 5: Understanding the message handling logic

Let's break down how each message type is processed:

Text messages

When a text message is received:

1. The OpenAI JSON generator node determines if the message is requesting image generation

2. If it's an image generation request:

- The OpenAI image generator node creates the image

- The image is uploaded to BuildShip storage

- The autoresponder node sends the image back to the user

3. If it's a regular text query:

- The Perplexity AI search node generates a response

- The autoresponder node sends the text back to the user

Image messages

When an image is received:

1. The download media node retrieves the image from WhatsApp

2. GPT Vision analyzes the image based on the caption or a default prompt

3. The autoresponder node sends the analysis back as text

Audio messages

When a voice message is received:

1. The download media node retrieves the audio

2. The utility convert MP3 node converts the WhatsApp OGG format to MP3

3. The Whisper speech-to-text node transcribes the audio

4. The Perplexity AI search node generates a response based on the transcription

5. The autoresponder node sends the text response back

Fallback

If none of the conditions match, the bot sends a generic message informing the user they've sent an unsupported message type.

Step 6: Configure the autoresponder

For the autoresponder node to work in a live environment, you'll need to enter your WhatsApp access token. Meta provides two options:

- Temporary access tokens for test mode

- Permanent tokens for live environments

Make sure to add your token to all instances of the autoresponder node in your workflow.

Step 7: Ship your workflow

Once everything is configured, ship your workflow to bring your WhatsApp bot to life. Your bot is now ready to handle messages from anyone with a WhatsApp account.

Customizing your bot

The template we've provided is just a starting point. Here are some ways you can customize it:

- Connect to your CRM or Google Sheets to save user interactions

- Integrate with other AI models beyond OpenAI and Perplexity

- Add custom logic for specific business use cases

- Implement multilingual support for global audiences

- Create personalized responses based on user data

BuildShip's node library offers integrations with popular APIs, AI models, databases, and more. And if you need something truly custom, you can use BuildShip's "Build with AI" feature to generate custom nodes in seconds.

For a complete video guide, please click below:

Try it yourself

Ready to build your own multimodal WhatsApp bot? You can remix our template and start customizing it for your needs. We'd love to see what you build with this template!

Start building your
BIGGEST ideas
in the *simplest* of ways.

Start building your
BIGGEST ideas
in the *simplest* of ways.

Start building your
BIGGEST ideas
in the *simplest* of ways.

You might also like