Blog

How to Build a Voice-to-Translation App with BuildShip: Step-by-Step Guide

Tutorial

Jul 24, 2025

Ever wanted to create an app that can translate spoken words from one language to another? With BuildShip's powerful workflow automation platform, you can build a complete voice-to-translation application without writing complex code.

This guide walks you through creating an app that lets users upload audio in any language and receive both text and audio translations in their desired language. Remix the template to follow along.

What You'll Build

By the end of this tutorial, you'll have created an application that:

- Accepts audio file uploads in any language

- Transcribes speech to text using OpenAI's Whisper API

- Translates the transcribed text to a target language

- Converts the translated text back to speech

- Delivers all three outputs (original transcription, translated text, and translated audio) to your users

Let's get started!

Setting Up Your BuildShip Workflow

Step 1: Create a New Workflow and Define Inputs

Start by creating a new workflow in BuildShip. You'll need two input parameters:

- A file input named "voice file" for the audio upload

- A string input named "language" where users specify their desired translation language

Step 2: Add Speech-to-Text Transcription

1. Add the 'Whisper speech-to-text' node to your workflow

2. Configure the node:

- Set the input to the "voice file" from your workflow inputs

- Set response format to "verbose JSON"

- Enable "timestamps by segment" for detailed output

Before testing, make sure to add your OpenAI API key:

1. Click "Add Key"

2. Navigate to OpenAI's platform to create a secret key

3. Paste your key into BuildShip

Test the node by uploading a sample audio file. You should receive a transcription with timestamps for each segment.

Step 3: Translate the Transcribed Text

1. Add the 'OpenAI text generator' node

2. For instructions, enter: "Please translate the provided text into {{language}}"

3. For the prompt, select the transcription output from the Whisper node

Test this step by uploading an audio file and selecting a target language (like German or Spanish). You should see the translated text in your output.

Step 4: Convert Translated Text to Speech

1. Add the 'OpenAI text-to-speech' node

2. Set the input text to the output from the text generator node

3. Optionally customize the voice or speech speed

When you test this step, you'll receive a long base64-encoded string representing the audio file.

Step 5: Convert Base64 to a Playable Audio File

1. Add the BuildShip file storage upload base64 file node

2. Set the file content to the output from the text-to-speech node

3. For the filename, use JavaScript to generate a dynamic name: `Date.now() + ".mp3"`

Step 6: Configure the Final Output

Create a custom output with three components:

1. The original transcription

2. The translated text

3. The URL to the translated audio file

Test your complete workflow to ensure all three outputs are generated correctly.

Adding a Trigger and API Endpoint

To make your workflow accessible as an API:

1. Add a REST API file upload trigger

2. Map your inputs:

- Connect the file object from the trigger data to your "voice file" input

- Map the language parameter from the trigger body to your "language" input

3. Deploy your workflow changes

Creating a User Interface with Bolt.new

Now that your API is ready, you can create a simple interface:

1. Go to Bolt.new

2. Create a new app

3. Design an interface with:

- A file upload component for audio files

- A text input or dropdown for selecting the target language

- Display areas for showing the transcription, translation, and audio playback

To connect your Bolt app to your BuildShip workflow:

1. Go to the Usage tab in BuildShip

2. Copy the API usage code

3. Implement this code in your Bolt app to handle the API calls

Debugging and Testing

If you encounter issues with your integration:

1. Check the BuildShip logs to verify your workflow is processing correctly

2. Ensure your API calls from Bolt include the correct parameters

3. Verify that your Bolt app correctly handles the response format

Customization Options

Your voice-to-translation app can be customized in several ways:

- Voice options: Experiment with different voices in the text-to-speech node

- Speech speed: Adjust the speed parameter for faster or slower audio output

- Additional languages: Support multiple language options in your interface

- Error handling: Add validation and error messages for a better user experience

- Timestamps: Use the segment timestamps to create synchronized subtitles

Limitations and Considerations

When building your voice-to-translation app, keep in mind:

- API costs: OpenAI's APIs have usage-based pricing, so monitor your usage

- File size limits: Very large audio files may encounter processing limits

- Language support: While Whisper supports many languages, accuracy varies

- Translation quality: Complex idioms or technical terms may not translate perfectly

For a complete video tutorial guide, please click below:

Conclusion

Congratulations! You've built a complete voice-to-translation application using BuildShip's workflow automation platform. This powerful tool combines OpenAI's speech recognition, translation, and text-to-speech capabilities into a seamless experience for your users.

The best part is that you've accomplished this without writing complex code or managing multiple API integrations manually. BuildShip handles the orchestration, allowing you to focus on creating a great user experience.

Ready to take your app further? Consider adding features like:

- Support for real-time recording instead of just file uploads

- Multiple language detection for automatic source language identification

- Custom vocabulary for domain-specific terminology

- Integration with other services in your tech stack

We can't wait to see what you build with these powerful tools!