Minimal Web Example
This guide walks you through a minimal, self-contained web example using React and TypeScript. It demonstrates how to connect to the Sanas Language Translation API via WebRTC to get real-time audio translation and text captions.
The full example is broken down into logical steps, explaining how each piece of the puzzle works to establish a connection and handle the data flow.
If you would like to jump right into the full demo, check out:
1. Managing State for Captions
First, we need a way to manage the incoming transcription and translation data. We create a custom React hook, useLanguageTranslationCaptions
, to handle the state for both pending (partial) and complete sentences.
This hook exposes a handleMessage
function that will be connected to our WebRTC data channel. When a message of type transcription
or translation
arrives, this function updates the appropriate state variable, which will cause our UI to re-render with the latest captions.
import { LTMessage } from "@/types/ltMessages";
import { Phrase, Word } from "@/types/words";
import { useCallback, useState } from "react";
function useLanguageTranslationCaptions() {
const [pendingTranscriptions, setPendingTranscriptions] = useState<Word[]>([]);
const [completeTranscriptions, setCompleteTranscriptions] = useState<Phrase[]>([]);
const [pendingTranslations, setPendingTranslations] = useState<Word[]>([]);
const [completeTranslations, setCompleteTranslations] = useState<Phrase[]>([]);
const handleMessage = useCallback((message: LTMessage) => {
switch (message.type) {
case "transcription":
if (message.transcription.type === "complete") {
if (message.transcription.transcriptions.length > 0) {
setCompleteTranscriptions((prev) => [
...prev,
message.transcription.transcriptions,
]);
}
setPendingTranscriptions([]);
} else {
setPendingTranscriptions(message.transcription.transcriptions);
}
break;
case "translation":
// Logic for handling translation messages
if (message.translation.type === "complete") {
if (message.translation.translations.length > 0) {
setCompleteTranslations((prev) => [
...prev,
message.translation.translations,
]);
}
setPendingTranslations([]);
} else {
setPendingTranslations(message.translation.translations);
}
break;
default:
console.error("Unknown message type", message.type);
}
}, []);
const reset = useCallback(() => {
setPendingTranscriptions([]);
setCompleteTranscriptions([]);
setPendingTranslations([]);
setCompleteTranslations([]);
}, []);
return {
reset,
state: {
pendingTranscriptions,
completeTranscriptions,
pendingTranslations,
completeTranslations,
},
handleMessage,
};
}
2. Orchestrating the WebRTC Connection
Next, we create another custom hook, useLanguageTranslationConnection
, to manage the entire WebRTC connection lifecycle. This hook encapsulates all the logic for starting, stopping, and managing the connection. Let's break down its internal functions.
Getting User Audio
The createInputStream
function uses the browser's navigator.mediaDevices.getUserMedia
API to request access to the user's microphone. This provides the MediaStream
that we will send to the Sanas API for translation.
async function createInputStream() {
const inputStream = await navigator.mediaDevices.getUserMedia({
video: false,
audio: {
echoCancellation: true,
noiseSuppression: false,
sampleRate: SAMPLE_RATE,
autoGainControl: true,
},
});
return inputStream;
}
Creating the Peer Connection
The createPeerConnection
function initializes the RTCPeerConnection
object, which is the heart of a WebRTC session. 🤝
peerConnection.ontrack
: This event handler is triggered when the translated audio stream arrives from the server. We then set this stream as our output audio.peerConnection.onconnectionstatechange
: This monitors the connection status and automatically calls ourstopCall
function if the connection is lost.
function createPeerConnection(
onAudioTrack: (audioTrack: MediaStream) => void,
stopCall: () => void,
) {
const peerConnection = new RTCPeerConnection();
peerConnection.ontrack = (e) => {
onAudioTrack(e.streams[0]);
};
peerConnection.onconnectionstatechange = () => {
console.log(`connection state: ${peerConnection.connectionState}`);
if (
peerConnection.connectionState === "disconnected" ||
peerConnection.connectionState === "failed"
) {
stopCall();
}
};
return peerConnection;
}
Establishing the Connection
To connect, the client and server must agree on connection parameters using the Session Description Protocol (SDP).
createOffer
: We generate an SDP "offer" that describes our desired connection settings (e.g., we want to receive audio).
async function createOffer(peerConnection: RTCPeerConnection) {
const offer = await peerConnection.createOffer({
offerToReceiveAudio: true,
offerToReceiveVideo: false,
});
await peerConnection.setLocalDescription(offer);
return offer;
}
receiveAnswer
: We send this offer in a POST
request to the /session
endpoint. The server responds with an SDP "answer". We then set this answer as the remoteDescription
on our peer connection, which finalizes the connection parameters.
async function receiveAnswer(
peerConnection: RTCPeerConnection,
offer: RTCSessionDescriptionInit,
) {
const sdpResponse = await fetch(
`${process.env.NEXT_PUBLIC_LT_ENDPOINT}/session`,
{
method: "POST",
body: JSON.stringify({
...offer,
input_sample_rate: SAMPLE_RATE,
output_sample_rate: SAMPLE_RATE,
}),
headers: {
"Content-Type": "application/json",
"X-API-Key": "YOUR_API_KEY", // Replace with your key
},
},
);
const answer = await sdpResponse.json();
await peerConnection.setRemoteDescription(answer);
}
Initializing the Translation Session
With the connection established, we must initialize the translation session over the WebRTC Data Channel. The data channel is used for sending control messages and receiving text captions, separate from the audio stream.
createDataChannel
: A data channel named "messaging" is created.startTranslationSession
: We wait for the data channel to open. Then, we send areset
message containing the input/output languages and a unique ID.Wait for Ready: The client listens for a
ready
message back from the server with a matching ID. Once received, the session is officially started! 🎉
async function startTranslationSession(
dataChannel: RTCDataChannel,
{
langIn,
langOut,
voiceId,
glossary,
}: {
langIn: string;
langOut: string;
voiceId: string | null;
glossary: string[];
},
) {
// Wait for data channel to be open
await new Promise((resolve, reject) => {
const removeListeners = () => {
dataChannel.removeEventListener("open", onOpen);
dataChannel.removeEventListener("error", onError);
};
const onOpen = () => {
removeListeners();
resolve(true);
};
const onError = (event: Event) => {
removeListeners();
reject(new Error(`Error: ${event.type}`));
};
dataChannel.addEventListener("open", onOpen);
dataChannel.addEventListener("error", onError);
});
// Send reset message, and wait for corresponding ready message
const resetMessage: LTMessage = {
type: "reset",
reset: {
id: window.crypto.randomUUID(),
lang_in: langIn,
lang_out: langOut,
voice_id: voiceId,
glossary: glossary,
},
};
console.log("Sending reset message", resetMessage);
await new Promise((resolve) => {
const onMessage = (event: MessageEvent) => {
const message: LTMessage = JSON.parse(event.data);
if (
message.type === "ready" &&
message.ready.id === resetMessage.reset.id
) {
resolve(true);
dataChannel.removeEventListener("message", onMessage);
}
};
dataChannel.addEventListener("message", onMessage);
dataChannel.send(JSON.stringify(resetMessage));
});
console.log("Language translation session ready");
}
3. The User Interface (UI)
Finally, the Demo
component ties everything together.
It uses our two custom hooks:
useLanguageTranslationCaptions
for state anduseLanguageTranslationConnection
for WebRTC logic.An
<audio>
element is used to play the incoming translated audio stream. AuseRef
anduseEffect
ensure thesrcObject
of the audio player is updated when the stream is available.startCallWrapper
andstopCallWrapper
functions handle the UI state (isLoading
,isRunning
) and call the corresponding functions from our connection hook.The component renders the start/stop buttons and maps over the state variables to display the complete and partial transcriptions and translations.
export default function Demo() {
const [isRunning, setIsRunning] = useState(false);
const [isLoading, setIsLoading] = useState(false);
const { state, handleMessage, reset } = useLanguageTranslationCaptions();
const { startCall, stopCall, outputAudio } =
useLanguageTranslationConnection(handleMessage);
const audio = useRef<HTMLAudioElement | null>(null);
useEffect(() => {
if (audio.current) {
audio.current.srcObject = outputAudio;
}
}, [outputAudio]);
const startCallWrapper = useCallback(() => {
setIsRunning(true);
setIsLoading(true);
startCall(config)
.catch(alert)
.finally(() => setIsLoading(false));
}, [startCall]);
const stopCallWrapper = useCallback(() => {
setIsRunning(false);
stopCall();
reset();
}, [stopCall, reset]);
return (
<div className="p-6 max-w-lg mx-auto bg-white rounded-xl shadow-md space-y-4">
<h1 className="text-2xl font-bold text-center text-gray-800">
Language Translation ({config.langIn} to {config.langOut})
</h1>
<div className="flex justify-center space-x-4">
{isLoading ? (
<div className="text-center text-gray-500">Loading...</div>
) : isRunning ? (
<button
onClick={stopCallWrapper}
className="px-4 py-2 bg-red-500 text-white rounded-lg hover:bg-red-600 focus:outline-none focus:ring-2 focus:ring-red-400 cursor-pointer"
>
Stop Call
</button>
) : (
<button
onClick={startCallWrapper}
className="px-4 py-2 bg-blue-500 text-white rounded-lg hover:bg-blue-600 focus:outline-none focus:ring-2 focus:ring-blue-400 cursor-pointer"
>
Start Call
</button>
)}
</div>
<audio ref={audio} autoPlay className="w-full mt-4" />
{!isLoading && isRunning ? (
<>
<h2 className="text-xl font-semibold text-gray-700">
Complete Transcriptions
</h2>
<div className="space-y-2">
{state.completeTranscriptions.map((transcription, index) => (
<div key={index} className="p-2 bg-gray-100 rounded-md">
{transcription.map((word) => word.word).join("")}
</div>
))}
</div>
<h2 className="text-xl font-semibold text-gray-700">
Partial Transcription
</h2>
<div className="p-2 bg-gray-100 rounded-md">
{state.pendingTranscriptions.map((word) => word.word).join("")}
</div>
<h2 className="text-xl font-semibold text-gray-700">
Complete Translations
</h2>
<div className="space-y-2">
{state.completeTranslations.map((translation, index) => (
<div key={index} className="p-2 bg-gray-100 rounded-md">
{translation.map((word) => word.word).join("")}
</div>
))}
</div>
<h2 className="text-xl font-semibold text-gray-700">
Partial Translation
</h2>
<div className="p-2 bg-gray-100 rounded-md">
{state.pendingTranslations.map((word) => word.word).join("")}
</div>
</>
) : null}
</div>
);
}
Find the full demo website at:
Last updated