Minimal Web Example

This guide walks you through a minimal, self-contained web example using React and TypeScript. It demonstrates how to connect to the Sanas Language Translation API via WebRTC to get real-time audio translation and text captions.

The full example is broken down into logical steps, explaining how each piece of the puzzle works to establish a connection and handle the data flow.

If you would like to jump right into the full demo, check out:

Scott-Hickmann-Sanas/Language-Translation-DemoGitHub

1. Managing State for Captions

First, we need a way to manage the incoming transcription and translation data. We create a custom React hook, useLanguageTranslationCaptions, to handle the state for both pending (partial) and complete sentences.

This hook exposes a handleMessage function that will be connected to our WebRTC data channel. When a message of type transcription or translation arrives, this function updates the appropriate state variable, which will cause our UI to re-render with the latest captions.


import { LTMessage } from "@/types/ltMessages";
import { Phrase, Word } from "@/types/words";
import { useCallback, useState } from "react";

function useLanguageTranslationCaptions() {
  const [pendingTranscriptions, setPendingTranscriptions] = useState<Word[]>([]);
  const [completeTranscriptions, setCompleteTranscriptions] = useState<Phrase[]>([]);
  const [pendingTranslations, setPendingTranslations] = useState<Word[]>([]);
  const [completeTranslations, setCompleteTranslations] = useState<Phrase[]>([]);

  const handleMessage = useCallback((message: LTMessage) => {
    switch (message.type) {
      case "transcription":
        if (message.transcription.type === "complete") {
          if (message.transcription.transcriptions.length > 0) {
            setCompleteTranscriptions((prev) => [
              ...prev,
              message.transcription.transcriptions,
            ]);
          }
          setPendingTranscriptions([]);
        } else {
          setPendingTranscriptions(message.transcription.transcriptions);
        }
        break;
      case "translation":
        // Logic for handling translation messages
        if (message.translation.type === "complete") {
            if (message.translation.translations.length > 0) {
              setCompleteTranslations((prev) => [
                ...prev,
                message.translation.translations,
              ]);
            }
            setPendingTranslations([]);
          } else {
            setPendingTranslations(message.translation.translations);
          }
        break;
      default:
        console.error("Unknown message type", message.type);
    }
  }, []);

  const reset = useCallback(() => {
    setPendingTranscriptions([]);
    setCompleteTranscriptions([]);
    setPendingTranslations([]);
    setCompleteTranslations([]);
  }, []);

  return {
    reset,
    state: {
      pendingTranscriptions,
      completeTranscriptions,
      pendingTranslations,
      completeTranslations,
    },
    handleMessage,
  };
}

2. Orchestrating the WebRTC Connection

Next, we create another custom hook, useLanguageTranslationConnection, to manage the entire WebRTC connection lifecycle. This hook encapsulates all the logic for starting, stopping, and managing the connection. Let's break down its internal functions.

Getting User Audio

The createInputStream function uses the browser's navigator.mediaDevices.getUserMedia API to request access to the user's microphone. This provides the MediaStream that we will send to the Sanas API for translation.

async function createInputStream() {
  const inputStream = await navigator.mediaDevices.getUserMedia({
    video: false,
    audio: {
      echoCancellation: true,
      noiseSuppression: false,
      sampleRate: SAMPLE_RATE,
      autoGainControl: true,
    },
  });
  return inputStream;
}

Creating the Peer Connection

The createPeerConnection function initializes the RTCPeerConnection object, which is the heart of a WebRTC session. 🤝

peerConnection.ontrack: This event handler is triggered when the translated audio stream arrives from the server. We then set this stream as our output audio.
peerConnection.onconnectionstatechange: This monitors the connection status and automatically calls our stopCall function if the connection is lost.

function createPeerConnection(
  onAudioTrack: (audioTrack: MediaStream) => void,
  stopCall: () => void,
) {
  const peerConnection = new RTCPeerConnection();

  peerConnection.ontrack = (e) => {
    onAudioTrack(e.streams[0]);
  };

  peerConnection.onconnectionstatechange = () => {
    console.log(`connection state: ${peerConnection.connectionState}`);
    if (
      peerConnection.connectionState === "disconnected" ||
      peerConnection.connectionState === "failed"
    ) {
      stopCall();
    }
  };

  return peerConnection;
}

Establishing the Connection

To connect, the client and server must agree on connection parameters using the Session Description Protocol (SDP).

createOffer: We generate an SDP "offer" that describes our desired connection settings (e.g., we want to receive audio).

async function createOffer(peerConnection: RTCPeerConnection) {
  const offer = await peerConnection.createOffer({
    offerToReceiveAudio: true,
    offerToReceiveVideo: false,
  });
  await peerConnection.setLocalDescription(offer);
  return offer;
}

receiveAnswer: We send this offer in a POST request to the /session endpoint. The server responds with an SDP "answer". We then set this answer as the remoteDescription on our peer connection, which finalizes the connection parameters.

async function receiveAnswer(
  peerConnection: RTCPeerConnection,
  offer: RTCSessionDescriptionInit,
) {
  const sdpResponse = await fetch(
    `${process.env.NEXT_PUBLIC_LT_ENDPOINT}/session`,
    {
      method: "POST",
      body: JSON.stringify({
        ...offer,
        input_sample_rate: SAMPLE_RATE,
        output_sample_rate: SAMPLE_RATE,
      }),
      headers: {
        "Content-Type": "application/json",
        "X-API-Key": "YOUR_API_KEY", // Replace with your key
      },
    },
  );
  const answer = await sdpResponse.json();
  await peerConnection.setRemoteDescription(answer);
}

Initializing the Translation Session

With the connection established, we must initialize the translation session over the WebRTC Data Channel. The data channel is used for sending control messages and receiving text captions, separate from the audio stream.

createDataChannel: A data channel named "messaging" is created.
startTranslationSession: We wait for the data channel to open. Then, we send a reset message containing the input/output languages and a unique ID.
Wait for Ready: The client listens for a ready message back from the server with a matching ID. Once received, the session is officially started! 🎉

async function startTranslationSession(
  dataChannel: RTCDataChannel,
  {
    langIn,
    langOut,
    voiceId,
    glossary,
  }: {
    langIn: string;
    langOut: string;
    voiceId: string | null;
    glossary: string[];
  },
) {
  // Wait for data channel to be open
  await new Promise((resolve, reject) => {
    const removeListeners = () => {
      dataChannel.removeEventListener("open", onOpen);
      dataChannel.removeEventListener("error", onError);
    };
    const onOpen = () => {
      removeListeners();
      resolve(true);
    };
    const onError = (event: Event) => {
      removeListeners();
      reject(new Error(`Error: ${event.type}`));
    };
    dataChannel.addEventListener("open", onOpen);
    dataChannel.addEventListener("error", onError);
  });

  // Send reset message, and wait for corresponding ready message
  const resetMessage: LTMessage = {
    type: "reset",
    reset: {
      id: window.crypto.randomUUID(),
      lang_in: langIn,
      lang_out: langOut,
      voice_id: voiceId,
      glossary: glossary,
    },
  };
  console.log("Sending reset message", resetMessage);
  await new Promise((resolve) => {
    const onMessage = (event: MessageEvent) => {
      const message: LTMessage = JSON.parse(event.data);
      if (
        message.type === "ready" &&
        message.ready.id === resetMessage.reset.id
      ) {
        resolve(true);
        dataChannel.removeEventListener("message", onMessage);
      }
    };
    dataChannel.addEventListener("message", onMessage);
    dataChannel.send(JSON.stringify(resetMessage));
  });
  console.log("Language translation session ready");
}

3. The User Interface (UI)

Finally, the Demo component ties everything together.

It uses our two custom hooks: useLanguageTranslationCaptions for state and useLanguageTranslationConnection for WebRTC logic.
An <audio> element is used to play the incoming translated audio stream. A useRef and useEffect ensure the srcObject of the audio player is updated when the stream is available.
startCallWrapper and stopCallWrapper functions handle the UI state (isLoading, isRunning) and call the corresponding functions from our connection hook.
The component renders the start/stop buttons and maps over the state variables to display the complete and partial transcriptions and translations.

export default function Demo() {
  const [isRunning, setIsRunning] = useState(false);
  const [isLoading, setIsLoading] = useState(false);
  const { state, handleMessage, reset } = useLanguageTranslationCaptions();
  const { startCall, stopCall, outputAudio } =
    useLanguageTranslationConnection(handleMessage);
  const audio = useRef<HTMLAudioElement | null>(null);

  useEffect(() => {
    if (audio.current) {
      audio.current.srcObject = outputAudio;
    }
  }, [outputAudio]);

  const startCallWrapper = useCallback(() => {
    setIsRunning(true);
    setIsLoading(true);
    startCall(config)
      .catch(alert)
      .finally(() => setIsLoading(false));
  }, [startCall]);

  const stopCallWrapper = useCallback(() => {
    setIsRunning(false);
    stopCall();
    reset();
  }, [stopCall, reset]);

  return (
    <div className="p-6 max-w-lg mx-auto bg-white rounded-xl shadow-md space-y-4">
      <h1 className="text-2xl font-bold text-center text-gray-800">
        Language Translation ({config.langIn} to {config.langOut})
      </h1>
      <div className="flex justify-center space-x-4">
        {isLoading ? (
          <div className="text-center text-gray-500">Loading...</div>
        ) : isRunning ? (
          <button
            onClick={stopCallWrapper}
            className="px-4 py-2 bg-red-500 text-white rounded-lg hover:bg-red-600 focus:outline-none focus:ring-2 focus:ring-red-400 cursor-pointer"
          >
            Stop Call
          </button>
        ) : (
          <button
            onClick={startCallWrapper}
            className="px-4 py-2 bg-blue-500 text-white rounded-lg hover:bg-blue-600 focus:outline-none focus:ring-2 focus:ring-blue-400 cursor-pointer"
          >
            Start Call
          </button>
        )}
      </div>
      <audio ref={audio} autoPlay className="w-full mt-4" />
      {!isLoading && isRunning ? (
        <>
          <h2 className="text-xl font-semibold text-gray-700">
            Complete Transcriptions
          </h2>
          <div className="space-y-2">
            {state.completeTranscriptions.map((transcription, index) => (
              <div key={index} className="p-2 bg-gray-100 rounded-md">
                {transcription.map((word) => word.word).join("")}
              </div>
            ))}
          </div>
          <h2 className="text-xl font-semibold text-gray-700">
            Partial Transcription
          </h2>
          <div className="p-2 bg-gray-100 rounded-md">
            {state.pendingTranscriptions.map((word) => word.word).join("")}
          </div>
          <h2 className="text-xl font-semibold text-gray-700">
            Complete Translations
          </h2>
          <div className="space-y-2">
            {state.completeTranslations.map((translation, index) => (
              <div key={index} className="p-2 bg-gray-100 rounded-md">
                {translation.map((word) => word.word).join("")}
              </div>
            ))}
          </div>
          <h2 className="text-xl font-semibold text-gray-700">
            Partial Translation
          </h2>
          <div className="p-2 bg-gray-100 rounded-md">
            {state.pendingTranslations.map((word) => word.word).join("")}
          </div>
        </>
      ) : null}
    </div>
  );
}

Find the full demo website at:

Scott-Hickmann-Sanas/Language-Translation-DemoGitHub

PreviousConnect With WebRTC NextUse Language Translation Session

Last updated 6 days ago