QVAC Logo

@qvac/transcription-parakeet

Automatic speech recognition (ASR) for speech-to-text with speaker diarization.

Overview

Bare module that adds support for transcription in QVAC using NVIDIA Parakeet ASR models via ONNX Runtime as the inference engine.

Parakeet supports multiple model variants:

  • TDT — multilingual transcription (~25 languages) with automatic language detection.
  • CTC — english-only, fast transcription with punctuation and capitalization.
  • EOU — real-time streaming with end-of-utterance detection (optimized for low latency).
  • Sortformer — speaker diarization (up to 4 speakers).

Models

Parakeet uses multiple model files depending on the variant:

TDT (multilingual):

  • Encoder ONNX model
  • Encoder data file
  • Decoder ONNX model
  • Vocabulary file
  • Preprocessor ONNX model

CTC (English-only):

  • Model ONNX file
  • Model data file
  • Tokenizer file

Sortformer (diarization):

  • Single ONNX model file

EOU (streaming):

  • Encoder ONNX model
  • Decoder ONNX model
  • Tokenizer file

Model files are available from Hugging Face:

Requirement

Bare \geq v1.20

Installation

npm i @qvac/transcription-parakeet

Quickstart

If you don't have Bare runtime, install it:

npm i -g bare

Create a new project:

mkdir qvac-parakeet-quickstart
cd qvac-parakeet-quickstart
npm init -y

Install dependencies:

npm i @qvac/dl-filesystem @qvac/transcription-parakeet bare-path bare-process

Download the TDT model files and place them in models/parakeet-tdt-0.6b-v3-onnx/:

  • encoder.onnx
  • encoder.onnx_data
  • decoder.onnx
  • vocab.txt
  • preprocessor.onnx

Download from Hugging Face.

Create index.js:

index.js
'use strict'

const path = require('bare-path')
const process = require('bare-process')
const binding = require('@qvac/transcription-parakeet/binding')
const { ParakeetInterface } = require('@qvac/transcription-parakeet/parakeet')

async function main () {
  const modelPath = path.join('.', 'models', 'parakeet-tdt-0.6b-v3-onnx')
  const audioPath = path.join('.', 'my-audio.wav')

  const config = {
    modelPath,
    modelType: 'tdt',
    maxThreads: 4,
    useGPU: false
  }

  const transcriptions = []

  const outputCallback = (handle, event, data, error) => {
    if (event === 'transcription' && data && data.text) {
      transcriptions.push(data.text)
    }
  }

  const parakeet = new ParakeetInterface(binding, config, outputCallback)

  await parakeet.loadWeights()
  await parakeet.activate()

  const fs = require('bare-fs')
  const audioBuffer = fs.readFileSync(audioPath)
  const audioData = audioBuffer.subarray(44) // Skip WAV header

  await parakeet.append({ type: 'audio', data: audioData.buffer })
  await parakeet.append({ type: 'end of job' })

  // Wait briefly for processing
  await new Promise(resolve => setTimeout(resolve, 5000))

  console.log('=== TRANSCRIPTION ===')
  console.log(transcriptions.join(' '))
  console.log('=====================')

  await parakeet.destroyInstance()
}

main().catch(err => {
  console.error(err)
  process.exit(1)
})

Run index.js:

bare index.js

Usage

1. Choose a Data Loader

First, select and instantiate a data loader that provides access to model files:

// Option A: Filesystem Data Loader - for local model files
const FilesystemDL = require('@qvac/dl-filesystem')
const fsDL = new FilesystemDL({
  dirPath: './path/to/model/files'
})

// Option B: Hyperdrive Data Loader - for peer-to-peer distributed models
const HyperDriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperDriveDL({
  key: 'hd://<driveKey>',
  store: corestore
})

2. Configure Parakeet Parameters

The addon accepts the following configuration:

KeyTypeDescription
modelPathstringPath to the model directory
modelTypestring'ctc', 'tdt', 'eou', or 'sortformer'
maxThreadsnumberMaximum CPU threads to use
useGPUbooleanEnable GPU acceleration
languagestringLanguage code or 'auto' (TDT only)

3. Configuration Example

// TDT (multilingual, recommended)
const config = {
  modelPath: './models/parakeet-tdt-0.6b-v3-onnx',
  modelType: 'tdt',
  maxThreads: 4,
  useGPU: false,
  language: 'auto'
}

// CTC (English-only, fastest)
const ctcConfig = {
  modelPath: './models/parakeet-ctc-0.6b-ONNX',
  modelType: 'ctc',
  maxThreads: 4,
  useGPU: false
}

// Sortformer (speaker diarization)
const sortformerConfig = {
  modelPath: './models/sortformer',
  modelType: 'sortformer',
  maxThreads: 4,
  useGPU: false
}

4. Create Model Instance

const binding = require('@qvac/transcription-parakeet/binding')
const { ParakeetInterface } = require('@qvac/transcription-parakeet/parakeet')

const outputCallback = (handle, event, data, error) => {
  if (event === 'transcription' && data && data.text) {
    console.log('Transcription:', data.text)
  }
}

const parakeet = new ParakeetInterface(binding, config, outputCallback)

5. Load Model

Load model weights and activate the inference engine:

try {
  await parakeet.loadWeights()
  await parakeet.activate()
} catch (error) {
  console.error('Failed to load model:', error)
}

6. Run Transcription

Pass audio data to the model for transcription:

try {
  const fs = require('bare-fs')
  const audioBuffer = fs.readFileSync('path/to/your/audio.wav')
  const audioData = audioBuffer.subarray(44) // Skip WAV header for raw PCM

  await parakeet.append({ type: 'audio', data: audioData.buffer })
  await parakeet.append({ type: 'end of job' })
} catch (error) {
  console.error('Transcription failed:', error)
}

Output Callback Events:

  • transcription — partial or complete transcription result (data.text, data.confidence, data.isFinal)
  • progress — processing progress (data.percent, data.timeElapsed)
  • diarization — speaker identification (data.speakerId, data.startTime, data.endTime)
  • complete — job completed successfully
  • error — error occurred (error string)

7. Release Resources

Always destroy the instance when finished to free memory and resources:

try {
  await parakeet.destroyInstance()
} catch (error) {
  console.error('Failed to destroy instance:', error)
}

More resources

Package at npm

On this page