@qvac/transcription-parakeet
Automatic speech recognition (ASR) for speech-to-text with speaker diarization.
Overview
Bare module that adds support for transcription in QVAC using NVIDIA Parakeet ASR models via ONNX Runtime as the inference engine.
Parakeet supports multiple model variants:
- TDT — multilingual transcription (~25 languages) with automatic language detection.
- CTC — english-only, fast transcription with punctuation and capitalization.
- EOU — real-time streaming with end-of-utterance detection (optimized for low latency).
- Sortformer — speaker diarization (up to 4 speakers).
Models
Parakeet uses multiple model files depending on the variant:
TDT (multilingual):
- Encoder ONNX model
- Encoder data file
- Decoder ONNX model
- Vocabulary file
- Preprocessor ONNX model
CTC (English-only):
- Model ONNX file
- Model data file
- Tokenizer file
Sortformer (diarization):
- Single ONNX model file
EOU (streaming):
- Encoder ONNX model
- Decoder ONNX model
- Tokenizer file
Model files are available from Hugging Face:
- CTC: parakeet-ctc-0.6b-ONNX
- TDT: parakeet-tdt-0.6b-v3-onnx
- EOU: parakeet-rs realtime_eou_120m-v1-onnx
- Sortformer: parakeet-rs sortformer
Requirement
Bare v1.20
Installation
npm i @qvac/transcription-parakeetQuickstart
If you don't have Bare runtime, install it:
npm i -g bareCreate a new project:
mkdir qvac-parakeet-quickstart
cd qvac-parakeet-quickstart
npm init -yInstall dependencies:
npm i @qvac/dl-filesystem @qvac/transcription-parakeet bare-path bare-processDownload the TDT model files and place them in models/parakeet-tdt-0.6b-v3-onnx/:
encoder.onnxencoder.onnx_datadecoder.onnxvocab.txtpreprocessor.onnx
Download from Hugging Face.
Create index.js:
'use strict'
const path = require('bare-path')
const process = require('bare-process')
const binding = require('@qvac/transcription-parakeet/binding')
const { ParakeetInterface } = require('@qvac/transcription-parakeet/parakeet')
async function main () {
const modelPath = path.join('.', 'models', 'parakeet-tdt-0.6b-v3-onnx')
const audioPath = path.join('.', 'my-audio.wav')
const config = {
modelPath,
modelType: 'tdt',
maxThreads: 4,
useGPU: false
}
const transcriptions = []
const outputCallback = (handle, event, data, error) => {
if (event === 'transcription' && data && data.text) {
transcriptions.push(data.text)
}
}
const parakeet = new ParakeetInterface(binding, config, outputCallback)
await parakeet.loadWeights()
await parakeet.activate()
const fs = require('bare-fs')
const audioBuffer = fs.readFileSync(audioPath)
const audioData = audioBuffer.subarray(44) // Skip WAV header
await parakeet.append({ type: 'audio', data: audioData.buffer })
await parakeet.append({ type: 'end of job' })
// Wait briefly for processing
await new Promise(resolve => setTimeout(resolve, 5000))
console.log('=== TRANSCRIPTION ===')
console.log(transcriptions.join(' '))
console.log('=====================')
await parakeet.destroyInstance()
}
main().catch(err => {
console.error(err)
process.exit(1)
})Run index.js:
bare index.jsUsage
1. Choose a Data Loader
First, select and instantiate a data loader that provides access to model files:
// Option A: Filesystem Data Loader - for local model files
const FilesystemDL = require('@qvac/dl-filesystem')
const fsDL = new FilesystemDL({
dirPath: './path/to/model/files'
})
// Option B: Hyperdrive Data Loader - for peer-to-peer distributed models
const HyperDriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperDriveDL({
key: 'hd://<driveKey>',
store: corestore
})2. Configure Parakeet Parameters
The addon accepts the following configuration:
| Key | Type | Description |
|---|---|---|
modelPath | string | Path to the model directory |
modelType | string | 'ctc', 'tdt', 'eou', or 'sortformer' |
maxThreads | number | Maximum CPU threads to use |
useGPU | boolean | Enable GPU acceleration |
language | string | Language code or 'auto' (TDT only) |
3. Configuration Example
// TDT (multilingual, recommended)
const config = {
modelPath: './models/parakeet-tdt-0.6b-v3-onnx',
modelType: 'tdt',
maxThreads: 4,
useGPU: false,
language: 'auto'
}
// CTC (English-only, fastest)
const ctcConfig = {
modelPath: './models/parakeet-ctc-0.6b-ONNX',
modelType: 'ctc',
maxThreads: 4,
useGPU: false
}
// Sortformer (speaker diarization)
const sortformerConfig = {
modelPath: './models/sortformer',
modelType: 'sortformer',
maxThreads: 4,
useGPU: false
}4. Create Model Instance
const binding = require('@qvac/transcription-parakeet/binding')
const { ParakeetInterface } = require('@qvac/transcription-parakeet/parakeet')
const outputCallback = (handle, event, data, error) => {
if (event === 'transcription' && data && data.text) {
console.log('Transcription:', data.text)
}
}
const parakeet = new ParakeetInterface(binding, config, outputCallback)5. Load Model
Load model weights and activate the inference engine:
try {
await parakeet.loadWeights()
await parakeet.activate()
} catch (error) {
console.error('Failed to load model:', error)
}6. Run Transcription
Pass audio data to the model for transcription:
try {
const fs = require('bare-fs')
const audioBuffer = fs.readFileSync('path/to/your/audio.wav')
const audioData = audioBuffer.subarray(44) // Skip WAV header for raw PCM
await parakeet.append({ type: 'audio', data: audioData.buffer })
await parakeet.append({ type: 'end of job' })
} catch (error) {
console.error('Transcription failed:', error)
}Output Callback Events:
transcription— partial or complete transcription result (data.text,data.confidence,data.isFinal)progress— processing progress (data.percent,data.timeElapsed)diarization— speaker identification (data.speakerId,data.startTime,data.endTime)complete— job completed successfullyerror— error occurred (errorstring)
7. Release Resources
Always destroy the instance when finished to free memory and resources:
try {
await parakeet.destroyInstance()
} catch (error) {
console.error('Failed to destroy instance:', error)
}