jvosk
Modern desktop application for speech-to-text transcription using Vosk offline speech recognition.
Built with Java Swing and FlatLaf for a polished, cross-platform UI with multi-model support.
Download 0.1.0-SNAPSHOT
Made for macOS 🍎 
Looks decent on Windows too :) 🪟 
Features
Model Management
- Multi-Model Support: Download and manage multiple Vosk models
- Automatic Updates: Check for model updates at startup
- 150+ Models Available: All models from alphacephei.com/vosk/models
- Easy Switching: Switch between models on the fly
- Smart Downloads:
- Small models (< 500MB) for quick downloads
- Big models (> 500MB) with download confirmation
- Progress tracking for all downloads
- 40+ Languages: English, Chinese, Russian, French, German, Spanish, and many more
- Model Manager UI:
- View all available models with details (size, language, accuracy)
- Download new models with progress bar
- Delete unused models
- Check for updates
- Filter by installed/available status
Audio Support
- Multiple Formats: WAV, MP3, M4A, FLAC, OGG, AAC, WMA, OPUS
- Automatic Conversion: Built-in audio conversion (no ffmpeg required!)
- Drag & Drop: Simply drag audio files into the app
- File Browser: Standard file picker with format filtering
Transcription
- Offline Processing: No internet required, privacy-first
- Real-time Progress: Visual feedback during transcription
- Accurate Results: Powered by Vosk speech recognition
- Optional Timestamps: Add
[HH:MM:SS] timestamps to each segment
Export Options
- Plain Text (.txt)
- Subtitle Formats (SRT, VTT)
- Structured Data (JSON)
- Markdown (.md)
User Interface
- Modern Design: Clean, professional interface with FlatLaf
- Dark Mode: System-aware dark/light theme toggle
- Keyboard Shortcuts: Streamlined workflow
Cmd/Ctrl+O - Open file
Cmd/Ctrl+S - Save transcript
Cmd/Ctrl+N - Clear/New
Cmd/Ctrl+Shift+C - Copy to clipboard
Cmd/Ctrl+Shift+M - Manage models
Cmd/Ctrl+± - Adjust font size
- Statistics: Word count, character count, WPM
- Recent Files: Quick access to previously transcribed files
- Audio Info: Display duration, format, sample rate
- Progress Tracking: Real-time transcription progress
Quality of Life
- Copy to Clipboard: One-click copy of transcription
- Cancel Anytime: Stop long transcriptions mid-process
- Unsaved Changes Warning: Never lose work accidentally
- Persistent Preferences: Remembers your settings and selected model
- Adjustable Font: Customize text size for comfort
Quick Start
Requirements
- Java 17 or higher
- Maven 3.6+
- No additional dependencies needed!
Build & Run
# Clone the repository
git clone https://github.com/palaashatri/jvosk.git
cd jvosk
# Build the project
mvn clean package
# Run the application
mvn exec:java -Dexec.mainClass=atri.palaash.jvosk.App
First Use
- Launch the app
- Open Models → Manage Models… (
Cmd/Ctrl+Shift+M)
- Download a model for your language:
- For English:
vosk-model-small-en-us-0.15 (40MB) or vosk-model-en-us-0.22 (1.8GB)
- For other languages, browse the available models
- Click “Download Model” and wait for completion
- Select the downloaded model and click “Use This Model”
- Click “Browse Files…” or drag & drop an audio file
- Wait for transcription to complete
- Save or copy your transcript!
Model Management
Accessing Model Manager
- Menu: Models → Manage Models…
- Keyboard:
Cmd/Ctrl+Shift+M
Available Models
The app provides access to 150+ models from the official Vosk repository:
Popular Languages:
- English: 10+ models (US, Indian accents)
- Chinese: 3 models
- Russian: 4 models
- French: 3 models
- German: 4 models
- Spanish: 2 models
- Japanese: 2 models
- And 40+ more languages!
Model Types:
- Small Models (< 100MB): Fast, good for mobile/desktop, lightweight
- Big Models (> 500MB): Higher accuracy, server-grade
- Punctuation Models: Add punctuation and capitalization
- Speaker ID Models: Identify different speakers
Downloading Models
- Open Model Manager
- Browse available models in the table
- Select a model
- Click “Download Model”
- Wait for download and extraction (progress shown)
- Model is automatically installed and ready to use
Note: Large models will show a confirmation dialog before downloading.
Switching Models
Quick Switch:
- Menu: Models → Switch Model…
- Select from installed models
- Confirm selection
Or via Model Manager:
- Open Model Manager
- Select an installed model
- Click “Use This Model”
Automatic Updates
- On startup, the app checks for model updates
- If updates are available, you’ll see a notification
- Updates can be downloaded through the Model Manager
- Models are never auto-updated without your confirmation
Deleting Models
- Open Model Manager
- Select an installed model
- Click “Delete Model”
- Confirm deletion
Technical Details
Architecture
New Components:
VoskModel: Data class for model metadata
ModelRegistry: Parses Vosk models page and fetches model information
ModelManager: Handles downloading, installing, version checking, and loading models
ModelManagerDialog: UI for managing models
- Enhanced
VoskTranscriber: Supports switching between models
- Updated
App: Checks for model updates on startup
Model Storage:
- All models stored in
models/ directory
- Each model in its own subdirectory
- Models are standard Vosk format (can be used with other Vosk tools)
Dependencies
<dependencies>
<!-- Core speech recognition -->
<dependency>
<groupId>com.alphacephei</groupId>
<artifactId>vosk</artifactId>
<version>0.3.38</version>
</dependency>
<!-- UI framework -->
<dependency>
<groupId>com.formdev</groupId>
<artifactId>flatlaf</artifactId>
<version>3.4.1</version>
</dependency>
<!-- Audio conversion -->
<dependency>
<groupId>ws.schild</groupId>
<artifactId>jave-all-deps</artifactId>
<version>3.5.0</version>
</dependency>
<!-- Web scraping for model registry -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.17.2</version>
</dependency>
<!-- JSON processing -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>2.17.1</version>
</dependency>
</dependencies>
- Wait for transcription to complete
- Copy, save, or export your transcript
Technology Stack
- Speech Recognition: Vosk
- Audio Processing: JAVE2 (FFmpeg wrapper)
- UI Framework: Java Swing with FlatLaf
- Build Tool: Maven
License
MIT
Contributing
Contributions welcome! Please open an issue or submit a pull request.