jvosk

Modern desktop application for speech-to-text transcription using Vosk offline speech recognition.

Built with Java Swing and FlatLaf for a polished, cross-platform UI with multi-model support.

Download 0.1.0-SNAPSHOT

Made for macOS 🍎 image

Looks decent on Windows too :) 🪟 image

Features

Model Management

Audio Support

Transcription

Export Options

User Interface

Quality of Life

Quick Start

Requirements

Build & Run

# Clone the repository
git clone https://github.com/palaashatri/jvosk.git
cd jvosk

# Build the project
mvn clean package

# Run the application
mvn exec:java -Dexec.mainClass=atri.palaash.jvosk.App

First Use

  1. Launch the app
  2. Open Models → Manage Models… (Cmd/Ctrl+Shift+M)
  3. Download a model for your language:
    • For English: vosk-model-small-en-us-0.15 (40MB) or vosk-model-en-us-0.22 (1.8GB)
    • For other languages, browse the available models
  4. Click “Download Model” and wait for completion
  5. Select the downloaded model and click “Use This Model”
  6. Click “Browse Files…” or drag & drop an audio file
  7. Wait for transcription to complete
  8. Save or copy your transcript!

Model Management

Accessing Model Manager

Available Models

The app provides access to 150+ models from the official Vosk repository:

Popular Languages:

Model Types:

Downloading Models

  1. Open Model Manager
  2. Browse available models in the table
  3. Select a model
  4. Click “Download Model”
  5. Wait for download and extraction (progress shown)
  6. Model is automatically installed and ready to use

Note: Large models will show a confirmation dialog before downloading.

Switching Models

Quick Switch:

  1. Menu: Models → Switch Model…
  2. Select from installed models
  3. Confirm selection

Or via Model Manager:

  1. Open Model Manager
  2. Select an installed model
  3. Click “Use This Model”

Automatic Updates

Deleting Models

  1. Open Model Manager
  2. Select an installed model
  3. Click “Delete Model”
  4. Confirm deletion

Technical Details

Architecture

New Components:

Model Storage:

Dependencies

<dependencies>
    <!-- Core speech recognition -->
    <dependency>
        <groupId>com.alphacephei</groupId>
        <artifactId>vosk</artifactId>
        <version>0.3.38</version>
    </dependency>
    
    <!-- UI framework -->
    <dependency>
        <groupId>com.formdev</groupId>
        <artifactId>flatlaf</artifactId>
        <version>3.4.1</version>
    </dependency>
    
    <!-- Audio conversion -->
    <dependency>
        <groupId>ws.schild</groupId>
        <artifactId>jave-all-deps</artifactId>
        <version>3.5.0</version>
    </dependency>
    
    <!-- Web scraping for model registry -->
    <dependency>
        <groupId>org.jsoup</groupId>
        <artifactId>jsoup</artifactId>
        <version>1.17.2</version>
    </dependency>
    
    <!-- JSON processing -->
    <dependency>
        <groupId>com.fasterxml.jackson.core</groupId>
        <artifactId>jackson-databind</artifactId>
        <version>2.17.1</version>
    </dependency>
</dependencies>
  1. Wait for transcription to complete
  2. Copy, save, or export your transcript

Technology Stack

License

MIT

Contributing

Contributions welcome! Please open an issue or submit a pull request.