Phonikud

Hebrew Grapheme-to-Phoneme Conversion
for Real-Time Text-to-Speech

Yakov Kolani¹ Maxim Melichov² Cobi Calev¹ Morris Alper³

¹Independent Researcher ²Reichman University ³Tel Aviv University

Introduction

Real-time text-to-speech (TTS) for Modern Hebrew is challenging due to its complex writing system and underspecified phonetic features such as stress.

To address this, we present Phonikud, a lightweight, open-source grapheme-to-phoneme system that produces fully-specified IPA transcriptions with minimal latency.

We also introduce ILSpeech, a new Hebrew speech dataset with expert IPA annotations, designed for both benchmarking and training.

Our results demonstrate that Phonikud improves phoneme prediction accuracy and enables fast, effective Hebrew TTS models.

What Makes Us Different

⏱️

Real-Time Inference

Works with real-time TTS like Piper using standard IPA phonemes.

🏠

Edge Deployment

Runs locally - can be used on Raspberry Pi, Home Assistant, or other edge devices for private, efficient operation.

📊

Data-Efficient Training

Fine-tunes the TTS models with as little as 2 hours of data by leveraging models from other languages.

🌍

Hebrew Phonetics

Handles Hebrew stress and vocal shva missed by existing methods.

♿

Assistive Tech

Can be used on screen readers with low delay, even offline or remote.

🎙️

Open TTS Dataset

Published a studio-quality Hebrew speech dataset with ~2 hours of audio and hand-annotated IPA phonemes.

📦

Open Models & Training

Released Phonikud weights, Hebrew TTS models, and full training code.

🎛️

Fine-Grained Phonetic Control

You can edit phonemes directly or let the G2P system handle it automatically, giving control over stress and vowel sounds.

From Text to Speech

See how Phonikud transforms Hebrew text through each stage.

Text

השפה העברית נשמעת יפה כשמבטאים אותה נכון

Input: Regular Hebrew text without vowel markings

Diacritics

הַשָּׂפָה הָעִבְרִית נִשְׁמַ֫עַת יָפָה כְּשֶׁמְּֽבַטְּאִים אוֹתָהּ נָכוֹן

Enhanced diacritics with stress markers and vocal shva

Phonemes

hasafˈa haʔivʁˈit niʃmˈaʔat jafˈa kʃemevatʔˈim ʔotˈa naχˈon.

Phonikud converts to precise IPA phonetic transcription

Audio

Real-time TTS synthesis from phonemes - listen to the result

💡

Flexible Input

Pro tip: You can input at any stage! Whether you want the model to add diacritics, add them yourself, or directly input phonemes. Try it in the demo!

Full control over the pipeline - input text, diacritics, or phonemes

Method Comparison

Comparative evaluation of Phonikud against existing Hebrew TTS approaches

Text Sample	ElevenLabs Eleven v3	Google Gemini v2.5	RoboShaul 1st place	Phonikud (Ours) Ours v1 (alpha)
הוא צפה בס֫רט וראה חיה שצ֫פה במ֫ים 🐸
הוא רצה את זה גם אבל היא ר֫צה מהר והקד֫ימה אותו 🏃‍♀️
בוא תרד לאכול יש בור֫קס עם ת֫רד 🥬

Citation

@misc{kolani2025phonikud,
  title={Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech},
  author={Yakov Kolani and Maxim Melichov and Cobi Calev and Morris Alper},
  year={2025},
  eprint={2506.12311},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2506.12311},
}