Phonikud

Hebrew Grapheme-to-Phoneme Conversion
for Real-Time Text-to-Speech

1Independent Researcher 2Reichman University 3Tel Aviv University
Model Architecture

Introduction

Real-time text-to-speech (TTS) for Modern Hebrew is challenging due to its complex writing system and underspecified phonetic features such as stress.

To address this, we present Phonikud, a lightweight, open-source grapheme-to-phoneme system that produces fully-specified IPA transcriptions with minimal latency.

We also introduce ILSpeech, a new Hebrew speech dataset with expert IPA annotations, designed for both benchmarking and training.

Our results demonstrate that Phonikud improves phoneme prediction accuracy and enables fast, effective Hebrew TTS models.

What Makes Us Different

โฑ๏ธ

Real-Time Inference

Works with real-time TTS like Piper using standard IPA phonemes.

๐Ÿ 

Edge Deployment

Runs locally - can be used on Raspberry Pi, Home Assistant, or other edge devices for private, efficient operation.

๐Ÿ“Š

Data-Efficient Training

Fine-tunes the TTS models with as little as 2 hours of data by leveraging models from other languages.

๐ŸŒ

Hebrew Phonetics

Handles Hebrew stress and vocal shva missed by existing methods.

โ™ฟ

Assistive Tech

Can be used on screen readers with low delay, even offline or remote.

๐ŸŽ™๏ธ

Open TTS Dataset

Published a studio-quality Hebrew speech dataset with ~2 hours of audio and hand-annotated IPA phonemes.

๐Ÿ“ฆ

Open Models & Training

Released Phonikud weights, Hebrew TTS models, and full training code.

๐ŸŽ›๏ธ

Fine-Grained Phonetic Control

You can edit phonemes directly or let the G2P system handle it automatically, giving control over stress and vowel sounds.

From Text to Speech

See how Phonikud transforms Hebrew text through each stage.

1
Text
ื”ืฉืคื” ื”ืขื‘ืจื™ืช ื ืฉืžืขืช ื™ืคื” ื›ืฉืžื‘ื˜ืื™ื ืื•ืชื” ื ื›ื•ืŸ
Input: Regular Hebrew text without vowel markings
2
Diacritics
ื”ึทืฉื‚ึผึธืคึธื” ื”ึธืขึดื‘ึฐืจึดื™ืช ื ึดืฉืึฐืžึทึซืขึทืช ื™ึธืคึธื” ื›ึผึฐืฉืึถืžึผึฐึฝื‘ึทื˜ึผึฐืึดื™ื ืื•ึนืชึธื”ึผ ื ึธื›ื•ึนืŸ
Enhanced diacritics with stress markers and vocal shva
3
Phonemes
hasafหˆa haส”ivสหˆit niสƒmหˆaส”at jafหˆa kสƒemevatส”หˆim ส”otหˆa naฯ‡หˆon.
Phonikud converts to precise IPA phonetic transcription
4
Audio
Real-time TTS synthesis from phonemes - listen to the result
๐Ÿ’ก
Flexible Input
Pro tip: You can input at any stage! Whether you want the model to add diacritics, add them yourself, or directly input phonemes. Try it in the demo!
Full control over the pipeline - input text, diacritics, or phonemes

Method Comparison

Comparative evaluation of Phonikud against existing Hebrew TTS approaches

Text Sample ElevenLabs
Eleven v3
Google
Gemini v2.5
RoboShaul
1st place
Phonikud (Ours)
Ours v1 (alpha)
ื”ื•ื ืฆืคื” ื‘ืกึซืจื˜ ื•ืจืื” ื—ื™ื” ืฉืฆึซืคื” ื‘ืžึซื™ื ๐Ÿธ
ื”ื•ื ืจืฆื” ืืช ื–ื” ื’ื ืื‘ืœ ื”ื™ื ืจึซืฆื” ืžื”ืจ ื•ื”ืงื“ึซื™ืžื” ืื•ืชื• ๐Ÿƒโ€โ™€๏ธ
ื‘ื•ื ืชืจื“ ืœืื›ื•ืœ ื™ืฉ ื‘ื•ืจึซืงืก ืขื ืชึซืจื“ ๐Ÿฅฌ

Citation

@misc{kolani2025phonikud,
  title={Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech},
  author={Yakov Kolani and Maxim Melichov and Cobi Calev and Morris Alper},
  year={2025},
  eprint={2506.12311},
  archivePrefix={arXiv},
  primaryClass={cs.CL},
  url={https://arxiv.org/abs/2506.12311},
}