Research · Hebrew G2P

ReNikud

Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

Standard Hebrew G2P is usually trained on nikud-style text; ReNikud adds weak supervision from speech to model spoken pronunciation.

Maxim Melichov*1 Yakov Kolani*2 Morris Alper3

1Reichman University 2Independent Researcher 3Carnegie Mellon University

01

The gap

Prescriptive vocalization and spoken pronunciation do not always match.

Standard G2P predicts vowel diacritics and prescriptive rules. Everyday speech shifts vowels, simplifies conjunctions, and pronounces loanwords by ear.

ReNikud uses weak supervision from speech to train a character-aligned pseudo-vocalization model for spoken Hebrew G2P.

Colloquial phonology · conjunction

ובדרך

Prescriptive nikud uvadˈeʁeχ

Spoken norm vebadˈeʁeχ

Colloquial pronunciation

בירושלים

Prescriptive nikud biʁuʃalˈajim

Spoken norm bejeʁuʃalˈajim

02

Method

Weak labels from audio, then per-character pseudo-vocalization.

1

Audio pseudo-labeling

Parallel ASR on unlabeled speech produces Hebrew and IPA transcripts. An FST filter keeps only pairs that align — 1.52M sentences from ~1.7k hours.

Same clip · two ASR transcripts

Hebrew ASR
שלום
IPA ASR
ʃalˈom
FST filter
שלום ʃalˈom
keep
שלום ʃagˈom
drop
2

Pseudo-vocalization

Character-level encoder with three parallel heads per letter — consonant, vowel, stress. Trained on FST-aligned pseudo-labels from step 1; constrained decoding yields spoken IPA.

FST-aligned labels

Per character · שלום shalomʃalˈom

Aligned inputs
IPA Heb ʃa ש lˈo ל ו m ם
Predicted heads per Hebrew character
C /ʃ/ /l/ /m/
V a o
σ 0 1 0 0

Read by column: Hebrew letter → consonant, vowel, stress.

Consonant · 25 classes Consonant inventory abdefhijklmnopstuvwzɡʁʃʒʔχ ∅ All non-vowel phoneme symbols, or no consonant. Vowel · 7 classes Vowel targets a e i o u ∅ Hebrew vowel output is one vowel or none. Stress · binary Stress target ˈ / ∅ Binary head: stress mark or no stress.
03

Spoken Hebrew

Spoken Hebrew pronunciation cases from MILIM Benchmark.

Lexical slang

פאדיחה

Prescriptive nikud padiχˈa

Slang norm fadˈiχa

Colloquial phonology · conjunction

ומשפחה

Normative /u/ umiʃpaxˈa

Colloquial /ve/ vemiʃpaχˈa

Penultimate stress · loanword

קונספט

Mil’ra bias konsˈept

Mil’el target kˈonsept

Rare phoneme · /w/

וויסקי

Vav · /v/ vˈiski

Loanword · /w/ wˈiski

Abstract

Modern Hebrew grapheme-to-phoneme conversion is hard because Hebrew orthography is an abjad, with most vowels omitted in writing. Common pipelines infer pronunciation via nikud, but this depends on scarce annotated vocalization data, does not fully represent lexical stress, and reflects prescriptive norms more than everyday spoken usage.

ReNikud is an audio-supervised Hebrew G2P method. It builds phonemic pseudo-labels from large unlabeled Hebrew speech corpora using ASR, then trains a character-aligned pseudo-vocalization model that predicts IPA realizations at each grapheme position. On established Hebrew G2P benchmarks and targeted spoken-Hebrew evaluations, ReNikud improves over prior baselines.

Citation

@misc{melichov2026renikud,
  title={ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion},
  author={Maxim Melichov and Yakov Kolani and Morris Alper},
  year={2026},
  url={https://arxiv.org/pdf/2606.20179},
}