Research · Hebrew G2P

ReNikud

Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

Standard Hebrew G2P is usually trained on nikud-style text; ReNikud adds weak supervision from speech to model spoken pronunciation.

Maxim Melichov^*¹ Yakov Kolani^*² Morris Alper³

¹Reichman University ²Independent Researcher ³Carnegie Mellon University

Paper

Code

Datasets

Benchmark

Hebrew → IPA from real speech

ובדרך

Nikud-aligned

uvadˈeʁeχ

Spoken

vebadˈeʁeχ

Weak supervision from 1,700+ hrs Israeli speech

The gap

Prescriptive vocalization and spoken pronunciation do not always match.

Standard G2P predicts vowel diacritics and prescriptive rules. Everyday speech shifts vowels, simplifies conjunctions, and pronounces loanwords by ear.

ReNikud uses weak supervision from speech to train a character-aligned pseudo-vocalization model for spoken Hebrew G2P.

Colloquial phonology · conjunction

ובדרך

Prescriptive nikud uvadˈeʁeχ

Spoken norm vebadˈeʁeχ

Colloquial pronunciation

בירושלים

Prescriptive nikud biʁuʃalˈajim

Spoken norm bejeʁuʃalˈajim

Method

Weak labels from audio, then per-character pseudo-vocalization.

Audio pseudo-labeling

Parallel ASR on unlabeled speech produces Hebrew and IPA transcripts. An FST filter keeps only pairs that align — 1.52M sentences from ~1.7k hours.

Same clip · two ASR transcripts

Hebrew ASR

שלום

IPA ASR

ʃalˈom

שלום ʃalˈom

keep

שלום ʃagˈom

drop

Pseudo-vocalization

Character-level encoder with three parallel heads per letter — consonant, vowel, stress. Trained on FST-aligned pseudo-labels from step 1; constrained decoding yields spoken IPA.

FST-aligned labels

Per character · שלום shalom → ʃalˈom

Aligned inputs

IPA Heb ʃa ש lˈo ל ∅ ו m ם

Predicted heads per Hebrew character

C /ʃ/ /l/ ∅ /m/

V a o ∅ ∅

σ 0 1 0 0

Read by column: Hebrew letter → consonant, vowel, stress.

Consonant · 25 classes Consonant inventory abdefhijklmnopstuvwzɡʁʃʒʔχ ∅ All non-vowel phoneme symbols, or no consonant. Vowel · 7 classes Vowel targets a e i o u ∅ Hebrew vowel output is one vowel or none. Stress · binary Stress target ˈ / ∅ Binary head: stress mark or no stress.

Spoken Hebrew

Spoken Hebrew pronunciation cases from MILIM Benchmark.

Lexical slang

פאדיחה

Prescriptive nikud padiχˈa

Slang norm fadˈiχa

Colloquial phonology · conjunction

ומשפחה

Normative /u/ umiʃpaxˈa

Colloquial /ve/ vemiʃpaχˈa

Penultimate stress · loanword

קונספט

Mil’ra bias konsˈept

Mil’el target kˈonsept

Rare phoneme · /w/

וויסקי

Vav · /v/ vˈiski

Loanword · /w/ wˈiski

—

Abstract

Modern Hebrew grapheme-to-phoneme conversion is hard because Hebrew orthography is an abjad, with most vowels omitted in writing. Common pipelines infer pronunciation via nikud, but this depends on scarce annotated vocalization data, does not fully represent lexical stress, and reflects prescriptive norms more than everyday spoken usage.

ReNikud is an audio-supervised Hebrew G2P method. It builds phonemic pseudo-labels from large unlabeled Hebrew speech corpora using ASR, then trains a character-aligned pseudo-vocalization model that predicts IPA realizations at each grapheme position. On established Hebrew G2P benchmarks and targeted spoken-Hebrew evaluations, ReNikud improves over prior baselines.

—

Citation

@misc{melichov2026renikud,
  title={ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion},
  author={Maxim Melichov and Yakov Kolani and Morris Alper},
  year={2026},
  url={https://arxiv.org/pdf/2606.20179},
}