The phonology of Japanese features about 15
consonant In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are and pronounced with the lips; and pronounced with the front of the tongue; and pronounced ...
phonemes, the cross-linguistically typical five-
vowel A vowel is a Syllable, syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in Vowel ...
system of , and a relatively simple phonotactic distribution of phonemes allowing few consonant clusters. It is traditionally described as having a mora as the unit of timing, with each mora taking up about the same length of time, so that the disyllabic ("Japan") may be analyzed as and dissected into four moras, , , , and . Standard Japanese is a pitch-accent language, wherein the position or absence of a pitch drop may determine the meaning of a word: "chopsticks", "bridge", "edge" (see Japanese pitch accent). Unless otherwise noted, the following describes the standard variety of Japanese based on the
Tokyo dialect The Tokyo dialect () is a variety of Japanese language spoken in modern Tokyo Tokyo (; ja, 東京, , ), officially the Tokyo Metropolis ( ja, 東京都, label=none, ), is the capital and largest city of Japan Japan ( ja, 日 ...


*Voiceless stops are slightly aspirated: less aspirated than English stops, but more so than Spanish. *, a remnant of
Old Japanese is the oldest attested stage of the Japanese language, recorded in documents from the Nara period (8th century). It became Early Middle Japanese in the succeeding Heian period, but the precise delimitation of the stages is controversial. Old ...
, now occurs almost always medially in compounds, typically as a result of gemination (as in 切符 ''kippu'', 切腹 ''seppuku'' or 北方 ''hoppō'') or after (as in 音符 ''onpu''), and in a few older compounds as a result of the contractions of pronunciations over time (as in 河童 ''kappa''). It occurs initially or medially in onomatopoeia. Some few non-onomatopoeic exceptions where it occurs initially include 風太郎 ''pūtarō'', although as a personal name it's still pronounced ''Fūtarō''. As '' gairaigo'', loanwords of non-Middle-Chinese origin (non-Middle-Chinese Chinese borrowings such as パオズ ''paozu'', ペテン ''peten'' as well as borrowings from non-Chinese languages such as パーティ ''pāti'', etc.), enter the language, is increasingly used in transcription, initially or medially. * are laminal denti-alveolar (that is, the blade of the tongue contacts the back of the upper teeth and the front part of the alveolar ridge) and are laminal alveolar. is traditionally described as a velar or labialized velar approximant or something between the two, or as the semivocalic equivalent of with little to no rounding, while a 2020 real-time MRI study found it is better described as a bilabial approximant . *Consonants inside parentheses are allophones of other phonemes, at least in native words. In loanwords, sometimes occur phonemically, outside of the allophonic variation described below. * before and are alveolo-palatal . before is . before and are , but in most dialects they are neutralized as free variation between the two realizations; before is , but are also neutralized in most dialects (see below). Traditionally, it is described that, in neutralizing varieties, occur when word-initial or preceded by , and otherwise. However, a 2010 corpus study found that both variants were found in all positions, and that the time it takes to produce the consonant or consonant cluster (to which , , and pauses contribute) was the most reliable predictor for affricate realization. * is before and , and before , coarticulated with the labial compression of that vowel. Geminate is now only found in recent loanwords (e.g. ''Gohho'' '(van) Gogh', ''Bahha'' 'Bach') and rarely in Sino-Japanese or mixed compounds (e.g. ''juhhari'' 'ten stitches', ''zeffuchō'' 'terrible slump'). * is a syllable-final moraic nasal with variable pronunciation depending on what follows. It may be considered an allophone of in syllable-final position or a distinct phoneme. *Realization of the liquid phoneme varies greatly depending on environment and dialect. The prototypical and most common pronunciation is an apical tap, either alveolar or postalveolar . Utterance-initially and after , the tap is typically articulated in such a way that the tip of the tongue is at first momentarily in light contact with the alveolar ridge before being released rapidly by airflow. This sound is described variably as a tap, a "variant of ", "a kind of weak plosive", and "an affricate with short friction, ". The apical alveolar or postalveolar lateral approximant is a common variant in all conditions, particularly utterance-initially and before . According to , utterance-initially and intervocalically (that is, except after ), the lateral variant is better described as a tap rather than an approximant. The retroflex lateral approximant is also found before . In Tokyo's Shitamachi dialect, the alveolar trill is a variant marked with vulgarity. Other reported variants include the alveolar approximant , the alveolar stop , the
retroflex flap The voiced retroflex flap is a type of consonant In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are and pronounced with the lips; and pronou ...
, the lateral fricative , and the retroflex stop .


Non- coronal voiced stops between vowels may be weakened to
fricative A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in ...
s, especially in fast or casual speech: : However, is further complicated by its variant realization as a velar nasal . Standard Japanese speakers can be categorized into 3 groups (A, B, C), which will be explained below. If a speaker pronounces a given word consistently with the allophone (i.e., a B-speaker), that speaker will never have as an allophone in that same word. If a speaker varies between and (i.e., an A-speaker) or is generally consistent in using (i.e., a C-speaker), then the velar fricative is always another possible allophone in fast speech. may be weakened to nasal when it occurs within words—this includes not only between vowels but also between a vowel and a consonant. There is a fair amount of variation between speakers, however. suggests that the variation follows social class, while suggests that the variation follows age and geographic location. The generalized situation is as follows. ;At the beginning of words: * all present-day standard Japanese speakers generally use the stop at the beginning of words: > ''gaiyū'' 'overseas trip' (but not ) ;In the middle of simple words (i.e. non- compounds): * A. a majority of speakers use either or in free variation: > or ''kagu'' 'furniture' * B. a minority of speakers consistently use : > (but not ) * C. most speakers in western Japan and a smaller minority of speakers in Kantō consistently use : > (but not ) In the middle of compound words morpheme-initially: * B-speakers mentioned directly above consistently use . So, for some speakers the following two words are a minimal pair while for others they are homophonous: * ''sengo'' (せんご) 'one thousand and five' = for B-speakers * ''sengo'' () 'postwar' = for B-speakers To summarize using the example of ''hage'' 'baldness': * A-speakers: > or or * B-speakers: > * C-speakers: > or Some phonologists posit a distinct phoneme , citing pairs such as 'big sheet of glass' vs. 'big raven'.

Palatalization and affrication

The palatals and palatalize the consonants preceding them: : For
coronal consonant Coronals are consonants articulated with the flexible front part of the tongue. Among places of articulation, only the coronal consonants can be divided into as many articulation types: apical (using the tip of the tongue), laminal (using the ...
s, the palatalization goes further so that alveolo-palatal consonants correspond with dental or alveolar consonants ( 'field' vs. 'tea'): : and also palatalize to a palatal fricative (): > ''hito'' ('person') Of the allophones of , the affricate is most common, especially at the beginning of utterances and after , while
fricative A fricative is a consonant produced by forcing air through a narrow channel made by placing two articulators close together. These may be the lower lip against the upper teeth, in the case of ; the back of the tongue against the soft palate in ...
may occur between vowels. Both sounds, however, are in free variation. In the case of the when followed by , historically, the consonant was palatalized with merging into a single pronunciation. In modern Japanese, this is arguably a separate phoneme, at least for the portion of the population that pronounces it distinctly in English borrowings. : The vowel also affects consonants that it follows: : Although and occur before other vowels in loanwords (e.g. ''faito'' 'fight'; ''fyūjon'' 'fusion'; ''tsaitogaisuto'' 'Zeitgeist'; ''eritsin'' ' Yeltsin'), and are distinguished before vowels except (e.g. English ''fork'' vs. ''hawk'' > ''fōku'' vs. ''hōku'' ). is still not distinguished from (e.g. English ''hood'' vs. ''food'' > ''fūdo'' ). Similarly, and usually do not occur even in loanwords so that English ''cinema'' becomes ''shinema'' ; although they may be written and respectively, they are rarely found even among the most innovative speakers and do not occur phonemically.


The contrast between and is neutralized before and : . By convention, it is often assumed to be , though some analyze it as , the voiced counterpart to . The writing system preserves morphological distinctions, though spelling reform has eliminated historical distinctions except in cases where a mora is repeated once voiceless and once voiced, or where rendaku occurs in a compound word: , from . Some dialects retain the distinctions between and and between and , while others retain only and but not and , or merge all four.

Moraic nasal

Some analyses of Japanese treat the moraic nasal as an archiphoneme ; other less abstract approaches take its uvular or alveolar realization as basic (i.e., or ). It undergoes a variety of assimilatory processes. It is variously: * bilabial before . * laminal before coronals ; never found utterance-finally. Apical is found before liquid . * alveolo-palatal before alveolo-palatals . * velar before . Before palatalized consonants, it is also palatalized, as in . * some sort of nasalized vowel before vowels, approximants , liquid , and fricatives . Depending on context and speaker, the vowel's quality may closely match that of the preceding vowel or be more constricted in articulation. It is thus broadly transcribed with , an ''ad hoc'' semivocalic notation undefined for the exact place of articulation. It is also found utterance-finally. These assimilations occur beyond word boundaries. When utterance-final, the moraic nasal is traditionally described as uvular , sometimes with qualification that the occlusion may not always be complete or that it is, or approaches, velar after front vowels. However, instrumental studies in the 2010s showed that there is considerable variability in the realization of utterance-final and that it often involves a lip closure or constriction. A 2021 real-time MRI study found that the tongue position of utterance-final largely corresponds to that of the preceding vowel, though with overlapping locations, leading the researcher to conclude that has no specified place of articulation rather than a clear allophonic rule. 5% of the samples of utterance-final were realized as nasalized vowels with no closure, where appreciable tongue raising was observed only when following .


While Japanese features consonant gemination, there are some limitations in what can be geminated. Most saliently, voiced geminates are prohibited in native Japanese words. This can be seen with suffixation that would otherwise feature voiced geminates. For example, Japanese has a suffix, , , that contains what calls a "floating mora" that triggers gemination in certain cases (e.g. , , +, , > 'a lot of'). When this would otherwise lead to a geminated voiced obstruent, a moraic nasal appears instead as a sort of "partial gemination" (e.g. , , + , , > 'splashing'). In the late 20th century, voiced geminates began to appear in loanwords, though they are marked and have a high tendency to devoicing. A frequent example is loanwords from English such as ''bed'' and ''dog'' that, though they end with voiced singletons in English, are geminated (with an epenthetic vowel) when borrowed into Japanese. These geminates frequently undergo devoicing to become less marked, which gives rise to variability in voicing: : ''doggu'' → ''dokku'' ('dog') : ''beddo'' → ''betto'' ('bed') The distinction is not rigorous. For example, when voiced obstruent geminates appear with another voiced obstruent they can undergo optional devoicing (e.g. ''doreddo'' ~ ''doretto'' 'dreadlocks'). attributes this to a less reliable distinction between voiced and voiceless geminates compared to the same distinction in non-geminated consonants, noting that speakers may have difficulty distinguishing them due to the partial devoicing of voiced geminates and their resistance to the weakening process mentioned above, both of which can make them sound like voiceless geminates. There is some dispute about how gemination fits with Japanese phonotactics. One analysis, particularly popular among Japanese scholars, posits a special "mora phoneme" ( ''Mōra onso'') , which corresponds to the sokuon . However, not all scholars agree that the use of this "moraic obstruent" is the best analysis. In those approaches that incorporate the moraic obstruent, it is said to completely assimilate to the following obstruent, resulting in a geminate (that is, double) consonant. The assimilated remains unreleased and thus the geminates are phonetically long consonants. does not occur before vowels or nasal consonants. This can be seen as an archiphoneme in that it has no underlying place or manner of articulation, and instead manifests as several phonetic realizations depending on context, for example: : Another analysis of Japanese dispenses with . In such an approach, the words above are phonemicized as shown below: : Gemination can of course also be transcribed with a length mark (e.g. ), but this notation obscures mora boundaries.


Various forms of sandhi exist; the Japanese term for sandhi generally is , while sandhi in Japanese specifically is called . Most commonly, a terminal on one morpheme results in or being added to the start of the next morpheme, as in , (ten + ō = tennō). In some cases, such as this example, the sound change is used in writing as well, and is considered the usual pronunciation. See (''in Japanese'') for further examples.


* is a close near-back vowel with the lips unrounded () or compressed (). When compressed, it is pronounced with the side portions of the lips in contact but with no salient protrusion. In conversational speech, compression may be weakened or completely dropped. It is centralized after and palatalized consonants (), and possibly also after . * are mid . * is central . Except for , the short vowels are similar to their Spanish counterparts. Vowels have a phonemic length contrast (i.e. short vs. long). Compare contrasting pairs of words like ''ojisan'' 'uncle' vs. ''ojiisan'' 'grandfather', or ''tsuki'' 'moon' vs. ''tsūki'' 'airflow'. Some analyses make a distinction between a long vowel and a succession of two identical vowels, citing pairs such as ''satōya'' 'sugar shop' vs. ''satooya'' 'foster parent' . They are usually identical in normal speech, but when enunciated a distinction may be made with a pause or a glottal stop inserted between two identical vowels. Within words and phrases, Japanese allows long sequences of phonetic vowels without intervening consonants, pronounced with hiatus, although the
pitch accent A pitch-accent language, when spoken, has word accents in which one syllable in a word or morpheme is more prominent than the others, but the accentuated syllable is indicated by a contrasting pitch ( linguistic tone) rather than by loudnes ...
and slight rhythm breaks help track the timing when the vowels are identical. Sequences of two vowels within a single word are extremely common, occurring at the end of many ''i''-type adjectives, for example, and having three or more vowels in sequence within a word also occurs, as in ''aoi'' 'blue/green'. In phrases, sequences with multiple ''o'' sounds are most common, due to the direct object particle 'wo' (which comes after a word) being realized as ''o'' and the honorific prefix 'o', which can occur in sequence, and may follow a word itself terminating in an ''o'' sound; these may be dropped in rapid speech. A fairly common construction exhibiting these is ''... (w)o o-okuri-shimasu'' 'humbly send ...'. More extreme examples follow: :


In many dialects, the close vowels and become voiceless when placed between two voiceless consonants or, unless accented, between a voiceless consonant and a pausa. : Generally, devoicing does not occur in a consecutive manner: : This devoicing is not restricted to only fast speech, though consecutive devoicing may occur in fast speech. To a lesser extent, may be devoiced with the further requirement that there be two or more adjacent moras containing the same phoneme: : The common sentence-ending copula ''desu'' and polite suffix ''masu'' are typically pronounced and . Japanese speakers are usually not even aware of the difference of the voiced and devoiced pair. On the other hand, gender roles play a part in prolonging the terminal vowel: it is regarded as effeminate to prolong, particularly the terminal as in ''arimasu''. Some nonstandard varieties of Japanese can be recognized by their hyper-devoicing, while in some Western dialects and some registers of formal speech, every vowel is voiced. Recent research has argued that "vowel deletion" more accurately describes the phenomena. However, Japanese contrasts devoiced vowel between two identical voiceless fricatives and voiceless fricative gemination. Vowel between two identical voiceless fricatives may have either a weak voiceless approximant release or a revoiced vowel depending on the rate of speech and individual speech habits. * ('Nisshinbashi', a place name) vs. or ('Nishi-shinbashi', a place name). * ('check out') vs. or ('while erasing').


Japanese vowels are slightly nasalized when adjacent to nasals . Before the moraic nasal , vowels are heavily nasalized: :

Glottal stop insertion

At the beginning and end of utterances, Japanese vowels may be preceded and followed by a glottal stop , respectively. This is demonstrated below with the following words (as pronounced in isolation): : When an utterance-final word is uttered with emphasis, this glottal stop is plainly audible, and is often indicated in the writing system with a small letter ''tsu'' called a sokuon. This is also found in interjections like and . These words are likely to be romanized as and .


Japanese words have traditionally been analysed as composed of moras, a distinct concept from that of syllables. Each mora occupies one rhythmic unit, i.e. it is perceived to have the same time value. A mora may be "regular" consisting of just a vowel (V) or a consonant and a vowel (CV), or may be one of two "special" moras, and . A glide may precede the vowel in "regular" moras (CjV). Some analyses posit a third "special" mora, , the second part of a long vowel (a chroneme). In this table, the period represents a mora break, rather than the conventional syllable break. : : Traditionally, moras were divided into plain and palatal sets, the latter of which entail palatalization of the consonant element.. In such a classification scheme, the plain counterparts of moras with a palatal glide are onsetless moras. is restricted from occurring word-initially, and is found only word-medially. Vowels may be long, and the voiceless consonants may be geminated (doubled). In the analysis with archiphonemes, geminate consonants are the realization of the sequences , and sequences of followed by a voiceless obstruent, though some words are written with geminate voiced obstruents. In the analysis without archiphonemes, geminate clusters are simply two identical consonants, one after the other. In English, stressed
syllable A syllable is a unit of organization for a sequence of speech sounds typically made up of a syllable nucleus (most often a vowel A vowel is a Syllable, syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are on ...
s in a
word A word is a basic element of language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which human Hu ...
are pronounced louder, longer, and with higher pitch, while unstressed syllables are relatively shorter in duration. Japanese is often considered a mora-timed language, as each mora tends to be of the same length, though not strictly: geminate consonants and moras with devoiced vowels may be shorter than other moras. Factors such as pitch have negligible influence on mora length.


Standard Japanese has a distinctive
pitch accent A pitch-accent language, when spoken, has word accents in which one syllable in a word or morpheme is more prominent than the others, but the accentuated syllable is indicated by a contrasting pitch ( linguistic tone) rather than by loudnes ...
system: a word can have one of its moras bearing an accent or not. An accented mora is pronounced with a relatively high tone and is followed by a drop in pitch. The various
Japanese dialects The dialect The term dialect (from Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as ...
have different accent patterns, and some exhibit more complex tonic systems.

Sound change

As an agglutinative language, Japanese has generally very regular pronunciation, with much simpler morphophonology than a
fusional language Fusional languages or inflected languages are a type of synthetic language, distinguished from agglutinative languages by their tendency to use a single inflection In linguistic morphology, inflection (or inflexion) is a process of word ...
would. Nevertheless, there are a number of prominent sound change phenomena, primarily in morpheme combination and in conjugation of verbs and adjectives. Phonemic changes are generally reflected in the spelling, while those that are not either indicate informal or dialectal speech which further simplify pronunciation.



In Japanese, sandhi is prominently exhibited in rendakuconsonant mutation of the initial consonant of a morpheme from unvoiced to voiced in some contexts when it occurs in the middle of a word. This phonetic difference is reflected in the spelling via the addition of dakuten, as in . In cases where this combines with the yotsugana mergers, notably and in standard Japanese, the resulting spelling is morphophonemic rather than purely phonemic.


The other common sandhi in Japanese is conversion of or (''tsu, ku''), and or (''chi, ki''), and rarely or (''fu, hi'') as a trailing consonant to a geminate consonant when not word-final – orthographically, the sokuon , as this occurs most often with . So that * (いつ ''itsu'') + (しょ ''sho'') = (いっしょ ''issho'') * (''gaku'') + (''kō'') = (''gakkō'') Some long vowels derive from an earlier combination of a vowel and ''fu'' ふ (see onbin). The ''f'' often causes gemination when it is joined with another word: * (''hafu'' はふ > ''hō'' ほう) + (''hi'' ひ) = (''happi'' はっぴ), instead of ''hōhi'' ほうひ * (''kafu'' かふ > ''gō'' ごう) + (''sen'' せん) = (''kassen''), instead of ''gōsen'' * (''nifu'' > ''nyū'') + (''shō'') = (''nisshō''), instead of ''nyūshō'' * (''jifu'' > ''jū'') + (''kai'') = (''jikkai'') instead of ''jūkai'' Most words exhibiting this change are Sino-Japanese words deriving from
Middle Chinese Middle Chinese (formerly known as Ancient Chinese) or the Qieyun system (QYS) is the historical variety of Chinese recorded in the '' Qieyun'', a rime dictionary first published in 601 and followed by several revised and expanded editions. The ...
morphemes ending in , or , which were borrowed on their own into Japanese with a prop vowel after them (e.g., MC * > Japanese ) but in compounds as assimilated to the following consonant (e.g. MC * > Japanese ).


Sandhi also occurs much less often in , where, most commonly, a terminal or on one morpheme results in (or when derived from historical ''m'') or respectively being added to the start of a following morpheme beginning with a vowel or
semivowel In phonetics and phonology, a semivowel, glide or semiconsonant is a sound that is phonetically similar to a vowel A vowel is a Syllable, syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two ...
, as in . Examples: ;First syllable ending with : * (''ginnan''): (''gin'') + (''an'') → (''ginnan'') * (''kannon''): (''kwan'') + (''om'') → (''kwannom'') → (''kannon'') * (''tennō''): (''ten'') + (''wau'') → (''tennau'') → (''tennō'') ;First syllable ending with from original : * (''sanmi''): (''sam'') + (''wi'') → (''sammi'') → (''sanmi'') * (''onmyō''): (''om'') + (''yau'') → (''ommyau'') → (''onmyō'') ;First syllable ending with : * (''setchin''): (setsu) + (''in'') → (''setchin) * (''kuttaku''): (kutsu) + (''waku'') → (''kuttaku)


:1. usually not reflected in spelling
Another prominent feature is , particularly historical sound changes. In cases where this has occurred within a morpheme, the morpheme itself is still distinct but with a different sound, as in , which underwent two sound changes from earlier → (onbin) → (historical vowel change) → (long vowel, sound change not reflected in kana spelling). However, certain forms are still recognizable as irregular morphology, particularly forms that occur in basic verb conjugation, as well as some compound words.

Verb conjugation

Polite adjective forms

The polite adjective forms (used before the polite copula and verb ) exhibit a one-step or two-step sound change. Firstly, these use the continuative form, , which exhibits ''onbin'', dropping the ''k'' as → . Secondly, the vowel may combine with the preceding vowel, according to historical sound changes; if the resulting new sound is palatalized, meaning , this combines with the preceding consonant, yielding a palatalized syllable. This is most prominent in certain everyday terms that derive from an ''i''-adjective ending in ''-ai'' changing to ''-ō'' (''-ou''), which is because these terms are abbreviations of polite phrases ending in ''gozaimasu'', sometimes with a polite ''o-'' prefix. The terms are also used in their full form, with notable examples being: * , from . * , from . * , from . Other transforms of this type are found in polite speech, such as → and → .


The morpheme (with ''rendaku'' ) has changed to or , respectively, in a number of compounds. This in turn often combined with a historical vowel change, resulting in a pronunciation rather different from that of the components, as in (see below). These include: * , from → → . * , from → → . * , from → → . * , from → → . * , from → → → . * , from → → → . * , from → → → . * , from → → → → . is also found, as a variant of .


In some cases morphemes have effectively fused and will not be recognizable as being composed of two separate morphemes.

See also

* * Japanese grammar *
Japanese writing system The modern Japanese writing system uses a combination of logographic kanji are the logographic Chinese characters taken from the Chinese script and used in the writing of Japanese. They were made a major part of the Japanese writ ...
* Japanese honorifics * Japanese language and computers * Japanese language education * Japanese literature * Transcription into Japanese * Yotsugana, the different distinctions of historical *zi, *di, *zu, *du in different regions of Japan * Okinawan Japanese, a variant of Standard Japanese influenced by the
Ryukyuan languages The , also Lewchewan or Luchuan (), are the indigenous languages of the Ryukyu Islands, the southernmost part of the Japanese archipelago. Along with the Japanese language is spoken natively by about 128 million people, primarily by Ja ...
* Japanese loanwords in Hawaii



* * * * * * * * * * * * * * * * * * * * * * * * * * *

Further reading

* * * * * *(dissertation) * * * * * * * {{DEFAULTSORT:Japanese Phonology Phonologies by language