A guide to the International Phonetic Alphabet, part I (2024)

Brian Smith

What the IPA is and what it isn’t, for those with no linguistics background (especially data scientists)

As a teacher of phonetics for the past few years, I’ve witnessed a lot of misunderstanding and mystery surrounding the International Phonetic Alphabet (IPA). This article explains what the IPA is, with a focus on its limitations and assumptions, written with non-linguists in mind (especially data scientists). The too-long-didn’t-read is that the IPA is extremely useful…but probably not as powerful as you think.

This is the first part of a series. In the next part, I walk through the Arpabet, a common way of representing the IPA in Machine Learning applications, and the CMU pronouncing dictionary, a great resource when working with English speech.

The International Phonetic Alphabet (IPA) is a notational system that’s used to represent spoken language as text. For example, the IPA symbol [ð] corresponds to the th sound at the beginning of the word the. You’ll notice that the IPA symbol is enclosed within square brackets ([ð]), which are typically used to indicate the sequence of symbols is a phonetic transcription. Symbols like [ð] can be put together to transcribe entire phrases. For example, we could transcribe the entire phrase the facts as [ðə fækts].¹ There are also diacritics, which can be combined with symbols to transcribe even more sounds.

The main purpose of the IPA is to provide a single set of symbols and diacritics that can be used to transcribe all of the words in every language. If there are two distinct words in a particular language, and those words don’t have the same pronunciation, the IPA will let you transcribe the difference. So any time you need to compactly represent the pronunciation of a word, the IPA can help. It’s useful not only to linguists documenting languages, but also actors, singers, language teachers, and engineers working on speech. Anytime you need to compactly represent pronunciation, the IPA is the right tool for the job.

Below you’ll find the consonant portion of the chart, with the diacritics omitted. The rows and columns of the chart refer to the articulatory characteristics of the sounds — how the sounds are made using the tongue, lips, vocal cords, etc. You’ll notice that there are lots of gaps, some white and some grey. The white gaps represent combinations of articulatory characteristics that don’t have an IPA symbol (although you can probably create one using diacritics). The grey gaps are sounds that are judged to be impossible to make, given the limitations of the human vocal tract.

A guide to the International Phonetic Alphabet, part I (2)

Resist the temptation to read too much into whether a sound has a simple symbol or requires diacritics, or whether it appears in the main chart or is tucked away on the side. In reality, some sounds are common in the world’s languages (e.g., clicks!) but aren’t included in the main chart, and some really common sounds require diacritics to transcribe.

So far, I bet this all sounds really straightforward (and boring!), but in truth, reality is messy (and interesting!).

When talking about phonetic transcription, it’s important to realize that representing spoken language as text is not easy! Speech is really just sound — pressure waves hitting your ear drum or a recording device. Every time you say a word like hello, the sound waves are a little different. The differences between your hello’s and another person’s hello’s are pretty noticeable. In fact, they’re so noticeable that someone can recognize your voice when you answer the phone, based on hearing just a single word.

The IPA abstracts away from all of these small differences. It assumes that the sound waves of speech can be represented as a sequence of discrete symbols, which are called speech segments, and it only has symbols for the aspects of pronunciation that are linguistically relevant.² This means a transcription of hello — [hɛloʊ] — loses a lot of fine-grained information that’s present in the acoustic signal. It doesn’t distinguish between loud vs. quiet hello’s, fast vs. slow hello’s, or my hello vs. your hello.

Since we use the same symbols for every language and every speaker, it’s often the case that noticeably different sounds are transcribed the same. For example, [fil] can be used to transcribe both feel in English and fil in French, even though the words sound pretty different. Below are recordings of me saying both. Although I have a French degree, my attempt pales in comparison to a native speaker.

Given everything I just said, you’re probably wondering, “Well how does anyone know what symbols to use?” The answer is that the symbols in the IPA chart correspond to descriptions of how sounds are made, and so we use the symbol that best matches how the sound seems to be made. For example, the symbol [ð] is used to transcribe a sound that’s made with your tongue touching your teeth (dental), noisy continuous airflow (fricative), and vibrating vocal folds (voiced). If you’re transcribing a sound that’s a voiced dental fricative, you’d probably use [ð].

I bet you’re thinking, “Really, probably use?”

Yes, probably use.

Since transcriptions are abstract approximations, it’s ultimately up to a linguist to decide the best way go about transcribing. There are a lot of judgment calls, which is why transcriptions should always be accompanied by an explanation, and are best when accompanied by sound files. The official Handbook for the International Phonetic Association says this directly:

“A transcription always consists of a set of symbols and a set of conventions for their interpretation. Furthermore, the IPA consists of symbols and diacritics whose meaning cannot be learned entirely from written descriptions of the phonetic categories involved.” (emphasis mine)

If you’re reading a transcription, it’s really important to discover the transcriber’s system as quickly as possible, and if you’re writing a transcription, it’s really important to include some guidance for future readers of your work to interpret your transcriptions correctly. If you’d like to see some examples, check out the Journal of the International Phonetic Association, which publishes illustrations of the IPA for different languages, each of which includes transcriptions, explanations, and recordings.

To demonstrate the sorts of judgment calls linguists make, here are four of the things that a linguist considers when transcribing. Each of these challenges the idea that the IPA is a truly universal and objective method of transcribing speech (although the IPA doesn’t claim to be that!). These aren’t the only considerations, but they’re the ones that seem to pop up most often.

Transcription is based on an analysis of the language. When a linguist writes a transcription, they’re making a decision (consciously or unconsciously) about what aspects of pronunciation matter. They do this based on what they know about the system of sounds in the language.

For example, consider the words beat and bit. They are definitely different words with different vowel sounds. If you study the pronunciation of the words and compare them, you‘ll notice that the vowel in beat has a longer duration and is made with your tongue higher and more forward (relative to the vowel in bit). The typical treatment of these vowels in English is to ignore the differences in duration and only transcribe the difference in tongue position, using the symbol [ɪ] for bit and [i] for beat, where the symbol [ɪ] is used for a vowel sound with a slightly lower tongue position than [i]. However, it would be just as correct to ignore the difference in tongue position and transcribe the difference in vowel duration instead, if that’s what you thought mattered, using the symbol [i] for bit and [iː] for beat (the diacritic [ː] means the preceding sound is relatively long).

The moral: how a linguist transcribes is influenced by their analysis of the language. In this example, is it tongue position or duration that really matters in the English vowel system?

Transcription is based on tradition. Linguists are often building on an existing body of work, which has its own established conventions, and these traditions influence transcription. For example, the English vowel sound found in the word cat is usually transcribed as [æ], but it could also be represented by [a]. In fact, many other languages have vowel sounds that are very similar to English’s [æ], and those are usually transcribed as [a]. Why do transcribers of English use [æ]? The tradition of English transcription dictates [æ], that’s why.³

Transcription is based on what’s easier to type or read. Transcriptions should be easy to type and read, and so many linguist use simpler symbols when possible. To keep things simple, they might even use a symbol that’s normally used for a different sound. For example, the r sound in English is transcribed by some linguists as [ɹ] and by other linguists as [r]. The articulatory description for [ɹ] in the IPA chart closely matches how the sound is made in English, so that’s a correct symbol to use, but [r] is easier to type, so some linguists use that one instead. Without a written explanation, this usage might be confusing, since the symbol [r] in the IPA chart refers to a very non-English sound (the rolled r of Spanish).

Transcription is based on what a sound sounds like. The IPA chart mainly describes sounds based on how they’re produced using the vocal tract, but in practice, transcription is usually based on what the transcriber hears. This is most common in vowels, where articulatory differences between vowels are difficult to measure accurately. To transcribe vowels, linguists commonly use a set of reference vowels — symbols that have established vowel sounds (where the ‘official’ sounds have been recorded) — and they choose the symbol that sounds closest, while taking into account the other concerns mentioned above (analysis of the language, tradition, typography, etc.). Transcribing vowels is definitely more of an art than a science, and transcribers sometimes disagree about which reference vowel is closest.⁴

In summary, the IPA is a really useful tool: one set of symbols and diacritics to transcribe every word of every language. You can read the IPA for a language you don’t know, and get a sense of how it sounds, and you can use transcribed speech to develop sophisticated speech technologies.

However, for the IPA to work, a linguist needs to take a complex sound wave and turn it into series of symbols. Doing so requires a lot of abstraction and judgment calls, and two linguists may hear the same exact sounds and come up with different transcriptions. As long as the transcriber can justify and explain their choices, the transcription is “correct”.

The moral is to always pay attention to a transcriber’s conventions for transcription, and remember that the IPA was not designed to be a completely objective way to represent speech. We have microphones and recordings for that.

[1] Brackets are commonly used for narrow / phonetic transcription, while slashes (/ðə fækts/) are used for broad / phonemic transcription. The narrowest transcription contains as much phonetic detail as possible, while the broadest transcription contains as little detail as possible. Even the narrowest phonetic transcriptions are subject to the limitations discussed in this article, since we’re still attempting to map sound waves to a set of discrete and universal symbols.

[2] I’m not going to get into what it means for a difference between two sounds to be linguistically relevant. The short answer is that it depends on what’s contrastive (https://en.wikipedia.org/wiki/Phonemic_contrast). Deciding what’s linguistically relevant isn’t easy, and linguists don’t always agree on it. Those decisions are made by the International Phonetic Association (confusingly, also called the IPA), who maintains the alphabet and publishes the Handbook of the IPA. They choose when to add, remove, or change symbols.

[3] In the 2014 edition of his phonetics textbook, The Sounds of Language, Henry Rogers dedicates a whole section to the puzzle of [æ] in English transcription. Why not use [a] like everyone else? He speculates that the historical reason is typographic. There are two a-like symbols in the IPA — [a] and [ɑ] — but typewriters only had a single a key. The solution was to use ae, which could be distinguished from a. This is as good of a time as any to warn you about the a-like symbols in the IPA, which vary widely in usage when transcribing English. Sometimes [a] is used to transcribe the vowel sound in cat, sometimes [a] is used to transcribe the vowel sound in father, and sometimes [a] describes a third, distinct vowel sound. This is another example of why you should always interpret a transcription carefully.

[4] The reference vowels are called cardinal vowels. The wikipedia page for cardinal vowels has links to some recordings, discussion of the oral tradition of training linguists to use cardinal vowels, and cites some interesting papers that show transcribers often disagree when it come to vowel transcription.

A guide to the International Phonetic Alphabet, part I (2024)

References