Transcription of Speech

/ tɹæn.ˈsgɹɪp.ʃn̩ əv sbit͡ʃ /


Stacks Image 102
Transcription, the ability to create an accurate written representation of speech, which may then be decoded exactly, is achieved through the use of the International Phonetic Alphabet (IPA).

Transcription data may also be analysed to arrive at a differential diagnosis. A Speech Sound Disorder (SSD) may then be identified, such as Articulation Disorder, Phonological Delay, Consistent Phonological Disorder, or Inconsistent Phonological Disorder (IPD).

This then leads to a care plan, where knowledge of speech and mapping meaning into the speech code (phonology) is essential.

Transcription, analysis and the identification of an appropriate evidence-based care plan for disordered speech is a skill is taught to pre-qualification speech and language therapy students and is a unique skill of the profession.



The days of listening to an audio cassette and referring to the IPA charts are long gone (although I do still have a comprehensive collection of 1980s cassettes, but that's another story).

You can now see and hear phones using this brilliant resource by the University of Glasgow and Queen Margaret University, Edinburgh:

Making high quality recordings of speech

Speech and Language Therapists should transcribe 'live', ideally sitting opposite the child to see the articulators moving. There is often insufficient time to check transcriptions at a later time and many children will have a few, easily identifiable articulation errors and/or commonly encountered phonological processes.

However, for children with moderate to severe SSD, where intelligibility is low, and where a second opinion, where a child is reluctant to speak or repeat items, and/or an objective outcome measure is required, a high quality audio recording is recommended.

Recordings may be made at low cost with free software, a computer and a USB microphone. Portable equipment may also be suitable.

See the page below for more information.

IPA Online: Online resources for practical phonetics

Ghada Khattab & Gerry Docherty realise the phones on the IPA and ext IPA alphabet.

Ideal for ear training.

QUIZ - Speech sounds

A simple multiple choice quiz by Dr Sean Pert.

Practice Voice, Place and Manner for English Phones

A sorting learning resource by Dr Sean Pert.

QUIZ - English phones: Consonants

A simple 'reveal' quiz to practice voice-place-manner by Dr Sean Pert.


  • The English vowels are not the same as IPA vowels. We usually use the vowel set and not IPA vowels for routine English transcriptions.
  • Northern English Vowel set. This is used in my teaching for the Greater Manchester area.
    • Note that length diacritics are only used in phonetic transcriptions (square brackets), not targets (slant brackets), since long and short vowels have different symbols. Only use the length diacritic if the child/client's realisation is much longer than expected.
    • Some vowels are not found in northern English, such as the strut vowel / ʌ /, and other vowels are distributed differently, such as [ ɑ ], typically produced as a short vowel in words such as 'path', but retained in words such as 'gala'.
    • Accents are just as complex as each other, and Speech and Language Therapists should not view these differences as disorders, nor attempt to treat these (we are not elocutionists!)

  • IPA Vowel chart - for describing vowel disorder, new languages or for research.
  • Vowel disorder in children is surprising rare. Most speech sound errors occur on the consonant segments.
  • Children are thought to acquire vowels by age 3 (Ball & Gibbon, 2002; Speake et al., 2012)
  • Speech and Language Therapists who encounter children 3+ or those with unusual vowel systems, should consult the references below, as well as carrying out a full assessment.
  • Clinical Assessment of Vowels - English System (CA-VES) is a free assessment available here:

RCSLT Guidelines

Child Speech Disorder Research Network (2017) Good practice guidelines for the transcription of children’s speech in clinical practice and research. Published on RCSLT members webpage ( and Bristol Speech and Language Therapy Research Unit webpage (www.speech-
Download here from RCSLT or locally, here.

Sally Bates, Jill Titterington and members of UK and Ireland’s Child Speech Disorder Research Network (authors). 2021. Good Practice Guidelines for the Analysis of Child Speech. This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. To view a copy of this license, visit or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
Download here from RCSLT or locally, here.

Writing IPA symbols

Most IPA transcription occurs 'live' in the clinical situation. Transcriptions of picture-word naming and connected speech are essential in the assessment of speech sound disorder.

These transcriptions may be scanned into computerised case note systems, but the fastest and most efficient way of transcribing in the moment is to use hand written symbols.

The relative sizes, orientations and shapes of these symbols are fixed and cannot be changed in the same way one might change English letters to achieve a unique handwriting style.

IPA symbols must be 'printed' across three imaginary lines so that another clinician can read what the child, young person or adult said accurately.

Click the button below to access the excellent guide by Gareth Walker (2022) Writing IPA symbols. The University of Sheffield.

You should not attempt to transcribe 'live' speech using this resource. It is, however, extremely useful for making your own resources such as recording forms, probes etc.

The resulting text pastes into most modern word processors (Microsoft Word, Apple Pages) and presentation software (Microsoft PowerPoint, Apple Keynote) without error or changing into random symbols.

Transcribing single sounds or phones

Single sound production, such as during a stimulability assessment where the speech and language therapist asks a child to repeat a single sound (phone) after a model are transcribed in square brackets.
This is because single phones have no meaning and are not being realised in the context of meaningful words.

So, transcribing a therapist and a client responding would look like this:

1. [ s ] [ ɬ ]
2. [ s ] [ s ]

In example #1, the therapist has said the single phone 's' and the child has tried to imitate this sound, but instead realised [ s ], a voiceless alveolar fricative as a voiceless alveolar lateral fricative. If consistent, and the child is old enough to have acquired the [ s ] phone, then this would be an Articulation Disorder. The child is not stimulable for [ s ].

In examples #2, the child is stimulable for [ s ].

Note that in the above examples, the sounds are made and are real compressions and rarefactions in the air which we could capture with a microphone. These sounds are called phones and NOT 'Phonemes'.

Transcribing sounds in the context of words (phonemes) realised as sounds (phones)

Phonemes carry meaning, and are psycholinguistic units. They are NOT real sounds, but rather a contrastive unit in the mind.

We refer to each phoneme by the word position. This is because children may treat phonemes differently according to word position.

Word initial (WI) - The first phoneme in the word, or the prevocalic (before the vowel) cluster
For example, / s- / is the word initial phoneme of 'sun' / sʊn /
For example, / sb- / is the word initial di-cluster in 'spider' / ˈsbaɪ.də /
For example, / sdɹ- / is the word initial tri-cluster in 'string' / sdɹɪŋ /

Word final (WF) - the last phoneme in the word, or the postvocalic (after the vowel) cluster
For example, / -n / is the word final phoneme of 'sun' / sʊn /.
For example, / -nt / is the word final di-cluster of 'elephant' / ˈe.lɪ.fənt /
For example, / -sks / is the word final tri-cluster of 'desks' / desks /

Within Word
Note, that if we accept that speech is formed of syllables, then the only thing 'word medial' is the syllable diacritic! Instead of 'word medial', it is more accurate to speak of 'within word'.
This gives two options:

Within Word Syllable Initial (WWSI)
For example, / k- / is the WWSI phoneme for Syllable 3 in 'helicopter' / ˈhe.lɪ.kɒp.tə /
For example, / sg- / is the WWSI di-cluster for Syllable 2 in 'biscuit' / ˈbɪ.sgɪt /
For example, / sgɹ- / is the WWSI try-cluster in Syllable 2 in 'ice cream' / aɪ.ˈsgɹim /

Within Word Syllable Final (WWSF)
For example, / -p / is the WWSF phoneme for Syllable 3 in 'helicopter' / ˈhe.lɪ.kɒp.tə /

Notation for transcribing / TARGET / [ REALISATION ] shows that the phoneme has been realised as a phone.
For example,

5. 'spider' / ˈsbaɪ.də / [ baɪgə ]
  • Cluster reduction? WI CC WI C (5 examples needed to confirm);
  • Not Context Sensitive Voicing as / sb- / is the target cluster.
  • WWSI Voiced alveolar plosive / -d / phoneme realised as the phone [ g ], Backing of alveolar to velar plosive? (5 examples needed to confirm)

  • Number of the item;
  • 'English gloss';
  • / Phonemic transcription /;
  • CVC word structure; [ Phonetic transcription ] CVC structure of the realisation.
  • Clinical notes and observations.

s-clusters in English - Voiced plosive elements

Many people transcribe the s-clusters of English incorrectly and as a result there is a risk of misdiagnosing Context Sensitive Voicing when the child is simply reducing the cluster to the adult form with a voiced plosive.

Please see below for further information, and Minimal Pairs that correctly contrast prevocalic s-clusters with voiced word initial plosives.

ACCENT/DIALECT and the role of the Speech and Language Therapist

I do not speak English with a Southern British Standard (SBS) accent. Born in Derbyshire and raised in North Nottinghamshire, I lived as a young adult in Manchester and now live in West Yorkshire. I therefore speak with a broadly northern accent. As my father was a coal miner, I also have lexical items and idioms from that community. These important aspects of identity are not speech defects, or errors. Just as home language is important for bilingual speakers, accents or dialects are valuable aspects of people's identities. It is therefore morally repugnant to consider modifying accent, and this is not the role of speech and language therapists. Those wishing to use a range of accents for acting purposes, or those seeking to assimilate to a more high status accent for career progression should consider a voice coach.

'Standard English', 'Received Pronunciation', 'BBC English' and even 'The Queen's (King's?) English and other labels are sometimes used for the accent often hear on television programmes, News programmes and on courses for adults learning English.

This assumption that SBS is the 'gold standard' or 'correct' form is a victory for the class system and the propaganda machine, the BBC World Service. Even though this is now being addressed, and one can hear more regional accents on the radio and on television, the use of regional accents continues to be judged negatively. This affects social mobility, job prospects and how people's intelligence is perceived.

The role of the speech and language therapist is to transcribe what the person says, without judgement. The aim of therapy should be to restore speech to the type used by their peers, family and community. Elocution, accent modification, or treating non-standard English pronunciation is not the role of the speech and language therapist.

This includes:
  • The use of phonemes. Some speech communities do not use all/the same contrastive system as the standard form. In Manchester, many speakers use labiodental fricatives instead of dental fricatives, such as 'thin' / θɪn / => [ fɪn ]; 'these' / ðiz / => [ viz ]. These differences would NOT be targets for therapy.
  • The use of allophones. Many accents have acceptable realisations of a phoneme which are sound difference, not errors. These include the use of a glottal stop for syllable/word final /t/, such as 'cat' / kæt / => [ kæʔ ]
  • The use of two syllables and a [w] or [j] instead of a triphthong. Consider 'fire', 'player' and 'flower'. These are all two syllable words in my northern accent.
  • The use of a different vowel system. Northern accents don't tend to have the 'strut' vowel / ʌ / and use / ʊ / for words such as 'butter'. There are other differences between vowel systems in the accents of English.
  • The use of dialectal forms of words. A warm drink such as tea or coffee would be described as 'a brew', which would confuse Australian English speakers, who would think that they were being offered a beer! If you want to spark a heated debate, ask English speakers how they pronounce 'scone' (see this map of distribution of the different pronunciations across the UK) and what they call small bread portions ('roll', 'stottie', 'bread cake', 'muffin', 'oven bottom muffin', 'bap', 'cob', 'morning roll', 'batch', 'Vienna', 'teacake', 'oggie', or 'scuffler'!)
    • This matters during word naming assessments such as vocabulary naming assessments and speech sound word naming assessments.

Does it matter if you use your own accent and treat in standard English?

Yes! It is not just that accent correction is not our role, but young children will often fail to understand instructions if the speaker does not use the child's vowel system. Nathan et al. (1998) found that four year old children "…found it harder to understand words spoken in an unfamiliar accent than in their own accent." (p. 359). This could potentially affect a child's score on a vocabulary or language comprehension assessment. Speech and Language Therapists should therefore only provide treatment for genuine speech sound errors, and not for accent/dialect differences. When working with a speech community, it is important to try and use the accent of that speech community.

Levon, E., Sharma, D., & Ilbury, C. (2022). Speaking up: Accents and social mobility.

Brown, M. (2022, Sunday 12th June 2022). Accent discrimination is alive and kicking in England, study suggests. The Guardian.

Parveen, N. (2020, Monday 19th October 2020). Students from northern England facing 'toxic attitude' at Durham University. the Guardian.

Listen to a Manchester accent at the British Library.


Ball, M. J., & Gibbon, F. E. (2002).
Vowel Disorders. Butterworth-Heinemann.

Bates, S., Milne, S., Sinclair, A., Sweet, J., & Watson, J. (2010).
Clinical Assessment of Vowels–English Systems (CAV-ES). Marjon University, Plymouth and Queen Margaret University, Edinburgh.

Nathan, L., Wells, B., & Donlan, C. (1998). Children’s comprehension of unfamiliar regional accents: a preliminary investigation. Journal of Child Language, 25, 343-365.

Speake, J., Stackhouse, J., & Pascoe, M. (2012). Vowel targeted intervention for children with persisting speech difficulties: Impact on intelligibility.
Child Language Teaching and Therapy, 28(3), 277-295.

We do not collect any data or track your activity on this website.