Guides

Text to Speech Apps for Kids Who Can't Talk

STSabiKo Team
February 13, 20269 min read
AACtext to speechTTSvoiceskidsnonverbal

When a child can't speak, a text-to-speech app becomes their voice. The quality of that voice matters more than most people think. It affects whether the child wants to use the app, whether listeners take them seriously, and whether communication feels natural or mechanical.

This guide covers how TTS works in AAC apps, why voice quality has improved dramatically, and what to look for when choosing an app for a nonverbal or minimally verbal child.

What Text-to-Speech Actually Does in AAC

In an AAC context, text-to-speech converts symbols or text into spoken words. The child taps a symbol for "I," then "want," then "water." The app combines those words and speaks the sentence aloud: "I want water."

This is different from how most people encounter TTS. Siri and Alexa use TTS for short responses. Screen readers use TTS to read web pages. But in AAC, TTS is someone's primary voice. It's how they ask for help, tell a joke, say "I love you," and argue with their siblings.

That difference changes what matters. A voice that's fine for reading a Wikipedia article aloud may not be acceptable as a child's personal voice.

Neural Voices vs. Robotic Voices

TTS technology has gone through several generations.

Robotic/concatenative voices

Early TTS systems (and many budget apps still) use concatenative synthesis. This approach records a human speaker saying thousands of short sound fragments, then stitches them together. The result sounds choppy, mechanical, and obviously artificial. Pauses land in odd places. Intonation is flat.

Neural voices

Modern TTS uses neural networks trained on large datasets of human speech. The output sounds remarkably natural. Inflection, rhythm, and emphasis follow patterns that real speech follows. Most listeners can't distinguish a high-quality neural voice from a recording of a human speaker.

The difference is not subtle. Put a robotic voice and a neural voice side by side, and the gap is immediately obvious.

Why Voice Quality Matters for Identity

For a speaking child, their voice is part of who they are. It carries personality, emotion, and identity. The same is true for an AAC user's synthesized voice. It just gets overlooked more often.

Consider a 7-year-old girl using an AAC app with a deep adult male voice. Or a teenager using a voice that sounds like a GPS unit from 2008. These mismatches affect how the user feels about their communication device and how others respond to them.

Good AAC apps offer multiple voice options so users can find one that matches their age, gender identity, and personality. Some apps now offer voices in multiple accents and languages, which matters for bilingual families using AAC.

Voice banking

Some people who are losing their speech due to progressive conditions like ALS can record their natural voice and create a synthetic version of it. This is called voice banking. It's a separate process from standard TTS, but some AAC apps support importing custom voice profiles.

How TTS Works with Symbol Boards

In most AAC apps, TTS and symbols work together in a pipeline:

  1. The user taps symbols on their communication board.
  2. Each symbol maps to a word or phrase stored in the app's vocabulary database.
  3. Selected words appear in a message bar at the top of the screen.
  4. The user taps a "speak" button (or the message bar itself).
  5. The app sends the text string to the TTS engine.
  6. The TTS engine generates audio and plays it through the device speaker.

This entire process happens in under a second on modern devices. The delay between tapping "speak" and hearing the voice is barely noticeable.

Grammar and sentence construction

Some AAC apps, including SabiKo, apply basic grammar rules before sending text to the TTS engine. If a child taps "I," "want," "go," "park," the app can output "I want to go to the park" with appropriate articles and prepositions inserted. This makes the spoken output sound more natural and grammatically correct.

Without grammar support, TTS reads exactly what's there, including awkward phrasing like "I want go park."

Offline vs. Cloud TTS

This is a critical distinction for AAC users.

FeatureOffline TTSCloud TTS
Requires internetNoYes
Voice qualityGood (neural voices available)Excellent
LatencyVery lowDepends on connection
PrivacyData stays on deviceAudio processed on servers
AvailabilityAlways worksFails without Wi-Fi
Voice varietyLimited selectionLarger selection

Why offline matters for AAC

A child needs to communicate at the playground, in the car, at grandma's house, and at the grocery store. None of these places guarantee internet access. If the TTS engine requires a cloud connection, the child literally loses their voice when the Wi-Fi drops.

This is not a minor inconvenience. It's a safety issue. A child who can't call for help because they're out of cell range is a child without communication.

The best approach: both

The ideal setup uses offline TTS as the default (so communication is always available) with the option to use higher-quality cloud voices when internet is available. Some apps handle this automatically.

Voice Options Across AAC Apps

Here's how major AAC apps compare on voice features:

AppVoice TypeLanguagesOfflineFree Voices
SabiKoNeural5 languages, 37 voicesYes6 neural voices free
Proloquo2GoNeural18 languagesYesIncluded with purchase
TouchChatAcapela/NeoSpeechMultipleYesIncluded with purchase
TD SnapAcapelaMultipleYesIncluded with purchase
CoughDropSystem TTSDevice-dependentVariesSystem voices only
LetMeTalkSystem TTSDevice-dependentVariesSystem voices only

A few things stand out. Premium apps like Proloquo2Go and TouchChat include voices, but the apps themselves cost $100 to $250. Budget options like LetMeTalk and CoughDrop rely on whatever voices the device's operating system provides, which vary widely in quality.

SabiKo includes 6 neural voices in the free tier, with 37 total voices across 5 languages available with Pro. That's an unusual amount of voice variety for a free app.

Choosing the Right Voice for Your Child

Selecting a voice is a personal decision. Here are practical guidelines:

Match age range. Use a child's voice for a child. Adult voices on children's devices create a disconnect that affects how peers and adults interact with the user.

Match language and accent. If your family speaks a specific language at home, find a voice in that language. If you speak English with a regional accent, look for voice options that reflect that.

Let the user choose. Whenever possible, let the AAC user listen to voice samples and pick the one they prefer. Self-determination matters, even in seemingly small choices. This is their voice.

Consider volume and clarity. Some voices sound great through headphones but are hard to hear in noisy environments. Test voices in real-world settings, like a busy kitchen or a classroom.

Test full sentences. A voice that sounds good saying single words may sound odd in full sentences. Listen to how it handles "Can I have more please" and "I don't want that" before deciding.

SabiKo's Voice System

SabiKo uses neural text-to-speech voices that work fully offline. Every voice is downloaded to the device, so communication is never dependent on internet access. For a detailed look at SabiKo's voice options, including how to choose the right one, see our neural voices feature guide.

The free tier includes 6 voices. Pro users get access to all 37 voices across 5 languages: English, Spanish, French, German, and Portuguese. Each language has multiple voice options so users can find one that fits.

All voices support the message bar and grammar correction features, so spoken output sounds natural even when the user selects symbols in a simplified order.

Common Questions

Will TTS replace my child's natural speech? No. Research consistently shows that AAC does not prevent or delay speech development. A 2006 meta-analysis by Millar, Light, and Schlosser reviewed 23 studies and found that AAC either had no effect on speech or increased it. TTS gives your child a way to communicate now while speech skills continue to develop.

Can my child use TTS at school? Yes. AAC devices (including apps on tablets) are protected under the Individuals with Disabilities Education Act (IDEA) in the US. Schools cannot prohibit a child from using their communication device. If you encounter resistance, request an IEP meeting and involve your child's SLP.

What if my child doesn't like any of the voices? Try different apps. Voice preferences are personal and valid. Some children prefer a higher pitch. Some prefer a lower one. Some want a voice that sounds like their parent. Keep trying until you find one that feels right.

Do I need to buy an expensive app to get good voices? Not anymore. Neural voices were once exclusive to premium apps costing hundreds of dollars. SabiKo offers 6 neural voices for free. The gap between free and paid voice quality has narrowed significantly.

Getting Started

  1. Download a free AAC app with neural voices. SabiKo is a good place to start.
  2. Let your child listen to the available voices and pick a favorite.
  3. Set up a basic communication board with core words.
  4. Practice in a quiet environment first, then gradually use it in noisier settings.
  5. Work with your child's SLP to integrate the app into therapy and daily routines.

A text-to-speech app can be the most important tool in a nonverbal child's life. The technology exists. The voices sound natural. Many options are free. There's no reason to wait.

Download SabiKo free and give your child a voice today.

Back to all posts