A Humble Narrator

Last revised: 04Apr2006 GLG

Table of Contents

Introduction

The Humble Narrator is a Facade, a simplifying overlay, for performing text-to-speech synthesis in Java programs running on Mac OS X. It is not a speech synthesizer. It only overlays another speech synthesizer, providing a uniform and easy-to-use API.

The Humble Narrator is written entirely in Java, and doesn't use any native-code (JNI) libraries. It can be redistributed under the open source Artistic License.

Concrete implementations are provided only for Mac OS X 10.1 or higher, although all the basic abstractions are platform-neutral. Other implementations can be written and incorporated, such as one based on the FreeTTS speech engine. This is left as an exercise for the interested reader.

There were two main design goals for the Humble Narrator:
   1. It should be simple and easy to use.
   2. Speech should be asynchronous but waitable.

Goal #2 means that speech would occur asynchronously by default, but a program could also choose to wait for speech to finish.

Files

The API docs are in the "API" directory.

A pre-compiled JAR is provided in the "Files" directory. It is not double-clickable, since there isn't a single main class. The provided JAR has a manifest whose class-path attribute specifies "/System/Library/Java/", so you can use the provided NSNarrator on Mac OS X without making further additions to your classpath.

All the Java source is provided in the "Source" sub-directory of "Files".

Design Summary

The central abstraction is the Narrator interface, and its central method is speak(). You call Narrator.speak() with the name of a speaking voice and some speakable text in String or char[] form. It then creates an Utterance, enqueues it to be spoken asynchronously, and returns the Utterance to the caller. The calling thread is delayed only for as long as it takes to produce the Utterance and enqueue it. A Narrator's queue size is typically limited only by available memory space.

You can await the completion of each Utterance a Narrator returns, or you can continue to queue up more text to be spoken, each as its own Utterance. You can also enqueue an unspoken "empty" Utterance, such as to mark the end of a series of other Utterances, then await its completion.

At any time, you can ask the Narrator for the Utterance currently being spoken, or you can tell the Narrator to shut up, discarding all current and queued Utterances. You can also ask the Narrator for an Enumeration of speaking-voice names, or retrieve a Voice object that characterizes the gender, age, and regional nature of a speaking-voice.

Those are the basic abstractions.

The basic concrete implementation is HumbleNarrator. It knows about several provided Mac OS X Narrator implementations, and picks one to do the actual work. HumbleNarrator works its way through these possible choices, from most capable to most compatible, and keeps the first one that can be successfully created. This acts as a creational proxy that hides details of implementation choices. You only write this Java code: new HumbleNarrator(), but behind the scenes, HumbleNarrator finds the best underlying Narrator it knows about that will actually work for the platform's current configuration.

That's about all there is to the API, because it was intended to be very simple. There is no specified phonemic representation, no pronunciation dictionaries, no sophisticated progress-callback mechanism, etc. A particular Narrator implementation may provide any of these, if it can, but how it does so is left to the specific Narrator implementation.

Package Summary

The Humble Narrator class-library has these packages:

Class and Interface Summary

glguerin.narrator.Narrator
This is the principal interface. Its principal method is speak(), which comes in two forms. One form takes a String to speak, while the other takes a char-array and two ints that define a range of text to speak. Both forms take a voice-name String parameter and return an Utterance.

The voiceNames() method returns an Enumeration of voice-name Strings. The getVoice() method returns a Voice for a given voice-name. Voice-names and voices may depend on the voice-synthesis engine and its voices installed on a particular computer.

The getParam() and setParam() methods get and set named integer parameters that affect optional capabilities of a particular Narrator imp. For example, you can change an enforceable limit on text length, enable and disable tracing diagnostics, and discover the maximum speakable text length. Different Narrator implementations may support various parameters that affect their behavior. See each imp's API docs for complete explanations.

The speaking() method returns any Utterance currently being spoken. When nothing is being spoken it returns null, so you can also use this method to test whether the Narrator is speaking or not.

The last method is shutup(), which cancels all current and queued Utterances. After calling shutup(), you can queue more speech and it will be spoken.

glguerin.suitep.Utterance
This interface represents a speech utterance, such as a phrase, sentence, or paragraph. It can also represent individual spoken words, but that's not a good way to use a Narrator unless you want each utterance to sound like individual spoken words.

The main use for an Utterance in an application is to wait for its completion with the awaitCompletion() method. You can wait indefinitely, or for a limited interval measured in milliseconds. You can always find out the current completion state with isComplete().

After an Utterance is complete, you can determine whether it was successfully completed (spoken) or not with isSuccessful(). Before completion, this method always returns false.

Any Utterance can have an "extra" Object attached to it, which is entirely for your own use. The getExtra() and setExtra() methods manage this.

glguerin.suitep.Voice
This class characterizes the name, gender, age, spoken language, and regional nature of a speaking voice. The getName() method returns the voice name String, as used by Narrator.speak(). The getGender() method returns an int whose sign represents the gender (masculine, feminine, or neuter). The getAge() method returns an int representing the approximate age of the speaking voice, in years. The getRegion() method returns a String representing the language and country of the speaking voice. It's similar to the Strings used with Locales.

Available Implementations

See the API docs for details and limitations.
glguerin.narrator.HumbleNarrator
This class is not an actual implementation of Narrator, but a proxy or surrogate that knows how to create an underlying Narrator imp. It then passes its method calls through to that underlying instance.

HumbleNarrator used to be an actual implementation, but that functionality has been moved to SayNarrator and OSANarrator. HumbleNarrator also knows about NSNarrator, and will prefer it if it's loadable. The order in which implementations are tried is:

  1. The fully qualified classname given by the "glguerin.narrator.class" property.
  2. NSNarrator.
  3. SayNarrator.
  4. OSANarrator.
  5. Narrate, a diagnostic-only implementation that emits text to System.out.

glguerin.narrator.imp.macosx.NSNarrator
This class uses the Cocoa-Java class NSSpeechSynthesizer, if it's available. If unavailable, it can't be instantiated, so HumbleNarrator will skip it. NSNarrator requires Mac OS 10.3 or higher. Earlier OS versions lack NSSpeechSynthesizer. It will work on both PowerPC and Intel-based Macs.

NSNarrator doesn't start another process, so it has low speech-starting latency. NSNarrator also doesn't filter its text, so it works better with voices that speak languages other than English, where accented letters make a difference to correct pronunciation.

NSNarrator supports an "after.ms" parameter, in addition to the usual "text.limit", "text.max", and "trace" parameters. See the API docs for details.

glguerin.narrator.imp.macosx.SayNarrator
This class uses the "/usr/bin/say" command, if it's available. If unavailable, it can't be instantiated, so HumbleNarrator will skip it. SayNarrator requires Mac OS 10.3 or higher. Earlier OS versions lack the "/usr/bin/say" command. It will work on both PowerPC and Intel-based Macs.

SayNarrator starts another process for each Utterance, which produces a slight speech-starting latency. SayNarrator also filters its text, eliminating accents from accented letters (among other things). This can affect pronunciation in non-English spoken languages.

SayNarrator supports the "text.limit", "text.max", and "trace" parameters. See the API docs for details.

glguerin.narrator.imp.macosx.OSANarrator
This class uses the "/usr/bin/osascript" command, if it's available. If unavailable, it can't be instantiated, so HumbleNarrator will skip it. OSANarrator requires Mac OS 10.1 or higher. It will work on both PowerPC and Intel-based Macs. It will not work on 10.0 because the 'osascript' command on that OS version doesn't support the '-e' option.

OSANarrator is a lot like SayNarrator -- it's even a subclass of SayNarrator. It starts another process for each Utterance, so it has a similar speech-starting latency. Its latency is longer than SayNarrator, because there's also an initial scripting latency. OSANarrator also filters its text, stripping accents and other things, which can affect pronunciation in non-English spoken languages.

OSANarrator supports the "text.limit", "text.max", and "trace" parameters. See the API docs for details.

SayNarrator and OSANarrator functionality used to be in HumbleNarrator itself. It was factored out when NSNarrator was added.

There is some variation in pronunciation for different versions of Mac OS X. You should take this into account if you use any of the above Narrators across several OS versions.

The above Narrators can accept phonemic text for Apple's voices. Two Apple references describe how. Although these documents refer to the classical Macintalk speech synthesizer, the Mac OS X speech synthesizer seems to use the same rules and phonemes:
    Writing Embedded Speech Commands
    Phonemic Representation of Speech.
Voices other than Apple's, such as those from Cepstral's web site, do not use the same phonemic representations.

Creating phonemic text entirely by hand is not easy. A Mac OS X program that can help is Apple's Cocoa Speech Synthesis Example, which can be found by following the "Speech Technologies" link on Apple's Accessibility Sample Code page. At the time of this writing, the download contains a ready-to-use double-clickable app in its "build" subdirectory. You shouldn't have to compile it with Project Builder or XCode unless you want to.

Revision History

06-Jun-2006 -- Update release
  • FIXED -- Voices in locations other than /System/Library/Speech/Voices would not be found. This was fixed by looking for voices in two other standard locations:
       /Library/Speech/Voices
       user-home/Library/Speech/Voices
  • FIXED -- Voices for speaking languages other than English are now properly recognized as such, and return an appropriate String from getRegion().
  • ADDED -- NSNarrator class, based on NSSpeechSynthesizer.
  • IMPROVED -- ### Something about the stdout-based Narrate for platforms other than Mac OS X.
  • DEPRECATED -- The HumbleNarrator.LIMIT field is now only used as an initial text-limit parameter. To change a Narrator's limit after creation, use Narrator.setParam().
  • Many other changes were also made, which should have no major effect on the public Narrator API.
#############
13-Apr-2004 -- Update release
  • FIXED -- There was a race condition between shutup() and the speaking thread that would lose Utterances queued shortly after a shutup(). This has been fixed, and an illustration can be seen in the class glguerin.narratortrials.Abortive.
  • ADDED -- Utterance.isSuccessful() returns a boolean indicating successful completion. You can use this to discover whether an Utterance was successfully spoken or not.
  • IMPROVED -- On Mac OS X 10.3 and higher, the HumbleNarrator will now use the /usr/bin/say command if it's available, otherwise it continues to use the 'osascript' command and "say" scripting verb. The advantage of /usr/bin/say is that it's more economical with CPU usage, and there may be a slightly lower starting latency.
  • IMPROVED -- The HumbleNarrator.LIMIT size was increased to 2K characters by default.
  • IMPROVED -- The HumbleNarrator and its Humble Utterance were refactored and revised. This should make it easier to make future improvements, or to create alternative implementations.
  • Other changes were also made, which should have no significant effect on the public Narrator API.

05-Apr-2004 -- Initial public release
  • IMPLEMENTATION -- Initial version of HumbleNarrator for Mac OS X executes 'osascript' command and "say" scripting verb to produce all speech.


To Greg's Home Page
To Greg's Software Page