Showing posts with label text-to-speech. Show all posts
Showing posts with label text-to-speech. Show all posts

Friday, September 21, 2018

Using TTS on Android: Punctuation is read aloud

Leave a Comment

CONTEXT: My application is sending sentences to whatever TTS engine the user has. Sentences are user-generated and may contain punctuation.

PROBLEM: Some users report that the punctuation is read aloud (TTS says "comma" etc) on SVOX, Loquendo and possibly others.

QUESTION:

  1. Should I strip all punctuation?
  2. Should I transform the punctuation using this kind of API?
  3. Should I let the TTS engine deal with the punctuation?

The same user that sees the problem with Loquendo, does not have this problem with another Android application called FBReader. So I guess the 3rd option is not the right thing to do.

2 Answers

Answers 1

I had the same problem with one of my apps.

The input string was:

Next alarm in 10 minutes,it will be 2:45 pm

and the TTS engine would say:

Next alarm in 10 minutes comma it will be 2:45 pm.

The problem was fixed just by adding a space after the comma like this:

Next alarm in 10 minutes, it will be 2:45 pm

This is a stupid mistake, and maybe your problem is more complicated than that, but it worked for me. :)

Answers 2

So, you're worried about what back-alley-acquired text-to-speech engine the user might happen to have selected as their default... presumably because you don't want your app to look bad due to this engine's unknown/bad behavior. Understandable.

The (good) fact is, though, that the TTS's behavior is not actually your responsibility unless you decide to embed an engine in the app itself (Difficulty: Hard, Recommended? No).

Engines can and should be presumed to adhere to Android rules and behaviors dictated here... and presumed to supply their own sufficient set of configuration options in the Android system settings (home\settings\language&locale\TTS) which may or may not include pronunciation options. The user should also be presumed intelligent enough to install an engine that they are satisfied with.

It is a slippery slope to take on the job of anticipating and "correcting" for unknown and unwanted engine behaviors (at least in engines that you haven't tested yourself).

A SIMPLE AND GOOD OPTION (Difficulty: Easy):

  • Make a setting in your app: "ignore punctuation."

A BETTER OPTION (Difficulty: Medium):

  • Do the above, but only show the "ignore punctuation" setting-option if the engine you have detected on the user's device is prone to this issue.

Also, one thing to note is that there are many, many differences between engines (whether they use embedded voices vs online, response time, initialization time, reliability/adherence to Android specs, behavior across Android API levels, behavior across their own version history, the quality of voices, not to mention language capability)... differences that may be even more important to users than whether or not punctuation is pronounced.

You say "My application is sending sentences to whatever TTS engine the user has." Well... "That's yer problem right there." Why not give the user a choice on what engine to use?

And leads us to...

AN EVEN BETTER OPTION (Difficulty: Hard and Good! [in my humble opinion]):

  • Decide on some "known-good" engines your app will "support," starting with Google and Samsung. I would guess that there are less than 5% of devices out there these days that don't have either of those engines on them.
  • Study and test these engines as much as possible across all Android API levels that you plan to support... at least in as far as whether they pronounce punctuation or not.
  • Over time, test more engines if you like, and add them to your supported engines in subsequent app updates.
  • Run an algorithm when your app starts that detects which engines are installed, then use that info against your own list of supported engines:

private ArrayList<String> whatEnginesAreInstalled(Context context) {     final Intent ttsIntent = new Intent();     ttsIntent.setAction(TextToSpeech.Engine.ACTION_CHECK_TTS_DATA);     final PackageManager pm = context.getPackageManager();     final List<ResolveInfo> list = pm.queryIntentActivities(ttsIntent, PackageManager.GET_META_DATA);     ArrayList<String> installedEngineNames = new ArrayList<>();     for (ResolveInfo r : list) {         String engineName = r.activityInfo.applicationInfo.packageName;         installedEngineNames.add(engineName);          // just logging the version number out of interest         String version = "null";         try {             version = pm.getPackageInfo(engineName,             PackageManager.GET_META_DATA).versionName;             } catch (Exception e) {                 Log.i("XXX", "try catch error");             }         Log.i("XXX", "we found an engine: " + engineName);         Log.i("XXX", "version: " + version);     }     return installedEngineNames; } 

  • In your app's settings, present all engines that you've decided to support as options (even if not currently installed). This could be a simple group of RadioButtons with titles corresponding to the different engine names. If the user selects one that isn't installed, notify them of that and give them the option of installing it with an intent.
  • Save the user's selected engine name (String) in SharedPreferences, and use their selection as the last argument of the TextToSpeech constructor any time you need a TTS in your app.
  • If the user has some weird engine installed, present it as a choice also, even if it is unrecognized/unsupported, but inform them that they have selected an unknown/untested engine.
  • If the user selects an engine that is supported but is known to pronounce punctuation (bad), then upon selection of that engine, have an alert dialog pop up warning the user about that, explaining that they can turn this bad behavior off with the "ignore punctuation" setting referred to already.

SIDE-NOTES:

  • Don't let the SVOX/PICO (emulator) engine get you too worried -- it has many flaws and is not even designed or guaranteed to run on Android above API ~20, but is still included on emulators images up to API ~24, resulting in "unpredictable results" that don't actually reflect reality. I have yet to see this engine on any real hardware device made within the last seven years or so.

  • Since you say that "sentences are user generated," I would be more worried about solving the problem of what language they are going to be typing in! I'll look out for a question on that! :)

Read More

Tuesday, June 12, 2018

TTS dynamic library available

Leave a Comment

I have installed the ScanSoft Isabel voice (RealSpeak2) and I want a library to generate the phrases and reproduce them in my PC.

Is there a Text to Speech dll ready to use? I need a dynamic library callable with a function like speak(), as easy as that. I have heard that a dll can be created using the Microsoft Speech COM but I do not have acces to MSV Studio.

0 Answers

Read More

Tuesday, June 27, 2017

Android TTS checking for supported locale with missing/not downloaded voice data

Leave a Comment

I'm using Android's TextToSpeech class. Everything is working normally. However, there are languages/locales that aren't installed by default but supported by the TTS engine and I can't capture the state of missing voice data.

With the internet on, when I try to setLanguage to a new locale which its voice data hasn't been downloaded, it'll simply download the voice data and perform the speak method normally/successfully.

However, with internet off, when I try to setLanguage to a new locale which its voice data hasn't been downloaded, it attempts to download the voice data. But with no internet, it just indicates "downloading" on the "TTS voice data" settings screen under "Language and input" for the selected locale, without any progress. And as expected the speak method doesn't work since the voice data isn't downloaded. When this happens, I would think TTS methods setLanguage/isLanguageAvailable will return LANG_MISSING_DATA for me to capture this state, however, it simply returns LANG_COUNTRY_AVAILABLE. The situation is shown in this image: enter image description here

I want to be able to detect when the voice data of the locale being chosen isn't downloaded/missing and either give a toast message or direct user to download it. I have seen several posts suggesting the use of using isLanguageAvailable like this one. I also looked at the android documentation and it seems like isLanguageAvailable's return values should capture the state of missing voice data with LANG_MISSING_DATA.

I also tried sending an intent with ACTION_CHECK_TTS_DATA as the other way to check for missing data as suggested in the Android documentation I linked. However, the resultCode again didn't capture/indicate that the voice data is missing (CHECK_VOICE_DATA_FAIL) but returned CHECK_VOICE_DATA_PASS instead.

In this case, how should I capture the state of a language/locale being available/supported, with the voice data missing? I'm also curious why CHECK_VOICE_DATA_PASS and LANG_MISSING_DATA aren't the values returned. When the voice data is missing, shouldn't it return these values? Thanks! Below is the return value when I try to use setLanguage and isLanguageAvailable on locales that haven't had its voice data downloaded (0 and 1 are the returned value of the method shown in the logs, -1 is the one that corresponds to missing voice data): enter image description here

1 Answers

Answers 1

You can find all available Locale of the device using following function. hope this code will help you.

 Locale loc = new Locale("en");  Locale[] availableLocales= loc.getAvailableLocales();  Boolean available=Boolean.FALSE;  for (int i=0;i<availableLocales.length;i++)  {   if(availableLocales[i].getDisplayLanguage().equals("your_locale_language"))    {         available=Boolean.TRUE;         // TODO:      }  } 
Read More

Monday, May 22, 2017

Speech Synthesis API blocks main thread

Leave a Comment

I am using the Speech Synthesis API to pronounce a list of different words. My app animates the words in and out as they're being spoken via canvas. I realized that when I perform a new utterance:

var msg = new SpeechSynthesisUtterance(word); window.speechSynthesis.speak(msg); 

the spoken word appears to block the main thread, temporarily stalling the animation. This happens every time I call window.speechSynthesis.speak();.

Is there a way to have the speech synthesis run on a separate thread in Javascript, so it doesn't interfere with my animation on the main thread?

(I am primarily testing this in Chrome)

3 Answers

Answers 1

I'd use a setTimeout to fake an asynchronious call:

var msg = new SpeechSynthesisUtterance(word); setTimeout(function() { window.speechSynthesis.speak(msg); }, 1); 

I must admit I'm not sure about this.

Answers 2

An approach to follow here would be to use a worker thread that plays the audio corresponding to the text message.

A web-worker, however doesnot have access to the window object. So we cannot call the window.speechSythesis.speak method directly inside the worker.

A nice work around which text-to-speech library from Francois Laberge implements is

  1. Send the text to be spoken to the worker thread.
  2. The worker thread would then convert this text to an audio(WAV) file and return the main-thread the WAV file.
  3. The main thread on receiving a message from the worker thread will run the WAV file using an audio element.

In my opinion for even a better performance a worker pool can be created.

Please have a look at the demo here

Answers 3

I really suggest you to see this lovely summary of Philip Roberts about what is the event loop and why setTimeout 0 makes sense: https://www.youtube.com/watch?v=8aGhZQkoFbQ

In short a quick solution may be what Booster2ooo says, that you wrap the call within a setTimeout call:

setTimeout(function() { window.speechSynthesis.speak(msg); }, 0); 
Read More

Tuesday, March 7, 2017

iOS: AVSpeechSynthesizer : need to speak text in Headphone left channel

Leave a Comment

I am using AVSpeechSynthesizer for TextToSpeech. I have to play the speak in HeadPhone left Channel (Mono 2). I have got the following to set the output channel.

func initalizeSpeechForRightChannel(){     let avSession = AVAudioSession.sharedInstance()     let route = avSession.currentRoute     let outputPorts = route.outputs     var channels:[AVAudioSessionChannelDescription] = []     //NSMutableArray *channels = [NSMutableArray array];     var leftAudioChannel:AVAudioSessionChannelDescription? = nil     var leftAudioPortDesc:AVAudioSessionPortDescription? = nil      for  outputPort in outputPorts {         for channel in outputPort.channels! {             leftAudioPortDesc = outputPort             //print("Name: \(channel.channelName)")             if channel.channelName == "Headphones Left" {                 channels.append(channel)                 leftAudioChannel = channel             }else {                // leftAudioPortDesc?.channels?.removeObject(channel)             }         }      }      if channels.count > 0 {         if #available(iOS 10.0, *) {             print("Setting Left Channel")             speechSynthesizer.outputChannels = channels             print("Checking output channel : \(speechSynthesizer.outputChannels?.count)")          } else {             // Fallback on earlier versions             }         }     } 

I have 2 problems in the code

1. Cant able to set outputchannels , It always nil (It is happening on first time calling this method, consecutive calls working fine)

2. outputchannels supports from iOS 10.* But I need to support it from iOS 8.0

Please provide the best way to do that.

1 Answers

Answers 1

  1. Instead of checking the channelName, which is descriptive (i.e. for the user), check the channelLabel. There is an enumeration containing the left channel.

  2. I suspect this may not be possible pre-iOS 10. AVAudioSession doesn't appear to have any method to select only the left output channel. You may be able to use overrideAudioPort:error but it would affect the entire app.

Read More

Saturday, April 9, 2016

mbrola voice throws ProcessException “No audio data read” on linux CentOS

Leave a Comment

I am using mbrola voice (us1) on CentOS. I am trying to save the audio as wav file. But at the line (in bold below) - voice.speak(), it throws an exception ProcessException "No audio data read". It works fine when I run it on windows environment or even works on Linux with Kevin16 voice . Tried googling why voice.speak() command behaves this way for mbrola voices but could not find anything. Below is code, any clue ?

public static void createAudioFile(String text, String fileName) {     AudioPlayer audioPlayer = null;      System.setProperty("mbrola.base", Constants.mbrolaDiskPath);     Voice voice;     VoiceManager vm =  VoiceManager.getInstance();     voice = vm.getVoice("mbrola_us1");     //voice = vm.getVoice("kevin16");     voice.allocate();      try{         String directoryPath = audioDir+fileName;         audioPlayer = new SingleFileAudioPlayer(directoryPath,Type.WAVE);         voice.setAudioPlayer(audioPlayer);         **voice.speak(text);**         voice.deallocate();         audioPlayer.close();      }     catch(Exception e){         e.printStackTrace();     }  } 

0 Answers

Read More

Thursday, March 31, 2016

Initialising the TextToSpeech object on a worker thread

Leave a Comment

For years (literally), my application has suffered woes from poorly performing text to speech engines, in particular, the initialisation time when calling:

tts = new TextToSpeech(context, myOnInitListener); 

The above can cause the UI to lag and if you search for 'Text to Speech initialization slow' on SO, you'll find many posts. The embedded high quality IVONA voices used to be the worst culprit, but the Google TTS engine has now taken the prize.

Their most recent APK update, causes major lag on initialisation - No code necessary to test this, you can go to your Android Text to Speech settings and try switching between available engines, whilst pressing 'listen to a sample', the lag is demonstrated 'nicely'.

To try and combat this, I've implemented the following:

private volatile TextToSpeech tts;  AsyncTask.execute(new Runnable() {     @Override     public void run() {         tts = new TextToSpeech(context, volatileOnInitListener);     } }); 

This has completely cured the lag on initialisation, but I fear there may be side-effects to this that I've not considered? Can anyone think of any?

I'm puzzled also, as I had believed the the TextToSpeech Constructor was asynchronous and therefore moving this constructor to a worker thread should make no difference? If this implementation is the way forward, then why don't Google implement it in their TextToSpeechSettings?

Hope someone can clarify the above. Thanks in advance.

Edit - When I said the 'Constructor was asynchronous', I really was referring to the engine initialisation process it starts, and the eventual call to onInit

1 Answers

Answers 1

I had believed the the TextToSpeech Constructor was asynchronous

That is only partially true. A lot of the initialization is executed syncronously. Here is the Source

If this implementation is the way forward, then why don't Google implement it in their TextToSpeechSettings?

It seems like google seldomly checks how their code runs on mid and lowend devices, I guess the lag doesn't show on highend devices. (Another example of this happening can be seen in the current youtube app, for which I personally can confirm lag on a mid-spec device and no lag on a high-end device.) This is pure speculation as I am not affiliated to google.

I fear there may be side-effects to this that I've not considered? Can anyone think of any?

The only (obvious) sideeffect is that you can't use the tts engine syncronously, but have to wait for the asyncronous task to finish. But that is already the case anyway. The only thing you do is execute some code outside of the UI thread, which does not expect to be run on the UI thread. This should never be a problem. And even if there is a problem, you will only find it by testing it in an application.

In general you're good to go.

Read More