Saturday, March 4, 2017

Bypassing Google TTS Engine initialization lag in Android

Leave a Comment

I have tried playing the TextToSpeech object when a specific event is triggered in the phone.

However, I facing issues with the default Google TTS engine that is installed on most phones. As of now, I am playing some text immediately after the TextToSpeech object is initialized, and shutting the resource as soon as the speech is completed, as per the following code:

public class VoiceGenerator { private Context context = null;  private static TextToSpeech voice = null;  public VoiceGenerator(Context context) {     this.context = context; }   public void voiceInit(String text) {     try {         if (voice == null) {              new Thread(new Runnable() {                 @Override                 public void run() {                     voice = new TextToSpeech(context, new TextToSpeech.OnInitListener() {                         @Override                         public void onInit(final int status) {                             try {                                 if (status != TextToSpeech.ERROR) {                                     voice.setLanguage(Locale.US);                                     Log.d("VoiceTTS", "TTS being initialized");                                     HashMap p = new HashMap<String, String>();                                     p.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "ThisUtterance");   //Speaking here                            voice.speak(text, TextToSpeech.QUEUE_ADD, p);                                      voice.setOnUtteranceProgressListener(new UtteranceProgressListener() {                                         @Override                                         public void onStart(String utteranceId) {                                          }                                          @Override                                         public void onDone(String utteranceId) {                                             Log.d("VoiceTTS", "TTS being released");                                             clearTtsEngine();                                         }                                          @Override                                         public void onError(String utteranceId) {                                          }                                     });                                 }                              } catch (Exception e) {                                 clearTtsEngine();                                 Log.d("ErrorLog", "Error occurred while voice play");                                 e.printStackTrace();                             }                           }                     });                 }             }).start();          }     }     catch(Exception e)     {         clearTtsEngine();         Log.d("ErrorLog","Error occurred while voice play");         e.printStackTrace();     } }  public static void clearTtsEngine() {     if(voice!=null)     {         voice.stop();         voice.shutdown();         voice = null;     }     } } 

However, the problem I am facing is the finite amount of delay associated with initializing the Google TTS Engine - about 6-8 seconds on my devices.

I have read on other posts that this delay can be avoided by using other TTS engines. Since I always develop on my Samsung phone, which has its own proprietary TTS configured by default, I never noticed this issue until I checked my app on other brand phones which has the Google TTS engine configured as default. But, I ideally don't want to force users to install another app along with my own, and I hence I would like this to work with the default Google TTS Engine itself.

Through some erroneous coding which I later rectified, I realized that if I could keep the TextToSpeech object initialized beforehand and always not null - once initialized, I could seemingly bypass this delay.

However, since there is a necessity to shutdown the resource once we are done with it, I am not able to keep the object alive and initialized for long, and I do not know when to initialize/shutdown the resource, since I technically need the voice to play anytime the specific event occurs, which mostly would be when my app is not open on the phone.

So my questions are the following :

  1. Can we somehow reduce or eliminate the initialization delay of Google TTS Engine, programmatically or otherwise?

  2. Is there any way through which I can keep the TextToSpeech object alive and initialized at all times like say, through a service? Or would this be a bad, resource-consuming design?

  3. Also is using a static TextToSpeech object the right way to go, for my requirements?

Any solutions along with code would be appreciated.

Update: I have confirmed that the delay is associated exclusively with Google TTS engine, as I have tried using other free and paid TTS engines, wherein there is little or no lag. But I would still prefer to not have any third party dependencies, if possible, and would like to make this work with Google TTS Engine.

UPDATE: I have seemingly bypassed this issue by binding this TTS object to a service and accessing it from the service. The service is STICKY (if the service terminates due to memory issue, Android OS will restart the service when memory is available again) and is configured to restart on reboot of the device.

The service only initializes the TTS object and does no other work. I am not explicitly stopping the service, allowing it to run as long as possible. I have defined the TTS object as a static, so that I can access it from other classes of my app.

Although this seems to be working amazingly well, I am concerned if this could lead to memory or battery issues (in my specific situation where service handles only object initialization and then remains dormant). Is there any problem in my design, or can any further improvements/checks be done for my design?

Manifest file :

<uses-permission android:name="android.permission.RECEIVE_BOOT_COMPLETED"/>   <application     android:allowBackup="false"     android:icon="@drawable/ic_launcher"     android:label="@string/app_name" >     <activity         android:name="activity.MainActivity"         android:label="@string/app_name"         android:screenOrientation="portrait" >         <intent-filter>             <action android:name="android.intent.action.MAIN" />              <category android:name="android.intent.category.LAUNCHER" />         </intent-filter>     </activity>      <receiver         android:name="services.BroadcastReceiverOnBootComplete"         android:enabled="true"         android:exported="false">         <intent-filter>             <action android:name="android.intent.action.BOOT_COMPLETED" />         </intent-filter>         <intent-filter>             <action android:name="android.intent.action.PACKAGE_REPLACED" />             <data android:scheme="package" />         </intent-filter>         <intent-filter>             <action android:name="android.intent.action.PACKAGE_ADDED" />             <data android:scheme="package" />         </intent-filter>     </receiver>       <service android:name="services.TTSService"></service> 

BroadcastReceiver code :

public class BroadcastReceiverOnBootComplete extends BroadcastReceiver {  @Override public void onReceive(Context context, Intent intent) {     if (intent.getAction().equalsIgnoreCase(Intent.ACTION_BOOT_COMPLETED)) {         Intent serviceIntent = new Intent(context, TTSService.class);         context.startService(serviceIntent);     } } 

}

TTSService code:

public class TTSService extends Service {  private static TextToSpeech voice =null;  public static TextToSpeech getVoice() {     return voice; }  @Nullable @Override  public IBinder onBind(Intent intent) {     // not supporting binding     return null; }  public TTSService() { }  @Override public int onStartCommand(Intent intent, int flags, int startId) {      try{         Log.d("TTSService","Text-to-speech object initializing");          voice = new TextToSpeech(TTSService.this,new TextToSpeech.OnInitListener() {             @Override             public void onInit(final int status) {                 Log.d("TTSService","Text-to-speech object initialization complete");                                 }             });      }     catch(Exception e){         e.printStackTrace();     }       return Service.START_STICKY; }  @Override public void onDestroy() {     clearTtsEngine();     super.onDestroy();  }  public static void clearTtsEngine() {     if(voice!=null)     {         voice.stop();         voice.shutdown();         voice = null;     }    } } 

Modified VoiceGenerator code:

public class VoiceGenerator {  private TextToSpeech voice = null;  public VoiceGenerator(Context context) {     this.context = context; }   public void voiceInit(String text) {    try {         if (voice == null) {              new Thread(new Runnable() {                 @Override                 public void run() {                      voice = TTSService.getVoice();                     if(voice==null)                         return;                      voice.setLanguage(Locale.US);                     HashMap p = new HashMap<String, String>();                     p.put(TextToSpeech.Engine.KEY_PARAM_UTTERANCE_ID, "ThisUtterance");                     voice.speak(text, TextToSpeech.QUEUE_ADD, p);                      voice.setOnUtteranceProgressListener(new UtteranceProgressListener() {                         @Override                         public void onStart(String utteranceId) {                          }                          @Override                         public void onDone(String utteranceId) {                         }                          @Override                         public void onError(String utteranceId) {                          }                     });                 }             }).start();          }     }     catch(Exception e)     {         Log.d("ErrorLog","Error occurred while voice play");         e.printStackTrace();     } }     } 

1 Answers

Answers 1

I'm the developer of the Android application utter! That isn't a shameless plug, it's to demonstrate that I use the design pattern you are considering and I've 'been through' what has prompted your question.

It's fresh in my mind, as I've spent the last year rewriting my code and had to give great consideration to the surrounding issue.

  • Can we somehow reduce or eliminate the initialization delay of Google TTS Engine, programmatically or otherwise?

I asked a similar question some time ago and initialising the Text to Speech object on a background thread where it is not competing with other tasks, can reduce the delay slightly (as I see you are already doing in your posted code).

You can also make sure that the request to speak is not being delayed further by selecting an embedded voice, rather than one dependent on a network:

In API 21+ check out the options on the Voice class. Particularly getFeatures() where you can examine the latency and requirement for a network.

In API <21 - Set the KEY_FEATURE_NETWORK_SYNTHESIS to false inside your parameters.

Regardless of the above, the Google TTS Engine has the longest initialisation time of any of the engines I've tested (all of them I think). I believe this is simply because they are using all available resources on the device to deliver the highest quality voice they can.

From my own personal testing, this delay is directly proportional to the hardware of the device. The more RAM and performant the processor, the less the initialisation time. The same came be said for the current state of the device - I think you'll find that after a reboot, where there is free memory and Android will not need to kill other processes, the initialisation time will be reduced.

In summary, other than the above mentioned, no, you cannot reduce the initialisation time.

  • Is there any way through which I can keep the TextToSpeech object alive and initialized at all times like say, through a service? Or would this be a bad, resource-consuming design?

  • Also is using a static TextToSpeech object the right way to go, for my requirements?

As you've noted, a way to avoid the initialisation time, is to remain bound to the engine. But, there are further problems that you may wish to consider before doing this.

If the device is in a state where it needs to free up resources, which is the same state that causes an extended initialisation delay, Android is well within its rights to garbage collect this binding. If you hold this binding in a background service, the service can be killed, putting you back to square one.

Additionally, if you remain bound to the engine, your users will see the collective memory usage in the Android running application settings. For the many, many users who incorrectly consider (dormant) memory usage directly proportional to battery drain, from my experience, this will cause uninstalls and poor app ratings.

At the time of writing, Google TTS is bound to my app at a cost of 70mb.

If you still want to proceed on this basis, you can attempt to get Android to prioritise your process and kill it last - You'd do this by using a Foreground Service. This opens another can of worms though, which I won't go into.

Effectively, binding to the engine in a service and checking that service is running when you want the engine to speak, is a 'singleton pattern'. Making the engine static within this service would serve no purpose that I can think of.

Finally, to share my experience as to how I've dealt with the above.

I have 'Google is slow to initialise' at the top of my 'known bugs' and 'FAQ' in the application.

I monitor the time it takes for the engine to call onInit. If it's taking too long, I raise a notification to the user and direct them to the FAQ, where they are gently advised to try another TTS engine.

I run a background timer, that releases the engine after a period of inactivity. This amount of time is configurable by the user and comes with initialisation delay warnings...

I know the above doesn't solve your problems, but perhaps my suggestions will pacify your users, which is a distant second to solving the problem, but hey...

I've no doubt Google will gradually increase the initialisation performance - Four years ago, I was having this problem with IVONA, who eventually did a good job on their initialisation time.

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment