Google Cloud Messaging is Extremely Unreliable for Push Notifications

Dec 4, 2014

When Google Cloud Messaging first came out, I started using it immediately. Everything looked great: unlimited quotas allow developers to send as many push notifications as they needed, the service is completely free and best of all, it's backed by Google. I implemented the code in my Android apps and server-side backend scripts, and I was good to go.

I published one of my GCM-dependent apps to Google Play. In the first few weeks, everything was fine. I had a few hundred downloads, and no complaints regarding push notifications. But after my app started onboarding more and more users, I started receiving numerous complaints that were related to GCM.

SERVICE_NOT_AVAILABLE

The first, and possibly most common complaint was caused by an exception, SERVICE_NOT_AVAILABLE thrown by GoogleCloudMessaging.register(), the function call that registers the device for push notifications and returns a registration token, a unique token assigned to the device by Google, which is used to send it push notifications. Upon research, I found numerous bug reports and cries for help by other Android developers, all asking how to fix this dreaded exception.

The following solutions were suggested:

  1. SERVICE_NOT_AVAILABLE might mean that the user's device can't read the response to the registration request or a 500/503 error code was returned from the server (source). Developers have no way to fix this error because it is on Google's end, so we can blindly suggest that the user should try again in a few hours.
  2. SERVICE_NOT_AVAILABLE may occur on some devices even though the registration succeeded (source). This can be fixed by implementing a workaround broadcast receiver to catch the token when the call fails. I implemented this workaround and it may have fixed the problem for some users, but still I received many other SERVICE_NOT_AVAILABLE complaints.
  3. SERVICE_NOT_AVAILABLE may occur because of an outdated or missing Google Play Services library on the device (source). In this case, the app could theoretically notify the user to update Google Play Services by opening the respective Google Play app listing. However, the app cannot know that this is the reason for the exception, so it cannot blindly redirect the user to the Google Play Services app page on Google Play, because SERVICE_NOT_AVAILABLE is thrown in several situations.
  4. SERVICE_NOT_AVAILABLE may occur when the device's clock is not synchronized with the network (source). Again, developers have no way of knowing that this is the exact problem, so we can blindly suggest to the user to check their system clock synchronization, hoping they are one of the very few whose clocks are not synchronized.
  5. SERVICE_NOT_AVAILABLE may occur when a rooted user has deleted the Hangouts/GTalk app from their device (because they considered it bloatware). GCM is implemented and handled by Hangouts/GTalk, so it is not possible to use GCM without it.
  6. SERVICE_NOT_AVAILABLE may occur if the user is running a device that does not have Google APIs installed (such as the Amazon Kindle). Nothing to do here, these users will never receive push notifications from your app.

These issues alone were enough to get me to start looking for GCM alternatives. I'd get a 1-star review on my app every day or two, with a comment containing the error message displayed when a SERVICE_NOT_AVAILABLE was thrown. There was nothing I could do to help these users, because the majority of them were receiving it for reasons out of their (and my) control.

But this wasn't the end of my troubles with GCM. It turns out that the registration phase isn't the only problematic part of GCM. In some situations, users that were able to register for push notifictions and are fully connected to the Internet may not receive push notifications in real time, for one major reason.

Unrealistic GCM Heartbeat Interval

This is possibly the most frustrating bug in Google Cloud Messaging. GCM works by maintaining an idle socket connection from an Android device to Google's servers. This is great because it barely consumes battery power (contrary to polling), and it allows the device to be woken up instantly when a message arrives.

To make sure that the connection remains active, Android will send a heartbeat every 28 minutes on mobile connection and every 15 minutes on Wifi (source). If the heartbeat failed, the connection has been terminated, and GCM will re-establish it and attempt to deliver any pending push notifications. The higher the heartbeat interval, the less battery consumed and the less times the device has to be woken up from sleep.

However, this comes at a great price: the higher the heartbeat interval, the longer it takes to identify a broken socket connection.

Google has not tested these intervals in the real-world thoroughly enough before deploying GCM.

The problem with these intervals is caused by network routers and mobile carriers that disconnect idle socket connections after a few minutes of inactivity. Usually, this is more common with home routers, whose manufacturers decided on a maximum lifespan for idle socket connections, and terminate them to save resources (usually 5 - 10 minutes). In addition, in developing countries, in an effort to reduce network load, cellular carriers aggressively terminate idle socket connections after several minutes of inactivity.

This results in passively-terminated GCM sockets, since these network components do not usually notify the device that the GCM connection was terminated. When the time comes to deliver a GCM message, it does not reach the device. The device will only realize that the connection has been broken when it's time to send a heartbeat, 0 - 28 minutes later, rendering the push notification useless in some situations (when the message is time-critical, for example).

One Android developer went forth and developed an application called Push Notifications Fixer to try and fix the heartbeat interval bug. It works by scheduling a repeating task which sends a broadcast intent asking GCM to send a heartbeat, and it does this at a faster rate than the original heartbeat intervals, thereby keeping the socket connection alive and preventing network devices from terminating it for inactivity. But we can't start asking users to install another app to use our app. It's clunky and unprofessional. And this is just for those users who were able to get past GCM registration. What about the ones who can't even register because of SERVICE_NOT_AVAILABLE?

Popular Android Apps Don't Use GCM

I really wanted to know how the "big players" were able to overcome these issues with GCM. Well, they didn't. They worked around them, by implementing their own push notification gateways. Facebook uses MQTT (source) and WhatsApp uses XMPP (source). These are similar to GCM, and work by maintaining a long-lived socket connection. I'm betting that the heartbeat intervals they used are much lower than GCM without requiring an Android OS update.

This just goes to show that there is a fundamental problem with GCM which makes it unsuitable for today's popular apps - it is not reliable or stable enough for such large deployments. Developers are forced to seek or develop their own alternatives. Google is reluctant to address these issues (source), and even if they did, it would most likely be impossible to deploy updates to existing devices to fix GCM.

An Idle Connection Per App

The fact that every popular app is implementing their own socket connection is terrible. It means that instead of having one socket connection to Google open at all times to receive push notifications for all apps, our phones can now maintain 5 or more simultaneous idle socket connections, to WhatsApp, Facebook, GCM, Viber, and any other app that had to drop GCM because of its instability. This can greatly reduce the phone's battery life because of the constant wakeups requested by these apps to send heartbeats and reestablish terminated socket connections, every few minutes.

If push notifications are an important part of your application, you should find an alternative to GCM. It will not be reliable enough and may hinder your app's success. Hopefully, a few years from now, GCM will finally be stable enough and all apps will utilize it to save battery life. Until then, find an alternative or face your users' wrath.

Update: Check out Pushy - An Alternative to Google Cloud Messaging for a reliable, drop-in replacement for GCM.