Hi all,

It’s been a few weeks since I posted an update (more on why in a mo), but I wanted to write this partly to give me a little perspective, so we can improve our emergency processes in the future; but mostly to say sorry.

Even though we technically had no warning of the Gmail change, I still feel an immense tightening of the chest imagining everyone sat there wondering why ActiveInbox isn’t loading, and the frustration that causes.

And perhaps more so that after 10 years, I still haven’t anticipated and safe-guarded against every conceivable way that Gmail can break ActiveInbox. Hopefully this post mortem will take us another step closer.

What Caused The Outage?

A small pleasantry in one of our minor features – the ability to show your name against your calendars when picking a due date – relied on a piece of data buried deep in Gmail.

Today, Gmail did a small update that broke that request for your name, which set of a series of escalating events that tripped up ActiveInbox and stopped it loading.

How Was It Detected

Around 10am, we started getting the first notifications that two people couldn’t load ActiveInbox. Past experiences mean we react very quickly to these types of issues, as it’s often a “canary in the mine” suggestion that Gmail is changing, and we have a limited window before many people are affected.

As a consequence, by 11am I was doing a screenshare with Dale in the UK (one of our oldest customers), who very kindly let me run my diagnostics tools on his Gmail to find the problem.

How Quickly Was Everyone Informed

Lisa tweeted while I was talking to Dale, that we were aware of the problem; and began responding to everyone who emailed in. The Get Satisfaction post that Dale had started became our official channel around 2pm.

How Long Did The Fix Take?

As a team we stopped everything to tackle this, and the actual fix took about 2 hours, and was published to Chrome Web Store as soon as we were done.

However, frustratingly, in recent months Chrome has slowed down our release of updates from 30 minutes to 24 hours. This has been the biggest toll on our responsiveness.

Is there a workaround in the meantime?

Joeri Cohen found that by going back to the old Gmail it would work (because it didn’t use include the damaging Gmail change). Very kindly, that info was shared on the forum thread – thank you Joeri!

A more basic solution was to access your Gmail tasks via labels, because that’s how ActiveInbox works (it tries to store as much data as it can entirely within Gmail).

E.g. for your Low Priority items, look for the label “!Low Priority”. Or for items due today, look for the label ZD/20180710 (10th July 2018).

How Could We Handle It Better In The Future

We’ve had time to reflect on how we could improve to reduce the chance of this happening in the future. (As engineers, we never say never – but we want an extremely high likelihood of perfect running).

In terms of raw development speed, I don’t think we could have actually fixed it any faster than we did, and I’m immensely grateful to Dale.

The bottleneck at present is in getting updates distributed. To reduce this, we’re going to try to minimise the causes of our emergency responses:

  • We’re going to adopt a new technology that will make Gmail UI changes less likely to impact us (InboxSDK, for those wondering).
  • We’re going to refine our coding process so that in team reviews, we look for and isolate any piece of code dependent upon Gmail data. So that if Gmail changes, the breakage won’t bring down the entire app. (As happened today).

Anything Else?

You may also notice I’ve been a little quiet for the last 5 weeks. It’s because, after the major Gmail change of a few months ago, we’re still dealing with the aftershocks, and I’ve had to go back to coding to help out the rest of the team.

The good news is, as a consequence of what we’ve been working on, another major improvement to the ActiveInbox code is about to begin testing. It will include:

  • Faster loading, with a much more sophisticated cache system.
  • More robust, even handling periods offline.
  • The restoration of the ability to add or update tasks, and notes, while you’re composing an email.
  • A more robust approach to diagnosing why any problems occur. We’ll now be able to ‘replay’ any issues much more easily, so that when things do go wrong – as they sometimes must – we can fix things much faster.

Categorised in:

This post was written by AndyM

  • Braydan

    Can you guys give us a link to the unapproved Chrome Extension so we can update it manually in the mean time? I have my entire team in a fuss because they’re beloved Active Inbox isn’t working.

  • Leslie Stompor

    Thanks for sharing all this info — very transparent of you! And I’ve learned something, too — how to get to my Tasks when ActiveInbox isn’t there!
    Carry on…

  • Benjamin Lilley

    Thanks for the note – and the transparency. You don’t realize how much you love it till you don’t have it!

  • Rob Payne

    I, too, thank you for the transparency. I rely heavily on ActiveInbox. It may help others to know: I was able to limp along yesterday by using the mobile app. Thanks for creating such a great product and for continuing to make it even better.

  • Thank you for sharing your experience. I think an even higher level solution for customer satisfaction would be a defined communication process. For example, a simple email alert with updates/workarounds when it’s down. This would have been really useful to know. Or creating a status.activeinboxhq.com page with instructions on workarounds or a list of known issues. You could have it so that if Activeinbox can’t load, it directs you to the status.activeinboxhq.com page.

  • Dave Plummer

    nice work!

  • Thanks for the update and kudos for the quick response yesterday by our team as well as the follow up email this morning. As mentioned, AI is an essential tool for us and greatly missed when it suddenly disappears!

  • Thanks for having it back as quick as you could and for all the work.

  • Thanks very much for the transparency and the little peak into your processes! You have a fan in me! 🙂

  • John Bowen

    This type of open communication is what I have always loved about Andy and his team. Thanks!

  • Deirdre

    I appreciate how quickly you guys fix any – very rare – errors. So much so that when i saw the notice, i just shrugged, because I knew that it would be fixed soon and I also knew that in the meantime i could find my tasks in folders. OK it’s not so crucial for me because i’m not such a heavy user as i used to be but even if i had been i am sure i could have coped. Keep doing what you’re doing!

  • Kudos to you for sharing this with your users. As a customer experience professional, this is the kind of apology I recommend to clients: sincere, apology, explanation about how you will deal with it going forward. Keep up the good (and often frustrating) work!

  • snake

    Don’t Google give vendors like you advanced access to updates and new releases so you can test for issues?