Google services experienced an outage on Monday, Dec 14th, and on Dec the 15th and the 16th Gmail experienced disruption. This massive outage led to several reactions around the world from a plethora of IT professionals. It looks like that from the ‘mailops admins’ perspective this was no news as they have to deal with deliverability interruptions very often. The blogpost below was originally published at the Lightmeter website under the title “Googledown happens every day for mailops admins“, describes how the internet reacted to one of the most commonly used tools being unaccessible. Lightmeter brings peace and transparency to mailops management in a portable Open Source package. Real-time monitoring of mailserver problems produce early warnings and recommendations, reducing response time and disruption.
Warning: memes ahead
So on Monday Google services went down, and the world burned much productivity was disrupted.
As well as YouTube, Google search, Ads, Stadia, and Docs, GMail (the world’s most popular email host) went down too. Unlike the other services, however, Gmail remained disrupted days later, with millions of sent messages having been permanently lost. Hacker News took note.
Email is resilient by design, with baked-in capabilities for handling networks and mailservers which are overloaded or unavailable. As the world’s largest peer-to-peer network, mail servers have to be fault-tolerant, and Gmail’s problems would normally have resulted in temporary delays to mail delivery, lasting only until Gmail came back online.
Instead however Gmail has gone haywire, sending out false signals, instead of sending nothing at all.
If you messaged your own Gmail account every five minutes from a non-Gmail address like Protonmail, you would have seen confusing results. Google has been alternating between successful deliveries, then permanently refusing mail sent to Gmail with hard bounce messages like:
550-5.1.1 The email account that you tried to reach does not exist
These hard bounce messages tell other mailservers that messages sent to this address can never arrive, and therefore the message should be dropped as all future delivery attempts will be futile. These bounces tell bulk mailing systems like app registration notifications and newsletter mailing systems that the recipient address is invalid and should be permanently removed from mailing lists.
The effect is chaotic and long lasting:
- All mail sent to affected Gmail accounts has been unnecessarily lost, and Gmail has generated millions of false and unnecessary bounce messages
- All automatically blocked email addresses on mail automation platforms like SendGrid and phpList will, by default, never receive messages from those senders again
The curious thing about this major but no doubt unintentional SNAFU on Google’s part is that it stands out in the context of a decade of deliberately false messages of a very similar type.
Federico Leva was the first of several Lightmeter users who messaged us on Monday and Tuesday to report that Google were up to their old tricks on a grand scale. The difference this week was that the behaviour was unintentional.
Gmail is known to habitually misuse technical messages between mail servers, inaccurately reporting the fate of messages which Gmail receives.
SMTPs most famous status response code “250 OK” is designed to communicate successful delivery; no worries, message received successfully. Not with Google. Gmail can report success but actually delete it before it reaches the user, or hide it from view in Junk. This status is sometimes called “dead air”.
Other times Gmail honestly reports that a message delivery attempt failed, but confuse other mailservers with inaccurate explanations of that outcome. Much like in the case of this week’s erroneous 550-5.1.1
errors, Gmail intentionally responds with error codes claiming temporary technical faults, instead of reality: the message has been permanently rejected.
Such behaviour is presumed to be part of a strategy to waste the resources of third party mailservers from whom Google does not wish to receive mail. The effect is to decrease the reliability and efficiency of email as a whole, and to prioritise email from Gmail users above the millions of independent mail operators which constitute the world’s largest communication network. Worse, other large operators could follow Google’s example, and indeed some like Microsoft already have. This trend risks a breakdown of said distributed network as as a whole, as it gets harder and harder for independent mail networks to interact with dominant ones who are exploiting their power.
Lightmeter’s mission is to strengthen the foundations of digital society by making mailtech easy and convenient, and keep communications free. That’s why we have built Lightmeter Control Center to detect problems like Gmail’s errors this week, and are working towards automating solutions for such problems, thereby lowering barriers for indie networks, and growing the diversity and robustness of email overall.
With the problem resolved, Gmail users will be back in the Inboxes, happy that the disruption is over. But the mailops administrators who deal with deliberately false Gmail message reporting are back where they were before the events and the memes of #GoogleDown: struggling to interoperate with the largest host in the network, who’s committed to using their power to break the network’s rules.
Author: Sam Tuke – Founder & CEO