What does it take to keep gaming's most popular online service not just alive but thriving? An inhuman amount of work and a staggering level of planning. We make an exclusive pilgrimage to the never-before-photographed home of online gaming's guardian angels: the Xbox Live Operations Centre.
If you imagine Xbox Live as a living, breathing organism, the Xbox Live Operations Centre is the nervous system, the first line of defence in maintaining an immense global network that serves millions of users daily. Every technical hiccup, no matter how isolated or insignificant, is instantly identified and attacked with unflinching efficiency by a team of first-responders, the full might of the entire Xbox team just an email away.
Yet, the command centre itself - abbreviated as the XOC - projects a modesty that belies its grand purpose. Behind an unmarked door in an anonymous building on Microsoft's sprawling Redmond campus, a single windowless room houses the entire operation. The floor is terraced, affording every shift member a clear view of the various screens mounted at the front of the room, just in case the three monitors that populate each workstation prove somehow insufficient. The room is manned 24 hours a day, 365 days a year, and analogue clocks display current times in various parts of the world since, hey, it's always prime time somewhere.
And looming ominously above it all is a massive red digital timer. "Any time there's a major incident going on that's either user-impacting or we think it might be user-impacting, we start that clock," warns Microsoft Gaming Ninja (seriously, that's actually his title) Eric Neustadter. "We use that to make sure we're doing the right things at each point in our process. We have playbooks for every kind of major incident, which tell us what to do at each point in the process. There's actually someone in the room during an incident whose job is incident commander. They don't troubleshoot. Their job is to watch the clock and to follow the process and make sure we don't skip anything."
As if on cue, the clock suddenly starts ticking. Neustadter laughs; apparently this isn't a rare occurrence. "What started that clock could have been one of our monitoring systems alerting - the equivalent of a big red light," he explains. "We've got alerts on essentially anything a customer could do where it's going to tell us if it runs into an error. So when a console tries to do something and fails, it lets us know instantly."
Of course, given the breadth of the XOC's monitoring duties - which include not only the public face of Xbox Live but behind-the-scenes developer systems as well - there are a variety of other possibilities. "It could have been a report from @XboxSupport or someone else on the Xbox team saying that, 'Hey, something happened. Please go look into it.'" Neustadter continues. "It also could be an external event. It could be one of our partners' data centres just had a complete outage and now users can't play game 'blank.' But our job is to find out, 'Is it ours? Is it external?' Manage the communications, make sure we tell xbox.com, customer support, all those different customer touchpoints what's going on, and make sure that all the right people are engaged."
While the team spends plenty of time working to resolve known issues, just as often, they have to go looking for trouble. "That's what the majority of these people are doing all the time," says Neustadter. "They're watching for any interruptions in the service. Anytime we see anything unusual, we have to chase it down until we understand it, whether it's an actual problem or not, whether it's our problem or someone else's. Until you understand it, you can't say that nothing's wrong. You can guess, but you'll be wrong often enough that we just chase them all down."