
Upon successfully identifying this root cause, we were able to resolve the issue through performance tuning, re-configuration, and scaling back of some load. "Recovery took longer than any of us would have liked. The reason it took the team multiple days to get Roblox back up and running came down to the "difficulty in diagnosing the actual bug," Baszucki said. The result was that most services at Roblox were unable to effectively communicate and deploy." Rather the failure was caused by the growth in the number of servers in our datacenters. "This was not due to any peak in external traffic or any particular experience. A core system in our infrastructure became overwhelmed, prompted by a subtle bug in our backend service communications while under heavy load," Baszucki said. "This was an especially difficult outage in that it involved a combination of several factors. In short, the server issues were caused by a "growth in the number of servers" for Roblox's datacenters, caused by a "subtle bug" in the game's backend.

Baszucki said it was at this time that its teams began "working around the clock" to find the source of the issue and get the game back up and running.īut this was not easy, Baszucki said. Issues with Roblox started to appear on Thursday afternoon of October 28, at which time players reported having issues logging in. founder and CEO David Baszucki has now published a blog post that explains what went wrong.


Roblox is one of the most popular games in the world, and it was offline for more than 60 hours over the Halloween weekend as part of a massive outage.
