Tonight's red post collection features Sonicdeathmonk, Riot's VP of Network operations, with a statement on the recent NA and EU server issues, Riot tmx with context on why the EU platform needed to be restarted earlier today, Riot Gradius continuing to discuss DDoS attacks, and more.
NA and EU stability updateSonicdeathmonk, Riot's VP of Network Operations, has posted up a statement regarding the recent server stability issues plaguing both NA and EU.
"We are sorry for the ongoing server issues in many parts of the world including very recent North America and Europe problems. Lots of work has been done to help improve things and more work is still being done.
We have seen significant increases in hackers running distributed denial of service attacks (DDoS) against game and platform servers. League of Legends has not been the only target, most large game companies have been feeling the pain along with other online businesses, but we have been a more frequent target than most. In the last 6 months over half of all DDoS attacks we have seen against LoL have been in the last 2 months. We are currently seeing 2 to 4 attacks every day against one or more regions. The strength of these attacks has increased significantly over the last few months as the arms race against hackers continues to grow in scale.
League of Legends is made up of dozens of components, some of these are easier to protect against DDoS than others. Typically websites are easiest to protect, while game servers are hardest (most standard forms of protection would create lag in game like League of Legends).
In the last 2 weeks world record breaking DDoS attacks have been seen by several companies including Riot. These attacks are so big that the ISPs and Telecom's that service League of Legends have not been able to keep their Internet circuits up while the attack is underway. This means that our servers often do not even see or feel the malicious traffic itself because the Internet connections feeding our data centers are being cut off due to the size of the attacks filling the very large Internet circuits the telecoms and ISPs have.
When our data centers lose the Internet, your connections to game servers and/or PVP.net services like Chat and Login get disconnected in the worst case or lots of players will see serious in-game lag for the duration of the attack. If lots of players get disconnected, you will often see a login queue to get back into PVP.net or the game.
We expect the current attacks to get better soon. We already have been able to fend off many problems before they create problems for you. Our network team has been working almost non-stop for several weeks both fighting off DDoS attacks around the world and strengthening our Internet services. We have also been working with telecoms and ISPs to help them understand how to block these attacks so that their services are not affected. Some fixes have required new equipment and ISP circuits to be ordered which can take weeks to get shipped and installed. Some telecoms and ISPs are slower than others in applying the fixes needed to ensure they do not fall over from these attacks, we depend on many ISPs so its important that all ISPs make the fix or the bad traffic finds a way to knock off our servers.
Of course this is an arms race so as one defense gets put in place, eventually we can expect new methods to launch attacks that will often require time to put a defense in place. About half of the equipment investment in League of Legends is network gear with a large part of that in place to just handle the ongoing rise in frequency and size of DDoS attacks. We are investing even more every day around the world and will continue to work hard to win the fight. We are also working closely with our equipment providers and other services to make sure we are doing everything we can to protect the many components that make up League of Legends.
Sometimes we lose the fight and that sucks because it means players like you feel the pain too. It also means that instead of our engineers building cool stuff for players we have to fight off bad stuff from hackers, so players lose again. It's frustrating, but the reality of large scale Internet services, so we have to get smarter and better to make sure players around the world can have fun when you want to have fun.
I apologize for the pain and promise that we will continue to get better. New data centers with stronger and even more Internet connections are getting close to being live for Europe and North American players, with all other regions scheduled for complete overhauls as 2014 goes on. These new data centers will help us keep up the fight in the hacker arms race, they won't solve all the problems we deal with all the time, but they should level up our side of the fight a great deal.
In the meantime we are not helpless, teams are still working long hours, expect daily improvement.
See you on the Fields of Justice,
VP of Network Operations
P.S. for the more technically curious here is some background reading on the current attacks:
When asked about an ETA on improvements to NA servers, he commented:
"Work has been underway on NA for many months. We need to get EUW on our new architecture and make sure we have it baked into our automation. Then on to NA and EUN. Should start to see better network to game servers real soon. Also we are working on improving ping time for NA East Coast players. Cool things coming."
Feb 23 EU Emergency Platform RestartTo build on the above, here is Riot tmx with more info on what happened to the EU platform today:
We had a serious platform outage today and I want to give you a quick summary of what really happened. Before I start, I want to clearly admit that this was our own platform failure – the first incident of this nature since December last year. So this time we won’t be talking about our providers but about our own infrastructure which failed for the first time in 60 days.
At approximately 15:21 GMT our teams received alerts about games not starting. We quickly escalated to System Administrators, Platform and Network engineers on call, who joined the investigation within the few next minutes. They quickly spotted platform errors on various services and inability to process internal calls, which led to EUW being not able to create new games. Additionally, some players were dropped off the platform and were unable to log back in, as there were some issues with queues as well. Those of you already in games were mostly able to complete them, however, servers at this point were not connected to the platform, so results may have been lost/not viewable after the games’ conclusions.
After 30 minutes, we realized that the platform was in bad health and needed a full restart. However, we spent some more time on network investigation to confirm whether we were stable there. Network is a key aspect to be clarified before the restart, as we have to be certain that the issue won’t immediately strike us again. After additional 20 minutes it was clear that all connections and virtual networks are up and running. We proceeded with a platform restart which consisted of two separate steps: shutting down (45 mins) and restarting (30 mins). They went smoothly, and we proceeded with a full game QA loop to make sure that all services are properly brought up.
Overall EUW was unplayable for 3 hours today. Our next step is to further investigate the root cause of the incident and take additional steps to make sure it doesn’t happen again. I apologize about the weekend incident and hope that you were not massively affected.
I personally also want to thank our Community and NOC teams which led the forums and Twitter messaging in all languages during this outage. It was timely, quick and visible and I think we’re making progress here."
More Context on DDoS attacksFollowing up on his comments from the other day, Riot Gradius returned to the forums to elaborate on why these DDoS attacks have been so hard to combat:
"I can't speak for the stability/providers of other companies, but I can try and shed some light on why these particular DDOSes are challenging to protect against.
Prior to NTP/DNS DRDoSes (http://en.wikipedia.org/wiki/Distrib...Spoofed_attack), it was extremely hard to build up this kind of traffic flow for a DDos attack. You had to have a fairly formidable botnet, or a large group of followers willing to spin up a client based DoSer (LOIC, HOIC, etc). This is why groups like Anonymous and 4chan were very successful. They usually had enough users to create attacks that were larger than most botnet based DoSes.
With NTP reflection DDoSes, it's MUCH easier to create huge volumes of garbage traffic. Enough to take down ISPs and large CND/DOS protection services like Cloudflare (http://www.informationweek.com/secur...d/d-id/1113787). The only real work you have to do is collect the IP addresses of enough vulnerable NTP servers.He continued, replying to someone suggesting the solution was as simpole as "blocking all 123 trafic and allowing only specific servers":
This is why we're taking multiple approaches to the situation. We're working with our upstream providers to get proper access controls on THEIR equipment, so that it doesn't take down the whole network below them. We're also working with different groups in the security industry to help share knowledge of the core of the problem; by dealing with the broken NTP servers."
"Here in lies the problem. If the volume of traffic that the attacker is sending in is greater than the entire bandwidth of the provider, nothing can come in or out of that network. Blocking port 123 is a great approach, but it's not something a high tier provider can do across the board (at least, I have a feeling they'd have trouble justifying it). Many people and companies use NTP. If high tier providers blocked it without first evaluating the effects of it on ALL their customers, it could cause serious problems. I wouldn't doubt that some very important things utilize NTP. (Read: hospitals, emergency response systems, etc)"
When asked how his week as been being a security guy on the receiving end of a record breaking attack, RiotGradius commented:
"Well, it's been quite a week to say the least. We've actually had a lot of great discussion at Riot as well. A bunch of different teams pretty much holed up in a war room all last week to keep a close watch on the situation as well as keep cracking away at different solutions. Meetings on top of meetings as well as lots of time spent on networking equipment command lines.
There is one thing I would REALLY love to stress though; all last week one of the guys in the war room was stressing:
"Turn on LCS or a stream. (we have TVs in some of the meeting rooms). We might not be able to make the time to play LoL, but we can't forget why we're doing what we're doing. We love this game, and we love our players who love this game. They're the reason we're here in this room together all day. ""
As sonicdeathmonk commented on above, Riot Gradius also fielded a question on East Coast players ping:
"We're actually working at solutions to the latency issues for players that are farther away from our server datacenters as we speak. Lots of the same work we're doing for the Europe environments, where we're working with multiple providers to allow us better control of where our traffic goes, which then allows us to find the quickest path from your network to our servers :) (very similar to Riot Direct found here:http://forums.euw.leagueoflegends.co...light=riot+tmx)"
Champion Sound Videos.When asked if it was possible to get a video showing how Riot created the sounds for Mad Scientist Ziggs, Riot Eno commented:
We don't have any vids of recording Mad Scientist Ziggs' sounds, but I'll ask BelligerentSwan if he can give you some insight into how those sounds were created.
Also, we really want to start posting more vids of us recording sounds for new Champs, etc; it's good to know you guys are interested in that stuff.It's been a while but an example of this would be the video Riot Eno shared of the team creating sound effects for Tibbers!