Red Post Collection: Lyte & WookieCookie talk chat restrictions, Phreak on 4.9 Jungle changes, and more!

Posted on at 6:58 AM by Moobeat
[ ~10 PM PST Update: Added in more on chat restrictions, player behavior, and Nidalee! ] 

This morning's red post collection is short and sweet, featuring Lyte and WookieCookie discussion the recent chat restrictions and other player behavior topics. You'll also find Phreak chatting about the jungle changes mentioned in the 4.9 live gameplay patch forecast and RiotRepertoir with more to say on the upcoming Nidalee gameplay update. 
Continue reading for more information!

[ Continued ]  Player Behavior and Chat Restrictions discussion

Following up on his discussion from a few days, Lyte returned to the forums to talk more about the recent chat restrictions, the Tribunal, and player behavior. 

When criticized about their methods of research, Lyte replied: 
"You know, it's a bit unfair to call out researchers at Riot in this fashion. Most players haven't worked with data of this scale, nor the problems that come up when dealing with online player behavior; in fact, most scientists haven't worked in this space before either.
For example, you talk about sample sizes, p-values and hypotheses. You really think revealing those values are the problem and if we did, we'd convince players that we conduct strong, robust science here? Do you understand why calculating p-values might not be the best approach when your sample sizes are over 1 trillion? You get a "significant" effect with almost every pair-wise comparison. That's meaningless. So how do you analyze an experiment with millions of interactions? What kinds of multiple-comparison techniques are appropriate? Do we need to develop new ones because this scale of data has never been analyzed before? Are effect sizes of 1% meaningful when you have 1 billion samples of data? Maybe, maybe not. If you knew that 1% of games were worse because of a feature or experiment, and look at the millions of games played every hour... that's a pretty big practical impact isn't it? These are tough questions, and there are rarely easy or straight-forward answers. The researchers at Riot are some of the best in their respective fields whether it's neuroscience, aeronautics, bioinformatics or economics. They've published papers in Nature and the top journals in their fields. They've worked with former Nobel Prize laureates and their proteges. We still spend some time reaching out to top institutions like Harvard, MIT, Stanford, York, USC and more to collaborate on studies, and review papers for journals.
We ask researchers at Riot to have even higher standards than academia. Why? Because we value players and their experiences. In academics, some scientists are OK with 95% confidence intervals or 99% confidence intervals--do you laugh at their suggestion that they are 99% confident about a result? Yet here, whether we choose 95% or 99% confidence intervals is a big deal. One mistake here could impact a player forever if they lose Ranked Rewards or lose their accounts and we treat that responsibility extremely seriously. There was one time where an analysis could have provided a negative experience for 6 players. Just 6. However, 6 is 6 too much and we went back into the analysis to clean it up. Do we make mistakes? Of course. Mistakes are always going to happen. But, to give researchers at Riot **** for not being "scientists" and not strongly adhering to scientific standards is bull****
Is the problem really the science? Or, is it that some players are angry they got a punishment, and maybe deserved it? A negative player can disagree with what we believe is "OK" in League of Legends, but it's never been just Riot's opinion of what's OK or not OK--it's been the community's decision and we back the community 100%. A negative player can disagree with the community's subjective perspective on what's OK or not OK, but that doesn't make the science bad. If the problem is actually the science, let's talk about it."
He continued to drop the science bomb, saying:
"We have to be very careful using standard approaches to statistics when dealing with data of this scale. Are you familiar with the latest trends in functional neuroimaging? There have been recent studies on the analyses used in the field of functional neuroimaging and how simply calculating multiple comparisons or p-values is extremely faulty. When you have a sample size of a trillion, if you just calculate a p-value for a random pairwise comparison, it would be bad science because the sample size just kills the equation. Scientists used to joke that given a large enough sample size, every comparison results in a significant p-value. Another example would be studies on pre-cognition--something we maybe all agree is silly; however, with a large enough sample size, some scientists have suggested a significant difference and that pre-cognition does exist. This is an easy example of scientists clearly mis-using basic statistics. There are numerous papers in the past 5 years about how just calculating p-value is very bad for sample sizes over 10,000 (for example : . Like other scientists, we've generally agreed that it becomes more about effect sizes and other metrics than just p-value.
I'm kind of surprised we're still worried about p-values, given the problem space. When we first started working in this problem space, we obviously started with the basics. P-values, t-tests, chi-squared tests, Bonferoni corrections, multiple-comparisons corrections, all kinds of ANOVAs... but it always came back down to, "Were these the right choices given these problems and data of this scale?"

I don't know your background in statistics or the sciences in general so apologies if I oversimplify. Think about this: we actually have the population statistics for many of our variables. A traditional p-value is used to reject (or fail to reject) a hypothesis (and can't "prove" a hypothesis as true). Some people talk about p-values as the probability of data replication, or the probability that you would have observed the results you recorded (and we could talk all day about which is philosophically correct when talking about p-values). However, what happens when we actually have data for every single person in the ecosystem? When you ask a question in a lab, "Is Group A different than Group B," you calculate the odds that they might be truly different because you are randomly sampling from two populations. However, when you manage the entire population in the whole ecosystem, you can just say "Yes, they are different" because you have every single data point from the two populations.

So you have to ask yourself: what if I collected data on Variable A, and every single time I get exactly the same output because I have data from every single person in the ecosystem? What's the variance in these cases? How does that change how we do statistics? What do you do when you don't need to random sample and actually have data on the entire population?"

As for publishing these results, Lyte commented:
Do players want us to focus on publishing papers, or making new features? If the resource of time wasn't a concern, then of course we'd just do everything. To add to this, we could publish papers and we'd still have to deal with skeptics--scientists deal with that every single day. 
We've already revealed more data and methodology than most studios--check out GDC talks and talks at MIT, Harvard, York, USC, etc. Players have long been able to download and run their own analyses on Tribunal data too (when it was up)--not surprisingly, they often had the same results we talked about regarding its accuracy.

When asked about what sort of education is required to work as a data analyst, Lyte noted:
"We have a few analysts who have a Bacelors in a related field, but most have a Masters or PhD. It's less about the degree and more about the skillsets and critical problem solving--if you can solve some of the questions we face every day and have no degree.. we'd still hire you :)

Over the next few years, I think you'll see an increasing demand for data analysts and player behavior experts (whether it's game design, analytics, communications, etc) in the games industry. At Riot, we already actively seek people with these skillsets and have at least 3 different teams (Player Behavior, Business Intelligence, Research) that have people with similar backgrounds to you. I don't think the demand will ever decrease at Riot, so work hard and maybe we'll see you in a few years!"

When asked why players haven't been getting any information on what specifically earned them chat restrictions, Lyte noted:
"Completely agree that players currently are not getting the "Reform Card" equivalent to see what exactly they said to earn a restrictive chat game; however, this is because we're currently running temporary experiments versus launching a permanent feature. 
In the old days, we used game bans often. We had data before Reform Cards, after Reform Cards, before Justice Reviews, after Justice Reviews, etc. However, we didn't really have as much data about restricted chat bans alone. How effective are restrictive chat bans without a reform card of sorts? How effective are they for handling issues that may not be related to communication? These are the types of questions that (hopefully!) are answered in these experiments so we know what features to incorporate into the new Tribunal."
Lyte also commented on a few of the problems the current iteration of the Tribunal faces upcoming Tribunal improvements:
"I could probably write entire blogs about this topic, and maybe we'll touch on some in the future. Basically, there were multiple factors related to what you were seeing: 
1) The Tribunal was conservative in many cases, so people who may have deserved punishment earlier were often playing games far longer than they should until Tribunal gathered more evidence. 
2) The Tribunal didn't benefit from a lot of the research we developed while we were building Team Builder. One consequence is that a Tribunal case takes a lot longer to 'close' then it should. For example, let's say a Tribunal case typically requires 1000 votes. We now have a lot of data that we can look at patterns of votes to determine when a case should be closed early with really high accuracy, so we could close cases in as little as 20 votes and expedite the penalty (or reward, in the future). 
3) The Tribunal didn't upgrade its tech overtime and wasn't very scale-able. This is why players often saw the system go up and down, trying to build cases and close cases as fast as it could but the scale League operates at is just unprecedented.
As for players being able to still report despite the Tribunal being down, he noted:
"Players can still report when Tribunal is down. Report System and Tribunal System are separate things. But secondly, if players were reported a long time ago, they could still get penalized now. All the penalties from before were just queued up while Tribunal was having issues, so some players might have been getting penalties they earned a few weeks or months ago but Tribunal just queued it up while it was having issues."

WookieCookie, a player support lead and member of the player behavior team, also weighed in on a set of temporary bans that went out on a small subsection of chat redistricted users, replying directly to a summoner who felt that had been unjustly punished:
"On Friday we decided to review the data we collected after placing chat restrictions on accounts with high levels of recorded toxicity. We were pretty pleased with the results, a large majority of players actually showed signs of less harassment and toxic behavior. 
Unfortunately a small % of accounts actually increased in recorded levels of poor behavior. For these players we decided to place a 3 Day Suspension on their account.
As a lot of you are probably aware, the Tribunal is currently in extended recess while the Player Behavior and Justice dev team works on some upgraded features for it. 
During this time we're not going to sit idle while some players try to exploit others in game. Even without the Tribunal we have numerous tools at our disposal to find and take action on high offenders. 
In this particular case, we sent e-mails to those affected. I am bee bee sea you might want to check that the e-mail on your account is up to date, as this is the primary way you'll receive messages on the status of your account from us.
To help you out here, I decided to take another look at your account. In your case, during the period of chat restrictions your account continued to receive reports in over 40% of the games you played.

During your chat restricted period you had a habit of passively aggresively treating your team mates poorly. Either by feeding the enemy, rambo'ing on your own, or just waiting around not contributing until your team surrendered. Even with chat restrictions you did manage to say quite a few terrible things that I wouldn't repeat here. But I'm pretty sure that Lee Sin jungle in one of your games didn't appreciate being called that terrible racial slur just because he wasn't performing well. 
Everyone has bad games, it's how we deal with them that we excel and grow as players at League of Legends. Calling someone names doesn't help them get better and it certainly doesn't help you win."
He continued:
"The observed behavior of those which were banned was that they used what little chat they had in game to harass and berate others. In other cases they decided to feed or play against their own team in order to "prove a point". 
But what I find most interesting is that of the players we chat restricted last week (and there were a lot!)we only had to place manual suspensions on less than .05% of the players. By and large, the vast majority of players had no problem adjusting their behavior in game with limited chat."
He continued, explaining the system for these punishments is now automatic:

"The system is not automated at all currently. This wave of bans, and the wave of chat restrictions prior have been the result of close collaboration between a number of teams including Player Behavior and Justice, Business Intelligence, Player Support, and our regional offices. Human interaction has been a core component of each step the entire way through."

[ Update ]

Lyte also took the time to clear up a few things regarding chat restrictions:
"This is a great opportunity to clarify some points about chat restrictions. 
We don't believe that just having a mute option means that players can be verbally abusive. In many cases, if you have to mute someone, then the damage has already been done. What's more important is the ability to prevent these experiences. If somebody randomly punches you in the face, you aren't happy that you can stop them from punching you further--the damage is already done and you wish you could have avoided that punch to begin with. 
Before I explain chat restrictions a bit more, no, smurfs cannot simply make new accounts to avoid chat restrictions--you have to play through them or you will never remove them off your main account. 
Secondly, there's a common misunderstanding that restricted chat hinders a player's ability to communicate effectively with the team. When we tested the design of restricted chat, we specifically collaborated with numerous players that were neutral to positive standing in the community and analyzed their communication patterns. How often did they communicate? At what pace? How often did they use short versus long phrases? We eventually came to the current design and most neutral to positive players have no trouble with the available chat resources in restricted chat mode if they utilize their chat with smart pings. To be fair, we do recognize that Junglers have a bit more difficulty and may need more chat resources than currently provided. I think you have to remember who is being placed into restricted chat--players who normally use their chat resources for verbal abuse. When given freedom of chat, these are the players that opted to use chat to freely verbally abuse others--how much of their chat was really used for teamwork to begin with? 
Finally, chat restrictions force players to make a conscious choice--should I use my chat resources for verbal abuse and yelling at my teammates, or should I use them to communicate strategy with my team to help win games? Most players actually come out of chat restrictions with much improved communication patterns and never end up chat restricted again. However, chat restrictions don't improve everyone's behaviors.

On your last point about feedback, we do see these improvements even without providing direct feedback to players on what they did wrong, although we intend on giving more feedback in the future to show players exactly where the problems might be."

When asked if pre-game and post-game are visible spaces on reports, Lyte clarified:
"Yes, pre-game and post-game are reportable. Everything is recorded, and some of our experiments specifically tackle these spaces."

Phreak on Upcoming Jungle Changes

Phreak punned his way over to the forums to chat a bit about the jungle experience changes mentioned in the 4.9 live gameplay patch forecast.

First, here's a refresher on what was said in the 4.9 live gameplay patch forecast:
"7) Jungle XP 
We have bigger plans for the jungle, but currently we're seeing problems where junglers are getting really far ahead of the game, especially if they have a strong start. We think it might have to do with lane EXP versus monster EXP, as lane minion EXP doesn't scale with average champion level, but jungle monster EXP does. Basically a snowballing jungler gets his teammates some kills, they get levels and bring up the average level of the game, and then the jungler gets even more experience as he clears camps for additional rewards. No other lanes get this kind of bonus, and we've always been aware of strong junglers controlling the pace of the game, so we're looking for ways to tone that down."

Here's Phreak's comments  on why the jungle has been so difficult to get right over the years:
"The goal has always been to have a mix. In the first few seasons, it was basically impossible to carry from the jungle because there wasn't any gold in it. We upped the gold, a little too much, and the junglers who carried became massively overpowered, so we have to tune it back down. We're happy that guys like Wukong and Kha'Zix can jungle now, but we want Nautilus to be able to be picked, too. 
As an example, saying, "Yeah, you know we'd really like Rumble to get played" then giving him 5,000 damage on Flamespitter, nerfing it next patch, and players going, "WTF I thought you wanted to buff Rumble, Riot!" And the answer is yes we do, but we don't want it broken in the other direction."
When asked why these things weren't properly sorted during the preseason, Phreak commented:
"Quite simply, it's because this game is complex. 
We do everything we can in internal testing, but a few dozen bros compared to millions of players playing millions of games is just massively different. 
Why didn't people realize Twitch was really good all along? Why did it take four months for mid lane Lulu to catch on? Why did people think Zed was underpowered on release? 
People don't figure stuff out right away. Us included."

[ Continued ] Upcoming Nidalee Gameplay Update

Following his MASSIVE discussion from earlier this week, Riot Repertoir has returned to talk more about Nidalee's upcoming gameplay update.

[ Be sure to read the original and continued discussion on Nidalee before jumping in here! ]

When asked about the "new" Nidalee's viability as a jungler and the base damages of her Javelin Toss, Repertoir explained:
"I'm sure some people will be able to be successful jungling her following the changes, but I don't imagine it will become a mainstream thing. 
As for Javelin Toss, it will probably just see straight up nerfs. It's important to the spell's identity that the spell varies quite meaningfully in damage from point blank to max range, so the more that is true, the better players will feel about those long range snipes. The problem with increasing the base damage on Javelin Toss is that it deals immense damage at level 9 when maxed first."

When asked about making her new passive applying hunt when a trap or spear brings an enemy below 50% hp, he shared an example he tested out:
"Hi Rinzan, 
I tried a Hunt version that applied the debuff only when it hit targets below 50% or took them below 50% Health, actually. The reason I steered away from it was that it severely limited the number of decisions the Nidalee player made about them, largely due to how rare they could become. Especially with nerfed Javelin Toss damage, the number of Javelins or Bushwhack traps that drop an enemy below half Health is probably about 1 in 3, and that's only the hits. When this is the case, the Nidalee player just wants to jump in on all of them, even if it's not necessarily a good idea. Ever had a Lee Sin on your team that took every Q2? That's pretty much what it felt like to play with this version of Nidalee. 
Thanks for the feedback!"

In response to a summoner concerned that removing Nidalee's massive, long range nuke would also remove her uniqueness, he replied:
"Thanks for the thoughts. I actually agree that unique options are great for the game, but the value of the uniqueness needs to outweigh the frustration that comes with it in an interesting way. The unique thing about Javelin Toss is the tension that builds while it's in the air due to the large reward if it hits. Hopefully this new version will maintain most of that tension while providing opponents some chance to play against if they get hit."

Repertoir also replied to questions on what exactly you can do with the Hunt pounce bonus and why her trap doesn't offer any CC:
"Hi kovu88,

The Pounce reset can be used however the player sees fit, and I'm sure there will be several different uses. It can be used to continue a chase, to jump away from harm, to farm a minion wave, or just held onto so that it's ready next time you need it. 
As for the trap, I've avoided adding a CC to it because Nidalee would have to pay for that somewhere else in her kit, probably in reducing her effectiveness at chasing things down. Plus, we don't have many champions without CC nowadays, so it's nice to be able to preserve that where possible.

Thanks for the questions!"

[ Update

When asked about the fate of support Nidalee, Repertoir noted:
"I don't think support Nidalee will be fantastic following these changes (not that she was before) since she's losing her W shred, but if players are able to succeed with it, it will probably be a bit more of an all-in pattern. I imagine there will be some pretty high overall threat around level 3 or so."
Regarding concern on Javelin Toss becoming irrelevant, he noted:
"I'm hoping that even following these changes, Javelin Toss and Bushwhack will remain very valuable tools for Nidalee, despite behaving a bit differently. Both will allow her pursue runners as well as give her a stronger entrance into a fight. So while Javelin Toss may deal less damage, it will remain her most important tool for slightly changed reasons. Try not to worry too much. I'm trying to keep Javelin Toss and Bushwhack feeling powerful despite the changes to their functionality. We'll have to see how that plays out in the next month or so!"

Repertoir also hit a big block of feedback on a point by point basis:
"Hi Meowmix86, 
It sounds like we mostly agree that Javelin does too much damage, even if we don't necessarily agree that it is/isn't acceptable given the circumstances under which it does that damage.
If you want to reduce the safety factor of Nid, why not just reduce the range of the spear?
As to the Javelin Toss range, I was interested for awhile reducing its range to hit her safety, but I went a different route intentionally after playing around with it. I think one of the coolest and most defining things about being a good Nidalee player is range management and feeling like you're in control of a situation. The more I could keep intact the discrepancy between melee range and max range spear, the better I thought I would be able to preserve some of the cool aspects of Nidalee.
Easy way to solve this, just remove the heal altogether. Give Nid a useful passive skill like guinsoos blade which increases attack speed and ability power with each skill use. This way you synergize her entire kit as well as remove her ability to heal up herself and the entire team.
One of my original Nidalee kits tested Rageblade-esque thing where she got stacks of something for doing things in one form that she then realized when she swapped to the other form. The two main problems I had with this was that it led to stance swapping incentives that weren't necessarily about gaining access to the other form's abilities, and more relevantly to your suggestion, it led to a stack management mini game (like Vlad's E) that didn't feel all that Nidalee. Rageblade, even if underpowered, is pretty cool because it allows players to opt into that gameplay, but forcing players into it seemed like a poor long term decision for Nidalee. As to why I haven't just removed Primal Surge, even though it's not the most exciting ability ever and leads to some frustration, it helps her do her job in poke scenarios, which everyone knows is one of Nidalee's strengths.
Change bushwack so it still does damage AND slows, but has no shred.
Some brainstorming and kit ideation helped me come to the conclusion that Nidalee is a lot cooler as a character whose opportunism comes from quickness and agility rather than crowd control. Many people suggest that Nidalee should have some crowd control, but I just think it too dramatically changes her without it being necessary. 
Thanks for the questions! I had forgotten about some of the really old iterations until you brought up the Rageblade-esque passive, so it was nice to get thinking about why I didn't go down that route with her to the version we have today."

No comments

Post a Comment