Explaining the Research Behind DUPR’s Jan. 30th Updates
You all have probably seen the quick-hitter algo update and FAQs we posted on Jan 25th looking forward to the Jan 30th algo improvements or saw me talk briefly about them in my Reddit post last week. I wanted to take the opportunity to go a little more in depth here for those of you who are interested as to how we took the methods from my last blog post and applied them to these decisions, or just generally for those who are keen to get a better understanding of the algo’s inner workings.
Our goal is, in essence, to solve an “optimization” style problem: maximize X while obeying Y. In our case here, “X” is the accuracy of the overall system (we want to predict future matches as accurately as possible) and “Y” is the standard of pre-match transparency and post-match interpretability (it should be intuitive to a user whether their DUPR will go up, down, or stay the same as a result of the match they just played).
The original weekly algo had high accuracy and low transparency/interpretability —it did well on the optimization but didn’t satisfy the requirements. The instantaneous algo had lower accuracy but high transparency/interpretability — it easily satisfied the requirements but with plenty of room to grow on accuracy.
I see transparency as having two key components:
These components are crucial for users because they build trust in the system through clear and justifiable results, ensuring the output is viewed as fair and reliable for all.
The instantaneous ratings update presented a solution to the first transparency component by effectively telling the user: “this match, and only this match, is the reason your DUPR just changed”.
The win-and-go-up aspect of the instantaneous rating was an initial solution to the second transparency component because, while you may not know the exact amount a given match result will move your DUPR, you can at least have a good feel for the directionality and scale on intuition by comparing everyone’s ratings and the simple final W/L result.
So, how do we improve our accuracy while staying within the defined bounds? DUPR can either operate more accurately within the current solutions to these constraints (by building better models with the same requirements of instant results and win-and-go-up movements) or it can build additional solutions by adding channels of communication which show that causality or intuition more clearly to users.
Given some workflow hurdles to solve for the fact that many users typically only look at DUPR after the match has been played, and the required technical lifts to build these communication channels, our January 30th algo update is focused on optimizing within the solutions available to us, making sure each step adds accuracy and intuition.
Win intensity is stage one of fully reintroducing points to our ecosystem in a transparent and effective way. The question becomes: how do we reintegrate the accuracy of counting points back into the rating such that it satisfies the intuitive movement constraint?
The first step is: does counting points actually improve accuracy? Does seeing someone win 11-8 vs winning 11-0 mean that our prediction of their future performance should be relatively lower in the 11-8 case? With the framework discussed in the last blog, we can now answer this question pretty explicitly. We spin up a version of the algo that scales the movement of a player based on the points they’ve won, and, as many of you may have guessed, points do in fact add predictive value!
Calculating all of the points, even without much additional sophistication, adds an additional 25% or so of accuracy!
But there’s a catch. If Albus was supposed to win 11-5 against Barnabus and instead Albus only won 11-8, this pure points model would actually indicate Albus’ rating is too high as it currently stands or else it would’ve known to predict 11-8 from the get go! So the result would nudge both Albus’ and Barnabus’ ratings closer to the ratings that would’ve predicted 11-8: Albus goes down and Barnabus goes up, despite Albus beating Barnabus. While Barnabus may love the fact that his rating went up “unexpectedly”, Albus certainly could be frustrated. Maybe he had the game in hand at 11-4 and went easy for the last few points because he didn’t know he had to win 11-5 or better. Without a way of communicating that 11-5 target to users before the match is played, we simply don’t feel comfortable imposing that negative experience on users at this stage. We’re working on making this possible, but since many users don’t enter matches until after the match completes, for instance, we will have to address the entire user workflow to make this available for us.
So what can we do now? Well, we can design another model that still uses points, but allows for the inflection to still be at whether you’ve won or lost. By tuning the hyperparameters around exactly how this model works, we can maximize the accuracy of this method to within 10% of the pure points model — a significant step forward from where we are now! We’re calling this model feature Win Intensity. As before, if you win a match, you will go up, and if you lose, you will go down —this is the only way to guarantee transparency with this process because of the above.
But, now when you win by just a little, your rating barely moves up. If you win by a lot, your rating moves up much more. More of that point spread information is being incorporated.
Representative example of how DUPR can use Win Intensity to get close to the Pure Points-based ratings movement while still having an intuitive behavior. The exact shape here is for illustrative purposes only.
So yes, if I hypothetically took a bunch of lessons and managed to lose to Ben Johns by only 11-9, I would move down a very, very, very small amount after that match even though I would ideally show as having played quite above my current station. But when I turn around and start beating other people now by increasingly larger spreads, my rating would begin to rise much faster than in the former format. With more matches played, the relative differences from the Win Intensity model get averaged out and have less and less of an influence.
This is an area of ongoing focus for us, of course. We’d love to have all 25% of that accuracy bonus back (and then some), but this is a significant step in a positive direction while staying within the transparency conditions that have generated a significantly more active user-base.
I promise the next few sections will be shorter!
On the topic of taking a bunch of lessons and getting significantly better, we needed a way to incorporate drift and movement in player skill. Players get better with practice, more focus, or just generally more experience. Maybe they’re transitioning from another racquet sport and picking up a new pickleball-specific skill every week they touch a paddle. In the pre-Jan 30th instant algo, movement was constant regardless of when the games were played.
In this release, the ratings will now move proportionally to their match recency weighting. Recent matches will count for more in your rating and their weights will begin to slowly drop over time. As you get further and further from that match, the more recent matches entered will then hold relatively higher emphasis on your rating. If you take a few months off, the matches in your account will continue to decay in weight. Once you start entering matches into DUPR again, those recent matches will be much fresher than the old ones and your rating will move more than if you had been playing every day leading up to your latest match. This helps us learn better particularly in the case where you’ve taken some time off to practice and are coming back into the game with a few new tricks up your sleeve!
How did DUPR come up with the particular shape and decay rate? You may even be able to guess my answer at this point! We built a few hypotheses based on statistical theory and experience, fit them into our model, ran them against our dataset, and the one that best captured the natural trend in player skill movement was selected!
The last big change that was burning a hole in our quantitative pockets for this Jan 30 release date was the idea of match count and learning rate. Imagine you flipped a coin one time and have no prior knowledge of coins. They are an absolute mystery. It landed on heads. Our best guess at this time is that it would always land on heads–we have no idea what else it could be since we’ve only ever seen one flip! The next time we flip it, it lands on tails, and our best guess is now that it's only a 50/50 chance of being heads since we’ve now observed one of each. That's a move of 50 whole percent! On the other side of the spectrum, imagine we had flipped a million coins in our life and we had meticulously recorded 500k heads and 500k tails. Our next coin flip, be it heads or tails, doesn't change our estimate that much as to how likely this coin is to land on heads; we've already seen so much information to support our hypothesis that coin flips are 50/50 that this next flip adds relatively little new information.
Playing pickleball acts statistically in quite the same way; the more you play, the more we know about you and the less new information we gain from the latest match. An MLP/PPA pro is probably pretty close to their true rating, whereas someone just joining DUPR has a lot of unknown variables to discover! You may think “Wait, every time I play you should be learning more about me, right?” That is still true! But if it was your first game ever on DUPR, we’d have 100% of our knowledge about your rating coming from one single match and would have to update accordingly. As you play more matches, there are certain pieces of information that are redundant from match to match, and less and less of the match’s details are “surprises” to the algo that it can learn from. Let’s say you play the same team 10 times in a row. By the tenth match, we probably have a pretty good idea of how you typically match up against them and that tenth match usually just helps to confirm that constructed view rather than forming a new view altogether.
But you’ll notice this effect is acting in the opposite direction of the Match Recency effect; where Match Recency says we learn more from recent matches, Match Count says we are learning relatively less. With our new framework, we can calibrate these two effects together to determine the appropriate rates for DUPR to both “learn” and “forget” to maximize the predictive nature of the ecosystem.
With Win Intensity, Match Recency, and Match Count all affecting the ratings as of January 30th, how are matches entered prior to then going to be adapted into the upgraded system? This especially becomes a technological and operational concern when we have to go back and amend old matches or merge player accounts. The best way for us to do this, while also getting the benefit of getting that 15% accuracy bump as soon as possible, is to batch fix everyone’s ratings to the new model’s best guess. This, as you might imagine, would cause a lot of people to be quite frustrated, particularly those of you who may be slightly higher in the old system vs the new.
Our compromise is going to be to batch adjust only those people’s ratings whose ratings are going to go UP in the updated model. If you notice your rating has jumped overnight, congratulations! The upgraded model likes you more than the prior version did! This definitely will create some bias in the average rating since we’re not equivalently moving others down, but as we move further and further into the future, this bias will get decayed through the Match Recency effect and players will move to their unbiased ratings as they play.
With these updates to the DUPR algorithm, we've focused on making the system not only more accurate but also more intuitive and accessible, ensuring that every DUPR user can easily understand and benefit from their personal rating progression. We’re excited about these changes and the process through which they resulted and hope you’ve enjoyed seeing a little bit more into how we came to the decisions we did. As we work towards future algorithm updates and improvements, or maybe just as interesting topics pop up in the space, I hope to see you back here with me!
Written By: Scott Mendelssohn - Head of Analytics, DUPR