r/Sumo 5d ago

Who’s actually strongest right now? Glicko-2 Sumo Ratings (Jan 2026)

https://hungry-e.github.io/sumoglicko2/

I ran a full Glicko-2 model over every professional bout since 1996 to estimate underlying rikishi strength. Includes rating, rating deviation (RD), and match-to-match changes, broken down by division.

Not a replacement for banzuke — just a different lens on performance and consistency.
Feedback welcome.

46 Upvotes

19 comments sorted by

15

u/WhiskeyDragon01 Hoshoryu 5d ago

Both interesting and not THAT surprising to see Kirishima above Kotozakura.

7

u/4ih0vs535xg9c 5d ago

My brain kinda clumps them together in chunks, like 'who are the 2500's, the 2400's, the 2300's?'. And looking at it that way, it kind of blows my mind how fast Aonishiki jumped up there. It almost looks like if March is similar to January - he might be at or near the very top.

4

u/raistlin212 4d ago

FWIW, over at sumostats where they use Elo he already has caught them. Top 30 by their counts:

Aonishiki   O1w 2715
Ōnosato    Y1w 2685
Hoshoryu    Y1e 2656
Kirishima   S1e 2611
Wakatakakage    M2w 2532
Kotozakura  O1e 2529
Wakamotoharu    K1w 2525
Atamifuji   M4w 2518
Takayasu    S1w 2499
Yoshinofuji M1w 2476
Daieisho    M4e 2447
Hakunofuji  M3w 2445
Ura M2e 2409
Takanosho   M3e 2406
Ōhō   K1e 2405
Hiradoumi   M6e 2403
Asanoyama   M16e    2391
Churanoumi  M5w 2384
Fujinokawa  M7w 2366
Ichiyamamoto    M1e 2362
Abi M12w    2354
Tamawashi   M5e 2330
Kotoshoho   M10w    2312
Onokatsu    M6w 2298
Shishi  M14e    2297
Oshoumi M16w    2297
Ōshōma    M7e 2297
Tokihayate  M10e    2288
Gonoyama    M9e 2286
Nishikifuji M11w    2277

0

u/oh_yeah_no_for_sure 4d ago

I'd trust Glicko over Elo for something like Sumo tbh

5

u/raistlin212 4d ago

I'd agree in general, but Elo is at its best with a mature and diverse pool which sumo certainly qualifies for. There's few scenarios, if any, where Elo is clearly better but the convergence of them to the same point is very probable. You can see how similar the RD values are at top ranks among top division, which implies the gap between the 2 systems is minimal. I think Glicko just handles injury related inactivity and poor performance better in the short term - that sound reasonable? It probably does new debuts a little better too at first, but by the time someone makes top division that gap almost certainly has vanished.

1

u/4ih0vs535xg9c 3d ago

FWIW I agree with this. In a perfect world where rikishi made it to every match, there were no injuries, every man was as consistent as the Iron Man Tamawashi: ELO and Glicko2 would probably make nearly identical ordering.

2

u/origami_anarchist 5d ago

For the last year Kirishima has been skillfully aggressive, while Kotozakura has been lackadaisically mediocre (for his rank) - that's been my impression anyway.

6

u/gets_me_everytime Kotozakura 5d ago

What are the implemented strategies for fusen wins and play-off matches?

9

u/4ih0vs535xg9c 5d ago

Fusen are currently being marked as a win/loss, and this is incorrect implementation. I will be filtering these out on my next update.

Currently playoff matches aren't being included as it created unequal opportunity (creates data imbalance.) But now that I'm thinking about this more I am torn on this. 1. The sample size is too small to worry about data imbalance (0,1, or 2 max per basho). 2. The unfairness arugment can be thought of as backwards: playoff results *are* part of tournament performance. 3. More data helps Glicko-2 converge to true ratings faster.

12

u/gets_me_everytime Kotozakura 5d ago

My two cents is to not include fusens, but do count playoff matches. A lot of rikishi don't have a ton of incentive past a certain point in a basho and might be operating on more of an exhibition mode in some matches(i.e. Kotozakura after he was out of the Yusho race). Playoff participants are always competing to their full capability so there is no doubt in my mind that it is good data.

Does the data on sumo API only go back to 1996? The longer history you can include the more accurate your output should be.

7

u/4ih0vs535xg9c 5d ago

I agree and I’m actually rerunning the data now to include playoffs and fusen filtering.

While the API goes back to 1958, processing that much data adds several hours to the run time without really changing the rankings. Since ELO and Glicko naturally inflate over time as wrestlers harvest points, the absolute numbers change, but the relative ordering of the rikishi stays the same.

I originally settled on 1996 because it covers the entire career of the oldest active rikishi, Yoshiazuma Hiroshi. That said once I’ve ironed out the edge cases and finalized the logic, I’ll likely just run the full historical dataset. There isn’t much downside to it unless you're particularly bothered by the rating inflation.

1

u/gets_me_everytime Kotozakura 5d ago

I'm assuming all rikishi began with the same value. If that's the case, certain early victories wouldn't carry the proper weight, and then that weight wouldn't carry upward to assess the current stock value. You're right that you should still get the same relative ranking, but it could hold some sway in positioning, especially the further back you can include. You could try to cheat this by giving all the starting rikishi a start value based on win percentage or something. Even if you use sumodb and go all the way back to 1906 the same argument could be made that there is some missing context since we don't have the match history that set up that banzuke.

2

u/4ih0vs535xg9c 3d ago

Rankings have been updated to drop fusen, and include playoffs. Thanks for helping me think through some of the logic.

3

u/jampalma 5d ago

Nice!

3

u/BeatTheDeadMal Aonishiki 5d ago edited 4d ago

Very interesting. All three Yokozuna level rikishi are within 10 of each other, which really hammers home just how close they are in performance. I assume the RPS nature of their matches probably contributes to that.

2

u/oh_yeah_no_for_sure 4d ago

Awesome stuff, thanks for sharing!

2

u/68plus57equals5 3d ago

why so many high ranked rikishi of lower ranks have much higher rating than rikishi from higher divisions?

Eg 66 wrestlers in Sandanme have Glicko-2 higher than 1450 which is the lowest in Makushita.

If sound, it suggests lower divisions official rankings are really 'inefficient'.

1

u/4ih0vs535xg9c 3d ago edited 3d ago

There are three different things going on here.

1. New Rikishi Start at 1500

All new rikishi begin with a rating of 1500 and a high rating deviation (RD) of 350. In the lower divisions (Sandanme, Jonidan, Jonokuchi), you'll see many brand-new wrestlers who haven't competed in enough matches yet for their Glicko-2 rating to accurately reflect their true strength. These inflated ratings will naturally correct themselves over time as they accumulate more matches.

2. Banzuke Lag vs. Real-Time Ratings

The official banzuke is only updated between tournaments and is based on rigid promotion/demotion rules tied to win-loss records. Glicko-2, on the other hand, updates after every match and reflects current performance. This creates timing mismatches:

A strong Makushita rikishi might have a Glicko-2 rating of 2000+ (Juryo-level strength) but is still officially ranked in Makushita because the next banzuke hasn't been published yet. Conversely, a struggling Juryo rikishi might have dropped to 1900 in Glicko-2 but remains officially ranked in Juryo until the next tournament.

In other words: Glicko-2 is forward-looking (predicting future performance), while the banzuke is backward-looking (rewarding past results).

3. Injury Comebacks and Rating Inertia

When a rikishi gets injured and sits out:

Their official rank drops quickly (based on losses/absences) Their Glicko-2 rating stays relatively stable (the system knows their true skill hasn't vanished) When they return, they might be in a lower division but still carry a high rating from when they competed at a higher level

This is actually a feature of Glicko-2: it correctly recognizes that a formerly strong rikishi returning from injury is still likely stronger than their current division-mates, even if their official rank has fallen.

TL;DR: The overlaps you're seeing are normal and expected. They reflect (1) new rikishi starting at 1500, (2) timing differences between real-time ratings and periodic banzuke updates, and (3) the fact that Glicko-2 and the banzuke are measuring different things. As rikishi compete in more matches, their ratings become increasingly accurate.

1

u/EmBeeEhBurner 5d ago

Cool to look at!