
The world is full of rankings and rankings. They show up at tennis – like at Roland-Garros, which ends with a final ranking of champion players. They appear in pandemics — such as when public health officials can record new infections and use contact tracing to sketch the networks of COVID-19 spread. Systems of competition, conflict, and contagion can all give rise to hierarchies.
However, these hierarchies are observed after the fact. It is therefore difficult to know the real ranking of the system: who was really the best player? Who infected whom? “You can’t go back in time and know exactly how this thing happened,” says Santa Fe Institute postdoctoral fellow George Cantwell. One could build a model of the network and compare all possible outcomes, but such a brute-force approach quickly becomes untenable. If you try to classify a group with only 60 participants, for example, the number of possible permutations reaches the number of particles in the known universe.
For a recent article published in physical examination E, Cantwell collaborated with SFI Professor Cris Moore, a computer scientist and mathematician, to describe a new way to assess rankings. Their goal was not to find a true hierarchy, but to calculate the distribution of all possible hierarchies, each weighted by its probability.
“We were prepared to not be quite right, but we wanted to get good answers with a sense of their quality,” Cantwell says. The new algorithm is inspired by physics: ranks are modeled as interacting entities that can move up or down. Through this lens, the system then behaves like a physical system that can be analyzed using methods derived from spin glass theory.
Shortly after the onset of the COVID-19 pandemic, Cantwell and Moore began thinking about models of disease spread through a network. They quickly recognized the situation as an ordering issue that emerges over time, much like the spread of a meme on social media or the emergence of league standings in professional sports. “How do you order things when you have incomplete information? Cantwell asks.
They started by imagining a function that could score a ranking on accuracy. For example: a good ranking would be one that corresponds to the results of the confrontations 98% of the time. A ranking that agrees with the results only 10% of the time would be lousy – worse than a draw without any prior knowledge.
One problem with rankings is that they are usually discrete, meaning they track whole numbers: 1, 2, 3, etc. This order suggests that the “distance” between the members of the first and second rank is the same as that between the second and third. But that’s not the case, says Cantwell. The best players in a game, worldwide, are going to be close in skill, so the difference between the top-ranked players may be narrower than it looks.
“You see quite often that the lower ranked players can beat the higher ranked players, and the only way the model can make sense and fit the data is to smash all the ranks together,” Cantwell says.
Cantwell and Moore described a system that evaluates rankings based on a continued numbering system. A leaderboard could assign any real number – whole number, fraction, infinitely repeating decimal – to a network player. “It’s easier to work with continuous numbers,” Cantwell explains, and those continuous numbers can always be translated into discrete rankings.
Additionally, this new approach can be used to predict something about the future, like the outcome of a tennis tournament, and also infer something about the past, like how a disease has spread. “These rankings could tell us the order of sports teams from best to worst. But they could also tell us the order in which members of a community were infected with a disease,” says Moore. “Even before his postdoc , George was working on this problem as a way to improve contact tracing in an outbreak.Just as we can predict which team will win a game, we can infer which of two people infected the other when they came into contact l with each other.
In future work, the researchers say they plan to investigate some of the deeper questions that have emerged. More than one ranking may agree with the data, but radically disagree with other rankings, for example. Or a classification that appears to be incorrect may have a large uncertainty but not be inaccurate. Cantwell says he also wants to compare the model’s predictions to results from real-world competitions. Ultimately, he says, the model could be used to improve predictions in a wide range of systems leading to rankings, from infectious disease models to sports betting.
Cantwell says he’ll keep his money – for now. “I’m not quite ready to start betting on that,” he says.