Probability-turbulence divergence:
A tunable allotaxonometric instrument for comparing heavy-tailed categorical distributions

P. S. Dodds, J. R. Minot, M. V. Arnold, T. Alshaabi, J. L. Adams, D. R. Dewhurst, A. J. Reagan, and C. M. Danforth


Paper (to appear):  arXiv version  |  arXiv page 


My Image

 

Abstract:


Real-world complex systems often comprise many distinct types of elements as well as many more types of networked interactions between elements. When the relative abundances of types can be measured well, we further observe heavy-tailed categorical distributions for type frequencies.

For the comparison of type frequency distributions of two systems or a system with itself at different time points in time—a facet of allotaxonometry—a great range of probability divergences are available.

Here, we introduce and explore 'probability-turbulence divergence', a tunable, straightforward, and interpretable instrument for comparing normalizable categorical frequency distributions.

We model probability-turbulence divergence (PTD) after rank-turbulence divergence (RTD). While probability-turbulence divergence is more limited in application than rank-turbulence divergence, it is more sensitive to changes in type frequency.

We build allotaxonographs to display probability turbulence, incorporating a way to visually accommodate zero probabilities for 'exclusive types' which are types that appear in only one system.

We explore comparisons of example distributions taken from literature, social media, and ecology.

We show how probability-turbulence divergence either explicitly or functionally generalizes many existing kinds of distances and measures, including, as special cases, $L^{(p)}$ norms, the Sørensen-Dice coefficient (the $F_1$ statistic), and the Hellinger distance.

We discuss similarities with the generalized entropies of Rényi and Tsallis, and the diversity indices (or Hill numbers) from ecology.

We close with thoughts on open problems concerning the optimization of the tuning of rank- and probability-turbulence divergence.