Methodology

OpenRarity Principles

The core tenets of the OpenRarity methodology are:

  • It must be easy for creators, consumers, and developers to understand

  • It must be objective and grounded in mathematical principles (open-source, introspectable)

  • It must be easy to recalculate as the dataset changes (new mints, metadata typos, mutable attributes)

  • It must provide consistent rarity ranks across all publishers

Methodology

We evaluated several different platforms and collections to understand the methodologies currently being used across different providers. While several collections have some form of customization, we found the most commonly adopted rarity function to be a rarity score that is a sum of the probability of each trait, and normalized by category distribution (Trait Normalization).

The problem here is that summing probabilities is inaccurate. Summing produces the probability of a token having a Green Hat or a Blue Hat, while multiplying produces the probability of a token having a Green Hat and a Blue Hat. We believe that the rarity of any given token is rooted in its set of traits occurring together.

Surprisal Ranking Algorithm

Information content is an alternative way of expressing probabilities that is more well suited for assessing rarity. Think of it as a measure of how surprised someone would be upon discovering something.

  1. Probabilities of 1 (i.e. every single token has the Trait) convey no rarity and add zero information to the score.

  2. As the probability approaches zero (i.e. the Trait becomes rarer), the information content continues to rise with no bound. See equation below for explanation.

  3. It is valid to perform linear operations (e.g. addition or arithmetic mean) on information, but not on raw probabilities.

Information content is used to solve lots of problems that involve something being unlikely (i.e. rare or scarce). This video shows how it was used to solve Wordle and also has an explanation of the equations, along with graphics to make it easier to understand. You can skip straight to the part on information theory if you’d like.

The score is defined as:

This can look daunting, so let’s break it down:

    • Each of these points is actually called a “bit” of information.

    • The important thing is that even if there was a one-off grail in an impossibly large NFT collection, we could keep assigning points!

    • Unlike with probabilities, it’s valid to add together bits of information.

Last updated