The famous iris dataset: 150 iris flowers, divided into three classes; 50 of type setosa, 50 of type versicolor, and 50 of type virginica. Four features of each flower were measured: the length and width of the petals, and the length and width of the sepals.
The three types of flowers each occupy a region on the map. Their position is based on their shared characteristics.
Without any extra labeling, this dataset contains only two obvious clusters that can clearly be distinguished on the map.
A list of passengers of the Titantic, organized by various attributes: their gender, their age, whether they traveled in first, second and third, class, where they embarked, and whether they survived the voyage.
You can use the map to gain valuable insight into which passengers survived, and who were not as fortunate. For example, woman had a much higher chance of survival then men. Can you find other relationships?
Individual players' statistics for the 2015 NBA playoffs. This includes their position, age, minutes played per game, two and tree point averages and many more.
Players with similar statistics are positioned together on the map. Note for example how star players Stephen Curry, Lebron James, and James Harden are all positioned next to each other, even though they play different positions and have different playing styles.
Source: NBA Stuffer
A map of the world's universities based on characteristics like the number of citations, student to staff ratio, or expected student income. This data informed the Times University Rankings. The map is generated by excluding the rank of each university, so that only the source stats influence the position of the map.
Note how the large research institutions cluster together. Note also how they do not necessarily correspond to universities with a high expected income for students after graduation.
Source: Times Higher Education
Ponder uses an unsupervised machine learning technique called "self-organizing maps" that can be used to generate two-dimensional layouts of multidimensional data. It is unsupervised since the technique attempts to detect clusters in the data without any human intervention. All that is required from the user is indicating what variables (ie. "columns") should be taken into account.
The time it takes to map the data depends on the number of rows and columns in the spreadsheet, the number of unique values in the columns, and on how powerful your computer is. So there is no straightforward answer. Files with many rows can be mapped very quickly if the variables are of low complexity, while small files with few rows may take longer if the variables contain many different values.