User interface add-ons to make a better parallel sets visualizations:
(skip to the data visualization here.)
Parallel sets is a data visualization type that shows how attributes of different types are distributed across a large number of instances. Common examples include datasets of the characteristics of passengers on the Titanic disaster and nutrition information of many different types of cereal. Parallel sets data visualizations are good at showing how attributes do or don’t cluster over a large dataset of instances. It is a data visualization type favored when the data is categorical and not strictly numerical.
I built a parallel sets data visualization based on a dataset of battles from the Game of Thrones series (TV version, not book). Examples questions that can be answered by a parallel sets built from this data include: Which battle type has the best success rate? Which house has the best and track record in terms of number of battles won or lost? Were there more pitched battles versus ambushes in later years versus earlier years? Are battles in certain regions more likely to result in a major characters death?
The parallel sets visualizations that I’ve seen work best when the number of types of attributes is between 3 and 10, the number of different attributes within each type is <10, and the total number of instances is >100. Outside this range, parallel sets tend to get too cluttered.
When I used parallel sets for a project at work, I found it be successful for the immediate problem but very easy to be cluttered and messy if I tried to include all characteristics found in the dataset. Although a subset of the data was fine for showing what I was interested in at the time, I knew eventually I would want to show a larger dataset and let the user explore the data. This led to brainstorming potential improvements for my initial parallel sets code base, which I had adopted from Jason Davis parasets project. Both us d3.js. I’ve completed the first two interactivity improvements listed below in this visualization.
- User can select which data dimensions to visualize (types of attributes: region, year, defender, etc.)
- User can limit the visualization to only data that contains a certain value in a certain dimension (battles with Stark as the defender)
- User can chose to turn numerical information into small categorical groups (size of battle was > or < some number)
a. user selects n number of divisions of the data between min and max
b. or n number of divisions based on equal number in each category
c. or by log divisions (>10, >100, >1000, etc.) - Interactive combination of histogram and parallel sets data visualizations via dc.js.
- Pre-determined combinations of data dimensions available via a series of buttons to assist in story-telling to user.
Letting the user select the input data and try different combinations reduces cluttering. Giving the user the option to pick the input data makes this data visualization more of a data exploration tool and less of a straight forward story telling image.
It is also useful if you don’t have HBO but want to study up for water cooler conversations.
Examples from the Game of Thrones Parallel Sets Visualization:
ribbon width = number of battles
Ambushes, razing, and siege perform better for the attacker than a pitched battle.
Freefolk (black ribbon) have a horrible track record at attacking. House Stark and Baratheon are also not that great.
Lannister, Stark, and Baratheon have the highest number of battles where they are the main attacker and an important character dies.
The attacker usually wins in a Game of Thrones battle, but when they lose, it tends to be when the main defender is a Lannister. In those battles where a defending Lannister wins, it is also common for a major character of some type to die.
0 Comments