Sets are used to isolate a dimension into two fragments. That's all.
In the superstore dataset, there is a field called **Order ID, **where each order is composed of many purchases.
We will use a set to isolate the Order ID dimension into two fragments:
- Orders with average discounts below 9%
- Orders with average discounts above 9%.
To do so, right click on the field that we wish to make a set from. The set will be called
Discount > 9. The members of the set are determined based on a condition.
The condition will rely on a field, the Discount field, such that the members of the set will have an average discount **above > **9%, **0.09. **Conversely, orders not in the set will have an average discount below 9%.
Press ok, and the set shows up in our data pane. Drag it to colour, and you'll see two marks. A blue one for all orders inside the set (all orders with average discounts above 9%), and a grey mark for all orders outside the set (all orders with average discounts above 9%).
To validate this, it would make sense that if we took the average of all discount values for members inside the set, it would be below 9%, and vice-versa. We can do so by dragging discount to tooltip, and specifying the aggregation as AVG.
By hovering over the two marks, all orders inside the set average a discount of 33.25%, which is way above 9. All orders outside the set average a discount of 0.8%, which is way below 9. All of these results are as expected.
What if we wanted to compare the profits between these two segments? Simply drag profit to columns.
It aggregates the sum of profit for all orders inside the set, and all orders outside the set.
We see a very strong correlation. It becomes clear that high discounts generate lower profits, whereas orders associated with lower discounts generate higher profits.
Let's make use of the size pre-attentive property which we discussed in lecture 27. Make this visualization into a dot plot; drag the discount field to the size mark; and specify the aggregation as average.
This sizes the points based on the magnitude of their average discount value. It becomes even more clear that larger discounts are associated with lower profits. This relationship becomes even more apparent if we extended the granularity to evaluate the average discount at the level of categories.
All we've done is extended the granularity to evaluate the average discount of all technology, office supplies and furniture purchases from orders inside the set, or outside.
We can make this even more convincing by plotting the members of the set at the level of sub-categories
In summary, I hope you've realized how powerful it can be to isolate segments of a dimension using sets; and using these isolated segments in different ways to find insights in your data. Lectures 29 & 30 of the bootcamp provide more examples on sets; for this summary we will stop here. Just know this:
Sets are used to isolate segments of a dimension. These segments are then used in different ways to find insights in your data.