1
00:00:00,540 --> 00:00:05,400
Hi! Today I'm going to talk to you about 
our paper "Misleading Beyond Visual Tricks:  

2
00:00:05,400 --> 00:00:10,620
How People Actually Lie With Charts."
If you search the term "deceptive  

3
00:00:10,620 --> 00:00:15,300
visualizations," typically the charts that 
will show up will contain inverted y-axis,  

4
00:00:15,300 --> 00:00:21,360
truncated y-axis, and other visual tricks that 
violate common visualization design guidelines. 

5
00:00:22,380 --> 00:00:26,580
These techniques may interfere with our 
ability to quickly make accurate readings from  

6
00:00:26,580 --> 00:00:32,220
the chart so we often call such charts "lying."
But why do we always see these exact same charts  

7
00:00:32,220 --> 00:00:36,540
when we talk about deceptive visualizations?
Well, this can probably be traced back  

8
00:00:36,540 --> 00:00:41,040
to Edward Tufte and his notions of 
"graphical integrity" and "lie factor." 

9
00:00:41,040 --> 00:00:46,020
Hopefully everybody has seen such prototypical 
examples where if a chart, for instance,  

10
00:00:46,020 --> 00:00:52,440
shows this line to denote 18 miles per gallon 
but this line to denote 27 1/2 we call it lying. 

11
00:00:53,160 --> 00:00:57,960
Much research has followed up on these ideas and 
to this day we're studying the effects of the  

12
00:00:57,960 --> 00:01:04,500
lie factor on human perception. But, taking a step 
back: is this how people actually lie with charts? 

13
00:01:05,520 --> 00:01:08,640
To answer this question, we collected and analyzed  

14
00:01:08,640 --> 00:01:13,500
10,000 COVID-19 data visualizations shared on 
Twitter and used them to create a typology of  

15
00:01:13,500 --> 00:01:18,300
common attributes of misleading visualizations. 
And today we'll share with you our findings.

16
00:01:19,500 --> 00:01:24,180
This talk is structured as follows:
first, we'll discuss the notion of  

17
00:01:24,180 --> 00:01:30,120
reasoning errors in visualization posts and talk 
about how people lie with charts. Secondly, we'll  

18
00:01:30,120 --> 00:01:35,040
discuss what role visualization guidelines still 
play in this framework. And lastly, we'll talk  

19
00:01:35,040 --> 00:01:39,360
about how even well-designed charts can support 
misinformation and what can be done about it. 

20
00:01:40,500 --> 00:01:44,580
Let's start with reasoning errors. So, we 
define reasoning errors in visualization  

21
00:01:44,580 --> 00:01:50,820
tweets as unsupported assertions or logical 
fallacies as basis of an argument, and we find  

22
00:01:50,820 --> 00:01:57,240
that 84% of visualization posts that express 
an opinion about or offer an interpretation of  

23
00:01:57,240 --> 00:02:03,060
COVID-19 data contain such reasoning errors.
We identified seven common types of errors:  

24
00:02:03,900 --> 00:02:09,480
cherry-picking, setting an arbitrary 
threshold, incorrect causal inference,  

25
00:02:10,560 --> 00:02:17,040
issues with data validity ,failure to account 
for statistical nuance, misrepresentation of  

26
00:02:17,040 --> 00:02:22,680
scientific results, and incorrect reading of 
chart. Let's go through a couple of examples.

27
00:02:25,200 --> 00:02:31,020
So, in this post the author shares a screenshot of 
Mexico's government dashboard of excess mortality.  

28
00:02:31,620 --> 00:02:36,060
There's a very visually salient feature 
of the chart: a sharp drop in mortality.

29
00:02:38,100 --> 00:02:42,600
The author adds an annotation that assigns 
a cause-and-effect relationship between the  

30
00:02:42,600 --> 00:02:46,620
introduction of the alternative drug 
Ivermectin and the drop in mortality,  

31
00:02:48,180 --> 00:02:52,500
and the tweet text further 
explains this argument. So,  

32
00:02:52,500 --> 00:02:57,660
in this example the proposition that Ivermectin 
helps with COVID---which the scientific consensus  

33
00:02:57,660 --> 00:03:02,640
has disproven---is supported through incorrect 
causal inference with the help of cherry-picking.

34
00:03:08,820 --> 00:03:10,380
Let's take a look at another example.

35
00:03:14,520 --> 00:03:19,500
So, this is a chart from an actual 
CDC report showing COVID cases in  

36
00:03:19,500 --> 00:03:21,360
a single county over the course of a month.  

37
00:03:22,620 --> 00:03:27,360
Here, light blue represents fully vaccinated 
people and dark blue represents all others.  

38
00:03:29,160 --> 00:03:33,600
There's again a very visually salient feature 
which is the difference in category composition.

39
00:03:35,160 --> 00:03:39,180
And again the tweet text 
assigns a cause-and-effect  

40
00:03:39,180 --> 00:03:42,180
relationship implying that vaccines caused COVID.  

41
00:03:43,860 --> 00:03:47,700
But the elephant in the room is: what is the 
vaccination rate in the general population?  

42
00:03:48,600 --> 00:03:52,500
Without knowing, it's not possible to make any 
conclusions about the efficacy of vaccines.  

43
00:03:52,500 --> 00:03:58,200
So even though there are many more cases among 
the vaccinated, there are likely also many more  

44
00:03:58,200 --> 00:04:04,140
vaccinated people at the time. So in this example, 
the proposition that vaccines further spread COVID  

45
00:04:04,140 --> 00:04:09,240
is supported through several attributes, most 
notably failure to account for statistical nuance.

46
00:04:12,960 --> 00:04:19,380
Now let's move on to the role of visual 
tricks. We find that 89% of charts with  

47
00:04:19,380 --> 00:04:24,420
reasoning errors do not violate any common 
visualization design guidelines. Importantly,  

48
00:04:24,420 --> 00:04:29,400
this percentage is very similar across every cut 
of data: so, with or without reasoning errors,  

49
00:04:29,400 --> 00:04:35,280
posts in support of COVID restrictions or 
anti-mask posts are all in the range of 87--89%.  

50
00:04:36,600 --> 00:04:42,960
Let's discuss a couple of examples that do 
have visual tricks. So, this chart shows COVID  

51
00:04:42,960 --> 00:04:48,840
hospitalizations drop sharply in the UK, with 
the author concluding that "vaccines work." The  

52
00:04:48,840 --> 00:04:53,940
truncated y-axis potentially exaggerates the sharp 
drop and one might assume that cases went to zero.

53
00:04:58,620 --> 00:05:04,620
So let's look at another example: the chart on 
the right plots the rate of people vaccinated  

54
00:05:04,620 --> 00:05:10,800
versus people died. The careful selection of 
scales on this dual axis chart exaggerates the  

55
00:05:10,800 --> 00:05:16,200
spurious correlation leading the author to state, 
quote: "deaths rising in line with vaccinations."  

56
00:05:18,060 --> 00:05:20,940
So in conclusion, visual tricks may exaggerate the  

57
00:05:20,940 --> 00:05:23,880
effects of reasoning errors that 
are already present in the chart.

58
00:05:27,480 --> 00:05:32,820
So what could be done about it? You 
might have observed from examples  

59
00:05:32,820 --> 00:05:36,960
that the majority of misleading charts 
are screenshots from reputable sources,  

60
00:05:40,380 --> 00:05:45,360
such as government reports, data 
exploration websites, or news media.

61
00:05:48,000 --> 00:05:51,900
Previously shown examples all contain 
screenshots of charts that were not  

62
00:05:51,900 --> 00:05:55,260
intended to support any of these 
conclusions. Their vulnerability  

63
00:05:55,260 --> 00:06:00,900
to misinterpretation primarily comes from having 
very visually salient but unexplained features.  

64
00:06:01,860 --> 00:06:07,020
But additionally from including warnings 
against misinterpretation in the limitation  

65
00:06:07,020 --> 00:06:10,620
section of the report where they would 
not "survive the screenshot of a chart,"  

66
00:06:11,400 --> 00:06:16,860
in the case of data exploration websites---from 
offering an unrestricted set of interactions that  

67
00:06:16,860 --> 00:06:19,860
could be interpreted by the user 
as the set of valid comparisons,  

68
00:06:21,420 --> 00:06:25,020
or from annotations being added directly 
onto the chart with new information.

69
00:06:28,740 --> 00:06:35,700
Let's take a look at the structure of these 
arguments. So, typically we consider a chart that  

70
00:06:35,700 --> 00:06:42,060
shows a sharp increase in cases. This can form the 
base premise of an argument. One can, for example,  

71
00:06:42,060 --> 00:06:46,800
use annotations to add another premise---around 
the time of increase there was an important event,  

72
00:06:46,800 --> 00:06:51,660
say, the start of a vaccination campaign. 
Taking all these premises together,  

73
00:06:51,660 --> 00:06:57,840
one might make a general conclusion that the event 
caused cases. This is an example of an inductive  

74
00:06:57,840 --> 00:07:03,540
argument. But inductive reasoning is inherently 
uncertain and only deals with the extent to which  

75
00:07:03,540 --> 00:07:09,000
the conclusion is credible given the premises are 
logically sound. And in most examples we've seen  

76
00:07:09,000 --> 00:07:13,560
they are logically sound, they're simply not 
plausible enough for the generalization made.

77
00:07:15,120 --> 00:07:20,160
So given this, protecting vulnerable 
visualizations could be operationalized by  

78
00:07:20,160 --> 00:07:25,680
making misinformation arguments less credible and 
promoting skepticism. Importantly this should be  

79
00:07:25,680 --> 00:07:29,940
done through salient features of the chart design, 
such that it cannot be missed or cropped out.  

80
00:07:30,840 --> 00:07:36,060
For example, let's take a look at a COVID 
chart where an Event X might have caused cases  

81
00:07:36,060 --> 00:07:42,720
in Country A. We can make this proposition less 
credible by, for example, providing case charts  

82
00:07:42,720 --> 00:07:49,200
of other regions to not allow for cherry-picked 
examples, or visualizing important events that are  

83
00:07:49,200 --> 00:07:55,680
likely to truly explain a rise in cases---such 
as appearance of a new variant, or visualizing  

84
00:07:55,680 --> 00:08:01,080
the uncertainty and death estimates that may stem 
from low testing rates or methodological issues.

85
00:08:04,260 --> 00:08:10,380
In conclusion, we find that visual tricks are not 
the main driver of visual misinformation online,  

86
00:08:10,380 --> 00:08:15,300
reasoning errors are. The majority 
of misleading visualizations in our  

87
00:08:15,300 --> 00:08:20,460
data set are screenshots of charts from 
reputable sources. And if a chart is not  

88
00:08:20,460 --> 00:08:24,600
designed with a biased reading in mind it 
is vulnerable to misinformation arguments.

89
00:08:26,880 --> 00:08:27,480
Thank you!