1 00:00:00,540 --> 00:00:05,400 Hi! Today I'm going to talk to you about  our paper "Misleading Beyond Visual Tricks:   2 00:00:05,400 --> 00:00:10,620 How People Actually Lie With Charts." If you search the term "deceptive   3 00:00:10,620 --> 00:00:15,300 visualizations," typically the charts that  will show up will contain inverted y-axis,   4 00:00:15,300 --> 00:00:21,360 truncated y-axis, and other visual tricks that  violate common visualization design guidelines.  5 00:00:22,380 --> 00:00:26,580 These techniques may interfere with our  ability to quickly make accurate readings from   6 00:00:26,580 --> 00:00:32,220 the chart so we often call such charts "lying." But why do we always see these exact same charts   7 00:00:32,220 --> 00:00:36,540 when we talk about deceptive visualizations? Well, this can probably be traced back   8 00:00:36,540 --> 00:00:41,040 to Edward Tufte and his notions of  "graphical integrity" and "lie factor."  9 00:00:41,040 --> 00:00:46,020 Hopefully everybody has seen such prototypical  examples where if a chart, for instance,   10 00:00:46,020 --> 00:00:52,440 shows this line to denote 18 miles per gallon  but this line to denote 27 1/2 we call it lying.  11 00:00:53,160 --> 00:00:57,960 Much research has followed up on these ideas and  to this day we're studying the effects of the   12 00:00:57,960 --> 00:01:04,500 lie factor on human perception. But, taking a step  back: is this how people actually lie with charts?  13 00:01:05,520 --> 00:01:08,640 To answer this question, we collected and analyzed   14 00:01:08,640 --> 00:01:13,500 10,000 COVID-19 data visualizations shared on  Twitter and used them to create a typology of   15 00:01:13,500 --> 00:01:18,300 common attributes of misleading visualizations.  And today we'll share with you our findings. 16 00:01:19,500 --> 00:01:24,180 This talk is structured as follows: first, we'll discuss the notion of   17 00:01:24,180 --> 00:01:30,120 reasoning errors in visualization posts and talk  about how people lie with charts. Secondly, we'll   18 00:01:30,120 --> 00:01:35,040 discuss what role visualization guidelines still  play in this framework. And lastly, we'll talk   19 00:01:35,040 --> 00:01:39,360 about how even well-designed charts can support  misinformation and what can be done about it.  20 00:01:40,500 --> 00:01:44,580 Let's start with reasoning errors. So, we  define reasoning errors in visualization   21 00:01:44,580 --> 00:01:50,820 tweets as unsupported assertions or logical  fallacies as basis of an argument, and we find   22 00:01:50,820 --> 00:01:57,240 that 84% of visualization posts that express  an opinion about or offer an interpretation of   23 00:01:57,240 --> 00:02:03,060 COVID-19 data contain such reasoning errors. We identified seven common types of errors:   24 00:02:03,900 --> 00:02:09,480 cherry-picking, setting an arbitrary  threshold, incorrect causal inference,   25 00:02:10,560 --> 00:02:17,040 issues with data validity ,failure to account  for statistical nuance, misrepresentation of   26 00:02:17,040 --> 00:02:22,680 scientific results, and incorrect reading of  chart. Let's go through a couple of examples. 27 00:02:25,200 --> 00:02:31,020 So, in this post the author shares a screenshot of  Mexico's government dashboard of excess mortality.   28 00:02:31,620 --> 00:02:36,060 There's a very visually salient feature  of the chart: a sharp drop in mortality. 29 00:02:38,100 --> 00:02:42,600 The author adds an annotation that assigns  a cause-and-effect relationship between the   30 00:02:42,600 --> 00:02:46,620 introduction of the alternative drug  Ivermectin and the drop in mortality,   31 00:02:48,180 --> 00:02:52,500 and the tweet text further  explains this argument. So,   32 00:02:52,500 --> 00:02:57,660 in this example the proposition that Ivermectin  helps with COVID---which the scientific consensus   33 00:02:57,660 --> 00:03:02,640 has disproven---is supported through incorrect  causal inference with the help of cherry-picking. 34 00:03:08,820 --> 00:03:10,380 Let's take a look at another example. 35 00:03:14,520 --> 00:03:19,500 So, this is a chart from an actual  CDC report showing COVID cases in   36 00:03:19,500 --> 00:03:21,360 a single county over the course of a month.   37 00:03:22,620 --> 00:03:27,360 Here, light blue represents fully vaccinated  people and dark blue represents all others.   38 00:03:29,160 --> 00:03:33,600 There's again a very visually salient feature  which is the difference in category composition. 39 00:03:35,160 --> 00:03:39,180 And again the tweet text  assigns a cause-and-effect   40 00:03:39,180 --> 00:03:42,180 relationship implying that vaccines caused COVID.   41 00:03:43,860 --> 00:03:47,700 But the elephant in the room is: what is the  vaccination rate in the general population?   42 00:03:48,600 --> 00:03:52,500 Without knowing, it's not possible to make any  conclusions about the efficacy of vaccines.   43 00:03:52,500 --> 00:03:58,200 So even though there are many more cases among  the vaccinated, there are likely also many more   44 00:03:58,200 --> 00:04:04,140 vaccinated people at the time. So in this example,  the proposition that vaccines further spread COVID   45 00:04:04,140 --> 00:04:09,240 is supported through several attributes, most  notably failure to account for statistical nuance. 46 00:04:12,960 --> 00:04:19,380 Now let's move on to the role of visual  tricks. We find that 89% of charts with   47 00:04:19,380 --> 00:04:24,420 reasoning errors do not violate any common  visualization design guidelines. Importantly,   48 00:04:24,420 --> 00:04:29,400 this percentage is very similar across every cut  of data: so, with or without reasoning errors,   49 00:04:29,400 --> 00:04:35,280 posts in support of COVID restrictions or  anti-mask posts are all in the range of 87--89%.   50 00:04:36,600 --> 00:04:42,960 Let's discuss a couple of examples that do  have visual tricks. So, this chart shows COVID   51 00:04:42,960 --> 00:04:48,840 hospitalizations drop sharply in the UK, with  the author concluding that "vaccines work." The   52 00:04:48,840 --> 00:04:53,940 truncated y-axis potentially exaggerates the sharp  drop and one might assume that cases went to zero. 53 00:04:58,620 --> 00:05:04,620 So let's look at another example: the chart on  the right plots the rate of people vaccinated   54 00:05:04,620 --> 00:05:10,800 versus people died. The careful selection of  scales on this dual axis chart exaggerates the   55 00:05:10,800 --> 00:05:16,200 spurious correlation leading the author to state,  quote: "deaths rising in line with vaccinations."   56 00:05:18,060 --> 00:05:20,940 So in conclusion, visual tricks may exaggerate the   57 00:05:20,940 --> 00:05:23,880 effects of reasoning errors that  are already present in the chart. 58 00:05:27,480 --> 00:05:32,820 So what could be done about it? You  might have observed from examples   59 00:05:32,820 --> 00:05:36,960 that the majority of misleading charts  are screenshots from reputable sources,   60 00:05:40,380 --> 00:05:45,360 such as government reports, data  exploration websites, or news media. 61 00:05:48,000 --> 00:05:51,900 Previously shown examples all contain  screenshots of charts that were not   62 00:05:51,900 --> 00:05:55,260 intended to support any of these  conclusions. Their vulnerability   63 00:05:55,260 --> 00:06:00,900 to misinterpretation primarily comes from having  very visually salient but unexplained features.   64 00:06:01,860 --> 00:06:07,020 But additionally from including warnings  against misinterpretation in the limitation   65 00:06:07,020 --> 00:06:10,620 section of the report where they would  not "survive the screenshot of a chart,"   66 00:06:11,400 --> 00:06:16,860 in the case of data exploration websites---from  offering an unrestricted set of interactions that   67 00:06:16,860 --> 00:06:19,860 could be interpreted by the user  as the set of valid comparisons,   68 00:06:21,420 --> 00:06:25,020 or from annotations being added directly  onto the chart with new information. 69 00:06:28,740 --> 00:06:35,700 Let's take a look at the structure of these  arguments. So, typically we consider a chart that   70 00:06:35,700 --> 00:06:42,060 shows a sharp increase in cases. This can form the  base premise of an argument. One can, for example,   71 00:06:42,060 --> 00:06:46,800 use annotations to add another premise---around  the time of increase there was an important event,   72 00:06:46,800 --> 00:06:51,660 say, the start of a vaccination campaign.  Taking all these premises together,   73 00:06:51,660 --> 00:06:57,840 one might make a general conclusion that the event  caused cases. This is an example of an inductive   74 00:06:57,840 --> 00:07:03,540 argument. But inductive reasoning is inherently  uncertain and only deals with the extent to which   75 00:07:03,540 --> 00:07:09,000 the conclusion is credible given the premises are  logically sound. And in most examples we've seen   76 00:07:09,000 --> 00:07:13,560 they are logically sound, they're simply not  plausible enough for the generalization made. 77 00:07:15,120 --> 00:07:20,160 So given this, protecting vulnerable  visualizations could be operationalized by   78 00:07:20,160 --> 00:07:25,680 making misinformation arguments less credible and  promoting skepticism. Importantly this should be   79 00:07:25,680 --> 00:07:29,940 done through salient features of the chart design,  such that it cannot be missed or cropped out.   80 00:07:30,840 --> 00:07:36,060 For example, let's take a look at a COVID  chart where an Event X might have caused cases   81 00:07:36,060 --> 00:07:42,720 in Country A. We can make this proposition less  credible by, for example, providing case charts   82 00:07:42,720 --> 00:07:49,200 of other regions to not allow for cherry-picked  examples, or visualizing important events that are   83 00:07:49,200 --> 00:07:55,680 likely to truly explain a rise in cases---such  as appearance of a new variant, or visualizing   84 00:07:55,680 --> 00:08:01,080 the uncertainty and death estimates that may stem  from low testing rates or methodological issues. 85 00:08:04,260 --> 00:08:10,380 In conclusion, we find that visual tricks are not  the main driver of visual misinformation online,   86 00:08:10,380 --> 00:08:15,300 reasoning errors are. The majority  of misleading visualizations in our   87 00:08:15,300 --> 00:08:20,460 data set are screenshots of charts from  reputable sources. And if a chart is not   88 00:08:20,460 --> 00:08:24,600 designed with a biased reading in mind it  is vulnerable to misinformation arguments. 89 00:08:26,880 --> 00:08:27,480 Thank you!