📚 Personal bits of knowledge
at main 89 lines 5.6 kB view raw view rendered
1# Data Visualization 2 3Charts can be more memorable, shareable, and quickly understood than written explanations. They help explore data, explain concepts, and [share information effectively](https://www.scientificdiscovery.dev/p/salonis-guide-to-data-visualization). Clear visuals strengthen [[Communication]] and [[Dashboards]]. 4 5## Why Visualize Data 6 7- Visualizing data helps spot patterns, trends, and unusual data points that are hard to see in averages or summaries alone. A chart can reveal what an aggregate hides (e.g: [Anscombe's quartet](https://en.wikipedia.org/wiki/Anscombe%27s_quartet)). 8- Diagrams can explain concepts faster than text. A few-second visual can replace a long, confused explanation. 9- A good chart communicates faster than 1,000 words, but that power comes with responsibility. Misleading charts spread just as easily as accurate ones. 10- Plotting data helps spot potential errors and artefacts before publishing. 11 12## Chart Type Selection 13 14- Pick the chart type and [[Metrics|metric]] that answers the exact question; rates, counts, and shares reveal different truths, so show multiple small views when one cut feels incomplete. 15- Use familiar or practical units (minutes, not standard deviations) when possible. They're easier to interpret and sense-check. 16 17## Clarity 18 19- Keep labels horizontal and close to the data. Direct labels beat legends. 20- Don't include a legend, instead, [label data series directly](https://www.eugenewei.com/blog/2017/11/13/remove-the-legend) in the plot area (usually to the right of the most recent data point). Exception: many categories referring to many elements (e.g., maps). 21- Use small multiples when too many lines overlap. Splitting into panels makes individual trends easier to follow, though it trades off direct comparison between entities. 22- Sort categories logically (inherent order) or alphabetically (easier to skim). 23- [Data looks better naked](https://www.darkhorseanalytics.com/blog/data-looks-better-naked). 24 - Reduce non-data-ink as much as possible without losing communicative power. 25- Don't include more precision than needed. 26- Format axis labels to match the figures being measured (e.g., currency for dollars). 27- Look at axis label spacing and increase intervals if crowded. 28 29## Color 30 31- Match colors to concepts (plants → green, bad → red) so readers aren't forced into a [Stroop test](https://en.wikipedia.org/wiki/Stroop_effect). 32- Use [color-blind friendly palettes](https://davidmathlogic.com/colorblind/). About 4-5% of the population has some form of color blindness. 33- Direct labeling also helps color-blind readers distinguish categories. 34 35## Axes 36 37- Start your y-axis at zero (assuming no negative values). 38- Avoid deceptive scale tricks. 39 - Leave breathing room on axes instead of extreme zoom. 40 - The lowest point shouldn't appear to be the lowest possible value. 41- Pair relative effects with absolute numbers (or prediction intervals instead of confidence intervals) to show real-world risk. 42 43## Context 44 45- Include explanations for anomalous events directly on the graph. 46- For unfamiliar chart types, guide readers with annotations. Add a mini-tutorial if needed. 47- Include targets as asymptotes to help audiences see if you're on track. 48- Make the chart standalone. Add purpose, units, timeframe, and source so it can travel without losing meaning and slot into [[Dashboards]] or memos without extra explanation. 49- Titles for graphs should be the conclusion or key takeaway. 50- Always note the data source below the graph. 51 52## Reproducibility 53 54- Publish provenance with the chart (data source, assumptions, and ideally a link to code) so others can verify or reuse it and keep [[Data Practices]] consistent. 55- A chart with no source isn't much better than claiming a trend was revealed in a dream. 56 57## Common Pitfalls 58 59- Skip arrows or other glyphs that imply trends you can't support. 60- Don't use 3D charts. They distract and make values harder to read. 61- Avoid confidence intervals when showing variability. They're often [misinterpreted as ranges](https://www.dangoldstein.com/papers/Hofman_Goldstein_Hullman_Visualizing_Uncertainty_Mislead_Scientific.pdf). Consider prediction intervals or underlying percentages instead. 62- Try not to have too many data series; 5-8 is the usual limit depending on clustering. 63 64## Guiding Questions 65 66Ask yourself when creating a visualization: 67 681. Is my chart type meaningful for the question? 692. Can I make it clearer? 703. If complicated, can I guide the viewer through it? 714. Does the chart work as a standalone? 725. Is my chart's presentation justifiable? 736. Is my chart reproducible? 74 75## Tools 76 77- [Datawrapper](https://www.datawrapper.de/). Quick interactive charts with great defaults. 78- [Raw Graphs](https://app.rawgraphs.io/). Open-source, unusual chart types. 79- [Observable Plot](https://observablehq.com/plot/). JavaScript-based exploratory charts. 80- [Kepler](https://kepler.gl/). Geospatial visualization. 81 82## Resources 83 84- [Saloni's Guide to Data Visualization](https://www.scientificdiscovery.dev/p/salonis-guide-to-data-visualization) 85- [The Data Visualization Catalogue](https://datavizcatalogue.com/) and [Project](https://datavizproject.com/) 86- [Visualization Curriculum](https://jjallaire.github.io/visualization-curriculum/) 87- [Guides for Visualizing Reality](https://flowingdata.com/2020/06/01/guides-for-visualizing-reality/) 88- [Datawrapper's Do's and Don'ts](https://www.datawrapper.de/blog/category/datavis-dos-and-donts) 89- [The Science of Visual Data Communication: What Works](https://journals.sagepub.com/doi/10.1177/15291006211051956)