Data Science – References

Texts

  • S.Few [2012], Show Me the Numbers: Designing Tables and Graphs to Enlighten, Amazon.ca
  • S.Few [2009], Now You See It: Simple Visualization Techniques for Quantitative Analysis, Amazon.ca
  • D.M.Wong [2013], The Wall Street Journal Guide to Information Graphics: The Do’s And Don’ts Of Presenting Data Facts And Figures, Amazon.ca
  • C.Nussbaumer Knaflic [2015], Storytelling with Data: A Data Visualization Guide for Business Professionals Paperback, Amazon.ca
  • N.Yau [2011], Visualize This: The FlowingData Guide to Design, Visualization, and Statistics Paperback, Amazon.ca
  • N.Yau [2013], Data Points: Visualization That Means Something, Amazon.ca
  • E.R.Tufte [2006], Beautiful Evidence, Amazon.ca
  • E.R.Tufte [2001], The Visual Display of Quantitative Information, (2nd ed.), Amazon.ca
  • E.R.Tufte [1990], Envisioning Infortmation, Amazon.ca
  • E.R.Tufte [1997], Visual Explanations: Images and Quantities, Evidence and Narrative, Amazon.ca
  • D.Mccandless [2012], Visual Miscellaneum, The Revised And Updated: A Colorful Guide to the World’s Most Consequential Trivia, Amazon.ca
  • D.Mccandless [2014], Knowledge Is Beautiful: A Visual Miscellaneum of Compelling Information, Amazon.ca
  • N.Illinsky, J.Steele [2011], Designing Data Visualizations: Representing Informational Relationships, Amazon.ca
  • H.Wainer [2009], Picturing the Uncertain World: How to Understand, Communicate, and Control Uncertainty Through Graphical Display, Amazon.ca
  • W.Lefèvre, J.Renn, U.Shoepflin (eds.) [2003], The Power of Images in Early Modern Science, Amazon.ca
  • P.Murrell [2006], R Graphics, available online
  • J.Leek [2015], The Elements of Data Analytic Style, leanpub
  • J.Avirgan [2016], The Map That May Unmask Banksy, FiveThirtyEight
  • A.Bycoffe [2016], The Endorsement Primary, FiveThirtyEight
  • 2016 National Primary Polls, FiveThirtyEight
  • N.Yau [2016], Data USA makes government data easier to explore, Flowing Data
  • E.Lamb [2016], It Doesn’t Add Up,
  • E.Lamb [2012], Abandoning Algebra Is Not the Answer, Scientific American
  • E.Lamb [2016], Andrew Hacker and the Case of the Missing Trigonometry Question, Scientific American
  • N.Yau [2016], Data Proofer automates the data checking process, Flowing Data
  • K.Dutton, D.Abrams [2016], What Research Says about Defeating Terrorism, Scientific American
  • C.Aschwanden [2016], Failure Is Moving Science Forward, FiveThirtyEight
  • R.Matin, R.Azizi [2015], DEA with Missing Data: An Interval Data Assignment Approach, JOIE
  • R.Wasserstein, N.Lazar [2016], The ASA’s statement on p-values: context, process, and purpose , The American Statistician
  • T.Siegfried [2016], Experts issue warning on problems with P values, Science News
  • R.Arthur [2016], We Now Have Algorithms To Predict Police Misconduct, FiveThirtyEight
  • N.Yau [2016], What I Use to Visualize Data, FlowingData
  • C.Aschwanden [2016], Statisticians Found One Thing They Can Agree On: It’s Time To Stop Misusing P-Values, FiveThirtyEight
  • J.Honaker, G.King, M.Blackwell, Amelia II: A Program for Missing Data, Gary King
  • M.Blackwell, J.Honaker, G.King A Unified Approach to Measurement Error and Missing Data: Overview and Applications,
  • Y.Zhou, D.Wilkinson, R. Schreiber, R.Pan, Large-scale Parallel Collaborative Filtering for the Netflix Prize, PDF
  • N.Yau [2016], Vega-Lite for quick online charts, Flowing Data
  • B. D. CRAVEN, S. M. N. ISLAM [2005], Operations Research Methods, Flowing Data
  • M.Panza, D.Napoletani, D.Struppa [2010], Agnostic Science. Towards a Philosophy of Data Analysis, HAL
  • C.Paciorek [2014], An Introduction to Using Distributed File Systems and MapReduce through Spark,
  • J.Cranshaw, R.Schwartz, J.Hong, N.Sadeh [2012], The Livehoods Project: Utilizing Social Media to Understand the Dynamics of a City,
  • Life expectancy at birth, Gapminder
  • Gapminder World 2012 in pdf, Gapminder
  • K.Hsu, N.Pathak, J.Srivastava, G.Tschida, E.Bjorklund [2015], Data Mining Based Tax Audit Selection: A Case Study of a Pilot Project at the Minnesota Department of Revenue,
  • N.Yau [2010], Think Like a Statistician – Without the Math, Flowing Data
  • N.Lorang [2016], Data scientists mostly just do arithmetic and that’s a good thing,
  • N.Yau [2010], Predictive policing,
  • University of Minnesota Duluth Lectures,
  • Artistic License – Statistics, Tvtropes
  • 23 Design, Data Visualization and Presentation Quotes from Edward Tufte, Tvtropes
  • J. DeCoster [2001], Transforming and Restructuring Data,
  • N.Yau [2016], Role of empathy in visualization, Flowing Data
  • Center for Big Data Ethics, Law, and Policy, Data Science Institute
  • A.Barry-Jester [2016], What Went Wrong In Flint, FiveThirtyEight
  • J.Avirgan [2016], A History Of Data In American Politics (Part 2): Obama 2008 To The Present, FiveThirtyEight
  • C.Bialik [2016], Why Betting Data Alone Can’t Identify Match Fixers In Tennis, FiveThirtyEight
  • F.Jacobs [2016], A World Map of Economic Growth, Big Think
  • W.Briggs [2016], Machine Learning, Big Data, Deep Learning, Data Mining, Statistics, Decision & Risk Analysis, Probability, Fuzzy Logic FAQ, WILLIAM M. BRIGGS
  • Imagine Storing All the Worlds Archives in a Box of Seeds, New Scientist
  • S.Few, [2011], The Chartjunk Debate,
  • H.Enten [2015], Harry’s Guide To 2016 Election Polls, FiveThirtyEight
  • A.Gefter [2015], A Private View of Quantum Reality, Quanta Magazine
  • N.Yau [2015], R growth on StackOverflow reigns supreme, Flowing Data
  • C.Bialik [2015], As A Major Retraction Shows, We’re All Vulnerable To Faked Data, FiveThirtyEight
  • A European defense ministry revamps its logistics strategy and operations, McKinsey&Company
  • JF.Portarrieu [2013], City of Toulouse, IBM
  • K.Bonnes [2014], Predictive Analytics for Supply Chains: a Systematic Literature Review,
  • J. Bencina [2011], Fuzzy Decision Trees as a Decision-making Framework in the Public Sector,
  • The role of quantitative techniques in decision making process, Essay UK
  • R.Larson [2002], Public Sector Operations Research: A Personal Journey,
  • Real-Time Enterprise Stories , Real Time Research REPORTS
  • Decision Science for Housing and Community Development: An interview with co-author Michael Johnson, Statistics Views
  • City of Almere: Statistical analysis and predictive analytics allocate resources to citizens while planning for growth, IBM
  • Woonbedrijf improves tenants’ quality of living, IBM Software
  • M.Rockwell [2015], DHS to expedite data scans for foreign fighters, FCW
  • M.Hansen, A.Stermberg [2015], NOAA’s Data Heads for the Clouds, the White House
  • D.Major [2015], Open data, analytics key to Police Data Initiative, GCN
  • L.Cornish [2015], Data in action: The role of data in humanitarian disasters, Devex
  • Statisticians using social media to track foodborne illness and improve disaster response, PHYS.ORG
  • Z.Mendelson [2015], Cities Can Use Big Data to Find Out What They Really Don’t Know, Next City
  • N.Bishop [2015], Jen Q. Public: Governments can win the improper payment chase with analytics, IBM
  • J.Shueh [2014], Minneapolis Launches Citywide Analytics Platform, Government Technology
  • N.Bishop [2015], Public Sector News: How data and analytics promise a different future, IBM
  • N.Bishop [2015], Public Sector News: The question of citizen’s privacy, IBM
  • N.Bishop [2015], Public Sector News: How governments can unleash the power of analytics, IBM
  • B.Cortez-Neavel [2015], Data Analytics, Prevention Efforts Could Drive Down Child Deaths, The Chronicle of Social Change
  • The Benefits of Analytics in the Public Sector, JMP
  • H.Nicol Local Government and digital services: options for improving local services, Public Service Transformation Network
  • S.Bateman [2014] The Data Science in Government programme: using data in new ways to improve what government does, GOV.UK
  • Big Data for Development: Technocratic & Democratic Considerations, K
  • A.Syvajarvi, J.Stenvall Data mining in public and private sectors: organizational and government applications, Google Books
  • M.Gasco [2012]Proceedings of the 12th European Conference on e-Government, Google Books
  • Y.Zhao R and Data Mining: Examples and Case Studies, Google Books
  • G. K. GUPTA Introduction to data mining with case studies, Google Books
  • P.Putten, G.Melli, B.Kitts Data Mining Case Studies,
  • M.Nguyen-Nielsen, et.al, Existing data sources for clinical epidemiology: Danish registries for studies of medical genetic diseases,
  • N.Yau [2016] Using information graphics to calibrate bias, Flowing Data
  • Accounting for Errors with a Non-Normal Distribution, Engineering Statistics Handbook
  • Opinion Research Poll, CNN Opinion Research
  • G.Dvorsky [2014] Computers are providing solutions to math problems that we can’t check, iO9
  • Missing-data imputation, Stat Columbia
  • P. Allison [2012] Modern Methods for Missing Data, Amstat
  • C.Wild [2012] The Wilcoxon Rank-Sum Test, Stat Auckland
  • A.Pan, et.al [2013] Walnut Consumption Is Associated with Lower Risk of Type 2 Diabetes in Women, The Journal of Nutrition
  • E.Inglis-Arkell [2012] Why the Exact Same Lottery Numbers Came Up Twice in One Week, iO9
  • H.Nolan [2014] Exonerations Are on the Rise. Justice Is Not., GAWKER
  • S.Nieuwenhuis, B.Forstmann, E.Wagenmakers [2011] Erroneous analyses of interactions in neuroscience: a problem of significance, Nature NeuroScience
  • Significant, Explain XKCD
  • Log Scale, Explain XKCD
  • D.Hand [2014] Math Explains Likely Long Shots, Miracles and Winning the Lottery, Scientific American
  • A.Koo [2013] A Decade After Moneyball, Have The A’s Found A New Market Inefficiency?, Regressing
  • M.Enserink [2012] Fraud Detection Method Called Credible But Used Like an ‘Instrument of Medieval Torture’, Science
  • R.Harder [2010] How To Generate Your Own Benford’s Law Numbers, Think Harder
  • R.Nuzzo [2014] Scientific method: Statistical errors, nature.com
  • I. JP [2014] Why most published research findings are false, NCBI
  • D.Stapel, S.Lindenberg [2011] Coping with Chaos: How Disordered Contexts Promote Stereotyping and Discrimination, Science
  • E.Callaway [2011] Report finds massive fraud at Dutch universities, nature.com
  • E.Yong [2012] The data detective, nature.com
  • E.Yong [2012] Replication studies: Bad copy, nature.com
  • This Website Exposes a Scientific and Medical Cover Up, nature.com
  • J.Walthoe This Website Exposes a Scientific and Medical Cover Up, nature.com
  • J.Walthoe Looking out for number one, +plus
  • A.Frazier et.al [2013] Prospective Study of Peripregnancy Consumption of Peanuts or Tree Nuts by Mothers and the Risk of Peanut or Tree Nut Allergy in Their Offspring, JAMA Pediatric
  • R.Shapiro Prospective Study of Peripregnancy Consumption of Peanuts or Tree Nuts by Mothers and the Risk of Peanut or Tree Nut Allergy in Their Offspring, JAMA Pediatric
  • J.Dempsey Our Army: Soldiers, Politics, and American Civil-Military Relations, Princeton Press
  • K.Button et.al [2013] Power failure: why small sample size undermines the reliability of neuroscience, nature.com
  • Public Health England [2014] Measles: guidance, data and analysis, GOV.UK
  • The statisticians at Fox News use classic and novel graphical techniques to lead with data, Simply Statistics
  • N.Yau [2011] Open thread: Can you spot the wrongness in this tax graph?, Flowing Data
  • A. Hart Lies, damn lies, and the ‘Y’ axis, Washington Post
  • A Guide for the statistically perplexed, Polling
  • Lies, Damned Lies, and Statistics, tvtropes
  • A Little Statistics is a Dangerous Thing, TheNib
  • E.Inglis-Arkell [2014] The night the Gambler’s Fallacy lost people millions, iO9
  • E.Inglis-Arkell [2014] Statistics professor challenges midwives’ math on home birth safety, iO9
  • M.Cheyney, et.al [2014] Outcomes of Care for 16,924 Planned Home Births in the United States: The Midwives Alliance of North America Statistics Project, 2004 to 2009, Wiley Online Library
  • R.Misra [2014] One graph explaining why you should always order a larger pizza, iO9
  • P.Clarke [2014] Title IX’s Other Effects: Do Sports Make Women Less Religious?, Regressing
  • B.Barnwell [2014] Bridging the Analytics Gap, Grantland
  • K.Wagner [2014] Two Days At Sloan: How Sports Analytics Got Lost In The Fog, Regressing
  • M.Bruenig [2014] America’s Class System Across The Life Cycle, Demos
  • G.Bluestone [2014] Casino Says World-Famous Gambler Cheated It Out of $10 Million, GAWKER
  • R.Gonzalez [2014] Our New Favorite Website: Spurious Correlations, iO9
  • E.Inglis-Arkell [2014] One Mistake Fooled an Entire Nation About Who Would Be President, iO9
  • N.Yau [2014] Military infographic fascination, iO9
  • J.Raff [2014] How to Read and Understand a Scientific Paper: A Step-by-Step Guide for Non-Scientists, Huffpost Science
  • J.Lepore [2014] The Disruption Machine, The New Yorker
  • N.Yau [2014] Detailed UK census data browser, Flowing Data
  • J.Pinto da Costa, L. Roque [2006] Limit Distribution for the Weighted Rank Correlation Coefficient, REVSTAT
  • A.Weinstein [2014] Adam Weinstein’s Discussions, GAWKER
  • D.Thompson [2014] The Misguided Freakout About Basement-Dwelling Millennials, The Atlantic
  • R.Gonzalez [2014] Statistical Proof That Lionel Messi Is the Best Soccer Player On Earth, iO9
  • D.Mersereau [2014] Why Is a 30% Chance of Rain Different from a 30% Risk of Tornadoes?, The Vane
  • S.Wolfram [2013] Data Science of the Facebook World, Stephen Wolfram
  • B.Fung [2012] The Global Geography of HIV: 20 Years of Change—in 1 GIF, The Atlantic
  • H.Brady [2013] Watch the Country Get Fatter in One Animated Map, Slate
  • R.Gonzalez [2014] U.S. Remains Key Growth Market for Cigarettes, Despite Graphs Like This, iO9
  • A.Newitz [2014] Can Network Theory Help Explain Epic Mythology?, iO9
  • Hawkingdo [2014] I Solved Gerrymandering … sorta!, GERRYMANDERING
  • N.Silver [2014] Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?, FiveThirtyEight
  • B.Morris [2014] Billion-Dollar Billy Beane, FiveThirtyEight
  • E.Lamb [2014] British Rail’s Shocking Defiance of Standard Metrics, Scientific American
  • N.Yau [2014] How well we don’t understand probability, Flowing Data
  • N.Silver [2010] BREAKING: Daily Kos to Sue Research 2000 for Fraud, FiveThirtyEight
  • M.Strauss [2014] Statistician Creates Model To Predict What’s Next In Game Of Thrones, iO9
  • A.Burneko [2014] Numbers One Through 12, Ranked, The Concourse
  • G.Dvorsky [2014] Why The Sudden Surge Of Retractions At Nature Magazine?, iO9
  • S.Burtch [2014] Hockey Analytics: Why They Help And What’s Coming Next, SB Nation
  • R.Gonzalez [2014] How Much Would It Cost To Raise A Kid Like Calvin from Calvin and Hobbes?, iO9
  • Simply Statistics [2014] Data science can’t be point and click, Simply Statistics
  • S.Corinaldi [2015] I created a bot to find love online – reader, it worked, The Guardian
  • N.Yau [2015] The Elements of Data Analytic Style, Flowing Data
  • N.Yau [2015] The Price is Right winner and cancer survivor calculates the odds, Flowing Data
  • N.Yau [2015] Searching for stock market spoofers, Flowing Data
  • C.Bialik [2015] Scare Headlines Exaggerated The U.S. Crime Wave, FiveThirtyEight
  • J.Asher [2015] Murder Rates Don’t Tell Us Everything About Gun Violence, FiveThirtyEight
  • R.Ehrenberg [2015] Analysis gives a glimpse of the extraordinary language of lying, Science News
  • N.Yau [2015] The Most Regional Names in US History, Flowing Data
  • Thanksgiving in Charts and Graphs, The Gentlemans Armchair
  • N.Yau [2014] Lexical distance between European languages, Flowing Data
  • P.Murrell [2014] R Graphics, R Graphics
  • N.Yau [2010] How to visualize data with cartoonish faces ala Chernoff, Flowing Data
  • A Critique of Chernoff Faces, eagereyes
  • R.Misra [2014] 6P.M. is the most dangerous time of day to be a pedestrian, iO9
  • C.Anders [2014] Fascinating Chart: Top 20 Metropolitan Areas in the U.S.A., 1790-2010, iO9
  • K.Wagner [2014] Every NBA Team’s Season, In One Chart, Regressing
  • T.Ley [2014] Interactive Chart Finds Your New Favorite Beer For You, FoodSpin
  • R.Misra [2014] A graph showing all the languages whose words invaded English, iO9
  • N.Yau [2014] How people really read and share online, Flowing Data
  • R.Fischer-Baum [2014] Which Countries Have Produced The Most World-Famous Athletes?, Regressing
  • N.Yau [2014] Level of road grid, Flowing Data
  • N.Yau [2014] A visual analysis of the Boston subway system, Flowing Data
  • Logistic Modeling with Categorical Predictors, SAS
  • Stressed Out: Americans Tell Us About Stress In Their Lives, NPR
  • N.Yau [2014] Polling for stress, Flowing Data
  • B.Swihart, et.al Lasagna plots: A saucy alternative to spaghetti plots, Lasagna plots
  • What’s the difference between an Infographic and a Data Visualisation?, Jackhagley
  • J.Pavlus, et.al Infographic: If 7 Billion People Lived In One City, How Big Would It Be?, Co.Design
  • Left vs Right v1.5, Information is Beautiful
  • Mike [2011] Most Pirated Artists 2007 – 2010 Word Cloud, The Evil Jam
  • M.Hahsler, S.Chelluboina
    Visualizing Association Rules: Introduction to the R-extension Package arulesViz, Visualizing Association Rules
  • R.Misra [2014] An Interactive Chart Showing Which Jobs STEM Majors Really End Up In, iO9
  • N.Yau [2014] Markov Chains explained visually, Flowing Data
  • M.Strauss [2014] Here’s What Your 1.1 Million Comments On Net Neutrality Look Like, iO9
  • N.Yau [2014] State of birth, by state and over time, Flowing Data
  • N.Yau [2014] Finding small villages in big cities, Flowing Data
  • G.Dvorsky [2014] These Simple Tips Will Make Your Science Visualizations Rock, iO9
  • M.Strauss [2014] Transforming Data Into Beer Could Be The Greatest Idea Ever, iO9
  • R.Misra [2015] What Visualization Best Illustrated A Tricky Scientific Concept For You?, iO9
  • N.Yau [2014] Real Chart Rules to Follow, Flowing Data
  • N.Yau [2015] Bar Chart Baselines Start at Zero, Flowing Data
  • N.Yau [2015] Venn Diagrams: Read and Use Them the Right Way, Flowing Data
  • N.Yau [2015] Classic 1939 book on graphs in its entirety, Flowing Data
  • N.Yau [2015] Weight loss and life events, Flowing Data
  • N.Yau [2015] What probability means in different fields, Flowing Data
  • N.Yau [2015] What Does Probability Mean in Your Profession?, Math With Bad Drawings
  • N.Yau [2015] A timeline of history, FlowingData
  • Left vs Right v1.5, Information is Beautiful
  • N.Yau [2015] Work Counts, FlowingData
  • N.Yau [2015] Most Common Use of Time, By Age and Sex, FlowingData
  • A.Crossman [2016] Data Cleaning, About Education
  • Data Cleaning, Analysis
  • Top ten ways to clean your data, Microsoft
  • R.Cody, et.al, Data Cleaning 101, ucla
  • T.Orchard, M.Woodbury, A MISSING INFORMATION PRINCIPLE: THEORY AND APPLICATIONS
    , Project Euclid
  • P.Allison, Modern Methods for Missing Data
    , amstat
  • Regression diagnostics and cautions: outliers and influential points
    , uoregon
  • H.Wickham Tidy Data, Journal of Statistical Software
  • V.Powell Conditional probability, Setosa
  • Qualities of a Good Question, StatPac
  • GOOD DATA FROM BAD QUESTIONS? IMPOSSIBLE!, Cooperative Extension
  • Electronic Information Resources – Myth and Reality, stsci
  • M.Püschel
    Small Guide to Making Nice Tables, Carnegie Mellon
  • N.Yau [2014] The important parts of data analysis, FlowingData
  • T.Hothorn, et.al,
    [2006], party: A Laboratory for Recursive Partytioning
    , R package
  • Z.Weinersmith
    [2014], An artificial one-liner generator, Scientia salon
  • N.Webb
    [2006], Reliability Coefficients and Generalizability Theory, handbook of statistics
  • A.Cernat [2013], The impact of mixing modes on reliability in longitudinal studies, ESRC
  • B.Tran, C.Tucker [2010], Using Latent Class Models to Better Understand Reliability in Measures of Labor Force Status, JSM 2010
  • R.Fischer-Baum [2014], Charts: Your Spending Habits Get Lamer As You Age, Regressing
  • G.Dvorsky [2014], 20 Crucial Terms Every 21st Century Futurist Should Know, iO9
  • C.Proust-Lima, et.al,
    [2016], Package ‘lcmm’-Extended Mixed Models Using Latent Classes and Latent Processes

    , R package

  • Z.Bursac, et.al,
    [2008], Purposeful selection of variables in logistic regression

    , biomedcentral

  • Y.Zhang [2011], Dimension Reduction, Dimension Reduction Slides
  • Organisational Core Values, Organisational Core Values
  • N.Yau [2013], Getting started with visualization after getting started with visualization, Flowing Data
  • B.Fry [2004], Computational Information Design, Massachusetts Institute of Technology
  • N.Yau [2014], A more visual world data portal, Flowing Data
  • S.Boriah, et.al, Similarity Measures for Categorical Data: A Comparative Evaluation
    , University of Minnesota
  • P.Allison What’s the Best R-Squared for Logistic Regression?, statistical horizons
  • The curse of dimensionality, The Shape of Data
  • Decision Trees, The Shape of Data
  • Duality and Coclustering, The Shape of Data
  • S.Fefilatyev, et.al, Detection of Anomalous Particles from Deepwater Horizon Oil Spill Using SIPPER3 Underwater Imaging Platform, Proceedings Template – WORD
  • Pre-Crime Data Mining, Pre-Crime Data Mining
  • A.Bellaachia, E.Guven Predicting Breast Cancer Survivability Using Data Mining Techniques, Predicting Breast Cancer Survivability
  • J.Rath [2014], Data Scientists Predict Oscar Winners, Data Center Knowledge
  • E.Lamb [2014], The Saddest Thing I Know about the Integers, scientific american
  • V.Velickovic, What Everyone Should Know about Statistical Correlation, american scientist
  • N.Silver Rich Data, Poor Data, fivethirtyeight
  • A.Hoorfar, M.Hassant [2008], INEQUALITIES ON THE LAMBERT W FUNCTION AND HYPERPOWER FUNCTION, JIPAM
  • Investopedia Staff, A Beginner’s Guide To Hedging, investopedia
  • T.Yates, Practical And Affordable Hedging Strategies, investopedia
  • M.Kang [2015], Exploring the 7 Different Types of Data Stories, mediashift
  • H.Chen [2014], Curve Fitting & Multisensory Integration, cogsci.ucsd.edu
  • T.Minka, Building statistical models by visualization, Microsoft Research
  • Y.Zhao [2015], R and Data Mining: Examples and Case Studies, r data mining
  • Y.Zhao [2015], Introduction to Data Mining with R, r data mining
  • D. Meyer [2015], Support Vector Machines, r-project
  • A.Fatahi [2010], TRUNCATED ZERO INFLATED BINOMIAL CONTROL CHART FOR MONITORING RARE HEALTH EVENTS
    , IJRRAS
  • A.Lazarevic, et.al, [2004], Data Mining for Analysis of Rare Events:A Case Study in Security, Financial and Medical Applications, University of Minnesota Tutorial
  • D.Farace, J.Schöpfel, Grey Literature in Library and Information Studies, DE GRUYTER
  • A Practical Guide to Statistics for Online Experiments, optimizely


Blogs and Sites

General

Survey and Sampling

Compilations

  • A.Shienkman [2015], Our 47 weirdest charts from 2015, FiveThirtyEight
  • N.Yau [2015], 10 Best Data Visualization Projects of 2015, FlowingData

Code to Produce Graphics

General Code

  • SAS User Guide
  • L. Gau, SAS Global Forum: Write SAS Code to Generate Another SAS Program,
  • H.Wickham, Optimising code,
  • P.Gill, E.Wong, [2014], Methods for Convex and General Quadratic Programming, ucsd
  • C.Gohlke, Unofficial Windows Binaries for Python Extension Packages, University of California, Irvine
  • Visualizing the distribution of a dataset, stanford
  • Emacs Newbie Key Reference, emacswiki
  • N.Yau [2015], Extract data from PDF files and export to CSV, flowing data
  • J.Salvatier, et.al, Probabilistic Programming in Python using PyMC, PyMC3
  • Scatterplots, Quick-R
  • Adding a legend to a plot, r-bloggers
  • How I used R to create a word cloud, step by step, Georeferenced
  • Axes and Text, Quick-R
  • SVM example with Iris Data in R, github
  • Cheatsheet – 11 Steps for Data Exploration in R (with codes), analytics vidhya
  • R.Hamer, P.Simpson, SAS Tools for Meta-Analysis, SAS
  • C.Sheu, S.Suzuki, [2001], Meta-analysis using linear models
    , citeseerx
  • R.Butterfield, [2009], The Use of SAS in Meta-Analysis, ncsu.edu
  • J.Gloudemans, et.al, [2011], MV_META: A SAS Macro for Multivariate Meta-Analysis, SESUG 2011
  • M.Komaroff, [2012], APPLICATION OF META-ANALYSIS IN CLINICAL TRIALS, PharmaSUG
  • S.Kovalchik, [2013], Tutorial On Meta-Analysis In R, R useR! Conference 2013
  • A.C.Del Re, [2015], A Practical Tutorial on Conducting Meta-Analysis in R, The Quantitative Methods for Psychology
  • J.Rickert, [2014], R and Meta-Analysis, R bloggers

Debugging and Common Questions

Technical Details

  • M.Lin [2013], A color palette optimized for data visualization, MulinBlog
  • PyMC3,
  • Color Code, Coolors
  • Color Code_R,
  • Using colors in R,

Interactive/Dynamic/Animated Data Visualization

  • Keeping Up With the 2014 Winter Olympics, Washington Post (member required for access).
  • Sochi 2014 Winter Olympic Games Calendar, Sports Interaction
  • N.Yau [2016], How You Will Die, FlowingData
  • K.Collins [2015], Why Infectious Bacteria are Winning, Quartz
  • Bokeh, a Python interactive visualization library
  • D3.js, is a JavaScript library for manipulating documents based on data
  • You Draw It: How Family Income Predicts Children’s College Chances, The Upshot, New York Times
  • R.Harris, N.Popovich, K.Powell [2015], Watch how the measles outbreak spreads when kids get vaccinated – and when they don’t, The Guardian
  • S.Yee, T.Chu [2015], A Visual Introduction to Machine Learning, part 1, R2D3.us
  • T.Randall, B.Migliozzi [2015], 2014 Was the Hottest Year on Record, Bloomberg
  • J.W.Tulp [2015], Goldilocks, TULP Interactive
  • This is What the Spread of Walmart Looks Like From 1962 to 2006, Cheezburger
  • Player Usage Charts, Hockey Abstract
  • N.Yau [2015],Automatic charts and insights in Google Sheets, FlowingData

Heat Maps

  • M.Simon [2012], The year in MLB heat maps, ESPN
  • Heat map, Wikipedia

Box Plots

Parallel Coordinates/Spaghetti Plots

Maps

  • N.Yau [2014], Where people run, FlowingData.
  • N.Yau [2014], Amount of snow to cancel school, FlowingData, reporting on redditor atrubetskoy’s map.
  • R.Masra [2014], A map of ?how much snow it takes to cancel school across the U.S., io9, reporting on redditor atrubetskoy’s map.
  • N.Yau [2013], The most regional names in US history, FlowingData
  • An Unconventional Look at the European Map, The Dialogue
  • N.Yau [2016], Changing river path seen through satellite images, FlowingData
  • D.Walbert The mathematics of projections, LEARN NC
  • This is What the Spread of Walmart Looks Like From 1962 to 2006, Cheezburger
  • C.Maria [2014], Nine beautiful maps that will change how you see the world, The Weather Network
  • A.Newitz [2014], Map shows which countries are contributing the most to climate change, iO9
  • G.Dvorsky [2014], An interactive map showing how baby names spread across the US, iO9
  • Many ways to see the world, ODT Maps
  • Find all the countries of the world in the updated map, Gapminder
  • N.Yau [2014] How to Make an Interactive Treemap, Flowing Data
  • F.Jacobs Current Affairs: European Electricity Exports and Imports, Big Think
  • A.Liptak [2015] Data Visualization Shows How Segregated Our Cities Are, iO9
  • L.Czerniewicz [2015] A World Map Based on Scientific Research Papers Produced, iO9
  • F.Jacobs The Map as Persuader, Big Think
  • Plotting elevation maps and shaded relief images from latitude, longitude, and elevation pairs, StackExchange
  • What Makes a Map Beautiful?, StackExchange
  • A Model of Breast Cancer Causation, Breast Cancer
  • N.Yau [2014], Explorations of People Movements, Flowing Data
  • S.Sayad An Introduction to Data Mining, Saedsayad
  • S.Lynn Self-Organising Maps for Customer Segmentation using R , LinkedIn

Text Analysis

Queueing

  • Queueing Delay,
  • Y.Abdelkader, M.Al-Wohaibi [2011],Computing the Performance Measures in Queueing Models via the Method of Order Statistics, Journal of Applied Mathematics

Data Envelopment Analysis

  • R. Sale, M.SaleData Envelopment Analysis: A Primer for Novice Users and Students at all Levels ,

Time Series

  • O.Anava, E.Hazan, A.Zeevi [2015], Online Time Series Prediction with Missing Data,
  • D.Fung [2006], Methods for the Estimation of Missing Values in Time Series,
  • M.Vlachos [2005], A practical Time-Series Tutorial with MATLAB,
  • Timeseries class, MathWorks
  • Time Series Decomposition, MathWorks
  • Parametric Trend Estimation, MathWorks
  • Seasonal Adjustment Using S(n,m) Seasonal Filters, MathWorks
  • Moving Average Trend Estimation, MathWorks
  • Seasonal Adjustment Using a Stable Seasonal Filter, MathWorks
  • Resample, MathWorks
  • Seasonal Adjustment, MathWorks
  • Statistics Austria, T.Wien [2012]Interactive adjustment and outlier detection of time dependent data in R, Conference of European Statistians
  • B.Pecar [2012]Automating Time Series Analysis,
  • PROC X12 Statement, SAS
  • J.Honaker, G.King [2010]What to Do about Missing Values in Time-Series Cross-Section Data,
  • Mann-Kendall Test For Monotonic Trend,
  • Detrending,
  • PROC X12 Example, SAS
  • T.Jackson, M.Leonard Seasonal Adjustment Using the X12 Procedure,
  • Working With Time Series Data,
  • R.Peng [2016] Time Series Analysis in Biomedical Science – What You Really Need to Know,

Bayesian Analysis

  • Understanding empirical Bayes estimation (using baseball statistics),
  • J.Bowers, C.Davis, Bayesian Just-So Stories in Psychology and Neuroscience,
  • J.Horgan Are Brains Bayesian?,
  • H.Thornburg Introduction to Bayesian Statistics,
  • E.Yudkowsky An Intuitive Explanation of Bayes’ Theorem,
  • IS THE BRAIN BAYESIAN?,
  • J. Horgan [2016]Bayes’s Theorem: What’s the Big Deal?, Scientific American
  • Andrew [2008]Why I don’t like Bayesian statistics, Statistical Modeling, Causal Inference, and Social Science
  • Bayes’ Theorem with Lego, COUNT BAYESIE
  • T.Wiecki Bayesian data analysis with PyMC3, Quantopian Inc.
  • A.Gelman, et.al, [2014] Stan: A platform for Bayesian inference
    , Columbia University
  • N.Yau [2012] Bayesian fantasy football 101, flowing data
  • Bayesian Data Analysis with PyMC3, github

Star Diagrams

  • Star (Spider/Radar) Plots and Segment Diagrams, R-Manual
  • Star Plots and Segment Diagrams of Multivariate Data, Basic R package

Chernoff Faces

  • Chernoff Faces, Wolfram MathWorld
  • R.Kosara [2007], A Critique of Chernoff Faces, eagereyes
  • N.Yau [2010], How to visualize data with cartoonish faces à la Chernoff, FlowingData
  • Chernoff Face, Wikipedia
  • The Trouble with Chernoff, Map Hugger
  • L.Golden, M.Sirdesai [1992] Chernoff Faces: a Useful Technique For Comparative Image Analysis and Representation, Map Hugger
  • C.Morris, D.Ebert, P.Rheingans An Experimental Analysis of the Effectiveness of Features in Chernoff Faces
    , UMBC
  • Baseball managers Chernoff faces
    , information aesthetics
  • A.Schwarz [2008] Professor Puts a Face on the Performance of Baseball Managers
    , The New York Times

Network Visualizations

Neural Network

Association Rules

Classification

  • W.Loh Classification and regression trees,
  • J.Platt, N.Cristianini, [2000] Large Margin DAGs for Multiclass Classification,
  • Statistical classification, wikipedia
  • Multiclass classification, wikipedia
  • Accuracy and precision, wikipedia
  • Binary classification, wikipedia
  • Classification chart, wikipedia
  • Supervised vs. unsupervised learning, valpola_thesis
  • Tree-Based Models, Quick-R
  • Decision Trees, r data mining
  • Classification using neural net in r, r-bloggers
  • JP.Vert
    Practical session: Introduction to SVM in R, svmbasic_notes
  • Support Vector Regression with R, SVM Tutorial
  • J.Rickert [2013], Draw nicer Classification and Regression Trees with the rpart.plot package, Revolutions
  • Support Vector Machines, scikit-learn
  • Support Vector Machines Tutorial, NEC Labs America
  • Why use SVM?, yaksis
  • Introduction to Support Vector Machines, opencv

Clustering

Predictive Analytics

Uncertainty

  • S.Bell [1999], A Beginner’s Guide to Uncertainty of Measurement, National Physical Laboratory
  • C.Smith, Detecting Anomalies in Your Data Using Benford’s Law, SUGI
  • G.Iaccarino, Uncertainty Analysis and Optimization,
  • D.Kriegman [2001] Uncertainty,
  • N.Yau [2016] An uncertain spreadsheet for estimates, Flowing Data
  • H.Wainer [2009] Picturing the uncertain world: how to understand, communicate, and control uncertainty through graphical display, Information Research
  • E.Inglis-Arkell [2014] How near-complete certainty can make you completely wrong, iO9
  • Almost Sure, Almost Sure
  • N.Yau [2015] Criminal sentencing and a stat lesson on probabilities and uncertainty, Flowing Data
  • N.Yau [2015] Lessons in statistical significance, uncertainty, and their role in science, Flowing Data
  • J.Davies [2015] Why You’re Biased About Being Biased, nautilus
  • Error and Uncertainty, Whole Course Items: Error and Uncertainty

Big Data

  • Big data in the abstract, CQADS
  • Big data software, CQADS
  • E. Mcnulty [2014], Uncerstanding the Big Data: The Seven V’s, Dataconomy
  • B.Marr [2014], Big Data: The 5 Vs Everyone Must Know, LinkedIn
  • B.Marr [2015], Why only one of the 5 Vs of big data really matters, IBM
  • D. Lawson [2013], Time for Vendors (and Fundraisers) to Be Big About Big Data, Working Philanthropy
  • J.Hess [2015], From Police to Pipes: Fresno Leveraging ‘Big Data’ To Improve City Functions, NPR For Central California
  • Embracing the Power of Big Data Correlation in Government, FedTech
  • N.Bishop [2015], Public Sector News: Advancing analytics to transform cities, IBM
  • R.Delgado [2015], The Big Data Obstacles Faced by Developing Nations, TECHVIBES
  • N.Bishop [2015], Public Sector News: The ongoing impact of big data and analytics, IBM
  • M.Jeelani [2015], Chicago uses new technology to solve this very old urban problem, Forture
  • N.Bishop [2015], Public Sector News: How analytics is changing our world, IBM
  • B.Howarth [2014], Big data: how predictive analytics is taking over the public sector, The Guardian
  • A.Jensen et.al [2014], Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients, Nature
  • M.Chen [2014], ? Is ‘Big Data’ Actually Reinforcing Social Inequalities?, The Nation
  • J.Sullivan [2013], Forget the needle, consider the haystack: Uncovering hidden structures in massive data collections, Princeton University
  • R.Misra [2014], How does Big Data help us understand the vastness of space? Ask us now!, iO9
  • L.Greenemeier [2014], Why Big Data Isn’t Necessarily Better Data, Scientific American
  • A.Newitz [2014], Here’s What You Need to Know About Big Data, iO9
  • M.Korolov [2014], 10 big myths about Big Data, network world
  • C.Mims [2014], Why the only thing better than big data is bigger data, Quartz
  • A.Jensen [2014], Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients, Nature Communications
  • B.Casselman [2015], Big Government Is Getting In The Way Of Big Data, fivethirtyeight

Do’s and Don’ts

  • Misleading Graph, Wikipedia
  • P.Ford [2014], Amazing Military Infographics: an appreciation, The Message
  • [2012], A History of Dishonest Fox Charts, Media Matters
  • Bad Graphs, Tumblr
  • E.Klein [2010], Lies, Damn Lies and the Y axis, Washington Post
  • J.Leek [2012], The statisticians at Fox News use classic and novel graphical techniques to lead with data, Simply Statistics
  • J.Joyner [2010], Bad Graphs Mislead More Than 1000 Words, Outside the Beltway
  • K.Drum [2011], Fun With Graphs: Making the Rich Look Poor, Mother Jones
  • J.Chait [2011], Does the Middle Class Have All the Money?, New Republic
  • Obama’s Chief Data Scientist Reveals How the Government Uses Big Data, Time
  • S.Dhillon Researchers to study big data collection used on Canadians, The Globe and Mail
  • P.Karon [2015] Can Big Data Help Government Do Better? This Foundation Thinks So, Inside Philanthropy
  • I.Kottasova [2015] Europe’s big data bombshell: What you need to know, CNN
  • J.Higgins [2015] Federal Agencies Warming Up to Big Data, Commerce Times
  • J.Higgins [2015] Federal Investment in Big Data Applications Heads for Liftoff, Commerce Times
  • C.Yiu [2015] The Big Data Opportunity, Policy Exchange
  • Denmark plans to preserve illegally collected medical data, EDRi
  • N.Yau [2016] Bad Data — And Worse Decisions — Poisoned Flint, Flowing Data
  • T.Siegfried [2010] Odds Are, It’s Wrong , ScienceNews
  • The Problem with Small Sample Sizes, The Last Behaviorist
  • Misleading Graphs: Real Life Examples, Statistics How To
  • J.Joyner Bad Graphs Mislead More Than 1000 Words, Outside the Beltway
  • Bad Graphs, Bad Graphs
  • D.Shere H.Groch-begley [2012] A History Of Dishonest Fox Charts, MediaMatters
  • J. Grohol [2006] Bad Statistics: USA Today, psychcentral
  • B.Goldacre [2011] These Guardian / Independent stories are dodgy. Traps in data journalism., Bad Science
  • R.Parikh [2014] How to Lie With Data Visualization, Gizmodo
  • Don’t Let Maps Fool You, Fake Science
  • A.Balliett [2011] The Do’s And Don’ts Of Infographic Design, Smashing Magazine
  • T.Farrant-Gonzalez [2013] All That Glitters Is Not Gold: A Common Misconception About Designing With Data, Smashing Magazine
  • N.Veltman [2013] Avoiding mistakes when cleaning your data, School of Data
  • S.Frankel [2015] Data Scientists Don’t Scale, Harvard Business Review
  • J.Breaugh [2003] Effect Size Estimation: Factors to Consider and Mistakes to Avoid, Journal of Management
  • N.Yau [2014] CSV Fingerprint: Spot errors in your data at a glance, Flowing Data
  • J.Hassell [2014] 3 Mistaken Assumptions About What Big Data Can Do For You, CIO
  • M.Michel [2015], 6 Reasons You Can’t Trust Science Anymore, cracked

Others

Videos

  • grantwoolard, Classical Music Mashup
  • D.Arnold, J.Rogness, Mobius Transformations Revealed
  • N.Halloran The Fallen of World War II
  • N.Yau Math of crime and terrorism
  • N.Yau Suite of data tools for beginners, focused on fun
  • P.Boily The Discovery of Elements
  • T.Lehrer (music), Can YOU sing the elements? (video) The Element Song
  • originsX Discovery of the Elements – the Movie
  • FiveThirtyEight How A Data Scientist Who’d Never Heard Of Basketball Mastered March Madness
  • FiveThirtyEight How Data Helped Win The Battle Over Same-Sex Marriage
  • Reason.com Prying Open Government: The Sunlight Foundation’s Fight for Transparency
  • N.Yau Data science, big data, and statistics – all together now
  • Piled Higher and Deeper Who owns your data?
  • N.Yau [2016] Algorithms for the Traveling Salesman Problem visualized
  • FiveThirtyEight How The NYPD Abused Citizens In The Name Of Data, And How One Cop Exposed It
  • R.Vollman [2013] NEW TOOL: PLAYER USAGE CHARTS
  • IBMVisualAnalytics [2013] The Four Pillars of Effective Visualizations
  • iNTERNSiDEA master’s Chanel [2012] David McCandless: “The beauty of data visualization”
  • LinkedIn Tech Talks [2012] Designing Data Visualizations with Noah Iliinsky
  • Office Videos [2015] Welcome to our office: David McCandless, renowned data journalist and speaker
  • N.Yau [2015] US boundary evolution
  • N.Yau [2015] Sometimes the y-axis doesn’t start at zero, and it’s fine
  • N.Yau [2015] Fast image classifications in real-time
  • D.Conway [2011] Tidy Data
  • N.Yau [2014] Statistical concepts explained through dance
  • Explore visualization features
  • N.Yau [2015] White House appoints first US Chief Data Scientist
  • N.Yau [2015] Mathematics of love
  • MLSS Sydney 2015 Bayesian Inference and MCMC with Bob Carpenter
About Dr. Idlewyld 8 Articles
As a youth, Dr. Idlewyld used to read everything he could lay his hands on and he was in a band. For years, he believed that the NHL would have come calling if he hadn't broken his leg as a kid in a hilarious skiing mishap. Nowadays, whatever's left of his hair is slowly turning grey, and that can only mean one thing: he's had the chance to work on plenty of quantitative projects, providing expertise in operations research methods, data science and predictive analytics, stochastic and statistical modeling, and simulations. So he's got that going for him, which is nice. He's not keen on buzzwords, but overall he's glad to see interest in analytical endeavours grow. In the final analysis, he thinks that insights and discoveries are within everyone's reach, and that he would have made a great goalie.