{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Getting Yourself Unstuck" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "```{contents} Table of Contents\n", ":depth: 4\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction\n", "Modeling and analytics get the lion's share of attention in the classroom in data science programs, but in the real world, data is almost never ready to be analyzed without a great deal of work to prepare the data first. [This article in Forbes](https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#3ebb8cd06f63) describes a survey of data scientists in which the respondents claim to spend nearly 80% of their time collecting and cleaning data. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Source:\n", "'Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says' by Gil Press\n", "
Source:\n", "Ph.D. Comics, \"Debugging\", by Jorge Cham\n", "
Source:\n", "'Statswars' by Kieran Healy\n", "
Source:\n", "'10 Ways to Learn WordPress Hooks', RachieVee: Rachel's Blog.\n", "\n", "
\n", " | Country Name | \n", "Country Code | \n", "Series Name | \n", "Series Code | \n", "2012 [YR2012] | \n", "2013 [YR2013] | \n", "2014 [YR2014] | \n", "2015 [YR2015] | \n", "2016 [YR2016] | \n", "2017 [YR2017] | \n", "2018 [YR2018] | \n", "2019 [YR2019] | \n", "2020 [YR2020] | \n", "2021 [YR2021] | \n", "2022 [YR2022] | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "Afghanistan | \n", "AFG | \n", "CO2 emissions (kt) | \n", "EN.ATM.CO2E.KT | \n", "10208.13 | \n", "9402.05 | \n", "9281.34 | \n", "10057.59 | \n", "9294.93 | \n", "10022.78 | \n", "10972.38 | \n", "11238.83 | \n", "8709.47 | \n", ".. | \n", ".. | \n", "
1 | \n", "Afghanistan | \n", "AFG | \n", "GDP per capita (constant 2015 US$) | \n", "NY.GDP.PCAP.KD | \n", "570.676064337005 | \n", "582.103978902877 | \n", "576.487820334782 | \n", "566.881132665072 | \n", "564.920843553397 | \n", "563.488239421983 | \n", "553.973308908649 | \n", "559.140956898688 | \n", "529.144912477888 | \n", "407.616507361737 | \n", ".. | \n", "
2 | \n", "Afghanistan | \n", "AFG | \n", "Population, total | \n", "SP.POP.TOTL | \n", "30466479 | \n", "31541209 | \n", "32716210 | \n", "33753499 | \n", "34636207 | \n", "35643418 | \n", "36686784 | \n", "37769499 | \n", "38972230 | \n", "40099462 | \n", "41128771 | \n", "
3 | \n", "Albania | \n", "ALB | \n", "CO2 emissions (kt) | \n", "EN.ATM.CO2E.KT | \n", "4541.8 | \n", "4795.4 | \n", "5188 | \n", "4797 | \n", "4573.2 | \n", "5403.7 | \n", "5316.1 | \n", "4993.3 | \n", "4383.2 | \n", ".. | \n", ".. | \n", "
4 | \n", "Albania | \n", "ALB | \n", "GDP per capita (constant 2015 US$) | \n", "NY.GDP.PCAP.KD | \n", "3736.34007957648 | \n", "3780.69919254096 | \n", "3855.76074409659 | \n", "3952.80357364813 | \n", "4090.37272829183 | \n", "4249.8200493985 | \n", "4431.55559506989 | \n", "4543.38771048312 | \n", "4418.66087378292 | \n", "4857.11194201819 | \n", "5155.29085964151 | \n", "
5 | \n", "Albania | \n", "ALB | \n", "Population, total | \n", "SP.POP.TOTL | \n", "2900401 | \n", "2895092 | \n", "2889104 | \n", "2880703 | \n", "2876101 | \n", "2873457 | \n", "2866376 | \n", "2854191 | \n", "2837849 | \n", "2811666 | \n", "2777689 | \n", "
6 | \n", "Algeria | \n", "DZA | \n", "CO2 emissions (kt) | \n", "EN.ATM.CO2E.KT | \n", "134934.2 | \n", "139024.1 | \n", "147735.2 | \n", "156273 | \n", "154654.3 | \n", "157704.4 | \n", "164534.1 | \n", "170582.4 | \n", "161563 | \n", ".. | \n", ".. | \n", "
7 | \n", "Algeria | \n", "DZA | \n", "GDP per capita (constant 2015 US$) | \n", "NY.GDP.PCAP.KD | \n", "4025.64148367191 | \n", "4057.76480695524 | \n", "4129.42254868204 | \n", "4197.41998491398 | \n", "4246.24217385308 | \n", "4218.0823188856 | \n", "4188.22038483483 | \n", "4153.00345481624 | \n", "3873.50874565769 | \n", "3939.36086434716 | \n", "3999.75769667754 | \n", "
8 | \n", "Algeria | \n", "DZA | \n", "Population, total | \n", "SP.POP.TOTL | \n", "37260563 | \n", "38000626 | \n", "38760168 | \n", "39543154 | \n", "40339329 | \n", "41136546 | \n", "41927007 | \n", "42705368 | \n", "43451666 | \n", "44177969 | \n", "44903225 | \n", "
9 | \n", "American Samoa | \n", "ASM | \n", "CO2 emissions (kt) | \n", "EN.ATM.CO2E.KT | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", ".. | \n", "