"Happiness can change, and does change, according to the quality of the society in which people live." -John F. Helliwell¶

Project Purpose¶

This project had a two-fold purpose for me. The World Happiness Data Set seemed to be the best data to apply both data analysis skills and behavioral economics.

In 2017, Richard Thaler’s win of the Nobel Prize in economics marked the third time the prize has been tied to the growing field of behavioral economics. In addition to Nobel Prizes, this new discipline has spawned bestselling books, new agencies within governments and even new majors within universities. What it hasn’t led to is new national statistics.

The most established national statistics – gross domestic product, household income and unemployment – focus on rational behavior: what people spend, how much they make, and whether they have a job. What they don’t capture is how people feel. These "feelings" are important because economic agents are simply humans and economic models should account for these human elements when making decisions. In this way, wellbeing and happiness are critical metrics for a nation's social and economic development.

For another perspective, here is a short 2 minute video on why Gallup, a survey company, measures global happiness:

In [1]:

from IPython.display import YouTubeVideo
YouTubeVideo('7QJBqak4GpI')

Out[1]:

About The Data¶

Taken from: https://worldhappiness.report/ed/2020/

Each country's "Happiness Score" is calculated by summing the seven other variables in the table:

Economy: GDP per Capita
Family: Social Support
Health: Life Expectancy
Freedom: Freedom to Make Life Choices
Trust: perceived corruption
Generosity: perceptions of generosity
Dystopia: Each country is compared to "Dystopia" which is a hypothetical nation with the lowest value for each of the 6 factors. The residual error between "Dystopia" and the country is used as a benchmark for regression

Questions Explored¶

Which are the happiest and least happy countries and regions in the world?
Is happiness affected by region?
Did the happiness score change significantly from 2015 to 2017?
Is the World Happiness Report an accurate measure true happiness?*

In [2]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
import hvplot.pandas

In [3]:

df = pd.read_csv('happiness.csv', index_col=0)
df.head()

Out[3]:

	Country	Region	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual	Year
0	Afghanistan	Southern Asia	153.0	3.575	0.31982	0.30285	0.30335	0.23414	0.09719	0.36510	1.95210	2015
1	Albania	Central and Eastern Europe	95.0	4.959	0.87867	0.80434	0.81325	0.35733	0.06413	0.14272	1.89894	2015
2	Algeria	Middle East and Northern Africa	68.0	5.605	0.93929	1.07772	0.61766	0.28579	0.17383	0.07822	2.43209	2015
3	Angola	Sub-Saharan Africa	137.0	4.033	0.75778	0.86040	0.16683	0.10384	0.07122	0.12344	1.94939	2015
4	Argentina	Latin America and Caribbean	30.0	6.574	1.05351	1.24823	0.78723	0.44974	0.08484	0.11451	2.83600	2015

In [4]:

#sort by year ascending and happiness score descending
df.sort_values(['Year','Happiness Score'], ascending=[True, False], inplace=True)
df.head()

Out[4]:

	Country	Region	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual	Year
141	Switzerland	Western Europe	1.0	7.587	1.39651	1.34951	0.94143	0.66557	0.41978	0.29678	2.51738	2015
60	Iceland	Western Europe	2.0	7.561	1.30232	1.40223	0.94784	0.62877	0.14145	0.43630	2.70201	2015
38	Denmark	Western Europe	3.0	7.527	1.32548	1.36058	0.87464	0.64938	0.48357	0.34139	2.49204	2015
108	Norway	Western Europe	4.0	7.522	1.45900	1.33095	0.88521	0.66973	0.36503	0.34699	2.46531	2015
25	Canada	North America	5.0	7.427	1.32629	1.32261	0.90563	0.63297	0.32957	0.45811	2.45176	2015

In [5]:

#size of data
print('rows: {}\ncolumns: {}'.format(df.shape[0],df.shape[1]))

rows: 495
columns: 12

In [6]:

#count of missing values for each column
df.isnull().sum().sort_values(ascending=False)

Out[6]:

Dystopia Residual                25
Generosity                       25
Trust (Government Corruption)    25
Freedom                          25
Health (Life Expectancy)         25
Family                           25
Economy (GDP per Capita)         25
Happiness Score                  25
Happiness Rank                   25
Year                              0
Region                            0
Country                           0
dtype: int64

In [7]:

#all rows with missing values
df[df.isnull().any(axis=1)]

Out[7]:

	Country	Region	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual	Year
13	Belize	Latin America and Caribbean	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
58	Hong Kong S.A.R., China	Eastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
100	Namibia	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
118	Puerto Rico	Latin America and Caribbean	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
130	Somalia	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
134	South Sudan	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
144	Taiwan Province Of China	Eastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2015
191	Central African Republic	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
204	Djibouti	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
223	Hong Kong S.A.R., China	Eastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
245	Lesotho	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
263	Mozambique	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
274	Oman	Middle East and Northern Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
304	Swaziland	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
309	Taiwan Province Of China	Eastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2016
361	Comoros	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
369	Djibouti	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
387	Hong Kong	Eastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
407	Laos	Southeastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
439	Oman	Middle East and Northern Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
448	Puerto Rico	Latin America and Caribbean	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
461	Somaliland Region	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
468	Suriname	Latin America and Caribbean	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
469	Swaziland	Sub-Saharan Africa	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017
473	Taiwan	Eastern Asia	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	2017

In [8]:

#drop rows with missing values
df.dropna(inplace=True)

In [9]:

print('2015 entries: ',str(df[df['Year']==2015].shape[0]))
print('2016 entries: ',str(df[df['Year']==2016].shape[0]))
print('2017 entries: ',str(df[df['Year']==2017].shape[0]))

2015 entries:  158
2016 entries:  157
2017 entries:  155

In [10]:

plt.figure(figsize=(12,10))
sns.heatmap(df.drop(['Happiness Rank','Dystopia Residual'],axis=1)\
            .corr(),square=True,annot=True,cmap='coolwarm')

Out[10]:

<matplotlib.axes._subplots.AxesSubplot at 0x28590ccb940>

It seems that GDP Per Capita, Life Expectancy, and Family are strongly correlated with the Happiness Score. This makes sense because according to the World Happiness Report, the richer the country, the higher people typically rate their lives. Having a higher life expectancy means you can worry less about survival and having a stronger sense of family gives someone a greater social and financial safety net.

A major problem to realize in predicting happiness is that all 3 of these factors are strongly correlated with each other. This is especially true with life expectancy and GDP per capita as countries with more money will be better able to provide proper healthcare.

Happiness by Year¶

In [11]:

pivot1 = pd.pivot_table(df,
                        index='Year',
                        values='Happiness Score')
pivot1

Out[11]:

	Happiness Score
Year
2015	5.375734
2016	5.382185
2017	5.354019

Global happiness has not seemed to have changed much in the 3 years given.

Happiness by Region¶

In [12]:

pivot2 = pd.pivot_table(df,
                        index='Region',
                        values='Happiness Score')

pivot2.sort_values(by='Happiness Score',ascending=False)

Out[12]:

	Happiness Score
Region
Australia and New Zealand	7.302500
North America	7.227167
Western Europe	6.693000
Latin America and Caribbean	6.069074
Eastern Asia	5.632333
Middle East and Northern Africa	5.387879
Central and Eastern Europe	5.371184
Southeastern Asia	5.364077
Southern Asia	4.590857
Sub-Saharan Africa	4.150957

Australia and New Zealand are the top regions for happiness with Sub-Saharan Africa at the bottom.

Happiness by Year and Region¶

In [13]:

pivot3 = pd.pivot_table(df,
                        index='Region',
                        columns='Year',
                        values='Happiness Score')

pivot3.plot(kind='bar',figsize=(16,6))
plt.legend(bbox_to_anchor=(0.9, 1.0))

Out[13]:

<matplotlib.legend.Legend at 0x28590ccbc18>

Here you can see that shifts in happiness occur differently in different regions. For example, Central and Eastern Europe has risen the over the past 3 years while North America has dropped over 3 years.

Adding More Stats For Each Region¶

In [14]:

pivot4 = pd.pivot_table(df,
                        index='Region',
                        values='Happiness Score',
                        aggfunc=[np.mean, np.median, np.std, min, max])
pivot4

Out[14]:

	mean	median	std	min	max
	Happiness Score	Happiness Score	Happiness Score	Happiness Score	Happiness Score
Region
Australia and New Zealand	7.302500	7.2995	0.020936	7.284	7.334
Central and Eastern Europe	5.371184	5.4010	0.578274	4.096	6.609
Eastern Asia	5.632333	5.6545	0.502100	4.874	6.422
Latin America and Caribbean	6.069074	6.1265	0.728157	3.603	7.226
Middle East and Northern Africa	5.387879	5.3175	1.031656	3.006	7.278
North America	7.227167	7.2175	0.179331	6.993	7.427
Southeastern Asia	5.364077	5.2965	0.882637	3.819	6.798
Southern Asia	4.590857	4.6080	0.535978	3.360	5.269
Sub-Saharan Africa	4.150957	4.1390	0.584945	2.693	5.648
Western Europe	6.693000	6.9070	0.777886	4.857	7.587

The standard deviation helps to quantify the variability of happiness within regions. The Middle East and Northern Africa region contains the highest deviation and a huge range of happiness scoeres from near the bottom (Syria, 3.006) to near the top (Israel, 7.278).

Removing Outliers From Each Region¶

In [15]:

def remove_outliers(x):
    mid_quartile = x.quantile([.25,.75])
    return np.mean(mid_quartile)

pivot5 = pd.pivot_table(df,
                        index='Region',
                        values='Happiness Score',
                        aggfunc=[np.mean,remove_outliers])

pivot5.plot(kind='bar',figsize=(16,6))
plt.legend(bbox_to_anchor=(0.9, 1.0))

Out[15]:

<matplotlib.legend.Legend at 0x28592e02da0>

After removing outlier countries, most regions saw a slight increase in their happiness score with the exception of Eastern Asia and Australia and New Zealand. Overall, most regions stay fairly similar in happiness and don't change that much regardless if outlier countries are removed or not.

Binning Countries By Happiness Score¶

In [16]:

score = pd.qcut(x=df['Happiness Score'], 
                q=3, 
                labels=['bottom 1/3','middle 1/3','top 1/3'])

pivot6 = pd.pivot_table(df,
                        index=['Region',score],
                        values='Happiness Score',
                        aggfunc='count',
                        fill_value=0,
                        dropna=False)
pivot6

Out[16]:

		Happiness Score
Region	Happiness Score
Australia and New Zealand	bottom 1/3	0
	middle 1/3	0
	top 1/3	6
Central and Eastern Europe	bottom 1/3	15
	middle 1/3	58
	top 1/3	14
Eastern Asia	bottom 1/3	0
	middle 1/3	11
	top 1/3	7
Latin America and Caribbean	bottom 1/3	4
	middle 1/3	19
	top 1/3	45
Middle East and Northern Africa	bottom 1/3	18
	middle 1/3	20
	top 1/3	20
North America	bottom 1/3	0
	middle 1/3	0
	top 1/3	6
Southeastern Asia	bottom 1/3	6
	middle 1/3	12
	top 1/3	8
Southern Asia	bottom 1/3	13
	middle 1/3	8
	top 1/3	0
Sub-Saharan Africa	bottom 1/3	101
	middle 1/3	16
	top 1/3	0
Western Europe	bottom 1/3	0
	middle 1/3	12
	top 1/3	51

This table illustrates the amount of times their countries have been in the bottom, middle, or top 1/3 of the world's happiness. You can see the huge disparity between regions when binned. For example, Western Europe has 0 countries in the bottom 1/3 of happiness, and Sub-Saharan Africa has 0 countries in the top 1/3, with the high majority at the bottom.

Searching For Any Connections¶

In [17]:

#Add quantile to split countries into 4 levels of happiness 
df['quantile'] = pd.qcut(x=df['Happiness Score'], 
               q=4, 
               labels=['low','low-mid','top-mid','top'])
df.head()

Out[17]:

	Country	Region	Happiness Rank	Happiness Score	Economy (GDP per Capita)	Family	Health (Life Expectancy)	Freedom	Trust (Government Corruption)	Generosity	Dystopia Residual	Year	quantile
141	Switzerland	Western Europe	1.0	7.587	1.39651	1.34951	0.94143	0.66557	0.41978	0.29678	2.51738	2015	top
60	Iceland	Western Europe	2.0	7.561	1.30232	1.40223	0.94784	0.62877	0.14145	0.43630	2.70201	2015	top
38	Denmark	Western Europe	3.0	7.527	1.32548	1.36058	0.87464	0.64938	0.48357	0.34139	2.49204	2015	top
108	Norway	Western Europe	4.0	7.522	1.45900	1.33095	0.88521	0.66973	0.36503	0.34699	2.46531	2015	top
25	Canada	North America	5.0	7.427	1.32629	1.32261	0.90563	0.63297	0.32957	0.45811	2.45176	2015	top

In [18]:

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()

#columns to graph
cols = df.columns[3:10].tolist()

#Only use 2017 data to make graph less busy
temp_df = df[df['Year']==2017].reset_index()

#Transform all data to a scale from 0 to 1 to look for patterns
temp_df.loc[:,cols] = min_max_scaler.fit_transform(temp_df[cols])

In [19]:

hvplot.parallel_coordinates(temp_df, 'quantile', cols=cols, alpha=.3, tools=['hover', 'tap'], width=800, height=500)

Out[19]:

Is there a pattern to what the most happy countries look like? It appears that they are highest in GDP per capita, family, and life expectancy as found earlier. Countries tend to be split fairly well by GDP per capita, family, life expectancy, and freedom while perceived government corruption and generosity look like a free-for-all. The only exception is that top countries tend to have the best government corruption ratings, which makes sense because the top is flooded with Scandinavian countries who tend to be very transparent with their government actions.

My Concerns¶

Chosen Metrics¶

My main concerns for the World Happiness Report is the high focus on GDP per capita and the strongly correlated features such as family and life expectancy. Some argue that questioning on overall life status leads humans to overweigh income concerns, rather than happiness. This concern can be validated by a 2017 Gallup poll which rated countries based on the positive and negative experiences of their lives and found the list dominated by Latin America. El Salvador was rated as 2nd on this list while the World Happiness Report found El Salvador ranked at 45th.

how_live

In comparison, this same survey looked at how people perceived their lives (similar to the World Happiness Report) and found a fairly similar ranking. It seems people tend to over-value their income when it is brought up as a factor for happiness.

how_see

Also, according to the Wiki article on the World Happiness Report, some point out that the ranking results are counterintuitive when it come to some dimensions. For instance, "if rate of suicide is used as a metric for measuring unhappiness, (the opposite of happiness), then some of the countries which are ranked among the top 20 happiest countries in the world will also feature among the top 20 with the highest suicide rates in the world."

Philosophical¶

Measuring happiness in a group of people can be misleading because happiness is an individual event. It is dependent on the individuals perception of their life, which is independent of the environment they are placed in. For example, in the book Man's Search for Meaning by Viktor Frankl, while Dr. Frankl was imprisoned in a Nazi concentration camp, he found the ones that faired best had a strong reason, a "why", that kept them going. Everyone was in the same situation, but their well-being was deeply affected by their thoughts.

On the other side, the metrics are useful for finding overall trends that can be improved in a country such as healthcare and general well-being, but this is far from the complete formula for individual happiness. A happy or unhappy country is just an average of happy and unhappy individuals.

So, Does the World Happiness Report Measure True Happiness?¶

The answer depends on how you define happiness. If you think happiness is how people see their lives -- then Norwegians are the happiest people in the world. If you think happiness is defined by how people live their lives through experiences such as smiling and laughing, enjoyment and feeling treated with respect each day -- then the happiest people in the world are Latin Americans.

How people reflect on their lives is very different from how people live their lives. For example, if you interview two women -- one with a child and one without a child -- which one has more stress? On average, it's the woman with the child. But if you asked them to rate their overall lives, whose rating is higher? It's also the woman with the child. So, the woman with more stress also rates her life higher.

So, How Should We Measure Happiness?¶

Global happiness studies often involve two measures -- how people see their lives and how they live their lives. The World Happiness Report only uses the former.

We can measure how others live their lives using indexes for positive and negative experiences. According to Gallup, there are 5 main positive and negative experiences:

Positive experiences

feeling well-rested
laughing and smiling
enjoyment
feeling respected
learning or doing something interesting

Negative experiences

stress
sadness
physical pain
worry
anger

Both of these concepts are rooted in behavioral economics, and both are necessary for providing a more clear picture of how people's lives are going. Are people satisfied with their lives? Do they have healthy levels of enjoyment and stress? Both questions are important, and this is exactly why we need to measure both life satisfaction and emotions.

Beyond Data

Analyzing World Happiness