Analyzing World Happiness


"Happiness can change, and does change, according to the quality of the society in which people live." -John F. Helliwell

Project Purpose

This project had a two-fold purpose for me. The World Happiness Data Set seemed to be the best data to apply both data analysis skills and behavioral economics.

In 2017, Richard Thaler’s win of the Nobel Prize in economics marked the third time the prize has been tied to the growing field of behavioral economics. In addition to Nobel Prizes, this new discipline has spawned bestselling books, new agencies within governments and even new majors within universities. What it hasn’t led to is new national statistics.

The most established national statistics – gross domestic product, household income and unemployment – focus on rational behavior: what people spend, how much they make, and whether they have a job. What they don’t capture is how people feel. These "feelings" are important because economic agents are simply humans and economic models should account for these human elements when making decisions. In this way, wellbeing and happiness are critical metrics for a nation's social and economic development.

For another perspective, here is a short 2 minute video on why Gallup, a survey company, measures global happiness:

In [1]:
from IPython.display import YouTubeVideo
YouTubeVideo('7QJBqak4GpI')
Out[1]:

About The Data

Taken from: https://worldhappiness.report/ed/2020/

Each country's "Happiness Score" is calculated by summing the seven other variables in the table:

  • Economy: GDP per Capita
  • Family: Social Support
  • Health: Life Expectancy
  • Freedom: Freedom to Make Life Choices
  • Trust: perceived corruption
  • Generosity: perceptions of generosity
  • Dystopia: Each country is compared to "Dystopia" which is a hypothetical nation with the lowest value for each of the 6 factors. The residual error between "Dystopia" and the country is used as a benchmark for regression

Questions Explored

  • Which are the happiest and least happy countries and regions in the world?
  • Is happiness affected by region?
  • Did the happiness score change significantly from 2015 to 2017?
  • Is the World Happiness Report an accurate measure true happiness?*
In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('darkgrid')
import hvplot.pandas
In [3]:
df = pd.read_csv('happiness.csv', index_col=0)
df.head()
Out[3]:
Country Region Happiness Rank Happiness Score Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Year
0 Afghanistan Southern Asia 153.0 3.575 0.31982 0.30285 0.30335 0.23414 0.09719 0.36510 1.95210 2015
1 Albania Central and Eastern Europe 95.0 4.959 0.87867 0.80434 0.81325 0.35733 0.06413 0.14272 1.89894 2015
2 Algeria Middle East and Northern Africa 68.0 5.605 0.93929 1.07772 0.61766 0.28579 0.17383 0.07822 2.43209 2015
3 Angola Sub-Saharan Africa 137.0 4.033 0.75778 0.86040 0.16683 0.10384 0.07122 0.12344 1.94939 2015
4 Argentina Latin America and Caribbean 30.0 6.574 1.05351 1.24823 0.78723 0.44974 0.08484 0.11451 2.83600 2015
In [4]:
#sort by year ascending and happiness score descending
df.sort_values(['Year','Happiness Score'], ascending=[True, False], inplace=True)
df.head()
Out[4]:
Country Region Happiness Rank Happiness Score Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Year
141 Switzerland Western Europe 1.0 7.587 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738 2015
60 Iceland Western Europe 2.0 7.561 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201 2015
38 Denmark Western Europe 3.0 7.527 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204 2015
108 Norway Western Europe 4.0 7.522 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531 2015
25 Canada North America 5.0 7.427 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176 2015
In [5]:
#size of data
print('rows: {}\ncolumns: {}'.format(df.shape[0],df.shape[1]))
rows: 495
columns: 12
In [6]:
#count of missing values for each column
df.isnull().sum().sort_values(ascending=False)
Out[6]:
Dystopia Residual                25
Generosity                       25
Trust (Government Corruption)    25
Freedom                          25
Health (Life Expectancy)         25
Family                           25
Economy (GDP per Capita)         25
Happiness Score                  25
Happiness Rank                   25
Year                              0
Region                            0
Country                           0
dtype: int64
In [7]:
#all rows with missing values
df[df.isnull().any(axis=1)]
Out[7]:
Country Region Happiness Rank Happiness Score Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Year
13 Belize Latin America and Caribbean NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
58 Hong Kong S.A.R., China Eastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
100 Namibia Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
118 Puerto Rico Latin America and Caribbean NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
130 Somalia Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
134 South Sudan Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
144 Taiwan Province Of China Eastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2015
191 Central African Republic Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
204 Djibouti Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
223 Hong Kong S.A.R., China Eastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
245 Lesotho Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
263 Mozambique Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
274 Oman Middle East and Northern Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
304 Swaziland Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
309 Taiwan Province Of China Eastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2016
361 Comoros Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
369 Djibouti Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
387 Hong Kong Eastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
407 Laos Southeastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
439 Oman Middle East and Northern Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
448 Puerto Rico Latin America and Caribbean NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
461 Somaliland Region Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
468 Suriname Latin America and Caribbean NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
469 Swaziland Sub-Saharan Africa NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
473 Taiwan Eastern Asia NaN NaN NaN NaN NaN NaN NaN NaN NaN 2017
In [8]:
#drop rows with missing values
df.dropna(inplace=True)
In [9]:
print('2015 entries: ',str(df[df['Year']==2015].shape[0]))
print('2016 entries: ',str(df[df['Year']==2016].shape[0]))
print('2017 entries: ',str(df[df['Year']==2017].shape[0]))
2015 entries:  158
2016 entries:  157
2017 entries:  155
In [10]:
plt.figure(figsize=(12,10))
sns.heatmap(df.drop(['Happiness Rank','Dystopia Residual'],axis=1)\
            .corr(),square=True,annot=True,cmap='coolwarm')
Out[10]:
<matplotlib.axes._subplots.AxesSubplot at 0x28590ccb940>

It seems that GDP Per Capita, Life Expectancy, and Family are strongly correlated with the Happiness Score. This makes sense because according to the World Happiness Report, the richer the country, the higher people typically rate their lives. Having a higher life expectancy means you can worry less about survival and having a stronger sense of family gives someone a greater social and financial safety net.

A major problem to realize in predicting happiness is that all 3 of these factors are strongly correlated with each other. This is especially true with life expectancy and GDP per capita as countries with more money will be better able to provide proper healthcare.

Happiness by Year

In [11]:
pivot1 = pd.pivot_table(df,
                        index='Year',
                        values='Happiness Score')
pivot1
Out[11]:
Happiness Score
Year
2015 5.375734
2016 5.382185
2017 5.354019

Global happiness has not seemed to have changed much in the 3 years given.

Happiness by Region

In [12]:
pivot2 = pd.pivot_table(df,
                        index='Region',
                        values='Happiness Score')

pivot2.sort_values(by='Happiness Score',ascending=False)
Out[12]:
Happiness Score
Region
Australia and New Zealand 7.302500
North America 7.227167
Western Europe 6.693000
Latin America and Caribbean 6.069074
Eastern Asia 5.632333
Middle East and Northern Africa 5.387879
Central and Eastern Europe 5.371184
Southeastern Asia 5.364077
Southern Asia 4.590857
Sub-Saharan Africa 4.150957

Australia and New Zealand are the top regions for happiness with Sub-Saharan Africa at the bottom.

Happiness by Year and Region

In [13]:
pivot3 = pd.pivot_table(df,
                        index='Region',
                        columns='Year',
                        values='Happiness Score')

pivot3.plot(kind='bar',figsize=(16,6))
plt.legend(bbox_to_anchor=(0.9, 1.0))
Out[13]:
<matplotlib.legend.Legend at 0x28590ccbc18>

Here you can see that shifts in happiness occur differently in different regions. For example, Central and Eastern Europe has risen the over the past 3 years while North America has dropped over 3 years.

Adding More Stats For Each Region

In [14]:
pivot4 = pd.pivot_table(df,
                        index='Region',
                        values='Happiness Score',
                        aggfunc=[np.mean, np.median, np.std, min, max])
pivot4
Out[14]:
mean median std min max
Happiness Score Happiness Score Happiness Score Happiness Score Happiness Score
Region
Australia and New Zealand 7.302500 7.2995 0.020936 7.284 7.334
Central and Eastern Europe 5.371184 5.4010 0.578274 4.096 6.609
Eastern Asia 5.632333 5.6545 0.502100 4.874 6.422
Latin America and Caribbean 6.069074 6.1265 0.728157 3.603 7.226
Middle East and Northern Africa 5.387879 5.3175 1.031656 3.006 7.278
North America 7.227167 7.2175 0.179331 6.993 7.427
Southeastern Asia 5.364077 5.2965 0.882637 3.819 6.798
Southern Asia 4.590857 4.6080 0.535978 3.360 5.269
Sub-Saharan Africa 4.150957 4.1390 0.584945 2.693 5.648
Western Europe 6.693000 6.9070 0.777886 4.857 7.587

The standard deviation helps to quantify the variability of happiness within regions. The Middle East and Northern Africa region contains the highest deviation and a huge range of happiness scoeres from near the bottom (Syria, 3.006) to near the top (Israel, 7.278).

Removing Outliers From Each Region

In [15]:
def remove_outliers(x):
    mid_quartile = x.quantile([.25,.75])
    return np.mean(mid_quartile)

pivot5 = pd.pivot_table(df,
                        index='Region',
                        values='Happiness Score',
                        aggfunc=[np.mean,remove_outliers])

pivot5.plot(kind='bar',figsize=(16,6))
plt.legend(bbox_to_anchor=(0.9, 1.0))
Out[15]:
<matplotlib.legend.Legend at 0x28592e02da0>

After removing outlier countries, most regions saw a slight increase in their happiness score with the exception of Eastern Asia and Australia and New Zealand. Overall, most regions stay fairly similar in happiness and don't change that much regardless if outlier countries are removed or not.

Binning Countries By Happiness Score

In [16]:
score = pd.qcut(x=df['Happiness Score'], 
                q=3, 
                labels=['bottom 1/3','middle 1/3','top 1/3'])

pivot6 = pd.pivot_table(df,
                        index=['Region',score],
                        values='Happiness Score',
                        aggfunc='count',
                        fill_value=0,
                        dropna=False)
pivot6
Out[16]:
Happiness Score
Region Happiness Score
Australia and New Zealand bottom 1/3 0
middle 1/3 0
top 1/3 6
Central and Eastern Europe bottom 1/3 15
middle 1/3 58
top 1/3 14
Eastern Asia bottom 1/3 0
middle 1/3 11
top 1/3 7
Latin America and Caribbean bottom 1/3 4
middle 1/3 19
top 1/3 45
Middle East and Northern Africa bottom 1/3 18
middle 1/3 20
top 1/3 20
North America bottom 1/3 0
middle 1/3 0
top 1/3 6
Southeastern Asia bottom 1/3 6
middle 1/3 12
top 1/3 8
Southern Asia bottom 1/3 13
middle 1/3 8
top 1/3 0
Sub-Saharan Africa bottom 1/3 101
middle 1/3 16
top 1/3 0
Western Europe bottom 1/3 0
middle 1/3 12
top 1/3 51

This table illustrates the amount of times their countries have been in the bottom, middle, or top 1/3 of the world's happiness. You can see the huge disparity between regions when binned. For example, Western Europe has 0 countries in the bottom 1/3 of happiness, and Sub-Saharan Africa has 0 countries in the top 1/3, with the high majority at the bottom.

Searching For Any Connections

In [17]:
#Add quantile to split countries into 4 levels of happiness 
df['quantile'] = pd.qcut(x=df['Happiness Score'], 
               q=4, 
               labels=['low','low-mid','top-mid','top'])
df.head()
Out[17]:
Country Region Happiness Rank Happiness Score Economy (GDP per Capita) Family Health (Life Expectancy) Freedom Trust (Government Corruption) Generosity Dystopia Residual Year quantile
141 Switzerland Western Europe 1.0 7.587 1.39651 1.34951 0.94143 0.66557 0.41978 0.29678 2.51738 2015 top
60 Iceland Western Europe 2.0 7.561 1.30232 1.40223 0.94784 0.62877 0.14145 0.43630 2.70201 2015 top
38 Denmark Western Europe 3.0 7.527 1.32548 1.36058 0.87464 0.64938 0.48357 0.34139 2.49204 2015 top
108 Norway Western Europe 4.0 7.522 1.45900 1.33095 0.88521 0.66973 0.36503 0.34699 2.46531 2015 top
25 Canada North America 5.0 7.427 1.32629 1.32261 0.90563 0.63297 0.32957 0.45811 2.45176 2015 top
In [18]:
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()

#columns to graph
cols = df.columns[3:10].tolist()

#Only use 2017 data to make graph less busy
temp_df = df[df['Year']==2017].reset_index()

#Transform all data to a scale from 0 to 1 to look for patterns
temp_df.loc[:,cols] = min_max_scaler.fit_transform(temp_df[cols])
In [19]:
hvplot.parallel_coordinates(temp_df, 'quantile', cols=cols, alpha=.3, tools=['hover', 'tap'], width=800, height=500)
Out[19]:

Is there a pattern to what the most happy countries look like? It appears that they are highest in GDP per capita, family, and life expectancy as found earlier. Countries tend to be split fairly well by GDP per capita, family, life expectancy, and freedom while perceived government corruption and generosity look like a free-for-all. The only exception is that top countries tend to have the best government corruption ratings, which makes sense because the top is flooded with Scandinavian countries who tend to be very transparent with their government actions.

My Concerns


Chosen Metrics

My main concerns for the World Happiness Report is the high focus on GDP per capita and the strongly correlated features such as family and life expectancy. Some argue that questioning on overall life status leads humans to overweigh income concerns, rather than happiness. This concern can be validated by a 2017 Gallup poll which rated countries based on the positive and negative experiences of their lives and found the list dominated by Latin America. El Salvador was rated as 2nd on this list while the World Happiness Report found El Salvador ranked at 45th.

how_live

In comparison, this same survey looked at how people perceived their lives (similar to the World Happiness Report) and found a fairly similar ranking. It seems people tend to over-value their income when it is brought up as a factor for happiness.

how_see

Also, according to the Wiki article on the World Happiness Report, some point out that the ranking results are counterintuitive when it come to some dimensions. For instance, "if rate of suicide is used as a metric for measuring unhappiness, (the opposite of happiness), then some of the countries which are ranked among the top 20 happiest countries in the world will also feature among the top 20 with the highest suicide rates in the world."

Philosophical

Measuring happiness in a group of people can be misleading because happiness is an individual event. It is dependent on the individuals perception of their life, which is independent of the environment they are placed in. For example, in the book Man's Search for Meaning by Viktor Frankl, while Dr. Frankl was imprisoned in a Nazi concentration camp, he found the ones that faired best had a strong reason, a "why", that kept them going. Everyone was in the same situation, but their well-being was deeply affected by their thoughts.

On the other side, the metrics are useful for finding overall trends that can be improved in a country such as healthcare and general well-being, but this is far from the complete formula for individual happiness. A happy or unhappy country is just an average of happy and unhappy individuals.

So, Does the World Happiness Report Measure True Happiness?

The answer depends on how you define happiness. If you think happiness is how people see their lives -- then Norwegians are the happiest people in the world. If you think happiness is defined by how people live their lives through experiences such as smiling and laughing, enjoyment and feeling treated with respect each day -- then the happiest people in the world are Latin Americans.

How people reflect on their lives is very different from how people live their lives. For example, if you interview two women -- one with a child and one without a child -- which one has more stress? On average, it's the woman with the child. But if you asked them to rate their overall lives, whose rating is higher? It's also the woman with the child. So, the woman with more stress also rates her life higher.

So, How Should We Measure Happiness?

Global happiness studies often involve two measures -- how people see their lives and how they live their lives. The World Happiness Report only uses the former.

We can measure how others live their lives using indexes for positive and negative experiences. According to Gallup, there are 5 main positive and negative experiences:

Positive experiences

  1. feeling well-rested
  2. laughing and smiling
  3. enjoyment
  4. feeling respected
  5. learning or doing something interesting

Negative experiences

  1. stress
  2. sadness
  3. physical pain
  4. worry
  5. anger

Both of these concepts are rooted in behavioral economics, and both are necessary for providing a more clear picture of how people's lives are going. Are people satisfied with their lives? Do they have healthy levels of enjoyment and stress? Both questions are important, and this is exactly why we need to measure both life satisfaction and emotions.