Life Expectancy Prediction using Machine Learning Algorithms

Alessandra Barbosa
3 min readOct 22, 2021

Part I Data acquisition

How the use of Machine Learning Algorithm in Life Expectancy prediction can help take decisions in Social Investments

Photo by Danie Franco on Unsplash

Working in a non-governmental organization that is concerned with the quality of life of the elderly, I am always faced with initiatives that aim to increase the expectation and quality of life of citizens. Life expectancy is one of most important factors in end-of-life decision making.
Many of these initiatives require financial investment and therefore it is necessary to decide which initiative to prioritize.

In this project I aproch the task of predicting life expectancy as a supervised machine learning task.

The metodology that I used is a CRISP DM (CRoss Industry Standard Process for Data Mining ). This process model with six phases that naturally describes the data science life cycle. I usualy complet at least 2 cycles before finish a project.

CRISP-DM Diagram. Inspired by WikiMedia

Datasets

Considering data from a period 2000 to 2015 for 193 countries, the data was separated by health , economics , mortality, immunization, enviroment and demographic factors .

Features Mental map

One part of the data was collected on the kaggle website but originates from the WHO and United Nations website with the help of scientists Deeksha Russell and Duan Wang.

Data Acquisition from kaggle :https://www.kaggle.com/kumarajarshi/life-expectancy-who
Data acquisition from kaggle

The second part was collect by “Our world in data” website that is a project of the Global Change Data Lab, a non-profit organization based in the United Kingdom.

The dataset was completed populating country code and continent code from function and then I used the API Geolocation to conver to alpah2 country codes

After this work to standarizated the datasets and completing with demografic data from United Nations website , I have a final dataset with 2938 rows and 30 coluns

In short, this study will focus on immunization factors, mortality factors, economic factors, social factors and other health-related factors as well as gas emissions in different countries.

The observations in this dataset are based on different countries, so that it will be easier for a country to determine the predictor that is contributing to lower life expectancy helping to suggest which area should be given importance to efficiently improve the life expectancy population.

Links:

--

--