Investigating death probability as computed from ISTATΒΆ
AIM: what are the mortality trends in Italy?
Remember:
- I already explored the data in a previous notebook
- Mortality is computed in
DCIS_MORTALITA1asFUNZ_BIO=PROBDEATH - Data is available for each age (up to 119 years old which is extrapolated, beacuse there are no people who lived that long)
- Data is available for each year of observation since 1974
- The methodology for the calculation is presented in https://www.istat.it/it/files/2018/08/volume-tavole-mortalita-1998.pdf
InΒ [1]:
import numpy as np
import pandas as pd
import requests
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import warnings
from istatapi import discovery, retrieval
pio.renderers.default = 'vscode+notebook'
warnings.filterwarnings('ignore')
requests.urllib3.disable_warnings() # avoid "InsecureRequestWarning: Unverified HTTPS request is being made to host 'sdmx.istat.it'. Adding certificate verification is strongly advised"
def get_colors(n, cmap_name="rainbow"):
"""Get colors for px colors_discrete argument, given the number of colors needed, n."""
cmap = matplotlib.colormaps[cmap_name]
colors = [cmap(i) for i in np.linspace(0, 1, n)] # Generate colors
colors_str = [f"rgba({int(color[0]*250)}, {int(color[1]*250)}, {int(color[2]*250)}, 1.0)" for color in colors]
return colors_str
InΒ [2]:
ds = discovery.DataSet(dataflow_identifier="DCIS_MORTALITA1")
ds.set_filters(
freq="A",
#eta="TOTAL",
itter107="IT", sesso="9", funz_bio="PROBDEATH"
)
df5 = retrieval.get_data(ds)
df5.loc[:, lambda dfx: (~dfx.isna()).any(axis=0)]
Out[2]:
InΒ [3]:
dfdp = (
df5
.rename(columns={"ETA_CLASSI_ETA": "ETA"})
.query("ETA!='TOTAL'")
.query("~ETA.str.contains('-|_')")
.assign(age= lambda x: [ int(a.split("Y")[-1]) for a in x["ETA"] ])
.groupby(["age", "TIME_PERIOD"], as_index=False)["OBS_VALUE"].sum()
.assign(year= lambda x: x["TIME_PERIOD"].dt.year)
.rename(columns={"OBS_VALUE": "deaths"})
.pivot(index="age", columns="year", values="deaths")
.sort_index()
.div(1000) # PROBDEATH is per 1000 people
)
dfdp.to_csv("../data/deathprob_by_age_year.csv")
dfdp
Out[3]:
InΒ [4]:
fig, axs = plt.subplots(12, 10, figsize=[24, 18], sharex=True)
for iplot, age in enumerate(dfdp.index):
ax = axs.flatten()[iplot]
ax.plot(dfdp.columns, dfdp.loc[age] * 100, color="red")
ax.set_title(f"Age: {age}")
ax.axvline(2003, color="black", linestyle="-", lw=0.5, alpha=0.5) # heatwave - investigate in Notebook 53
ax.axvline(2020, color="black", linestyle="-", lw=0.5, alpha=0.5) # COVID-19, also change in methodology (see Notebook 25)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.2f}%'))
plt.tight_layout()
plt.show()
InΒ [5]:
# use plotly to plot all ages in a single plot
fig = go.Figure()
first_year = dfdp.columns.min()
colors = get_colors(len(dfdp.index))
for i, age in enumerate(dfdp.index):
fig.add_trace(go.Scatter(
x=dfdp.columns, y=dfdp.loc[age]/dfdp.loc[age, first_year], mode='lines', name=age, line_color=colors[i]))
fig.update_layout(title="Mortality rate by age group", xaxis_title="Year", yaxis_title="Mortality rate (relative to 2003)", legend_title="Age group", width=1000)
fig.show()
ConclusionsΒΆ
- There was a sharp decline in mortaility from 1974 since 2000, except for the hill in 1-40 years old, expecially evident in people around 30 years old, with its peak around 1990-1995: it should be due to drug abuse, but why does it affect people so young? Unhealthy parents?
- From 2000 to now, results are mixed: for some ages is still going down like pre-2000 (e.g., 78-85 years old), for some it plateaud with noise (90+ years old), and for many it is reaching a plateau just in the recent few years (45-77).
- Covid-19 increased the mortality, which is not yet down to follow pre-Covid trends for many ages: is this due to the italian health care worsening?
- In relative terms (current mortality over mortality in 1974), younger people had the best improvement: while 50-80 years old had a 60% decrease in mortality (relative to 1974), this value goes up to 80% in under 18.
Follow-upΒΆ
- Reproduce the
PROBDEATHvalues from raw data of deaths. Only data grouped by 5-years-age-groups are available, can I find single-age raw data? Or iis it possible to do it from 5-years-age-group data?