← Home

Investigating death probability as computed from ISTATΒΆ

AIM: what are the mortality trends in Italy?

Remember:

  • I already explored the data in a previous notebook
  • Mortality is computed in DCIS_MORTALITA1 as FUNZ_BIO=PROBDEATH
  • Data is available for each age (up to 119 years old which is extrapolated, beacuse there are no people who lived that long)
  • Data is available for each year of observation since 1974
  • The methodology for the calculation is presented in https://www.istat.it/it/files/2018/08/volume-tavole-mortalita-1998.pdf
InΒ [1]:
import numpy as np
import pandas as pd
import requests
import matplotlib
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import warnings 
from istatapi import discovery, retrieval

pio.renderers.default = 'vscode+notebook'
warnings.filterwarnings('ignore')
requests.urllib3.disable_warnings() # avoid "InsecureRequestWarning: Unverified HTTPS request is being made to host 'sdmx.istat.it'. Adding certificate verification is strongly advised"

def get_colors(n, cmap_name="rainbow"):
    """Get colors for px colors_discrete argument, given the number of colors needed, n."""
    cmap = matplotlib.colormaps[cmap_name]
    colors = [cmap(i) for i in np.linspace(0, 1, n)]  # Generate colors
    colors_str = [f"rgba({int(color[0]*250)}, {int(color[1]*250)}, {int(color[2]*250)}, 1.0)" for color in colors]
    return colors_str
InΒ [2]:
ds = discovery.DataSet(dataflow_identifier="DCIS_MORTALITA1") 
ds.set_filters(
    freq="A", 
    #eta="TOTAL", 
    itter107="IT", sesso="9", funz_bio="PROBDEATH"
)
df5 = retrieval.get_data(ds)
df5.loc[:, lambda dfx: (~dfx.isna()).any(axis=0)]
Out[2]:
DATAFLOW FREQ ETA_CLASSI_ETA ITTER107 SESSO FUNZ_BIO TIME_PERIOD OBS_VALUE OBS_STATUS
0 IT1:26_295(1.1) A Y_UN4 IT 9 PROBDEATH 1974-01-01 27.34902 NaN
510 IT1:26_295(1.1) A Y104 IT 9 PROBDEATH 1974-01-01 602.90046 NaN
5304 IT1:26_295(1.1) A Y69 IT 9 PROBDEATH 1974-01-01 29.21105 NaN
5253 IT1:26_295(1.1) A Y68 IT 9 PROBDEATH 1974-01-01 26.23550 NaN
5202 IT1:26_295(1.1) A Y67 IT 9 PROBDEATH 1974-01-01 24.37493 NaN
... ... ... ... ... ... ... ... ... ...
4742 IT1:26_295(1.1) A Y6 IT 9 PROBDEATH 2024-01-01 0.08638 e
4691 IT1:26_295(1.1) A Y59 IT 9 PROBDEATH 2024-01-01 4.27162 e
4640 IT1:26_295(1.1) A Y58 IT 9 PROBDEATH 2024-01-01 3.91415 e
4946 IT1:26_295(1.1) A Y62 IT 9 PROBDEATH 2024-01-01 5.58787 e
7343 IT1:26_295(1.1) A Y99 IT 9 PROBDEATH 2024-01-01 321.53069 e

7344 rows Γ— 9 columns

InΒ [3]:
dfdp = (
    df5
    .rename(columns={"ETA_CLASSI_ETA": "ETA"})
    .query("ETA!='TOTAL'")
    .query("~ETA.str.contains('-|_')")
    .assign(age= lambda x: [ int(a.split("Y")[-1]) for a in x["ETA"] ])
    .groupby(["age", "TIME_PERIOD"], as_index=False)["OBS_VALUE"].sum()
    .assign(year= lambda x: x["TIME_PERIOD"].dt.year)
    .rename(columns={"OBS_VALUE": "deaths"})
    .pivot(index="age", columns="year", values="deaths")
    .sort_index()
    .div(1000) # PROBDEATH is per 1000 people
)
dfdp.to_csv("../data/deathprob_by_age_year.csv")
dfdp
Out[3]:
year 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 ... 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024
age
0 0.024431 0.023538 0.021786 0.020453 0.018872 0.017614 0.016245 0.015062 0.013304 0.012488 ... 0.002978 0.002962 0.002915 0.002918 0.002784 0.002632 0.002407 0.002346 0.002460 0.002572
1 0.001161 0.001034 0.000992 0.000951 0.000931 0.000891 0.000880 0.000842 0.000818 0.000769 ... 0.000211 0.000190 0.000203 0.000204 0.000208 0.000191 0.000182 0.000175 0.000199 0.000209
2 0.000803 0.000729 0.000687 0.000662 0.000643 0.000619 0.000617 0.000583 0.000570 0.000541 ... 0.000162 0.000148 0.000145 0.000145 0.000152 0.000143 0.000135 0.000125 0.000138 0.000144
3 0.000580 0.000540 0.000499 0.000480 0.000460 0.000443 0.000448 0.000422 0.000415 0.000394 ... 0.000124 0.000115 0.000107 0.000109 0.000114 0.000110 0.000101 0.000094 0.000102 0.000107
4 0.000450 0.000430 0.000392 0.000377 0.000353 0.000338 0.000347 0.000329 0.000326 0.000305 ... 0.000098 0.000092 0.000084 0.000089 0.000089 0.000088 0.000079 0.000076 0.000082 0.000088
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
115 0.830542 0.842068 0.840793 0.832899 0.820353 0.821158 0.830536 0.812013 0.827645 0.844510 ... 0.812349 0.783121 0.790757 0.785699 0.789470 0.825086 0.801922 0.827020 0.812427 0.798995
116 0.840824 0.851811 0.851071 0.844183 0.832275 0.833640 0.842781 0.824728 0.840211 0.855815 ... 0.831453 0.805091 0.811518 0.807117 0.810286 0.843005 0.821794 0.844665 0.832323 0.820611
117 0.849532 0.860006 0.859822 0.854056 0.842874 0.844825 0.853700 0.836179 0.851242 0.865447 ... 0.848947 0.825683 0.830815 0.825405 0.829654 0.859216 0.840126 0.860587 0.850653 0.840784
118 0.856800 0.866782 0.867174 0.862612 0.852221 0.854774 0.863353 0.846417 0.860838 0.873540 ... 0.864874 0.844921 0.848679 0.843982 0.847608 0.873775 0.856952 0.874836 0.867463 0.859555
119 0.862763 0.872263 0.873250 0.869934 0.860379 0.863539 0.871795 0.855488 0.869114 0.880248 ... 0.879288 0.862830 0.865155 0.861231 0.864192 0.886750 0.872318 0.887476 0.882806 0.876970

120 rows Γ— 51 columns

InΒ [4]:
fig, axs = plt.subplots(12, 10, figsize=[24, 18], sharex=True)
for iplot, age in enumerate(dfdp.index):
    ax = axs.flatten()[iplot]
    ax.plot(dfdp.columns, dfdp.loc[age] * 100, color="red")
    ax.set_title(f"Age: {age}") 
    ax.axvline(2003, color="black", linestyle="-", lw=0.5, alpha=0.5) # heatwave - investigate in Notebook 53
    ax.axvline(2020, color="black", linestyle="-", lw=0.5, alpha=0.5) # COVID-19, also change in methodology (see Notebook 25)
    ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f'{x:.2f}%'))
plt.tight_layout()
plt.show()
No description has been provided for this image
InΒ [5]:
# use plotly to plot all ages in a single plot
fig = go.Figure()
first_year = dfdp.columns.min()
colors = get_colors(len(dfdp.index))
for i, age in enumerate(dfdp.index):
    fig.add_trace(go.Scatter(
        x=dfdp.columns, y=dfdp.loc[age]/dfdp.loc[age, first_year], mode='lines', name=age, line_color=colors[i]))
fig.update_layout(title="Mortality rate by age group", xaxis_title="Year", yaxis_title="Mortality rate (relative to 2003)", legend_title="Age group", width=1000)
fig.show()

ConclusionsΒΆ

  • There was a sharp decline in mortaility from 1974 since 2000, except for the hill in 1-40 years old, expecially evident in people around 30 years old, with its peak around 1990-1995: it should be due to drug abuse, but why does it affect people so young? Unhealthy parents?
  • From 2000 to now, results are mixed: for some ages is still going down like pre-2000 (e.g., 78-85 years old), for some it plateaud with noise (90+ years old), and for many it is reaching a plateau just in the recent few years (45-77).
  • Covid-19 increased the mortality, which is not yet down to follow pre-Covid trends for many ages: is this due to the italian health care worsening?
  • In relative terms (current mortality over mortality in 1974), younger people had the best improvement: while 50-80 years old had a 60% decrease in mortality (relative to 1974), this value goes up to 80% in under 18.

Follow-upΒΆ

  • Reproduce the PROBDEATH values from raw data of deaths. Only data grouped by 5-years-age-groups are available, can I find single-age raw data? Or iis it possible to do it from 5-years-age-group data?
← Home