Part 2 Data Manipulation: preparing the dataset for analysis

In this tutorial, we will intersect the fire-climate classification (FC) vector data with the stacked and filtered geopackage we created in Part 1 to assess the fire risk for each project. Since our geopackage contains both points and polygons, we will separately determine the fire risk category for each data type and then merge the resulting dataframes.

We begin by importing the required packages and loading the FC vecotr and the filtered and stacked geopackage

#Importing packages
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.wkt import loads
from shapely.geometry import mapping
import matplotlib.pyplot as plt

##Loading the geopackage and fire-climate classification vector
data_ad_ifm = gpd.read_file('/home/jupyter/fire_risk/project_boundary/processed/data_ad_ifm.gpkg')

fire_vector = gpd.read_file('/home/jupyter/fire_risk/fire_data/fire_vec/fire_vec_4.geojson')

Finding the fire risk category for polygons

First, filter the dataframe based on geometry and subset for geometry type polygon

#Viewing the unique geometry types
data_ad_ifm.geom_type.unique()

#Subsetting the dataframe for polygons and multipolygons
data_ad_ifm_pol = data_ad_ifm[data_ad_ifm.geom_type.isin (['Polygon' ,'MultiPolygon'])]
data_ad_ifm_pol.geom_type.unique()

Next, we will intersect this data with the FC vector data to calculate the area for each intersection. Before doing so, we will convert the fire classification data from numerical to categorical format.

fire_vector['FC_cat'] =  fire_vector['FC'].astype("category")
fire_vector.dtypes

The code below might take some time to run, hang in there.

#Intersect 'data_ad_ifm_pol' with 'fire_vector' to find overlapping geometries and store the result in 'ad_ifm_pol_fire'.
ad_ifm_pol_fire = gpd.overlay(data_ad_ifm_pol, fire_vector, how='intersection')

# Calculate area of each intersection geometry
ad_ifm_pol_fire['intersection_area'] = ad_ifm_pol_fire.geometry.area

A project polygon may span multiple fire classification categories. We will calculate the area covered by each category and assign the category with the largest area to the project. We will group the data by Project ID and select the fire classification with the maximum area for each project.

ad_ifm_pol_fire

# Group by project ID and find the fire class with the maximum area
predominant_fire_class_pol = ad_ifm_pol_fire.groupby('ProjectID').apply(
    lambda x: x.loc[x['intersection_area'].idxmax(), ['FC_cat', 'intersection_area']]
).reset_index()

# Rename columns if needed
predominant_fire_class_pol = predominant_fire_class_pol.rename(columns={'FC_cat': 'predominant_fire_class'})
predominant_fire_class_pol

Next, merge the predominant_fire_class dataframe with the data_ad_ifm_pol dataframe using Project ID as the key.

data_ad_ifm_pol = pd.merge(data_ad_ifm_pol, predominant_fire_class_pol, on='ProjectID', how='left')
data_ad_ifm_pol

We have now identified the predominant fire class for each project and incorporated this information into the data_ad_ifm_pol dataframe.

Finding the fire risk category for points

We will apply the same steps to the points data. First, filter the dataframe to include only point geometries.

data_ad_ifm_points = data_ad_ifm[data_ad_ifm.geom_type.isin (['Point' ,'MultiPoint'])]
data_ad_ifm_points

Following this, intersect the points dataframe wih the FC vector to obtain the fire classification for each point

#Spatially join 'data_ad_ifm_points' with 'fire_vector' using a left join.
ad_ifm_points_fire = gpd.sjoin(data_ad_ifm_points, fire_vector, how='left', op='intersects')

ad_ifm_points_fire

Projects may contain multiple points. In such cases, we aim to identify the most frequently occurring fire class for each project. If no single fire class dominates, we prioritize a non-zero class.

#Calculate the number of points in each fire class per project
ad_ifm_points_fire['count'] = 1
fire_class_counts = ad_ifm_points_fire.groupby(['ProjectID', 'FC_cat']).agg({'count': 'sum'}).reset_index()

#Function to find the predominant fire class with tie-breaking logic
def find_predominant_fire_class(group):
    # Sort by count in descending order and by fire class to prioritize non-zero classes
    sorted_group = group.sort_values(by=['count', 'FC_cat'], ascending=[False, False])
    return sorted_group.iloc[0]

# Apply the function to each group
predominant_fire_class_point = fire_class_counts.groupby('ProjectID').apply(find_predominant_fire_class).reset_index(drop=True)

# Rename columns if needed
predominant_fire_class_point = predominant_fire_class_point.rename(columns={'FC_cat': 'predominant_fire_class', 'count': 'intersection_points'})

predominant_fire_class_point

Merge the predominant_fire_class dataframe with the points dataframe using Project ID as the key.

data_ad_ifm_points = pd.merge(data_ad_ifm_points, predominant_fire_class_point, on='ProjectID', how='left')
data_ad_ifm_points

Just as we did with the polygon dataframe (data_ad_ifm_pol), we have now added the predominant fire class to the points dataframe (data_ifm_points).

In the next section, we will merge these two dataframes into one and convert the codes in the ‘predominant_fire_class’ column into more meaningful values.

Merging the points and polygons dataframe into one

ad_ifm_fc = pd.concat([data_ad_ifm_pol, data_ad_ifm_points]).reset_index(drop=True)
ad_ifm_fc

#renaming the predominant_fire_class column
ad_ifm_fc.rename(columns = {'predominant_fire_class':'Fire Risk'}, inplace = True)

Having added the fire risk type for each project, we now need to extract additional information from this column for further analysis. The fire-climate classification codes mentioned in Part 1 allow us to infer both the ecosystem type and the fire class. To facilitate this, we create three separate columns:

The first column maps the codes to both ecosystem type and fire class.
The second column maps only to the fire class.
The third column maps only to the ecosystem type.

We categorize the fire risk as follows: “Recurrent” is translated to “High,” “Occasional” to “Medium,” and “Infrequent” to “Low.”

#Adding a column that translates fire risk from number to its actual classification in words
fire_risk_ecosystem_map = {
    11: 'Tropical - dry season - recurrent',
    12: 'Tropical - dry season - occasional',
    13: 'Tropical - dry season - infrequent',
    21: 'Arid - fuel limited - recurrent',
    22: 'Arid - fuel limited - occasional',
    23: 'Arid - fuel limited - infrequent',
    31: 'Temperate - dry hot season - recurrent',
    32: 'Temperate - dry hot season - occasional',
    33: 'Temperate - dry hot season - infrequent',
    41: 'Boreal - hot season - recurrent',
    42: 'Boreal - hot season - occasional',
    43: 'Boreal - hot season - infrequent',
    0: 'Non fire-prone'
}

#Translate number to fire risk class only
fire_risk_type_map = {
    11: 'High',
    12: 'Medium',
    13: 'Low',
    21: 'High',
    22: 'Medium',
    23: 'Low',
    31: 'High',
    32: 'Medium',
    33: 'Low',
    41: 'High',
    42: 'Medium',
    43: 'Low',
    0: 'Non fire-prone'
}

#Translate numerical code to its corresponding ecosystem
ecosystem_type_map = {
    0: 'NA',
    11: 'Tropical',
    12: 'Tropical',
    13: 'Tropical',
    21: 'Arid',
    22: 'Arid',
    23: 'Arid',
    31: 'Temperate',
    32: 'Temperate',
    33: 'Temperate',
    41: 'Boreal',
    42: 'Boreal',
    43: 'Boreal'
}

# Adding new columns using map function
ad_ifm_fc['fire_risk_ecosystem'] = ad_ifm_fc['Fire Risk'].map(fire_risk_ecosystem_map)
ad_ifm_fc['fire_risk_type'] = ad_ifm_fc['Fire Risk'].map(fire_risk_type_map)
ad_ifm_fc['ecosystem_type'] = ad_ifm_fc['Fire Risk'].map(ecosystem_type_map)

ad_ifm_fc

#Save this dataframe to use in the next tutorial
ad_ifm_fc.to_file('path/to/save/datafram/ad_ifm_fc.gpkg', driver='GPKG')