#Importing packages
import numpy as np
import pandas as pd
import geopandas as gpd
from shapely.wkt import loads
from shapely.geometry import mapping
import matplotlib.pyplot as plt
Part 2
Data Manipulation: preparing the dataset for analysis
In this tutorial, we will intersect the fire-climate classification (FC) vector data with the stacked and filtered geopackage we created in Part 1 to assess the fire risk for each project. Since our geopackage contains both points and polygons, we will separately determine the fire risk category for each data type and then merge the resulting dataframes.
We begin by importing the required packages and loading the FC vecotr and the filtered and stacked geopackage
##Loading the geopackage and fire-climate classification vector
= gpd.read_file('/home/jupyter/fire_risk/project_boundary/processed/data_ad_ifm.gpkg')
data_ad_ifm
= gpd.read_file('/home/jupyter/fire_risk/fire_data/fire_vec/fire_vec_4.geojson') fire_vector
Finding the fire risk category for polygons
First, filter the dataframe based on geometry and subset for geometry type polygon
#Viewing the unique geometry types
data_ad_ifm.geom_type.unique()
#Subsetting the dataframe for polygons and multipolygons
= data_ad_ifm[data_ad_ifm.geom_type.isin (['Polygon' ,'MultiPolygon'])]
data_ad_ifm_pol data_ad_ifm_pol.geom_type.unique()
Next, we will intersect this data with the FC vector data to calculate the area for each intersection. Before doing so, we will convert the fire classification data from numerical to categorical format.
'FC_cat'] = fire_vector['FC'].astype("category")
fire_vector[ fire_vector.dtypes
The code below might take some time to run, hang in there.
#Intersect 'data_ad_ifm_pol' with 'fire_vector' to find overlapping geometries and store the result in 'ad_ifm_pol_fire'.
= gpd.overlay(data_ad_ifm_pol, fire_vector, how='intersection')
ad_ifm_pol_fire
# Calculate area of each intersection geometry
'intersection_area'] = ad_ifm_pol_fire.geometry.area ad_ifm_pol_fire[
A project polygon may span multiple fire classification categories. We will calculate the area covered by each category and assign the category with the largest area to the project. We will group the data by Project ID and select the fire classification with the maximum area for each project.
ad_ifm_pol_fire
# Group by project ID and find the fire class with the maximum area
= ad_ifm_pol_fire.groupby('ProjectID').apply(
predominant_fire_class_pol lambda x: x.loc[x['intersection_area'].idxmax(), ['FC_cat', 'intersection_area']]
).reset_index()
# Rename columns if needed
= predominant_fire_class_pol.rename(columns={'FC_cat': 'predominant_fire_class'})
predominant_fire_class_pol predominant_fire_class_pol
Next, merge the predominant_fire_class dataframe with the data_ad_ifm_pol dataframe using Project ID as the key.
= pd.merge(data_ad_ifm_pol, predominant_fire_class_pol, on='ProjectID', how='left')
data_ad_ifm_pol data_ad_ifm_pol
We have now identified the predominant fire class for each project and incorporated this information into the data_ad_ifm_pol dataframe.
Finding the fire risk category for points
We will apply the same steps to the points data. First, filter the dataframe to include only point geometries.
= data_ad_ifm[data_ad_ifm.geom_type.isin (['Point' ,'MultiPoint'])]
data_ad_ifm_points data_ad_ifm_points
Following this, intersect the points dataframe wih the FC vector to obtain the fire classification for each point
#Spatially join 'data_ad_ifm_points' with 'fire_vector' using a left join.
= gpd.sjoin(data_ad_ifm_points, fire_vector, how='left', op='intersects') ad_ifm_points_fire
ad_ifm_points_fire
Projects may contain multiple points. In such cases, we aim to identify the most frequently occurring fire class for each project. If no single fire class dominates, we prioritize a non-zero class.
#Calculate the number of points in each fire class per project
'count'] = 1
ad_ifm_points_fire[= ad_ifm_points_fire.groupby(['ProjectID', 'FC_cat']).agg({'count': 'sum'}).reset_index() fire_class_counts
#Function to find the predominant fire class with tie-breaking logic
def find_predominant_fire_class(group):
# Sort by count in descending order and by fire class to prioritize non-zero classes
= group.sort_values(by=['count', 'FC_cat'], ascending=[False, False])
sorted_group return sorted_group.iloc[0]
# Apply the function to each group
= fire_class_counts.groupby('ProjectID').apply(find_predominant_fire_class).reset_index(drop=True)
predominant_fire_class_point
# Rename columns if needed
= predominant_fire_class_point.rename(columns={'FC_cat': 'predominant_fire_class', 'count': 'intersection_points'})
predominant_fire_class_point
predominant_fire_class_point
Merge the predominant_fire_class dataframe with the points dataframe using Project ID as the key.
= pd.merge(data_ad_ifm_points, predominant_fire_class_point, on='ProjectID', how='left')
data_ad_ifm_points data_ad_ifm_points
Just as we did with the polygon dataframe (data_ad_ifm_pol), we have now added the predominant fire class to the points dataframe (data_ifm_points).
In the next section, we will merge these two dataframes into one and convert the codes in the ‘predominant_fire_class’ column into more meaningful values.
Merging the points and polygons dataframe into one
= pd.concat([data_ad_ifm_pol, data_ad_ifm_points]).reset_index(drop=True)
ad_ifm_fc ad_ifm_fc
#renaming the predominant_fire_class column
= {'predominant_fire_class':'Fire Risk'}, inplace = True) ad_ifm_fc.rename(columns
Having added the fire risk type for each project, we now need to extract additional information from this column for further analysis. The fire-climate classification codes mentioned in Part 1 allow us to infer both the ecosystem type and the fire class. To facilitate this, we create three separate columns:
- The first column maps the codes to both ecosystem type and fire class.
- The second column maps only to the fire class.
- The third column maps only to the ecosystem type.
We categorize the fire risk as follows: “Recurrent” is translated to “High,” “Occasional” to “Medium,” and “Infrequent” to “Low.”
#Adding a column that translates fire risk from number to its actual classification in words
= {
fire_risk_ecosystem_map 11: 'Tropical - dry season - recurrent',
12: 'Tropical - dry season - occasional',
13: 'Tropical - dry season - infrequent',
21: 'Arid - fuel limited - recurrent',
22: 'Arid - fuel limited - occasional',
23: 'Arid - fuel limited - infrequent',
31: 'Temperate - dry hot season - recurrent',
32: 'Temperate - dry hot season - occasional',
33: 'Temperate - dry hot season - infrequent',
41: 'Boreal - hot season - recurrent',
42: 'Boreal - hot season - occasional',
43: 'Boreal - hot season - infrequent',
0: 'Non fire-prone'
}
#Translate number to fire risk class only
= {
fire_risk_type_map 11: 'High',
12: 'Medium',
13: 'Low',
21: 'High',
22: 'Medium',
23: 'Low',
31: 'High',
32: 'Medium',
33: 'Low',
41: 'High',
42: 'Medium',
43: 'Low',
0: 'Non fire-prone'
}
#Translate numerical code to its corresponding ecosystem
= {
ecosystem_type_map 0: 'NA',
11: 'Tropical',
12: 'Tropical',
13: 'Tropical',
21: 'Arid',
22: 'Arid',
23: 'Arid',
31: 'Temperate',
32: 'Temperate',
33: 'Temperate',
41: 'Boreal',
42: 'Boreal',
43: 'Boreal'
}
# Adding new columns using map function
'fire_risk_ecosystem'] = ad_ifm_fc['Fire Risk'].map(fire_risk_ecosystem_map)
ad_ifm_fc['fire_risk_type'] = ad_ifm_fc['Fire Risk'].map(fire_risk_type_map)
ad_ifm_fc['ecosystem_type'] = ad_ifm_fc['Fire Risk'].map(ecosystem_type_map)
ad_ifm_fc[
ad_ifm_fc
#Save this dataframe to use in the next tutorial
'path/to/save/datafram/ad_ifm_fc.gpkg', driver='GPKG') ad_ifm_fc.to_file(