Part 3 Data Analysis & Visualization

Determing the vulnerability of forest carbon projects to wildfires

Now that the data is clean and in a useable format, we’ll analyse it and make visualizations to detrmine the trends in vulnerability of projects to wildfire

#Import required packages
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas as gpd

#Imprting the dataframe created at the end of Part 2
ad_ifm_fc = gpd.read_file('path/to/your/geopackage/ad_ifm_fc_test.gpkg')

Number of projects by fire risk category and project type

We will analyze the number of projects in each fire risk category by project type, and then create a table to present the data

To determine the number of projects in each fire risk category, group the dataframe by the Project Type and fire_risk_type columns, and then count the number of observations within each combination of these categories.

Next, pivot the table such that we set index on the fire_risk_type and set the Project Type as columns. Finally, add a Total column to sum the number of projects across all project types for each fire risk category

# Group projects by 'Project Type' and 'Fire Risk Type' and count the number of projects in each group
risk_counts = ad_ifm_fc.groupby(['Project Type', 'fire_risk_type']).size().reset_index(name='Number of Projects')

# Create a pivot table with 'Fire Risk Type' as the index and 'Project Type' as columns, summing the number of projects
risk_count_pivot = risk_counts.pivot_table(index='fire_risk_type', columns='Project Type', values='Number of Projects', aggfunc='sum')

# Add a 'Total' column that sums the number of projects across all project types for each fire risk type
risk_count_pivot['Total'] = risk_count_pivot.sum(axis=1)

risk_count_pivot.reset_index(level=0, inplace=True) 

# Print the resulting pivot table
risk_count_pivot

Currently, the data is sorted alphabetically based on the values in the fire_risk_type column.

To sort the data in a more meaningful way, convert the fire_risk_type column to a categorical type with a specific order.

risk_count_pivot['fire_risk_type'] = pd.Categorical(risk_count_pivot['fire_risk_type'], ["High", "Medium", "Low", "Non fire-prone"])

#Sorting the data by the fire_risk_type_column
risk_count_pivot = risk_count_pivot.sort_values('fire_risk_type')

risk_count_pivot

This table is informative but somewhat basic, let’s enhance its presentation.

#Import package
from great_tables import GT, md, html

gt_fire_class = (
    GT(risk_count_pivot)  # Create a GT table object from the pivoted risk count DataFrame
    .tab_header(
        title=md('**Distribution of Avoided Deforestation (AD) and Improved Forest Management (IFM) Projects by Fire Risk Category**') 
        # Set the table header with a bold title describing the distribution of AD and IFM projects by fire risk category
    )
    .tab_spanner(label=md("**Project Type**"), columns=["AD", "IFM"])  # Create a column spanner labeled "Project Type" for the AD and IFM columns
    .cols_label(
        # Define labels for the columns with bold text formatting
        fire_risk_type=md("**Fire Risk Category**"), # Label for the fire_risk_type column
        AD=md("**Num AD**"),  # Label for the AD column
        IFM=md("**Num IFM**"),  # Label for the IFM column
        Total=md("**Total Projects**")  # Label for the Total column
    )
    .cols_align(
        align="center", columns=["AD", "IFM", "Total"]  # Center-align the text in the AD, IFM, and Total columns
    )
)

gt_fire_class  # Render the formatted table

Percentage of projects in each fire risk category

In this section, we will visualize the distribution of projects across different fire risk categories by creating a pie chart.

We will start by counting the frequency of each fire risk type and then calculate the corresponding percentages for each category.

# Count frequencies of Fire Risk types
fire_risk_type_counts = ad_ifm_fc['fire_risk_type'].value_counts()

# Calculate percentages
fire_risk_type_percentages = (fire_risk_type_counts / fire_risk_type_counts.sum()) * 100

fire_risk_type_percentages

Next, use the percentages from the fire_risk_type_percentages dataframe to create a pie chart for visualization.

#Labels
labels = ['High', 'Non Fire-prone', 'Low', 'Medium']

# Plotting the pie chart
plt.figure(figsize=(8, 8))  # Adjust the figure size if needed
colors = ['#da1e28', '#64dfdf', '#ffd7d9', '#ff8389']
plt.pie(fire_risk_type_percentages, labels=labels, autopct='%1.1f%%', colors=colors, startangle=140, textprops={'fontsize': 14})

# Adding title
plt.title('Percentage of AD and IFM Projects by Fire Risk Category', fontsize=16, loc='left', pad=20)

# Show plot
plt.axis('equal')  # Equal aspect ratio ensures the pie chart is circular.
plt.savefig('fire_risk_high_res.png', dpi=300)

plt.show()

Finding the area under each risk category by ecosystem type

We start by calculating the area of each project.

To ensure accurate area calculations, we need to reproject each project geometry to its respective UTM zone. For this, we will create two functions:

Determine the UTM Zone: This function calculates the UTM zone for each project based on its latitude and longitude.
Calculate and Convert Area: This function extracts latitude and longitude from the centroids, applies the UTM zone determination, computes the area (defaulting to square meters), and then converts the area to square kilometers.

#Import packages
import geopandas as gpd
from pyproj import Proj, transform, CRS

# Function to determine UTM zone for a given geometry
def get_utm_zone(lon, lat):
    utm_band = str((int((lon + 180) / 6) % 60) + 1)
    if len(utm_band) == 1:
        utm_band = '0' + utm_band
    if lat >= 0:
        epsg_code = f'326{utm_band}'
    else:
        epsg_code = f'327{utm_band}'
    return CRS(f'EPSG:{epsg_code}')

#Convert the geomtery to desired utm and compute area in km2
def convert_to_utm_and_compute_area(row):
    geom = row.geometry
    lon, lat = geom.centroid.coords[0]
    utm_crs = get_utm_zone(lon, lat)
    # Create a GeoDataFrame for the single geometry to use the to_crs method
    geom_gdf = gpd.GeoDataFrame(index=[0], crs='EPSG:4326', geometry=[geom])
    geom_utm = geom_gdf.to_crs(utm_crs)
    area_km2 = geom_utm.area[0]/1e6
    return area_km2

#Applying the convert_to_utm_and_compute_area function
ad_ifm_fc.crs = 'EPSG:4326'  # Ensure the CRS is set to EPSG:4326
ad_ifm_fc['area_km2'] = ad_ifm_fc.apply(convert_to_utm_and_compute_area, axis=1) #Creating a new column, 'area_km2'.

Since our dataframe includes both point and polygon geometries, the function will only compute the area for polygons, leaving ‘NA’ for points.

We will first filter out the ‘NA’ values from the area_km2 column and convert the remaining values to float.

Next, we’ll group the data by fire_risk_type and ecosystem_type, and sum the areas within each group.

ad_ifm_fc

#Filter the NA values from the the area_km2 column
ad_ifm_fc_clean = ad_ifm_fc[ad_ifm_fc['area_km2'] != 'NA']

#Convert the values in the area_km2 column to float
ad_ifm_fc_clean['area_km2'] = ad_ifm_fc_clean['area_km2'].astype(float)

# Group by Fire Risk type and Ecosystem type, then sum the areas
area_by_risk_and_ecosystem = ad_ifm_fc_clean.groupby(['fire_risk_type', 'ecosystem_type'])['area_km2'].sum().reset_index()

#area_by_risk_and_ecosystem = area_by_risk_and_ecosystem.iloc[1:].reset_index(drop=True)

area_by_risk_and_ecosystem

Since we do not want to include the “Non fire-prone” category in this graph, we will remove the last row from the dataframe.

#Removing the last row
area_by_risk_and_ecosystem = area_by_risk_and_ecosystem.iloc[:-1 , :]

As we did when creating the previous table, convert the fire_risk_type column to a categorical type with a specified order for the categories.

area_by_risk_and_ecosystem['fire_risk_type'] = pd.Categorical(area_by_risk_and_ecosystem['fire_risk_type'], ["Low", "Medium", "High"])

Finally, we create a grouped horizontal bar graph with fire_risk_type and ecosystem_type grouped on the y-axis. The x-axis represents the area under each category.

# Define custom colors for each Fire Risk type
colors = {'High': '#da1e28', 'Medium': '#ff8389', 'Low': '#ffd7d9'}
#Labels
labels = ['Low', 'Medium', 'High']

# Plotting
fig, ax = plt.subplots(figsize=(10, 6))
area_by_risk_and_ecosystem.pivot(index='ecosystem_type', columns='fire_risk_type', values='area_km2').plot(kind='barh', stacked=False, 
                            width=0.7, ax=ax, logx=True, color=colors)

# Annotate each bar with its area value
for p in ax.patches:
    width = p.get_width()
    height = p.get_height()
    x, y = p.get_xy() 
    ax.annotate(f'{int(width)}', (x + width * 1.05, y + height / 2), va='center', ha='left')

# Show plot
plt.legend(frameon=False, loc='upper left', bbox_to_anchor=(0.001, 1), ncol=3, labels=labels)
# Remove the x-axis ticks and labels
ax.tick_params(axis='x', which='both', bottom=False, top=False, labelbottom=False)
# Increase y-axis tick label font size
ax.tick_params(axis='y', labelsize=14)

# Customize plot
ax.set_ylabel('')
ax.set_xlabel('Area ($Km^2$)', fontsize=14)
ax.set_title('Area of Projects Under Fire Risk Categories by Ecosystem Type', fontsize=16, loc='left', pad=20)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.spines['bottom'].set_visible(False)
ax.spines['left'].set_visible(False)

# Adjust layout and show plot
plt.tight_layout()

plt.savefig('high_resolution_plot.png', dpi=300)

plt.show()