Traffic Occurrences Data Analysis with LightningChart Python
Tutorial
Assisted by AI
Explore traffic accidents data analysis using LightningChart Python to visualize and interpret traffic occurrence patterns effectively.
Introduction
Traffic accidents remain a significant global concern, causing substantial economic losses, injuries, and fatalities each year. Understanding the factors contributing to road accidents is crucial for implementing effective safety measures and reducing risks.
Driver-related attributes, such as age, experience, and decision-making under various conditions, play a pivotal role in accident severity. By visualizing and analyzing the relationships between these factors, we can identify patterns and correlations that inform targeted interventions and policy-making.
This study leverages advanced visualization techniques, like polar charts, to highlight the interplay between driver demographics and accident outcomes. The insights gained aim to guide stakeholders in improving road safety and mitigating accident severity through data-driven strategies
About the Dataset
This dataset contains detailed information on traffic accidents, including their severity and the factors influencing them. Key features include weather conditions (e.g., clear, rainy, foggy), road types (e.g., highways, city roads, rural roads), time of day (morning, afternoon, evening, night), traffic density, and speed limits.
It also accounts for driver-related factors such as age, experience, and alcohol consumption, alongside vehicle types (e.g., cars, trucks, motorcycles) and road conditions (e.g., dry, wet, icy). The target variable, Accident Severity, classifies accidents as low, moderate, or high, based on damage or injuries, offering a comprehensive view of factors contributing to road safety risks.
LightningChart Python
The visualizations created using LightningChart demonstrate its exceptional capabilities for crafting detailed, interactive, and aesthetically appealing charts. From polar charts highlighting correlations between driver factors and accident severity to multi-panel dashboards comparing traffic variables like weather, road conditions, and time of day, LightningChart excels in showcasing complex datasets with clarity.
Its support for 3D surface plots and correlation heatmaps further emphasizes its power in representing intricate relationships, making it an ideal tool for traffic accident analysis. The smooth interactivity, customizable themes, and advanced visualization options showcase LightningChart as a powerful solution for turning raw data into actionable insights.
Setting Up Python Environment
First, you need to install Python: Download and install the latest version of Python from the official Python website. Second, you need to import Pandas and LightningChart:
import pandas as pd
import lightningchart as lc
import time
import numpy as np
Visualizing Data with LightningChart Python
Handling Missing Values in the Dataset
To ensure the dataset is clean and ready for analysis, missing values were handled systematically:
- Numerical Columns: Missing values in numerical columns, such as Traffic_Density, Speed_Limit, and Driver_Age, were filled with the median of their respective columns. This approach ensures that outliers do not distort the imputed values.
- Categorical Columns: Missing values in categorical columns, including Weather, Road_Type, and Accident_Severity, were filled with the mode (most frequent value) of each column, maintaining consistency with existing data distributions.
# Fill missing values for numerical and categorical columns
numerical_columns = ['Traffic_Density', 'Speed_Limit', 'Number_of_Vehicles', 'Driver_Alcohol', 'Driver_Age', 'Driver_Experience', 'Accident']
for column in numerical_columns:
df[column].fillna(df[column].median(), inplace=True)
categorical_columns = ['Weather', 'Road_Type', 'Time_of_Day', 'Accident_Severity', 'Road_Condition', 'Vehicle_Type', 'Road_Light_Condition']
for column in categorical_columns:
df[column] = df[column].fillna(df[column].mode()[0])
Multiline Chart: Traffic Accidents Data Analysis and Daily Traffic Metrics and Accident Patterns Across Time of Day
This traffic accidents data analysis demonstrates that peak traffic density and higher vehicle counts in the evening are strongly correlated with an uptick in accident rates, marking evening hours as high-risk periods. It also reveals that driver experience significantly influences the likelihood of accidents, pointing to the necessity of implementing targeted safety measures during these risky times.
Conversely, the analysis shows lower accident rates at night, despite the reduced traffic density, suggesting that conditions are safer due to slower driving speeds or fewer distractions. These findings emphasize the importance of adopting time-specific traffic management and safety strategies to effectively reduce accident occurrences.
import pandas as pd
import lightningchart as lc
# Map time of day and group data
time_of_day_mapping = {'Morning': 1, 'Afternoon': 2, 'Evening': 3, 'Night': 4}
df['time_of_day_index'] = df['Time_of_Day'].map(time_of_day_mapping)
numerical_columns = ['Traffic_Density', 'Speed_Limit', 'Number_of_Vehicles', 'Driver_Age', 'Driver_Experience', 'Accident']
processed_data = {col: df.groupby('time_of_day_index')[col].mean().reset_index() for col in numerical_columns}
# Initialize chart and axes
chart = lc.ChartXY(theme=lc.Themes.TurquoiseHexagon, title='Metrics and Related Accidents Across Time of Day')
legend = chart.add_legend()
# Add line and point series for each metric
for i, (col, df_col) in enumerate(processed_data.items()):
axis_x = chart.add_x_axis(stack_index=i).set_title(col.replace('_', ' ').title())
series = chart.add_line_series(x_axis=axis_x, data_pattern='ProgressiveY').add(df_col[col].tolist(), df_col['time_of_day_index'].tolist())
series.set_name(f'{col.title()} (Average)')
chart.add_point_series(x_axis=axis_x).add(df_col[col].tolist(), df_col['time_of_day_index'].tolist())
# Configure Y-axis for time of day
y_axis = chart.get_default_y_axis().set_title('Time of Day')
for time, index in time_of_day_mapping.items():
y_axis.add_custom_tick().set_value(index).set_text(time)
y_axis.set_interval(1, 4)
chart.open('browser')
Bar Chart: Distribution of Accident Severity Levels
The distribution in the traffic accidents data analysis indicates that most accidents fall into the low severity category, with accidents of moderate severity constituting a smaller fraction, and high severity incidents being the least frequent.
This pattern highlights that, although severe accidents are uncommon, minor incidents are the predominant type of traffic-related issues.
severity_counts = df['Accident_Severity'].value_counts().reset_index()
severity_counts.columns = ['category', 'value']
chart = lc.BarChart(
vertical=True,
theme=lc.Themes.Dark,
title='Accident Severity Distribution'
)
chart.set_data(severity_counts.to_dict(orient='records'))
chart.set_palette_colors(
steps=[{'value': i / (len(severity_counts) - 1), 'color': color}
for i, color in enumerate(['#FF5733', '#33FF57', '#3357FF'])]
)
chart.set_sorting('disabled')
chart.set_value_label_display_mode('insideBar')
chart.open('browser')
Pie Chart: Accident Frequency by Weather Conditions
The traffic accidents data analysis underscores that the majority of accidents take place under clear weather conditions, followed by occurrences during rainy and foggy weather. Incidents during snowy and stormy conditions account for the fewest accidents.
This distribution suggests that while adverse weather conditions certainly influence accident rates, the prevalence of accidents during clear weather indicates that other elements such as traffic density or driver behavior might have a more significant impact in these instances.
weather_accident_counts = df['Weather'].value_counts()
slices = [{'name': weather, 'value': count} for weather, count in weather_accident_counts.items()]
chart = lc.PieChart(
title='Impact of Weather Conditions on Accident Frequency',
theme=lc.Themes.Black
)
chart.add_slices(slices)
chart.set_inner_radius(30)
chart.add_legend(data=chart)
chart.open('browser')
3D Surface Plot: Accident Severity by Weather Conditions and Road Types
The traffic accidents data analysis presented in the 3D surface chart illuminates the relationship between weather conditions, road types, and accident severity. The chart reveals significant variations in accident severity across different combinations of road types and weather conditions.
Certain combinations of weather and road conditions are associated with higher accident severities, highlighting potential risk factors that could impact safety. The continuous gradient and distinct peaks observed in the surface chart suggest specific areas where certain weather conditions or road types contribute to environments more susceptible to severe accidents.
import lightningchart as lc
import numpy as np
from scipy.interpolate import griddata
chart = lc.Chart3D(theme=lc.Themes.Dark, title='3D Surface Grid Series')
def create_surface_chart(chart, x, y, z, grid_size):
X_grid, Y_grid = np.meshgrid(
np.linspace(min(x), max(x), grid_size),
np.linspace(min(y), max(y), grid_size)
)
Z_grid = griddata((x, y), z, (X_grid, Y_grid), method='cubic')
Z_grid[np.isnan(Z_grid)] = np.nanmean(z)
surface_series = chart.add_surface_grid_series(columns=Z_grid.shape[1], rows=Z_grid.shape[0])
surface_series.set_start(x=min(x), z=min(y))
surface_series.set_end(x=max(x), z=max(y))
surface_series.invalidate_height_map(Z_grid.tolist())
surface_series.invalidate_intensity_values(Z_grid.tolist())
chart.get_default_x_axis().set_title('Weather Condition')
chart.get_default_y_axis().set_title('Road Type')
chart.get_default_z_axis().set_title('Accident Severity')
create_surface_chart(chart, x=df['Weather'], y=df['Road_Type'], z=df['Accident'], grid_size=100)
chart.open('browser')
Correlation Heatmap: Relationships Among Traffic Accident Variables
The traffic accidents data analysis displayed in the heatmap uncovers relationships between various variables influencing traffic accidents. Strong positive correlations are evident between related factors such as Number of Vehicles and Traffic Density, indicating their combined significant influence on road conditions.
Conversely, weaker or near-zero correlations between factors such as Weather and Accident Severity suggest a minimal direct impact on the severity of accidents. These insights are crucial for guiding the prioritization of variables in predictive modeling and formulating targeted interventions to enhance road safety.
df_encoded = df.copy()
for column in categorical_columns:
df_encoded[column] = df_encoded[column].astype('category').cat.codes
corr_array = df_encoded.corr().to_numpy()
min_value, max_value = corr_array.min(), corr_array.max()
chart = lc.ChartXY(title="Correlation Heatmap (All Variables)", theme=lc.Themes.TurquoiseHexagon)
heatmap_series = chart.add_heatmap_grid_series(columns=corr_array.shape[1], rows=corr_array.shape[0])
heatmap_series.set_start(x=0, y=0).set_end(x=corr_array.shape[1], y=corr_array.shape[0])
heatmap_series.invalidate_intensity_values(corr_array.tolist())
heatmap_series.set_palette_coloring(
steps=[
{"value": min_value, "color": lc.Color(0, 0, 0)},
{"value": 0.8, "color": lc.Color(255, 165, 0)},
{"value": max_value, "color": lc.Color(255, 255, 255)}
],
interpolate=True
)
variables = df_encoded.columns.tolist()
for i, label in enumerate(variables):
chart.get_default_x_axis().add_custom_tick().set_value(i + 0.5).set_text(label).set_tick_label_rotation(90)
chart.get_default_y_axis().add_custom_tick().set_value(i + 0.5).set_text(label)
chart.open('browser')
Bar Chart Dashboard: Accident Distribution Across Traffic Variables
The bar chart dashboard provides an insightful comparison of accident occurrences across various factors, highlighting critical patterns in traffic accidents data analysis. Clear weather and dry road conditions account for the highest number of traffic accidents, indicating frequent traffic activity under favorable conditions.
Afternoon and evening hours see a noticeable rise in traffic accidents, likely corresponding to peak traffic times. When analyzing vehicle types, cars are shown to dominate traffic accident figures, reflecting their significant presence on roads compared to trucks, motorcycles, and buses.
Additionally, periods with artificial lighting and daylight have higher rates of traffic accidents than periods with no light, further emphasizing the correlation between traffic density and traffic accidents during these times.
These observations underline the importance of understanding traffic dynamics under different conditions to inform targeted safety measures and road management strategies in traffic accidents data analysis.
dashboard = lc.Dashboard(theme=lc.Themes.TurquoiseHexagon, rows=(len(categorical_columns) // 3) + 1, columns=3)
def create_stacked_bar_chart(data, category, group_by, row_index, column_index):
grouped = data.groupby(group_by)
stacked_data = [
{'subCategory': str(group), 'values': grouped.get_group(group)[category].value_counts().tolist()}
for group in grouped.groups.keys()
]
unique_categories = data[category].value_counts().index.tolist()
chart = dashboard.BarChart(vertical=True, column_index=column_index, row_index=row_index)
chart.set_data_stacked(unique_categories, stacked_data)
chart.add_legend().add(chart)
for idx, column in enumerate(categorical_columns):
create_stacked_bar_chart(df, column, 'Accident', idx // 3, idx % 3)
dashboard.open('browser')
Polar Chart: Correlation Between Driver Age, Experience, and Accident Severity
The polar chart provides a detailed visualization of the relationship between driver factors such as age and experience and accident severity in traffic accidents data analysis. The green radial lines representing driver age show a consistent spread across the chart, indicating the broad distribution of ages involved in traffic accidents.
The purple lines for driver experience exhibit a more concentrated pattern, suggesting that most traffic accidents involve drivers with moderate experience levels.
The red radial lines representing accident severity stand out, showing distinct peaks corresponding to higher severity levels in traffic accidents. The overlap and alignment between these severity peaks and certain age or experience ranges suggest potential correlations, such as younger or less experienced drivers being more frequently involved in severe accidents.
This visualization underscores the importance of targeted interventions based on driver demographics to reduce severe traffic accidents, highlighting key insights from the traffic accidents data analysis.
polar_chart = lc.PolarChart(theme=lc.Themes.TurquoiseHexagon, title="Driver Factors vs Accident Severity")
angles = np.linspace(0, 360, len(df))
def add_polar_series(chart, data, color, name):
series_data = [{'angle': angle, 'amplitude': amp} for angle, amp in zip(angles, data)]
series = chart.add_line_series().set_data(series_data)
series.set_name(name)
series.set_line_color(color)
return series
add_polar_series(polar_chart, df['Driver_Age'], lc.Color(0, 255, 0), "Driver Age")
add_polar_series(polar_chart, df['Driver_Experience'], lc.Color(128, 0, 128), "Driver Experience")
add_polar_series(polar_chart, df['Accident_Severity_Encoded'] * 10, lc.Color(255, 0, 0), "Accident Severity")
polar_chart.add_legend().add(polar_chart)
polar_chart.open('browser')
Predictive Modeling: Classifying Traffic Accidents with Machine Learning
The final stage of the traffic accidents data analysis involved constructing a predictive model to classify traffic accidents based on the identified features such as weather conditions, road types, and driver characteristics.
Utilizing a Random Forest Classifier, the model achieved a strong predictive performance with an accuracy of 78%. The dataset underwent preprocessing to address missing values, and categorical variables were encoded for numerical representation, essential steps in traffic accidents data analysis.
Additionally, the dataset was balanced using SMOTE to ensure a fair representation of all accident classes, which significantly enhanced the robustness of the model. These steps collectively contributed to the effective prediction capabilities of the model in the context of traffic accidents data analysis.
The model’s evaluation metrics reveal valuable insights:
In the traffic accidents data analysis, the predictive model displayed differentiated performance across accident classifications. For accidents labeled as non-occurrences (0.0), the model achieved a precision of 74% with a recall of 87%, effectively minimizing false negatives for this class and demonstrating the model’s ability to accurately identify true non-accident situations.
For accident occurrences (1.0), the model reached a precision of 84% and a recall of 69%, showcasing good accuracy in predicting true accidents, though it experienced some issues with false negatives.
The balanced F1-scores for both classes, ranging approximately from 0.76 to 0.80, confirm the model’s consistent performance across both categories in the traffic accidents data analysis. This consistency supports the model’s reliability as a foundation for classifying traffic accidents and can help decision-makers identify high-risk conditions and factors.
Looking ahead, future enhancements to the model might include incorporating additional features or experimenting with different algorithms to further refine accuracy and broaden the model’s applicability in traffic accident prevention strategies.
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder
from imblearn.over_sampling import SMOTE
# Load the dataset
file_path = 'dataset_traffic_accident_prediction1.csv'
df = pd.read_csv(file_path)
# Handle missing values
numerical_columns = ['Traffic_Density', 'Speed_Limit', 'Number_of_Vehicles',
'Driver_Alcohol', 'Driver_Age', 'Driver_Experience', 'Accident']
for column in numerical_columns:
df[column].fillna(df[column].median(), inplace=True)
categorical_columns = ['Weather', 'Road_Type', 'Time_of_Day', 'Accident_Severity',
'Road_Condition', 'Vehicle_Type', 'Road_Light_Condition']
for column in categorical_columns:
df[column] = df[column].fillna(df[column].mode()[0])
# Encode categorical variables using LabelEncoder
label_encoders = {}
for column in categorical_columns:
le = LabelEncoder()
df[column] = le.fit_transform(df[column])
label_encoders[column] = le
# Split the dataset into features (X) and target (y)
X = df.drop(columns=['Accident'])
y = df['Accident']
# Apply SMOTE to balance the dataset
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2, random_state=42)
# Train a Random Forest Classifier
rf_model = RandomForestClassifier(random_state=42)
rf_model.fit(X_train, y_train)
# Make predictions on the test set
y_pred = rf_model.predict(X_test)
# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
print("Accuracy:", accuracy)
print("Classification Report:\n", report)
Conclusion
This analysis of traffic accidents, powered by LightningChart Python, effectively combined advanced visualizations and predictive modeling to unearth key insights. The data exposed critical patterns such as the impact of traffic density, road conditions, weather, and driver behavior on accident severity.
Visual tools like polar charts, bar charts, and 3D surface plots provided a detailed understanding of these influencing factors. Additionally, correlation heatmaps showcased strong relationships between variables, which was instrumental in aiding feature selection for the predictive modeling.
The Random Forest Classifier used in this traffic accidents data analysis proved its efficacy, achieving an accuracy of 78% and delivering balanced performance across different accident severity classes. This predictive tool represents a substantial advancement toward proactive traffic management, empowering decision-makers to strategically address and mitigate high-risk conditions.
Continue learning with LightningChart
Swing index indicator: formula and implementation with LC JS Trader
Learn the Swing Index indicator formula and implementation with LightningChart JS Trader to detect trend direction and refine trading signals.
How to use the Supertrend indicator for Fintech app development
Learn about the Supertrend indicator in fintech app development to generate clear buy and sell signals, optimize ATR settings, and enhance trading strategies.
Using the Schaff Trend Cycle Indicator for Fintech App Development
Learn how the Schaff Trend Cycle combines MACD and stochastic logic to deliver faster, smoother momentum signals for fintech trading applications.
