Wednesday, 14 July 2021

Predict the Severity of the Collision using Python

 


Introduction 


Background


    A collision is the event in which two or more bodies exert forces on each other in about a relatively short time. Although the most common use of the word collision refers to incidents in which two or more objects collide with great force, the scientific use of the term implies nothing about the magnitude of the force.


Problem Description

    A traffic collision, also called a motor vehicle collision, car accident, or car crash, occurs when a vehicle collides with another vehicle, pedestrian, animal, road debris, or other stationary obstruction, such as a tree, pole or building. Traffic collisions often result in injury, disability, death, and property damage as well as financial costs to both society and the individuals involved.






This include corresponds to the severity of the collision: 

  •     Fatality
  •     Serious Injury
  •     Injury
  •     Property Damage

    
    The field of Active Safety with respect to motor vehicles is concerned with the prevention of accidents before they happen. Warning drivers about the possibility of accidents and their severity due to weather, road, and visibility conditions is a new approach to prevent or reduce accidents before they take 
place.

Interest

    The traffic control stations would be known for traffic collisions often result in injury, disability, death, and property damage as well as financial costs to both society and the individuals involved. All collisions provided by SPD and recorded by Traffic Records.


Data acquisition and cleaning


Data Source

     Collision of all year data can be found by Coursera and also fine the meta data for that data. All collisions provided by SPD and recorded by Traffic RecordsThis includes all types of collisions. Collisions will display at the intersection or mid-block of a segment. Timeframe: 2004 to Present.

Data cleaning


1. First downloaded the data from source and find the data information, description and shape of the data in after analysis. There were a lot of missing values from data set, because of lack of record keeping. I decided to only use data or categorical values, because most column values are categorical and its help to easy to find a prediction value.

2. Data set has to several problems, so start the cleaning of data. 

3. Many columns are contain object type of datatype. And then some other columns 
are complicated values like date, float and negative values. After drop the unwanted columns based on further analysis.

4. After fixing these problems, I checked for outliers in the data. I found there were some extreme outliers, mostly caused by some types of small sample size problem.


Exploratory Data Analysis

    Histogram of the data separated into each columns based on number of collisions. There are address type, junction type, collision type and others. The separate columns are highlights to the different colors with labels.



    The box plot is represent of relationship for severity of each data. There are address type, status, and weather and road condition.




Predictive Modelling

    Predictive modelling uses statistics to predict outcomes. It is the general concept of building a model that capable of making predictions. Typically, such a model includes a machine learning algorithm that learns certain properties from a training data set in order to make those predictions.


Models

    Models can use one or more classifiers in trying to determine the probability of a set of data belonging to another set. There are two types of models, Regression and Classification. Regression is Supervised Learning task where output is having continuous value. The goal here is to predict a value as much closer to actual output value as our model can and then evaluation is done by calculating error value. The smaller the error the greater the accuracy of our regression model. 

    Classification is a Supervised Learning task where output is having defined labels (discrete value). The goal here is to predict discrete values belonging to a particular class and evaluate on the basis of accuracy. It can be either binary or multi class classification. In binary classification, model predicts either 0 or 1; yes or no but in case of multi class classification, model predicts more than one class.  In this project target value is categorical type (discrete value). So I choose classification model.



Applying standard Classification algorithms 

    Classification in machine learning and statistics is a supervised learning approach in which the computer program learns from the data given to it and make new observations or classifications. A classification model attempts to draw some conclusion from observed values. Given one or more inputs a classification model will try to predict the value of one or more outcomes. Outcomes are labels that can be applied to a data set.


    There are a number of classification models. Classification models include K nearest neighbor and Naive Bayes, Logistic regression, Decision tree, and Random forest.



Performance of Models

    Model evaluation metrics are required to quantify model performance. I choose the model evaluation metrics depends on our machine learning task such as classification algorithms. In precision – recall are useful for multiple tasks.


    I applied some classification matrix for model evaluation. There are Classification accuracy and Confusion matrix. Classification accuracy is the number of correct prediction made as a ratio of all predictions made. Confusion matrix provide a more detailed breakdown of correct and incorrect classification for each class. And also fine Actual and predicted values (True and False).It estimated performance of a model tells as how well it preform on unseen data. And also I find best classifier on this problem based on the table, it is decision tree. It has to high accuracy and better result on confusion matrix. In table explain performance of different models.





Conclusion 


    Finally, I predicted the severity of the collision based on further analysis. I achieved above 68% accuracy in classification algorithms. That is helps to identify the severity of the collision. And also analysis of major features, it used to get better result of this problem.

    Purpose of this project was to identify the type of severity of the collision. The major important of predicting is Weather condition, Road condition, Address of collision, how many peoples are involved, how many vehicles are present and type of collision. That are helps to predict to what type of severity or disability in collision.



And need to explore the project:

No comments:

Post a Comment