Sunday, 10 October 2021

How to fit a Non Linear Model

 





How to fit Non-linear Regression Model 


Introduction:


    In this blog, we fit a non-linear model to the data points corresponding to China's GDP from 1960 to 2014.

    This blog offers you a step-by-step instruction guide with source code, so you can build your model. It is not designed to be a deep dive into model design, statistical analysis, improvement, and validation. If you want to learn more, please check out my blog site: Techy Scientists.

It contains the following parts:


  1. Setup your environment
  2. Data Preparation
  3. Exploratory data analysis
  4. Non-linear Regression Model
  5. Model Evaluation


Setup your environment


   To run the program on your local computer, install the following required libraries, These libraries are 


  1.   python 
  2.   numpy
  3.   pandas
  4.   matplotlib
  5.   scikit-learn
  6.   scipy


Data Preparation


Understand the data


  We have downloaded a China’s GDP dataset, which contains China’s corresponding annual gross domestic income in US dollars for 1960 to 2014. You can find more information about the data, go to China’s GDP data




Import the Packages


  Create a python file (for example model.py). After installed the required packages and import packages  in your python file.


import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

Read the Data


  Read the data using Pandas


df = pd.read_csv("china_gdp.csv")

Exploratory data analysis

    Lets start exploratory data analysis on our data. This is what the data points look like. It kind of looks like either a logistic or exponential function.

x_data, y_data = (df["Year"].values, df["Value"].values)
plt.plot(x_data, y_data, 'ro')
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()






You can adjust the slope and intercept to verify the changes in the graph.

X = np.arange(-5.0, 5.0, 0.1)

Y= np.exp(X)
y_noise = 2 * np.random.normal(size=X.size)
ydata = Y + y_noise

plt.plot(X,ydata,'bo') 
plt.plot(X,Y,'-r')
plt.ylabel('Dependent Variable')
plt.xlabel('Indepdendent Variable')
plt.show()





Non-linear Regression Model

  Non-linear regression, where the relationship between the independent variable x and the dependent variable y, which results in a non-linear function modeled data.

   Now, let's build our regression model and initialize its parameters. sigmoid line that might fit with the data.

def sigmoid(x, Beta_1, Beta_2):
     y = 1 / (1 + np.exp(-Beta_1*(x-Beta_2)))
     return y


Normalize our data,

xdata =x_data/max(x_data)
ydata =y_data/max(y_data)


How do we find the best parameters for our fit line?


we can use curve fit which uses non-linear least squares to fit our sigmoid function, to data.

from scipy.optimize import curve_fit

popt, pcov = curve_fit(sigmoid, xdata, ydata)


    Now we plot our resulting regression model.

x = np.linspace(1960, 2015, 55)
x = x/max(x)
plt.figure(figsize=(8,5))
y = sigmoid(x, *popt)
plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(x,y, linewidth=3.0, label='fit')
plt.legend(loc='best')
plt.ylabel('GDP')
plt.xlabel('Year')
plt.show()





Model Evaluation

    We compare the actual values and predicted values to calculate the accuracy of a regression model. Evaluation metrics provide a key role in the development of a model, as it provides insight to areas that require improvement.


msk = np.random.rand(len(df)) < 0.8
train_x = xdata[msk]
test_x = xdata[~msk]
train_y = ydata[msk]
test_y = ydata[~msk]
from sklearn.metrics import r2_score

popt, pcov = curve_fit(sigmoid, train_x, train_y)
y_hat = sigmoid(test_x, *popt)

print("Mean absolute error: %.2f" 
	% np.mean(np.absolute(y_hat - test_y)))
print("Residual sum of squares (MSE): %.2f" 
	% np.mean((y_hat - test_y) ** 2))
print("R2-score: %.2f" 
	% r2_score(y_hat , test_y) )
Out[]:
Mean absolute error: 0.04
Residual sum of squares (MSE): 0.00
R2-score: 0.97


  • Mean absolute error: It is the mean of the absolute value of the errors. This is the easiest of the metrics to understand since it’s just average error.
  • Root Mean Squared Error (RMSE): This is the square root of the Mean Square Error.
  • R-squared is not error, but is a popular metric for accuracy of your model. It represents how close the data are to the fitted regression line. The higher the R-squared, the better the model fits your data. Best possible score is 1.0 and it can be negative (because the model can be arbitrarily worse).


Conclusion:

     In summary, we fit a Non-linear model to the data points corresponding to China's GDP from 1960 to 2014, which I have implemented using Scikit learn and SciPy. If you want to source code, check this GitHub linkNon-linear regression

Thank you...

No comments:

Post a Comment