Prediction of customer based on weather data

Comments · 169 Views

In united states, convenience stores provide basic daily necessities for people around their residence. Business owners are concerned about their daily sales. Several factors such as weather, fuel prices, available inventory, public holidays play a very vital role which directly affect sto

 
  1. Introduction :

When it comes to running business successfully, sales forecasting is a crucial component. In managing convenience store, making the right decision in placing a balanced order is a critical job that can enhance the competition of the corporation. Freshness and rapid speed of turnover have become important considerations. Store owners need sales forecast as an essential input. By using more extensive store inventory cannot perpetuate total store sales instead of customer service has become one of the key elements to run business more profitably. Weather is identified as the cause of month to month fluctuations in convenient store. This is not an event of regular seasonal variations, but rather amplification and decamping from the seasonal cycle. When we compare two seasons namely summer and winter than the store sales in the winter is far less than the sales had done in the summer. A beautiful sunny day can easily boost up store sale 5% to 10 %. while a cold rainy day has the opposite effect on the sales and reduce it.

 

This paper examines the effect of weather on convenient store. In holidays, people who wants to go for hangouts for relaxation they can stop and buy large amount of groceries from the convenient store. So, when the weekend comes store sales were gone high due to this factor.

Because of this store owners have to stock and maintain the store inventory time to time to give a good customer service. Other factor which plays a very vital role for convenient store is fuel price. The key reason is that low gas prices are good for convenience store. Inside store sales increases as consumers have more money to spend inside the store. Unfortunately, like the weather, which also affects sales, the price of oil is well beyond the control of convenience stores.


 

  1. Literature Review :
  • One study shows that, since 1994, in the United States manufacturing of convenient store has increased every year with many stores opened and closed yearly. This report shows that the number of convenient stores running has increased to more than 33,000 stores today[1].
  • Another study shows by using GMFLN forecasting model integrates Gray analysis model (GRA) which filter out more important factors from raw data then transform them as the input data in the multilayer functional link network to provide more accurate forecasting result. This study estimated the real data, which are provided by franchise company and the result showed the GMFLN model exceeds than other different time series forecasting models.[2].
  • One study displays that the inventory which customers’ demands is always out of stock, no matter how good the customer service, satisfaction of the customer will be hard to improve. The point of sale (POS) system provides the information analysis ability and can be used to analyse consumers purchasing behaviour as well as forecast needs[3].
  • A study describes that Alterations in customer spending are often accredited to weather conditions. Weather plays compelling role in describing daily sales fluctuations. Average daily temperature is most effecting parameters in this.[4].
  • One study found out that Inventory management can be retained based on a generated model which is made of historical data. [5]
  • This article shows that weather has a significant outcome on product demands and its availability.[6]


  1. Data Description:

For our project, the data set is derived based on daily convenience store reports from the past 3 years. The sales and customer data were collected from Corner Store (AHM Enterprise), Beaumont, Texas.

The weather data was collected from the following online source:

http://api.wunderground.com/history/airport/KBPT/2018/1/1/DailyHistory.html?req_city=Beaumontreq_state=TXreq_statename=reqdb.zip=reqdb.magic=reqdb.wmo= .

the data set for this project consist readings of several parameters such as

 

  • Number of Customers
  • Total daily Sales
  • Average temperature
  • Average humidity
  • Wind
  • Fuel price
  • Available Inventory
  • Public holidays

 

 

This data set consist of total 1127 days for 8 different parameters.

For this,

  1. Dependent variables: Number of customers, total daily sales.
  2. Independent variables: Average temperature, Average humidity, Wind, Fuel price,     Available Inventory, Public holidays.

After implementation of stepwise forward and backward regression analysis, we try to conclude that Average temperature, Average humidity, Wind, Fuel price, Available Inventory, Public, holidays have various consequential impact on total daily sales as well as number of customers received per day.

 

  1. Methodology: -

This project research was carried out in ‘R-studio’ for programming. The programming language used was ‘R- language’. We found out correlation between various defined parameters from our data set.

 

Multiple Linear regression:

this method is used to estimate real values of our parameters. In this method, we tried to establish relationship between dependent variable and independent variable by using best fitting line approach. This regression line was represented by linear regression equation

Y = β0 + β1x.

 In this equation,

Y- dependent variable, β0= intercept, β1= slope, x= independent variable.

 

5.1) For total daily sales :

Model 1:

Regression of model1 is based on analysis of 80% of the total data which is train data.

R code : model1- lm(Sale~ Avg. Temp + Avg. Humidity + Wind + Fuel. Price + Inventory Holiday, train).

 

We constituted residual of the dataset and standard error , t statistic value, probability-value for all variable  and we concluded the significance of all independent variables by t-statistic. By this multiple linear regression analysis, we conclude that fuel price and inventory are  the significant for this model. P- value for the fuel price variable was derived as 0.00024 which was less than the significance level of 0.05.

 

Model 1:

Y= 5177 -0.7357 (Avg. Temp ) -1.370 ( Avg. Humidity) + 0.4109 (Wind) + 284.6 (Fuel. Price) -0.01378 (Inventory) -8.695 (Holiday).

 

Stepwise regression :

We use this method to analyse the significance of fuel price as well as different parameters on total daily sales. For that, we did forward selection process and backward elimination analysis by AIC function from caret package in R- library. We derived that Fuel price and Inventory has significant effect on Sales. The model with these two independent variable had lowest AIC value which is 14660.66. we selected this model as final model.

 

Model 3 :

After we derived the significance of fuel price and Inventory on total daily sales, We constructed this model, which stated below.

R code= model3 - lm(Sale ~ Fuel. Price + Inventory , train)

 

So our final mathematical model derived was:

Y= 5031 + 286.9 (fuel price) – 0.0139 (Inventory)

 

Prediction :

Further, we use prediction function from r- packages to predict the total daily Sales using testing data of model 3.

R code: model4 - predict(model3,test,type="response")

 

Our aim was to predict total daily sales which is our one of the dependent variables. Thus, we gave “response” command as type. We also predicted the total daily sales using train dataset which is 20% of total defined dataset.

model5 - predict(model3,train,type="response").

 

5.2) For total customer :

Model 1:

Regression of model1 is based on analysis of 80% of the total data which is test data.

R code : model1- lm (Sale~ Avg. Temp + Avg. Humidity + Wind + Fuel. Price + Inventory Holiday, train).

 

We constituted residual of the dataset and standard error , t statistic value, probability-value for all variable as same we done for total sales and we concluded the significance of all independent variables by t-statistic. By this multiple linear regression analysis, we conclude that average temperature and inventory are  the significant for this model. P- value for the average temperature variable was derived as 0.00314, which was less than the significance level of 0.05. The Inventory had a p value (-0.005654) which is very small from 0.05.

 

Model 1 :

Y= 1099 -0.5802 (Avg. Temp ) +0.09181 (Avg. Humidity) + 0.7494 (Wind)

+ 18.90(Fuel. Price) – 0.005654 (Inventory) + 2.319(Holiday).

 

Stepwise regression :

We use this method to analyse the significance of different variable on total customers. For that, we did forward selection process and backward elimination analysis by AIC function from caret package in R- library. We derived that Inventory, Avg. Temp, Fuel.Price have significant effect on Customers . The model with these three independent variable had lowest AIC value which is 10215.69 . we selected this model as final model.

 

Model3 :

After we derived the significance of Avg. Temp, Inventory, Fuel. Price on total customer, We constructed this model, which stated below.

R code= model3 - lm (Customer ~ Avg. Temp+ Inventory+ Fuel. Price, train)

 

So our final mathematical model derived was:

Y=  1116 + 0.05987 (Avg. Temp) – 0.005.684 (Inventory) + 18.95 (Fuel. price)

 

Prediction :

Further, we use prediction function from r- packages to predict the total daily Sales using testing data of model 3.

R code: model5 - predict(model3,test,type="response")

 

Our aim was to predict total customer which is our one of the dependent variables. Thus, we gave “response” command as type.We also predicted the customer using train dataset which is 20% of total defined dataset.

R code: model6 - predict(model3, train, type="response").


6) Result and discussion :

Multiple linear regression was performed to determine significant effect of independent variable on sales and total customer in convenience store. The sales model was finalized with inventory and fuel. Price is most impactful variable. Those variable are having p value in acceptable region, which is less than 0.05. The stepwise regression have strongly concluded this model with lowest AIC value. Multiple linear model was created and summarized to get coefficient value for sales model throughout the month. The same procedure was performed for customer model.

 

Model summarised that average temperature, inventory, and fuel price have most significant effect. This effect was determined by Linear model regression analysis and was concluded by stepwise regression. The model selected for customer was having lowest p value for all separate variable and Lowest AIC score. We tried to obtain root mean square error value for  train and test data for both models. The sales and customer count was calculated  when fuel price is high. The Inventory was calculated in every 30-40 days of interval. So it will remain constant for that given period. So, this effects sales and  customer intensity on negative scale. Weather data impacted the model at certain levels such as, Hurricane Harvey on year 2018 resulted in store closure for 6-7 days. So sales and customer count was nil and for the next few days also recorded low value. These all variables effected the model accuracy.


 

7) Conclusion:

We concluded Sales model has notable independent variable such as inventory and fuel Price. The customer model has inventory, average temperature and fuel price as consequential independent variable. R square value has analysed to determine the shredded data around mean of model. RMSE- root mean square error value was calculated to analyse error between predicted model and original model. We found this value is low for our model. This conclude that our prediction was accurate.

 

The RMSE value for train and test data found to be same for both dataset. RMSE value for sales model, in test dataset found 0.99 and for train dataset 9.211. The customer model has RMSE value for test data and train data are 0.929. These are very low value which shows spread of predicted data to actual data. Reducing RMSE value will increase model accuracy. AIC stepwise regression is very helpful to determine impact of variable on model. Because, sometimes variable with high p value is having low value but having impact on final model.

 

 

8) Future research:

This research study concluded the factor impacting sales and customer model. The effect of inventory was high on model. Because inventory is done after every 30-40 days. So, it is impossible to calculate everyday inventory. So, this reduces the linear model accuracy. If it is possible to get everyday inventory value, then we can attain accuracy in regression analysis. This created model can be used to predict the sales and customer intensity that the store can receive.

 

 

 
Comments