Up Selling / Cross Selling Analysis Schema with Python
Overview¶
We begin with the overview from this slide. After downloading the data based on instructions in iykra web. The outline as follows:
- ## 1. Read the Data
- ## 2. Plotting the Data
- ## 3. Summary Data
- ## 4. Correlation Analysis
- ## 5. Regression Analysis
- ## 6. Summary
1. Read the Data¶
You can download the data from through this link : This data is transactional data from burger outlet
import csv
import pandas as pd
import numpy as np
#plotting
import matplotlib.pyplot as plt
import seaborn as sns
import IPython.display as ipd
import librosa.display
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go
import plotly.tools as tls
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display,HTML
CSS = """
.output {
flex-direction : row;
}
"""
HTML('<style>{}</style>'.format(CSS))
data = pd.read_csv('Static Dataset Transaction - Burger Star (Outlet A).csv')
data.head()
data = data.sort_values(['Amount Burger','Amount Drink'], ascending = False)
The data is consists of several feature as follows :¶
- Transaction time
- Burger type
- Burger code
- Burger amount
- Drink type
- Drink code
- Drink Amount
- Gender
- Payment Method
Here is the metadata that given by default from the csv file. On the next session, We have to modify some type of feature with domain knowledge that we have. For instance, Gender have to be categorical one (or object). This session is to get a proper intuition and interpretation from the data.
data.dtypes
For the following session, we plot the data to get the sense of the data and know what analysis we have to create¶
2. Plotting the data¶
data['hour'] = data['Timestamp'].str[:2].astype(int)
data['trx'] = 1
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
fig = plt.figure(figsize=(15,8))
plt.scatter(data['Amount Burger'], data['Amount Drink'], color='red')
fig.suptitle('Amount Burger Vs Amount Drink', fontsize=20)
plt.xlabel('Amount Burger', fontsize=14)
plt.ylabel('Amount Drink', fontsize=14)
plt.grid(False)
plt.show();
From the chart above, we hardly found the linear relationship between Amount Drink and Amount Burger. Since The amount of burger does not change linearly as amount drink change. But we know that, sum of the customers order more than one burger for one drink. It can be customer bring their own drink, or just take away order. For further, may be we can create the cross selling package for increase the revenue of this outlet
Next, we want to see the composition of the burger type by number of transactions. For further research, we can conduct upselling or crosseling by the type of burger
Aggregating the data based on the type of burger¶
#Type burger yang paling banyak dibeli
burger = data.groupby(['Type Burger'])['trx'].sum().reset_index().sort_values(['trx'],ascending=False).reset_index(drop=True)
sumburger = sum(burger['trx'])
burger['pct'] = (100. * burger['trx']/sumburger).round(1).astype(str) + '%'
display(burger)
import plotly.express as px
fig = px.bar(burger, x='Type Burger', y='trx',
#hover_data=['gender'], color='gender',
labels={'Type Burger':'Type of Burger','trx':'Number of Transactions'}, height=400)
fig.show()
#Type burger yang paling banyak dibeli
burger = data.groupby(['Type Burger'])['Amount Burger'].sum().reset_index().sort_values(['Amount Burger'],ascending=False).reset_index(drop=True)
sumburger = sum(burger['Amount Burger'])
burger['pct'] = (100. * burger['Amount Burger']/sumburger).round(1).astype(str) + '%'
display(burger)
import plotly.express as px
fig = px.bar(burger, x='Type Burger', y='Amount Burger',
#hover_data=['gender'], color='gender',
labels={'Type Burger':'Type of Burger','Amount Burger':'Amount of Burger'}, height=400)
fig.show()
From the table above, Beef Burger is the most popular burger in this outlet by 37% of total transactions, also by the size of amount burger ordered as well, Beef burger has the most burger that customers ordered by 27% of total Amount Burger.
datahour = data.groupby(['hour'])['Amount Burger'].sum().reset_index().sort_values(by=['hour'], ascending = True)
We define hour as two first string in timestamp feature. Since, we want to know the relationship between hour transactions and the amount of burger ordered by customers, also timestamp future is too rigit to be aggregated
fig = plt.figure(figsize=(15,8))
plt.scatter(datahour['hour'], datahour['Amount Burger'], color='green')
fig.suptitle('Hour of Transactions Vs Sum of Amount Burger', fontsize=20)
plt.xlabel('Hour', fontsize=14)
plt.ylabel('Amount Burger', fontsize=14)
plt.grid(False)
plt.show();
Based on the scatter plot above, the maximum that the amount burger ordered by customers is at 11 a.m, and the minimum amount of burger ordered by the customers is at 3 p.m.
timedata = data.groupby(['Type Burger','hour'])['Amount Burger'].sum().reset_index().sort_values(['Amount Burger','hour'], ascending = False).reset_index(drop=True)
g = sns.FacetGrid(timedata, col="Type Burger")
g.map(plt.scatter, "hour", "Amount Burger", alpha=.7)
g.add_legend();
datahourdrink = data.groupby(['hour'])['Amount Drink'].sum().reset_index().sort_values(by=['hour'], ascending = True)
fig = plt.figure(figsize=(15,8))
plt.scatter(datahourdrink['hour'], datahourdrink['Amount Drink'], color='blue')
fig.suptitle('Hour of Transactions Vs Sum of Amount Drink', fontsize=20)
plt.xlabel('Hour', fontsize=14)
plt.ylabel('Amount Drink', fontsize=14)
plt.grid(False)
plt.show();
timedatadrink = data.groupby(['Type Drink','hour'])['Amount Drink'].sum().reset_index().sort_values(['Amount Drink','hour'], ascending = False).reset_index(drop=True)
g = sns.FacetGrid(timedatadrink, col="Type Drink")
g.map(plt.scatter, "hour", "Amount Drink", alpha=.7)
g.add_legend();
By aggregating sum of the amount burger, we can see the 11 a.m is the peak hour for the outlet that sold the largest amount of all burger type. On the next pages, we can see what the most popular Type of burger by the hour of the transactions.
Here are the snapshot result from aggregating sum of the data based on the hour feature that created previously.
gendertrx = data.groupby(['Gender'])['trx'].sum().reset_index()
fig = go.Figure(data=[go.Pie(labels=['gender0', 'gender1'], values=gendertrx['trx'])])
fig.show()
genderbtype = data.groupby(['Gender','Type Burger'])['trx'].sum().reset_index()
gender0 = genderbtype[genderbtype['Gender']==0]
gender1 = genderbtype[genderbtype['Gender']==1]
from plotly.subplots import make_subplots
labels = genderbtype['Type Burger'].tolist()
fig = make_subplots(rows=1, cols=2,specs=[[{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=labels, values=gender0['trx'].tolist(),name = 'gender0'),1, 1)
fig.add_trace(go.Pie(labels=labels, values=gender1['trx'].tolist(),name = 'gender1'),1, 2)
fig.update_layout(title_text="Gender Vs Type of Burger Transactions")
fig.show();
PAYMENT¶
payment = data.groupby(['Payment'])['Timestamp'].count().reset_index()
payment.columns = ['Payment','sumtrx']
payment = payment.sort_values(['sumtrx'],ascending = False).reset_index(drop=True)
import plotly.express as px
fig = px.bar(payment, x='Payment', y='sumtrx',
#hover_data=['gender'], color='gender',
labels={'Payment':'Payment Method','sumtrx':'Number of Transactions'}, height=400)
fig.show()
payfood = data.groupby(['Payment','Type Burger'])['trx'].sum().reset_index()
payfood = payfood.sort_values(['Payment','trx'], ascending=False).reset_index(drop=True)
fig = px.bar(payfood, x='Payment', y='trx',
hover_data=['Type Burger'], color='Type Burger',
labels={'Payment':'Payment Method','Amount Burger':'Number of Transactions'}, height=400)
fig.show()
fig = go.Figure()
fig = px.histogram(data, x='Amount Drink',
hover_data=['Gender'],color='Gender',
labels={'count':'Count of Burger','Amount Drink':'Amount of Drink'}, height=400)
fig.update_layout(
margin=dict(l=80, r=30, t=30, b=30),
paper_bgcolor="LightSteelBlue",
)
g = sns.FacetGrid(data, col="Gender")
g.map(plt.hist, "Amount Drink");
data[['Amount Burger','Amount Drink']].describe()
data[data['Code Burger']==1][['Amount Burger','Amount Drink']].describe()
data[data['Code Burger']==2][['Amount Burger','Amount Drink']].describe()
data[data['Code Burger']==3][['Amount Burger','Amount Drink']].describe()
data[data['Code Burger']==4][['Amount Burger','Amount Drink']].describe()
import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x=data["Amount Burger"])
import seaborn as sns
sns.set(style="whitegrid")
ax = sns.boxplot(x=data["Amount Drink"])
ax = sns.boxplot(x="hour", y="Amount Burger", data=data)
ax = sns.boxplot(x="hour", y="Amount Drink", data=data)
Correlation between Amount burger & Amount drink¶
data['Gender0'] = (data['Gender']==1).astype(int)
data[['Amount Burger','Amount Drink','Gender0']].corr()
data['Gender1'] = (data['Gender']==0).astype(int)
data[['Amount Burger','Amount Drink','Gender1']].corr()
Based on the table above, assume that the Burger, Drink, and customer gender have a linear relationship. Gender 0 have negative correlation with amount of burger. It means that, customer with gender 1 is more like to order burger and drink than the customers with gender 0. So, if we want to offer the promo, it would be more efficient.
Correlation between Amount Burger & Hour¶
data[['Amount Burger','hour']].corr()
data[['Amount Drink','hour']].corr()
data['food'] = data['Type Burger'] +','+data['Type Drink']
data['sumitem'] = data['Amount Burger'] + data['Amount Drink']
data['countrx'] = 1
Correlation between Sum Item & Hour¶
data[['sumitem','hour']].corr()
# Paket burger yang paling banyak dibeli
foodgroup = data.groupby(['food','hour','Gender'])['sumitem'].sum().reset_index().sort_values(['sumitem'],ascending = False).reset_index(drop=True)
sumfoodgroup = sum(foodgroup['sumitem'])
foodgroup['pct'] = (foodgroup['sumitem']/sumfoodgroup)
foodgroup.head()
# Paket burger yang paling banyak dibeli
foodgrouptrx = data.groupby(['food','hour','Gender'])['countrx'].sum().reset_index().sort_values(['countrx'],ascending = False).reset_index(drop=True)
sumfoodgrouptrx = sum(foodgrouptrx['countrx'])
foodgrouptrx['pct'] = (foodgrouptrx['countrx']/sumfoodgrouptrx)
foodgrouptrx.head()
fig = px.bar(foodgroup, x='hour', y='sumitem',
#hover_data=['gender'], color='gender',
labels={'hour':'Time of Transactions','sumitem':'Sum of Pacakage'}, height=400)
fig.show()
fig = px.bar(foodgrouptrx, x='hour', y='countrx',
#hover_data=['gender'], color='gender',
labels={'hour':'Time of Transactions','countrx':'Number of Transactions'}, height=400)
fig.show()
import plotly.express as px
fig = px.bar(foodgroup, x='food', y='sumitem',
#hover_data=['gender'], color='gender',
labels={'food':'Food Package','sumitem':'Sum of Burger & Drinks'}, height=600)
fig.show()
import plotly.express as px
fig = px.bar(foodgrouptrx, x='food', y='countrx',
#hover_data=['gender'], color='gender',
labels={'food':'Food Package','countrx':'Count of Transactions'}, height=600)
fig.show()
# Paket burger berdasarkan gender
foodgender = data.groupby(['food','Gender'])['sumitem'].sum().reset_index().sort_values(['sumitem','Gender'],ascending = False).reset_index(drop=True)
sumfoodgender = sum(foodgender['sumitem'])
foodgender['pct'] = (foodgender['sumitem']/sumfoodgender)
foodgender
import plotly.express as px
fig = px.bar(foodgender, x='food', y='sumitem',
hover_data=['Gender'], color='Gender',
labels={'food':'Food Package','sumitem':'Number of Transactions'}, height=600)
fig.show()
For the very first suggestions, we can't already decide to make a proper crosseling burger product since the transactions is just sligthly diffrent. Need to explore more feature. We just can see the beef burger is the most popular burger in this outlet.
Yes, the diffrent is clearly view on the chart above. From the previous chart we know that the beef burger is the most popular burger at the outlet. But if we combine with the drinks, the cheese burger is the most burger that the customers want. So, we suggest the package of combination between cheese burger with coke and beef burger with the most profitable drink (call it Beef Cheese package). Since the beef burger is the most burger that customers want, no matter the drink (based on the transactional data).Then the customers can choose the type of drink whenever he/she want. The amount of combinations can't decide yet since we don't know the price and the profitability for each food (burger & drink) yet.
For further scheme, we can offer the crossseling package based on the gender of the customers. Then we can make the cross selling / up selling more effective (Since we know the target of the customers)
gender = data.groupby(['Gender'])['sumitem'].sum().reset_index()
gender
Clearly show from the chart above that the gender 0 are the most customers gender type to buy the Cheese burger and coke. Also with the beef burger with no drinks. So, we can offer more the cross selling in the previous slide to gender 0. Maybe with the gift item specifically for gender 0. For gender 1, we can offer the beef burger + chicken burger with the most profitable drink. Since we knew from the chart above, the gender 1 most likely to buy the beef burger and chicken burger
Then, we also can combine the package based on the peak hour of the transaction made. We can offer the package at the specific time
data['hour'] = data['Timestamp'].str[:2]
peak = data.groupby(['hour'])['sumitem'].sum().reset_index().sort_values(['sumitem'],ascending=False)
peak
fig = px.bar(peak, x='hour', y='sumitem',
#hover_data=['Type Burger'], color='Type Burger',
labels={'hour':'Time Transactions (hour)','sumitem':'Sum of item transactions'}, height=400)
fig.show()
data['trx'] = 1
peakhourbin = data.groupby(['hour','food'])['Amount Burger'].sum().reset_index().sort_values(['Amount Burger'],ascending=False).reset_index(drop=True)
peakhourbin.head()
We can offer the Beef Cheese package at the 11-13 at launch time. Also with beef burger package as well.
Conclusion¶
Finally, we have some simple strategy to create cross/up selling scheme. As a Data Analyst / Data Scientist, we have to discuss with the Business Stake Holder to get more insight. So, the scheme would be align with the business process. I hope, you can gather some insight to get the point.
Thank you :)
from IPython.display import HTML
HTML('''<script>
code_show=true;
function code_toggle() {
if (code_show){
$('div.input').hide();
} else {
$('div.input').show();
}
code_show = !code_show
}
$( document ).ready(code_toggle);
</script>
<form action="javascript:code_toggle()"><input type="submit" value="Click here to toggle on/off the raw code."></form>''')
2 comments