## Statistical Analysis

Q1. LIFT Analysis

Please calculate the following lift values for the table correlating Burger & Chips below:

• LIFT(Burger, Chips)
• LIFT(Burger, ^Chips)
• LIFT(^Burger, Chips)
• LIFT(^Burger, ^Chips)

 Chips ^Chips Total Row Burgers 600 400 1000 ^Burgers 200 200 400 Total Column 800 600 1400

1. LIFT ( Burgers, Chips)

s(Burgers u Chips) = 600/1400 = 3/7 = 0.43

s(Burgers) = 1000/1400 = 5/7 = 0.71

s(Chips) = 800/1400 = 4/7 = 0.57

LIFT(Burgers, Chips) = 0.43/0.71*0.57 = 0.43/0.40 = 1.075

LIFT(Burgers, Chips) > 1 meaning that Burgers and Chips are positively correlated

2. LIFT (Burgers, ^Chips)

s(Burgers u ^Chips) = 400/1400 = 2/7 = 0.29

s(Burgers) = 1000/1400 = 5/7 = 0.71

s(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(Burgers, ^Chips) = 0.29/0.71*0.43 = 0.29/0.31 = 0.94

LIFT(Burgers, ^Chips) < 1 meaning that Burgers and ^Chips are negatively correlated

3. LIFT (^Burgers, Chips)

s(^Burgers u Chips) = 200/1400 = 1/7 = 0.14

s(^Burgers) = 400/1400 = 2/7 = 0.29

s(Chips) = 800/1400 = 4/7 = 0.57

LIFT(^Burgers, Chips) = 0.14/0.29*0.57 = 0.14/0.17 = 0.82

LIFT(^Burgers, Chips) < 1 meaning that ^Burgers and Chips are negatively correlated

4. LIFT (^Burgers, ^Chips)

s(^Burgers u ^Chips) = 200/1400 = 1/7 = 0.14

s(^Burgers) = 400/1400 = 2/7 = 0.29

s(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(^Burgers, ^Chips) = 0.14/0.29*0.43 = 0.14/0.12 = 1.17

LIFT(^Burgers, ^Chips) > 1 meaning that Burgers and Chips are positively correlated

Q2. Please calculate the following lift values for the table correlating Ketchup & Shampoo below:

• LIFT(Ketchup, Shampoo)
• LIFT(Ketchup, ^Shampoo)
• LIFT(^Ketchup, Shampoo)
• LIFT(^Ketchup, ^Shampoo)

 Shampoo ^Shampoo Total Row Ketchup 100 200 300 ^Ketchup 200 400 600 Total Column 300 600 900

1. LIFT (Ketchup, Shampoo)

s(Ketchup u Shampoo) = 100/900 = 1/9 = 0.11

s(Ketchup) = 300/900 = 1/3 = 0.33

s(Shampoo) = 300/900 = 1/3 = 0.33

LIFT(Ketchup, Shampoo) = 0.11/0.33*0.33 = 0.11/0.11 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent

2. LIFT (Ketchup, ^Shampoo)

s(Ketchup u ^Shampoo) = 200/900 = 2/9 = 0.22

s(Ketchup) = 300/900 = 1/3 = 0.33

s(^Shampoo) = 600/900 = 2/3 = 0.67

LIFT(Ketchup, ^Shampoo) = 0.22/0.33*0.67 = 0.22/0.22 = 1

LIFT(Ketchup, ^Shampoo) = 1 meaning that Ketchup and Shampoo are independent

3. LIFT (^Ketchup, Shampoo)

s(^Ketchup u Shampoo) = 200/900 = 2/9 = 0.22

s(^Ketchup) = 600/900 = 2/3 = 0.67

s(Shampoo) = 300/900 = 1/3 = 0.33

LIFT(^Ketchup, Shampoo) = 0.22/0.67*0.33 = 0.22/0.22 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent

4. LIFT (^Ketchup, ^Shampoo)

s(^Ketchup u ^Shampoo) = 400/900 = 4/9 = 0.44

s(^Ketchup) = 600/900 = 2/3 = 0.67

s(^Shampoo) = 600/900 = 2/3 = 0.67

LIFT(^Ketchup, ^Shampoo) = 0.44/0.67*0.67 = 0.44/0.44 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent

Q3. Chi Squared Analysis

Please calculate the following chi Squared values for the table correlating Burger and Chips below (Expected values in brackets).

• Burgers & Chips
• Burgers & Not Chips
• Not Burgers & Chips
• Not Burgers & Not Chips

For the above options, please also indicate if each of your answer would suggest independent, positive or negative correlation.

 Chips ^Chips Total Row Burgers 900 (800) 100 (200) 1000 ^Burgers 300 (400) 200 (100) 500 Total Column 1200 300 1500

Chi-squared = ∑ (observed-expected) 2/ (expected)

Χ2 = (900-800)2 / 800 + (100-200)2 / 200 + (300-400)2 / 400 + (200-100)2 / 100

= 1002 / 800 + (-100)2 / 200 + (-100)2 / 400 + 1002 / 100

= 10000/800 + 10000/200 +10000/400 + 10000/100 = 12.5 + 50 + 25 + 100 = 187.5

Burgers & Chips are correlated because Χ2  > 0.

As expected value is 800 and observed value is 900 we can say that Burgers & Chips are positively correlated.

As expected value is 200 and observed value is 100 we can say that Burgers & ^Chips are positively correlated.

As expected value is 400 and observed value is 300 we can say that ^Burgers & Chips are positively correlated.

As expected value is 100 and observed value is 200 we can say that ^Burgers & ^Chips are positively correlated.

Q4: Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

• Burgers & Sausages
• Burgers & Not Sausages)
• Sausages & Not Burgers
• Not Burgers and Not Sausages

For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

 Chips ^Chips Total Row Burgers 800 (800) 200 (200) 1000 ^Burgers 400 (400) 100 (100) 500 Total Column 1200 300 1500

Χ2 = (800-800)2 / 800 + (200-200)2 / 200 + (400-400)2 / 400 + (100-100)2 / 100

= 02 / 800 + 02 / 200 + 02 / 400 + 02 / 100 = 0

Burgers & Chips are independent because Χ2  = 0.

Burgers & Chips – observed & expected values are the same (800) – independent

Burgers & ^Chips – observed & expected values are the same (200) – independent

^Burgers & Chips – observed & expected values are the same (400) – independent

^Burgers & ^Chips – observed & expected values are the same (100) – independent

Q5:

Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events? Lift and Chi Squared analysis wouldn’t be the best algorithms to use when there are too many Null transactions.

Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared? There are other algorithms that we can use – Kulczynski, AllConf, Jaccard, Cosine, MaxConf.