Business Intelligence (BI)

Business intelligence is a data analysis process aimed at boosting business performance by helping corporate executives and other end users make more informed decisions.

Business intelligence (BI) is a technology-driven process for analyzing data and presenting actionable information to help corporate executives, business managers and other end users make more informed business decisions. BI encompasses a variety of tools, applications and methodologies that enable organizations to collect data from internal systems and external sources, prepare it for analysis, develop and run queries against the data, and create reports, dashboards and data visualizations to make the analytical results available to corporate decision makers as well as operational workers.”

Companies can benefit greatly from business intelligence programs by improving decision making, optimizing business processes, increasing operational productivity, getting ahead of their business rivals with their competitive advantages. Identifying market trends can also be achieved with Business intelligence systems.

BI analysis can support strategic and tactical decisions by processing historical data and comparing it with present data to predict future movements.

In early stages of BI tools were used by data scientists and other IT personal but with development of self-service BI business managers and workers are able to use BI software themselves.

Business Intelligence tools

Business intelligence tools are application software that are used to retrieve, analyse, sort, filter, process and report data. Some of the top BI tools are:

  1. Spreadsheets – most commonly used is MS Office Excel
  2. Reporting and Querying – usually used companies own software to report, query, sort, filter and display data
  3. OLAP – Online Analytical Tools helps users with interactive analysing data from multiple sources in a multidimensional view.
  4. Digital dashboards – real-time user interfaces that are showing graphical presentation of the current status.
  5. Data mining – discovering patterns in large data sets involving different methods (artificial intelligence, machine learning, statistics, database systems)
  6. Data warehousing – central storage location of data. It is created by integrated data retrieval from different sources. It is used to store data for future analysis.
  7. Decision engineering – framework for decision making. Usually brings several techniques together (analytics, reasoning, machine learning) to overcome the issues in decision making
  8. Process mining – analysis based on events logs stored in an information system which is aimed at providing information for process analysis
  9. Business performance management – set of processes for managing the performance of a business
  10. Local information systems – designed to support geographic reporting

Business Intelligence

 

You can read below about one of the great examples where Tesco was using BI to save millions of pounds.

Tesco’s Legendary Big Data Benefits

Tesco, the largest retailer in the UK, was one of the first major companies to discover the endless benefits of big data analytics. Beginning in the mid 1990s, Tesco introduced its own loyalty program with the Clubcard. Many competitors used similar cards as a means to target discounts and coupons, however, Tesco realized the value of the insight it would give into its customers’ behavior patterns.

Tesco began processing the huge flood of data coming in from these cards, and was able to better target mailings of vouchers and coupons to customers, resulting in a huge increase from 3% to 70% in rate of coupon redemption. Seeing its analytics approach work, Tesco began applying it to other fields.

One of the company’s most profitable uses of analytics, was observing historical sales and weather data and using predictive analytics to optimize their stock-keeping system. By being able to forecast sales by product for each store, Tesco was able to save 100 million pounds ($151,718,000 US dollars) in stock that would have otherwise expired and thus wasted.

Now following Tesco’s lead, other competitive retailers are finding creative ways to use big data analytics in order to improve customer satisfaction and increase profits.”

 

References:

http://searchdatamanagement.techtarget.com/definition/business-intelligence

http://businessintelligence.com/big-data-case-studies/tescos-legendary-big-data-benefits/

http://www.predictiveanalyticstoday.com/top-business-intelligence-tools/

Management Information System (MIS)

Management Information System (MIS) is automated information-processing system developed to support the activities and functions of company management on all levels.

“The main purpose of the MIS is to give managers feedback about their own performance; top management can monitor the company as a whole. Information displayed by the MIS typically shows “actual” data over against “planned” results and results from a year before;

The MIS receives data from company units and functions. Some of the data are collected automatically from computer-linked check-out counters; others are keyed in at periodic intervals. Routine reports are pre-programmed and run at intervals or on demand while others are obtained using built-in query languages; display functions built into the system are used by managers to check on status at desk-side computers connected to the MIS by networks. Many sophisticated systems also monitor and display the performance of the company’s stock.” 

Companies are getting great benefits from MIS. They are able to identify their strengths and weaknesses due to presence of various different reports. Companies can improve its processes and operations. With MIS Companies can get overall picture of the company and its performance. “MIS can help a company gain a competitive advantage.  Competitive advantage is a firm’s ability to do something better, faster, cheaper, or uniquely, when compared with rival firms in the market.” 

“The use of Management Information Systems (MIS) has gone from competitive advantage for few to business necessity for all. The true advantages of information technology (IT) come in the forms of efficiency and effectiveness of the gathered information.”

“Information is the starting point for MIS, which is the analysis of collected raw data” 

“Though technology receives most of the credit, managers play an intricate role when dealing with MIS. Their literacy in both technology and information determine how effective strategies will be when implemented. A technology-literate manager will know how and when to apply technology, meaning that she will know what to purchase to execute certain processes and the most appropriate time to make the purchase. An information-literate manager is able to define what information is needed and how to access it, can convert it from information to business intelligence, and can make the best decision based on the information.”

Management Information System

Levels of Management decision making

Managers on all the levels of the company have to make informed decisions on behalf of a company. The difference between decision making vary depending on what level that decision has to be made.

There are 3 levels of management decision making in the company:

  1. Operational level that makes decisions on day-to-day business processes and daily operations. They interact directly with employees and customers. Operational management is using information systems to automate repetitive tasks and improve efficiency. Main characteristics of their decisions are that are structured and recurring and can be easily automated.
  2. Managerial level is area where functional managers or midlevel managers are making decisions. They monitor and control operational-level activities, provide information to executive level. Their main focus is on effectively utilising and deploying resources and their goal is achieving strategic objectives. Midlevel managers decisions are semi-structured, contained within business function and moderately complex.
  3. Executive level represents the President, CEO, Vice presidents, Board of directors. They make long-term strategic decisions for complex and non-routine issues. Their decisions are unstructured.
Levels of Management decision making
Levels of Management decision making

Types of Information Systems

Depending on their level in organization managers are using different types of Information Systems:

  1. Transaction Processing System (TPS) – serve operational managers and staff to perform and record daily routine transactions necessary to conduct business. TPS allow managers to monitor status of operations and relations with external environment. Also it serve predefined, structured goals and decision making. The data is very detailed at this level. The best TPS will be integrated throughout the organisation to supply useful information to those who need it when they need it.
  2. Management Information Systems (MIS) – serve middle management. MIS is designed to produce information on a periodic basis. Provide reports on firm’s current performance, based on data from TPS. As well, provide answers to routine questions with predefined procedure for answering them. Typically have little analytic capability.
  3. Decision Support Systems (DSS) – serve middle management. DSS support non-routine decision making (What-if analyses) and analyse results for hypothetical changes. It may use external information as well as internal TPS / MIS data.
  4. Executive Information Systems (EIS) – support senior management. EIS address non-routine decisions by requiring judgment, evaluation, and insight. Also incorporate data about external events (e.g. new tax laws or competitors) as well as summarized information from internal MIS and DSS.

 

Types of Information Systems
Types of Information Systems

 

Reference:

http://www.inc.com/encyclopedia/management-information-systems-MIS.html

http://en.wikipedia.org/wiki/Management_information_system

http://www.ehow.com/list_6869907_challenges-management-information-system.html

 

Statistical Analysis

Q1. LIFT Analysis

Please calculate the following lift values for the table correlating Burger & Chips below:

  • LIFT(Burger, Chips)
  • LIFT(Burger, ^Chips)
  • LIFT(^Burger, Chips)
  • LIFT(^Burger, ^Chips)

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation.

 

Chips ^Chips Total Row
Burgers 600 400 1000
^Burgers 200 200 400
Total Column 800 600 1400

1. LIFT ( Burgers, Chips)

s(Burgers u Chips) = 600/1400 = 3/7 = 0.43

s(Burgers) = 1000/1400 = 5/7 = 0.71

s(Chips) = 800/1400 = 4/7 = 0.57

LIFT(Burgers, Chips) = 0.43/0.71*0.57 = 0.43/0.40 = 1.075

LIFT(Burgers, Chips) > 1 meaning that Burgers and Chips are positively correlated

 

2. LIFT (Burgers, ^Chips)

s(Burgers u ^Chips) = 400/1400 = 2/7 = 0.29

s(Burgers) = 1000/1400 = 5/7 = 0.71

s(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(Burgers, ^Chips) = 0.29/0.71*0.43 = 0.29/0.31 = 0.94

LIFT(Burgers, ^Chips) < 1 meaning that Burgers and ^Chips are negatively correlated

 

3. LIFT (^Burgers, Chips)

s(^Burgers u Chips) = 200/1400 = 1/7 = 0.14

s(^Burgers) = 400/1400 = 2/7 = 0.29

s(Chips) = 800/1400 = 4/7 = 0.57

LIFT(^Burgers, Chips) = 0.14/0.29*0.57 = 0.14/0.17 = 0.82

LIFT(^Burgers, Chips) < 1 meaning that ^Burgers and Chips are negatively correlated

 

4. LIFT (^Burgers, ^Chips)

s(^Burgers u ^Chips) = 200/1400 = 1/7 = 0.14

s(^Burgers) = 400/1400 = 2/7 = 0.29

s(^Chips) = 600/1400 = 3/7 = 0.43

LIFT(^Burgers, ^Chips) = 0.14/0.29*0.43 = 0.14/0.12 = 1.17

LIFT(^Burgers, ^Chips) > 1 meaning that Burgers and Chips are positively correlated

 

Q2. Please calculate the following lift values for the table correlating Ketchup & Shampoo below:

  • LIFT(Ketchup, Shampoo)
  • LIFT(Ketchup, ^Shampoo)
  • LIFT(^Ketchup, Shampoo)
  • LIFT(^Ketchup, ^Shampoo)

Please also indicate if each of your answers would suggest independent, positive correlation, or negative correlation.

 

Shampoo ^Shampoo Total Row
Ketchup 100 200 300
^Ketchup 200 400 600
Total Column 300 600 900

1. LIFT (Ketchup, Shampoo)

s(Ketchup u Shampoo) = 100/900 = 1/9 = 0.11

s(Ketchup) = 300/900 = 1/3 = 0.33

s(Shampoo) = 300/900 = 1/3 = 0.33

LIFT(Ketchup, Shampoo) = 0.11/0.33*0.33 = 0.11/0.11 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent

 

2. LIFT (Ketchup, ^Shampoo)

s(Ketchup u ^Shampoo) = 200/900 = 2/9 = 0.22

s(Ketchup) = 300/900 = 1/3 = 0.33

s(^Shampoo) = 600/900 = 2/3 = 0.67

LIFT(Ketchup, ^Shampoo) = 0.22/0.33*0.67 = 0.22/0.22 = 1

LIFT(Ketchup, ^Shampoo) = 1 meaning that Ketchup and Shampoo are independent

 

3. LIFT (^Ketchup, Shampoo)

s(^Ketchup u Shampoo) = 200/900 = 2/9 = 0.22

s(^Ketchup) = 600/900 = 2/3 = 0.67

s(Shampoo) = 300/900 = 1/3 = 0.33

LIFT(^Ketchup, Shampoo) = 0.22/0.67*0.33 = 0.22/0.22 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent

 

4. LIFT (^Ketchup, ^Shampoo)

s(^Ketchup u ^Shampoo) = 400/900 = 4/9 = 0.44

s(^Ketchup) = 600/900 = 2/3 = 0.67

s(^Shampoo) = 600/900 = 2/3 = 0.67

LIFT(^Ketchup, ^Shampoo) = 0.44/0.67*0.67 = 0.44/0.44 = 1

LIFT(Ketchup, Shampoo) = 1 meaning that Ketchup and Shampoo are independent

 

Q3. Chi Squared Analysis

Please calculate the following chi Squared values for the table correlating Burger and Chips below (Expected values in brackets).

  • Burgers & Chips
  • Burgers & Not Chips
  • Not Burgers & Chips
  • Not Burgers & Not Chips

For the above options, please also indicate if each of your answer would suggest independent, positive or negative correlation.

 

Chips ^Chips Total Row
Burgers 900 (800) 100 (200) 1000
^Burgers 300 (400) 200 (100) 500
Total Column 1200 300 1500

 

Chi-squared = ∑ (observed-expected) 2/ (expected)

 

Χ2 = (900-800)2 / 800 + (100-200)2 / 200 + (300-400)2 / 400 + (200-100)2 / 100

= 1002 / 800 + (-100)2 / 200 + (-100)2 / 400 + 1002 / 100

= 10000/800 + 10000/200 +10000/400 + 10000/100 = 12.5 + 50 + 25 + 100 = 187.5

Burgers & Chips are correlated because Χ2  > 0.

As expected value is 800 and observed value is 900 we can say that Burgers & Chips are positively correlated.

As expected value is 200 and observed value is 100 we can say that Burgers & ^Chips are positively correlated.

As expected value is 400 and observed value is 300 we can say that ^Burgers & Chips are positively correlated.

As expected value is 100 and observed value is 200 we can say that ^Burgers & ^Chips are positively correlated.

 

Q4: Chi Squared Analysis

Please calculate the following chi squared values for the table correlating burger and sausages below (Expected values in brackets).

  • Burgers & Sausages
  • Burgers & Not Sausages)
  • Sausages & Not Burgers
  • Not Burgers and Not Sausages

 For the above options, please also indicate if each of your answer would suggest independent, positive correlation, or negative correlation?

Chips ^Chips Total Row
Burgers 800 (800) 200 (200) 1000
^Burgers 400 (400) 100 (100) 500
Total Column 1200 300 1500

 

Χ2 = (800-800)2 / 800 + (200-200)2 / 200 + (400-400)2 / 400 + (100-100)2 / 100

= 02 / 800 + 02 / 200 + 02 / 400 + 02 / 100 = 0

Burgers & Chips are independent because Χ2  = 0.

Burgers & Chips – observed & expected values are the same (800) – independent

Burgers & ^Chips – observed & expected values are the same (200) – independent

^Burgers & Chips – observed & expected values are the same (400) – independent

^Burgers & ^Chips – observed & expected values are the same (100) – independent

 

Q5:

Under what conditions would Lift and Chi Squared analysis prove to be a poor algorithm to evaluate correlation/dependency between two events? Lift and Chi Squared analysis wouldn’t be the best algorithms to use when there are too many Null transactions.

Please suggest another algorithm that could be used to rectify the flaw in Lift and Chi Squared? There are other algorithms that we can use – Kulczynski, AllConf, Jaccard, Cosine, MaxConf.

R language

Is R only one of the multiple programming languages available on the market or is it much more? It looks like its becoming world phenomenon! We can definitely say that R revolutionised analytics and predictive modelling…

R became the most important tool for visualization of Statistics and Data science. Worldwide community of statisticians and data scientists are using R to resolve the most challenging problems. Complex data can be analysed and visualized with charts and graphs that are part of R.

As part of this exercise we had to create User case and analyse data. As well, as use R graphics to visualize the data.

For starting point we were using R course from Code School – http://tryr.codeschool.com/.

Code school free online course was great starting point to get introduction to R but once I started to work on my User case I realized how big R is!

Numerous websites and tutorials are available and still it didn’t prove as easy as it looks at first instance but I suppose nothing worth a challenge is easy…

I flew through Code school tutorial that covered below points:

1. “R Syntax:A gentle introduction to R expressions, variables, and functions In this first chapter, we’ll over basic R expressions. We’ll start simple, with numbers, strings, and true/false values. Then we’ll show you how to store those values in variables, and how to pass them to functions. We’ll show you how to get help on functions when you’re stuck. Finally we’ll load an R script in from a file.

2.Vectors:Grouping values into vectors, then doing arithmetic and graphs with themThe name may sound intimidating, but a vector is simply a list of values. R relies on vectors for many of its operations. This includes basic plots – we’ll have you drawing graphs by the end of this chapter (and it’s a lot easier than you might think)!

3.Matrices:Creating and graphing two-dimensional data setsSo far we’ve only worked with vectors, which are simple lists of values. What if you need data in rows and columns? Matrices are here to help.A matrix is just a fancy term for a 2-dimensional array. In this chapter, we’ll show you all the basics of working with matrices, from creating them, to accessing them, to plotting them.

4.Summary Statistics:Calculating and plotting some basic statistics: mean, median, and standard deviationThe median is calculated by sorting the values and choosing the middle one (for sets with an even number of values, the middle two values are averaged).Call the median function on the vector:median(limbs)

5.Factors:Creating and plotting categorized data

6.Data Frames:Organizing values into data frames, loading frames from files and merging them

7.Working With Real-World Data:Testing for correlation between data sets, linear models and installing additional packages”*

You can earn badge like the one below if you complete tutorial.

Code school badge

And now fun part started – creating my own User case and visualization of the same.

After long research an idea was born to analyse dependency on Chocolate consumption & Unemployment. First I had to find data tables and create CSV file and load it into R.

Country Kg per Capita Unemployment %
Switzerland 9 4.3
Germany 7.9 5.3
Austria 7.8 4.9
Ireland 7.5 13.4
USA 7.5 7.3
Norway 6.6 3.5
Estonia 6 9.2
Slovakia 5.4 14.6
Sweden 5.4 8.2
Finland 5.3 8.6
Kazakhstan 5.3 5.5
Russia 5.3 5.8
Belgium 5.2 8.7
Australia 4.9 5.7
Netherlands 4.7 7.2
New Zealand 4.5 6.1
UK 4.3 7.2
Denmark 4.2 7
France 4.2 10.9
Lithuania 4.2 11.5

Once data was available in R I was able to plot 2 variables and create below graph.

Chocolate consumption vs Unemployment

After few hours spent discovering R possibilities I can confirm with certainty that new R follower was born…

*from “http://tryr.codeschool.com/”

Fusion tables

Even though Google Fusion tables is still labelled as an experimental app we have an option to use its great ability to visualize our structured data that would probably pass unnoticed if left in a plain table.
Google is putting great efforts into Research & Development of Data management and is trying to innovate ways of data being stored, visualized and shared hence Fusion tables have been created.
As part of our assessment we had to create a Fusion table outlining an Irish population Heat map based on the 2011 census data. For this particular exercise we had to merge 2 different sets of data, Table 1 had details about population and Table 2 had geographical data. First we had to upload both files into the Fusion tables app in certain format (CSV & KML) and once the files were available we were able to merge the two. While in the process of merging we had to confirm “The source of match” in both files and what columns we wanted to be available in our output file. Once our merged table was created we could view “Map of geometry”.
To create a Heat map of Irish population by counties based on population density we had to break the range of population numbers into five buckets and apply a colour scheme with a different shade for each bucket. As well, the map legend had to be added.
If you wish to share the map you have to change the Privacy setting to “Public”. There are multiple options to share the map such as email, Facebook, Blog etc. As part of this exercise we had to embed the map in a blog post. The embedding codes can be taken when clicking on the “Publish” command.

Please find the image of the Irish population heat map for 2011 below.

If you wish to see larger map please click here.

When looking into the Irish population heat map 2011 l can see a random distribution of population by counties and that could help different Government Departments when planning how and where to invest in infrastructure, education, hospitals etc…
Heat maps in general can help to visualize the content of a tabular data in much a more user friendly way. Users are turning to visualization tools more and more in order to get attention where needed. There are several types of heat maps used in different disciplines (A Web heat map is showing areas of a web page most frequently visited by Users; Biology heat maps are used mostly in molecular biology; Geographical heat maps can visualize Global food produce etc.) As well, there are numerous heat map software implementations and one of them will be covered in the next post – R Statistics.
We could agree that a picture is worth a thousand words and Fusion tables is great visualization tool that helps on our way to achieve goals that we set up.
I hope you’ll enjoy the simplicity of Google Fusion tables and that will it enhance and illuminate your future work.