Wednesday, April 1, 2015

Moore's Law, Cloud Computing and DW/BI


Moore's law, proposed by Gordon Moore states that the number of transistors per square inch on every integrated circuit doubles every year. In simple terms, it means the the processing and storage capability of computational machines will double every year. Surprisingly, this law still holds true even after 50 years.                                              

                                                 
One such example is Cloud computing. Cloud computing involves deploying groups of remote servers and software networks that allow centralized data storage and online access to computer services or resources. As Steve Jobs said. "I don’t need a hard disk in my computer if I can get to the server faster… carrying around these non-connected computers is byzantine by comparison." And I agree with this great man. Cloud Computing certainly has a huge impact on data-driven decision making or Business Intelligence. At a very high level Cloud BI benefits IT through faster deployment of solutions, more flexible capacity, faster and less expensive exploration of new technologies, as well as reduced capital, licensing, and maintenance costs.Let us look at a few examples of how true this is:

Aptaria(A sales force and cloud integration company) helps Nutricia

Nutricia, a nutrition products company leveraged Salesforce across their enterprise. Typically Salesforce collects a lot of data but the sales representatives at Nutricia were not making any use of this data. Aptaria introduced and recommended GoodData to Nutricia as a reporting and business intelligence (BI) solution.With the help of GoodData, Nutricia became a data-driven company and the sales representatives were able to perform following functions:
  • Receive real-time reports on a daily basis
  • Have reports that are "fun and easy to use" – no intimidation factor
  • Tweak reports easily against their business plans when their numbers change
  • Create new reports to answer "what and how" questions
Condé Nast uses Microsoft's Power BI

Condé Nast, a global media company, wanted better insight into consumer behavior to improve the performance of its 20 industry-leading print and digital media brands. For better efficiency and business intelligence (BI), the company implemented a solution based on Power BI for Office 365.

Through highly visual, interactive features such as graphs and pie charts, the reports provide immediate insight into factors such as consumer behavior, market share, competitors, and web traffic across PC browsers, tablets, and mobile devices. Editorial teams and marketers can quickly drill down into more detail. 

Since, all the data was on cloud, performing analytics and generating reports is now possible for everyone at Conde Nast. They have seen exceptional results in terms of improved efficiency of the Marketing and Analytics team.

Conclusion

Data will go on increasing and so will the processing and storage capabilities. Business Intelligence will prove quite important to make sense of this large amount of data. But, organizations shouldn't become too data driven that human intelligence loses its significance in decision making. Not being philosophical here, but just saying!

References
http://www.aptaria.com/industries/healthcare/nutricia-bi.html
https://customers.microsoft.com/Pages/CustomerStory.aspx?recid=12466
Wikipedia
http://www.intel.com/content/dam/www/public/us/en/documents/white-papers/extending-enterprise-business-intelligence-and-big-data-to-the-cloud-paper.pdf


Wednesday, March 4, 2015

Presentation And Visualization Methods

Representing data visually is very important since it gives a deep insight and at the same time saves time. I quote the British data journalist David McCandless here,

"By visualizing information, we can turn it into a landscape that you can explore with your eyes, a sort of information map. And when you're lost in information, an information map is kind of useful"
This quote reveals in a simple manner, why data visualization is useful. For this post I will demonstrate how data can be represented for three different business scenarios.

Order Management
An order placed on an e-commerce website has dimensions like seller, product, multiple dates for order approval, order processing and order shipment. these are role playing dimensions. Facts involved are quantity, price.

Optimal Way of representation
All the dimensions except the dates can be comprehensively displayed on a standard invoice. But to give a holistic view of the shipment process to the customer can be challenging. A customer may want to know where his order is at any given time. The best way to display this information is to use a timeline. This timeline can be divided in different stages such as order processing, delivery etc.

To give a visual representation of this , here is an image:The emphasis here is on the timeline which represents where the product is at any given time.
Figure- Order Tracking system by Flipkart.com

Healthcare
Data Management in the Healthcare industry is a complex issue because of the variety in the industry. This also affects the ability to represent information visually. Consider medical tests. A testing center conducts multiple tests that test multiple parameters resulting in variety of data. Every test needs different facts and measurements. Obviously there the common dimensions like doctor, patient, date etc involved.

Optimal Way of representation
The tests involve so many parameters of measurements that it can overwhelm the patient and confuse him/her.The best way to represent information of medical tests will be to show a comparison so that the patient understands the report easily. This comparison can be anything like comparing the measurements with the normal measurements or displaying the facts on a scale of low to high.

The following sample medical test result best demonstrates the meaning of the words above.

Figure- Bloodwork Cardiology Result Template from Wired Magazine
Ecommerce
The ecommerce industry is a multi-billion dollar industry. If you have a ecommerce website then knowing your customers is very important. You need to analyze the click-stream data to make important decisions about sales & advertising strategies. 


Optimal Way of representation
There are multiple dimensions involved. Dimensions like product, source of visits, date etc. To get an overall view of the performance of the website the owner should have a comprehensive dashboard that shows visits over time, the conversion rate i.e. the ratio of orders to visits, the contribution of sources that directed the user to the website, performance of different products over time.

Ecwid with the help of Google Analytics is a very powerful tool to represent such facts in an attractive manner. It gives a good bird eye view of a website's performance in different aspects like sales, visits etc. It consolidates things from various dimensions and gives a good visual representation of all the facts.


Figure- An ecommerce site  performance report by Ecwid


References
1.Figure 1- https://www.flipkart.com/order_details?order_id=OD40105003095&token=43b20e7069204bc6bcca9ba31e0eeb6b&utm_content=click&cmpid=email_oms_delivery_confirmation_email

2. Figure 2-http://www.wired.com/wp-content/uploads/2014/09/ff_bloodwork5_f.jpg
http://help.ecwid.com/customer/portal/articles/1167247-how-can-i-get-sales-reports-based-on-my-orders-in-ecwid

3. Figure 3- http://www.wired.com/wp-content/uploads/2014/09/ff_bloodwork5_f.jpg

Thursday, February 19, 2015

Big Unstructured Data v/s Structured Relational Data




Big Unstructured Data v/s Structured Relational Data

Structured vs Unstructured data
Nowadays there is a lot of hype around unstructured data. So, what is this unstructured data? How is it different than structured data?
Well, unlike structured data which is is organized in a manageable way, unstructured data is raw and unorganized. Structured data can be seamlessly integrated into a database or a well-structured data like XML. In contrast, unstructured data is cumbersome to handle. Examples of unstructured data include emails, documents, social media posts etc. Unstructured data needs a lot of processing and transformation to be completely usable for analysis.
Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:
"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:
  1.  The structure of the data itself. 
  1.  The structure of the container that hosts the data. 
  1.  The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."
Data in Organizations
Currently, enterprise data has many different formats. Data, structured and unstructured is constantly flowing in continuously. To broadly classify the data, organizations have the following three types of data-

  1. Transactional Data- Transactional data are the elements that support the ongoing operations of an organization and are included in application systems that automate business processes. This can include areas such as sales, order management, manufacturing, purchasing etc.It refers to the  data that is created and updated within operational systems.
  2. Analytical Data- Analytical data  are numerical values, measurements, metrics that provide business intelligence and support organizational decision making. Typically analytical data is stored in OLAP repositories optimized for decision support.It is characterized as being facts and numerical values in a dimensional model.
  3. Master Data- Master data is usually considered to play a key role in the core operation of a business. Moreover, master data refers to key organizational entities that are used by several functional groups and are typically stored in different data systems across an organization.
Fig- Data types in organizations.
Along with the above data types, there is machine data,social media data and document data too. The following figure illustrates the rise of structured and unstructured data in organizations.
Fig- Growth of structured and unstructured data in organizations

Where does Data warehouse fit in analyzing this data?
Data warehouse has been primarily used to store historical data. So structured enterprise data like transcational data can be easily analyzed using traditional  data warehousing techniques. But unstructured data poses a challenge to these techniques since there is no conventional method to perform ETL on such type of data.

Limitations of Data warehouse
Following are the limitations of a typicl  data warehouse
  1. Can't handle extreme integration of conventional and unconventional of data sources.
  2. Doesn't support massively parallel relational databases.
  3. Can't handle petabytes of data that organizations currently have.


Future of Data warehouse
A white paper by the Kimball Group suggests that to support big data, data warehouses must be magnetic, agile and deep.
Magnetic- A magnetic environment places the least impediments on the incorporation of new, unexpected and potentially dirty data sources. Specifically, this supports the nedd to defer declaration of data structures until after the data is loaded.
Agile- An agile environment eschews long-range careful design and planning!
Deep- A deep environment allows running sophisticated analytic algorithms on massive data sets without sampling or even cleaning.

Data warehouses will definitely develop to support Big data. That said ,Sometimes when an exciting new technology arrives, there is a tendency to close the door on older technologies as if they were going to go away. Data warehousing has built an enormous legacy of experience, best practices, supporting structures, technical expertise, and credibility with the business world. This will be the foundation for information management in the upcoming decade as data warehousing expands to include big data analytics.


References
Oracle Blog- https://blogs.oracle.com/fusionecm/entry/structured_and_unstructured_da
BI Insider-http://bi-insider.com/posts/types-of-enterprise-data-transactional-analytical-master/
The Digital Universe 2010
http://www.kimballgroup.com/wp-content/uploads/2011/04/Evolving-Role-of-EDW-in-the-Era-of-Big-Data-Analytics1.pdf

dw-bi

Tuesday, February 3, 2015

Analysis of Business Intelligence tools

MIS 587- Blog 1

In this post, I will start with introducing some Business Intelligence tools chosen for this comparative study.
1.OBIEE
OBIEE comes from a very well established and widely trusted vendor, Oracle. The latest version i.e. 11g is built on a robust technological foundation that supports the highest workloads and most complex deployments.
Strengths
·       Advanced enterprise capabilities that can be leveraged to create & manage tons of dashboards.
·       Ad-hoc reports can be created through the web interface.
·       Strong enterprise level security and user management.
Drawbacks
·       A lot of effort required to create even basic visualizations.
·       Deployment is difficult.

Figure 1. OBIEE Screen capture
2. MicroStrategy

MicroStrategy is a pretty popular BI & Analytics tool that allows even non-technical users to create powerful and insightful dashboards. It is available on a variety of platforms like web and mobile devices.
Strengths
·       Highly and easily customizable dashboards.
·        Ability to download big data during off-performance hours.
·         A very user-friendly interface.
Drawbacks
·       A steep learning curve.
·       It operates within rigid data structures implying a considerable amount of time needs to be given for ETL.
·       Lack of predictive analytical tools.

Figure 2. Dashboard created in MicroStrategy
3. IBM Cognos
Cognos is an online based business intelligence platform. It is comprised of around 30 products. One thing that is unique to Cognos is that it can serve large corporate giants & smaller to midsize companies on the same system.
Strengths
Attention to customer feedback in the design of it’s products makes Cognos a highly desirable BI tool.
Cognos 10 is bale to meld together information from multiple platforms, allowing for different systems to run together seamlessly.
A good amount of internal and external security.
Drawbacks
New users find it difficult to decipher error messages.
Data reports take time to compile.
Support services are not that good.

Figure 3. Dashboard created using IBM Cognos
Qlikview
The unique feature of Qlikview is its unique inference engine that maintains data associations automatically. This means it is less query dependant. It is comprised of three main components. Qlikview Desktop, Qlikview Publisher, Qlikview Server.
Strengths
·       A simplified google-like search feature which makes it very easy for the user to gain quick    insights into data.
·       Performance is very high.
·       A huge user community providing quick answers.
Drawbacks
·       Self-servicing for end users is quite difficult.
·       Debugging is difficult.
Figure 4.  Dashboard created using Qlikview
Birst
Birst provides an integrated Business Intelligence platform designed for both cloud delivery and on premise deployment as a software application. It comes in two editions. the Dicovery edition and the Enterprise edition.
Strengths
Provides Data warehousing and ETL capabilities.
No need to install anything on desktop.
Drawbacks
Limited reporting capability.
Support services are poor.


Figure 5. Dashboard created using Birst
Weighted Analysis

Criteria
Weight
OBIEE
MicroStrategy
IBM Cognos
Qlikview
Birst
Ease of use
30%
8
6
6
10
7
Integration
20%
10
8
7
5
8
Analysis Capabilities
20%
9
8
7
6
7
Cost
15%
4
6
7
9
6
Reporting
15%
7
9
7
9
6
Weighted Score
100%
7.85
7.25
6.7
7.9
6.9
Rank

2
3
5
1
4

The five criteria on which these tools have been analyzed on, are Ease of use, Cost, Integration, Analysis capabilities, Dashboards
I have explained each criterion and also given reasons according to my analysis for the scores allotted to each tool in the above table.
Ease of use
This is the most important metric for any tool. Since Business Intelligence has been finding its application in every industry, a variety of users are using BI tools and several new users are being introduced to BI tools.
Weight-30%.
Qlikview scores the maximum points in this feature because of its simplified search and filtering feature. The automatic data associations provided by Qlikview reduces the work a user has to do. Oracle BI also comes close because of its usability. IBM Cognos lacks in this metric because of its confusing UI and ambiguous errors.

Integration
A BI tool cannot exist in its entirety, meaning data has to come from different sources. With how many databases/data sources the tool integrates with and with what ease, is evaluated in this criteria. Integration is also concerned with workflow and collaboration of the tools with different applications.
Weightage-20%
OBIEE tops this particular rating because of its wide range of products that offer integration and interfacing with a variety of databases and applications like Excel etc. It also has a seamless integration with Big data. Qlikview lacks in this particular aspect as there is no integration with big data which is quite an important requirement these days. Birst does pretty well because of integrating ETL capabilities. Cognos and MicroStrategy also do well in this aspect.

Analysis Capabilities
This criteria entails different analytical features of a BI tool. Examples of such features are- OLAP predictive modelling, data mining, trend indications etc. This is pretty important while gaining insightful insights into data.
Weight-20%
OBIEE gains the maximum rating because of a comprehensive provision of all the important capabilities like Ad Hoc Analysis, OLAP, predictive analysis and profit analysis. MicroStrategy also provide the same features but lacks a little in advanced analytics. Qlikview scores the minimum because of lack of OLAP.

Reporting
Reporting is one of the main feature that a BI tool provides to an end user. This feature comprises of ad-hoc reporting, automatic scheduled reporting and dashboard capabilities.
Weight-15%
Qlikview is one of the best reporting tool along with MicroStrategy because of a comprehensive coverage of all the reporting capabilities mentioned above along with a lot of customization options provided to the user. Birst lacks in this aspect. OBIEE also has stellar reporting capabilities.

Cost
This is the cost effectiveness of the tool. Since many small to mid-sized organizations are using BI, it is important for tools to be reasonably priced according to the value they provide. The costs of a BI tool varies according to the software package and also upon the number of users.
Weight-15%
Considering ownership costs for individual users Qlikview is the cheapest followed by Cognos. OBIEE is the costliest. MicroStrategy and Birst fall between Qlikview and OBIEE in their pricing.

Conclusion



According to my weighted analysis Qlikview is the best. OBIEE comes close. Cost makes the difference between the two. IBM Cognos gets the minimum socre. Of course, this analysis is limited to only 5 criterion and selection of a good BI tool should be based on how it fits your requirements.