Thursday, February 19, 2015

Big Unstructured Data v/s Structured Relational Data




Big Unstructured Data v/s Structured Relational Data

Structured vs Unstructured data
Nowadays there is a lot of hype around unstructured data. So, what is this unstructured data? How is it different than structured data?
Well, unlike structured data which is is organized in a manageable way, unstructured data is raw and unorganized. Structured data can be seamlessly integrated into a database or a well-structured data like XML. In contrast, unstructured data is cumbersome to handle. Examples of unstructured data include emails, documents, social media posts etc. Unstructured data needs a lot of processing and transformation to be completely usable for analysis.
Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:
"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:
  1.  The structure of the data itself. 
  1.  The structure of the container that hosts the data. 
  1.  The structure of the access method used to access the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."
Data in Organizations
Currently, enterprise data has many different formats. Data, structured and unstructured is constantly flowing in continuously. To broadly classify the data, organizations have the following three types of data-

  1. Transactional Data- Transactional data are the elements that support the ongoing operations of an organization and are included in application systems that automate business processes. This can include areas such as sales, order management, manufacturing, purchasing etc.It refers to the  data that is created and updated within operational systems.
  2. Analytical Data- Analytical data  are numerical values, measurements, metrics that provide business intelligence and support organizational decision making. Typically analytical data is stored in OLAP repositories optimized for decision support.It is characterized as being facts and numerical values in a dimensional model.
  3. Master Data- Master data is usually considered to play a key role in the core operation of a business. Moreover, master data refers to key organizational entities that are used by several functional groups and are typically stored in different data systems across an organization.
Fig- Data types in organizations.
Along with the above data types, there is machine data,social media data and document data too. The following figure illustrates the rise of structured and unstructured data in organizations.
Fig- Growth of structured and unstructured data in organizations

Where does Data warehouse fit in analyzing this data?
Data warehouse has been primarily used to store historical data. So structured enterprise data like transcational data can be easily analyzed using traditional  data warehousing techniques. But unstructured data poses a challenge to these techniques since there is no conventional method to perform ETL on such type of data.

Limitations of Data warehouse
Following are the limitations of a typicl  data warehouse
  1. Can't handle extreme integration of conventional and unconventional of data sources.
  2. Doesn't support massively parallel relational databases.
  3. Can't handle petabytes of data that organizations currently have.


Future of Data warehouse
A white paper by the Kimball Group suggests that to support big data, data warehouses must be magnetic, agile and deep.
Magnetic- A magnetic environment places the least impediments on the incorporation of new, unexpected and potentially dirty data sources. Specifically, this supports the nedd to defer declaration of data structures until after the data is loaded.
Agile- An agile environment eschews long-range careful design and planning!
Deep- A deep environment allows running sophisticated analytic algorithms on massive data sets without sampling or even cleaning.

Data warehouses will definitely develop to support Big data. That said ,Sometimes when an exciting new technology arrives, there is a tendency to close the door on older technologies as if they were going to go away. Data warehousing has built an enormous legacy of experience, best practices, supporting structures, technical expertise, and credibility with the business world. This will be the foundation for information management in the upcoming decade as data warehousing expands to include big data analytics.


References
Oracle Blog- https://blogs.oracle.com/fusionecm/entry/structured_and_unstructured_da
BI Insider-http://bi-insider.com/posts/types-of-enterprise-data-transactional-analytical-master/
The Digital Universe 2010
http://www.kimballgroup.com/wp-content/uploads/2011/04/Evolving-Role-of-EDW-in-the-Era-of-Big-Data-Analytics1.pdf

dw-bi

Tuesday, February 3, 2015

Analysis of Business Intelligence tools

MIS 587- Blog 1

In this post, I will start with introducing some Business Intelligence tools chosen for this comparative study.
1.OBIEE
OBIEE comes from a very well established and widely trusted vendor, Oracle. The latest version i.e. 11g is built on a robust technological foundation that supports the highest workloads and most complex deployments.
Strengths
·       Advanced enterprise capabilities that can be leveraged to create & manage tons of dashboards.
·       Ad-hoc reports can be created through the web interface.
·       Strong enterprise level security and user management.
Drawbacks
·       A lot of effort required to create even basic visualizations.
·       Deployment is difficult.

Figure 1. OBIEE Screen capture
2. MicroStrategy

MicroStrategy is a pretty popular BI & Analytics tool that allows even non-technical users to create powerful and insightful dashboards. It is available on a variety of platforms like web and mobile devices.
Strengths
·       Highly and easily customizable dashboards.
·        Ability to download big data during off-performance hours.
·         A very user-friendly interface.
Drawbacks
·       A steep learning curve.
·       It operates within rigid data structures implying a considerable amount of time needs to be given for ETL.
·       Lack of predictive analytical tools.

Figure 2. Dashboard created in MicroStrategy
3. IBM Cognos
Cognos is an online based business intelligence platform. It is comprised of around 30 products. One thing that is unique to Cognos is that it can serve large corporate giants & smaller to midsize companies on the same system.
Strengths
Attention to customer feedback in the design of it’s products makes Cognos a highly desirable BI tool.
Cognos 10 is bale to meld together information from multiple platforms, allowing for different systems to run together seamlessly.
A good amount of internal and external security.
Drawbacks
New users find it difficult to decipher error messages.
Data reports take time to compile.
Support services are not that good.

Figure 3. Dashboard created using IBM Cognos
Qlikview
The unique feature of Qlikview is its unique inference engine that maintains data associations automatically. This means it is less query dependant. It is comprised of three main components. Qlikview Desktop, Qlikview Publisher, Qlikview Server.
Strengths
·       A simplified google-like search feature which makes it very easy for the user to gain quick    insights into data.
·       Performance is very high.
·       A huge user community providing quick answers.
Drawbacks
·       Self-servicing for end users is quite difficult.
·       Debugging is difficult.
Figure 4.  Dashboard created using Qlikview
Birst
Birst provides an integrated Business Intelligence platform designed for both cloud delivery and on premise deployment as a software application. It comes in two editions. the Dicovery edition and the Enterprise edition.
Strengths
Provides Data warehousing and ETL capabilities.
No need to install anything on desktop.
Drawbacks
Limited reporting capability.
Support services are poor.


Figure 5. Dashboard created using Birst
Weighted Analysis

Criteria
Weight
OBIEE
MicroStrategy
IBM Cognos
Qlikview
Birst
Ease of use
30%
8
6
6
10
7
Integration
20%
10
8
7
5
8
Analysis Capabilities
20%
9
8
7
6
7
Cost
15%
4
6
7
9
6
Reporting
15%
7
9
7
9
6
Weighted Score
100%
7.85
7.25
6.7
7.9
6.9
Rank

2
3
5
1
4

The five criteria on which these tools have been analyzed on, are Ease of use, Cost, Integration, Analysis capabilities, Dashboards
I have explained each criterion and also given reasons according to my analysis for the scores allotted to each tool in the above table.
Ease of use
This is the most important metric for any tool. Since Business Intelligence has been finding its application in every industry, a variety of users are using BI tools and several new users are being introduced to BI tools.
Weight-30%.
Qlikview scores the maximum points in this feature because of its simplified search and filtering feature. The automatic data associations provided by Qlikview reduces the work a user has to do. Oracle BI also comes close because of its usability. IBM Cognos lacks in this metric because of its confusing UI and ambiguous errors.

Integration
A BI tool cannot exist in its entirety, meaning data has to come from different sources. With how many databases/data sources the tool integrates with and with what ease, is evaluated in this criteria. Integration is also concerned with workflow and collaboration of the tools with different applications.
Weightage-20%
OBIEE tops this particular rating because of its wide range of products that offer integration and interfacing with a variety of databases and applications like Excel etc. It also has a seamless integration with Big data. Qlikview lacks in this particular aspect as there is no integration with big data which is quite an important requirement these days. Birst does pretty well because of integrating ETL capabilities. Cognos and MicroStrategy also do well in this aspect.

Analysis Capabilities
This criteria entails different analytical features of a BI tool. Examples of such features are- OLAP predictive modelling, data mining, trend indications etc. This is pretty important while gaining insightful insights into data.
Weight-20%
OBIEE gains the maximum rating because of a comprehensive provision of all the important capabilities like Ad Hoc Analysis, OLAP, predictive analysis and profit analysis. MicroStrategy also provide the same features but lacks a little in advanced analytics. Qlikview scores the minimum because of lack of OLAP.

Reporting
Reporting is one of the main feature that a BI tool provides to an end user. This feature comprises of ad-hoc reporting, automatic scheduled reporting and dashboard capabilities.
Weight-15%
Qlikview is one of the best reporting tool along with MicroStrategy because of a comprehensive coverage of all the reporting capabilities mentioned above along with a lot of customization options provided to the user. Birst lacks in this aspect. OBIEE also has stellar reporting capabilities.

Cost
This is the cost effectiveness of the tool. Since many small to mid-sized organizations are using BI, it is important for tools to be reasonably priced according to the value they provide. The costs of a BI tool varies according to the software package and also upon the number of users.
Weight-15%
Considering ownership costs for individual users Qlikview is the cheapest followed by Cognos. OBIEE is the costliest. MicroStrategy and Birst fall between Qlikview and OBIEE in their pricing.

Conclusion



According to my weighted analysis Qlikview is the best. OBIEE comes close. Cost makes the difference between the two. IBM Cognos gets the minimum socre. Of course, this analysis is limited to only 5 criterion and selection of a good BI tool should be based on how it fits your requirements.