Big Unstructured Data
v/s Structured Relational Data
Structured vs Unstructured data
Nowadays there is a lot of hype around unstructured data.
So, what is this unstructured data? How is it different than structured data?
Well, unlike structured data which is is organized in a
manageable way, unstructured data is raw and unorganized. Structured data can
be seamlessly integrated into a database or a well-structured data like XML. In
contrast, unstructured data is cumbersome to handle. Examples of unstructured
data include emails, documents, social media posts etc. Unstructured data needs
a lot of processing and transformation to be completely usable for analysis.
Duncan Pauly, founder and chief technology officer of Coppereye add's eloquent insight to the conversation:"The labels "structured data" and "unstructured data" are often used ambiguously by different interest groups; and often used lazily to cover multiple distinct aspects of the issue. In reality, there are at least three orthogonal aspects to structure:Data in Organizations
- The structure of the data itself.
- The structure of the container that hosts the data.
These three dimensions are largely independent and one does not need to imply another. For example, it is absolutely feasible and reasonable to store unstructured data in a structured database container and access it by unstructured search mechanisms."
- The structure of the access method used to access the data.
Currently, enterprise data has many different formats. Data, structured and unstructured is constantly flowing in continuously. To broadly classify the data, organizations have the following three types of data-
- Transactional Data- Transactional data are the elements that support the ongoing operations of an organization and are included in application systems that automate business processes. This can include areas such as sales, order management, manufacturing, purchasing etc.It refers to the data that is created and updated within operational systems.
- Analytical Data- Analytical data are numerical values, measurements, metrics that provide business intelligence and support organizational decision making. Typically analytical data is stored in OLAP repositories optimized for decision support.It is characterized as being facts and numerical values in a dimensional model.
- Master Data- Master data is usually considered to play a key role in the core operation of a business. Moreover, master data refers to key organizational entities that are used by several functional groups and are typically stored in different data systems across an organization.
Fig- Data types in organizations.
Along with the above data types, there is machine data,social media data and document data too. The following figure illustrates the rise of structured and unstructured data in organizations.
Fig- Growth of structured and unstructured data in organizations
Where does Data warehouse fit in analyzing this data?
Data warehouse has been primarily used to store historical data. So structured enterprise data like transcational data can be easily analyzed using traditional data warehousing techniques. But unstructured data poses a challenge to these techniques since there is no conventional method to perform ETL on such type of data.
Limitations of Data warehouse
Following are the limitations of a typicl data warehouse
- Can't handle extreme integration of conventional and unconventional of data sources.
- Doesn't support massively parallel relational databases.
- Can't handle petabytes of data that organizations currently have.
Future of Data warehouse
A white paper by the Kimball Group suggests that to support big data, data warehouses must be magnetic, agile and deep.
Magnetic- A magnetic environment places the least impediments on the incorporation of new, unexpected and potentially dirty data sources. Specifically, this supports the nedd to defer declaration of data structures until after the data is loaded.
Agile- An agile environment eschews long-range careful design and planning!
Deep- A deep environment allows running sophisticated analytic algorithms on massive data sets without sampling or even cleaning.
Data warehouses will definitely develop to support Big data. That said ,Sometimes when an exciting new technology arrives, there is a tendency to close the
door on older technologies as if they were going to go away. Data warehousing has
built an enormous legacy of experience, best practices, supporting structures,
technical expertise, and credibility with the business world. This will be the foundation
for information management in the upcoming decade as data warehousing expands
to include big data analytics.
References
Oracle Blog- https://blogs.oracle.com/fusionecm/entry/structured_and_unstructured_da
BI Insider-http://bi-insider.com/posts/types-of-enterprise-data-transactional-analytical-master/
The Digital Universe 2010
http://www.kimballgroup.com/wp-content/uploads/2011/04/Evolving-Role-of-EDW-in-the-Era-of-Big-Data-Analytics1.pdf
dw-bi
No comments:
Post a Comment