Hana & Hadoop - Dream Team for Big Data


SAP users who have had the opportunity to work with Hana are usually enthusiastic about the possibilities and the incredible performance. They would love to store as much data as possible in Hana and process it directly "in-memory".
Unfortunately, there is a catch: the more Hana is used, the more expensive it becomes and the more resources are required. In addition, there is a general limitation of Hana:
The relational data store is ideal for structured data, but it becomes difficult with unstructured data such as logs, social media feeds, documents or images. But that is exactly where Hadoop has its strengths.
A platform like Cloudera Enterprise enables economical and flexible storage, processing and analysis of Big Data, but does not have comparable functionality and performance when hosting relational workloads, such as online transaction processing (OLTP) in a database or data warehouse.
None of this is new, and in fact many companies operate both platforms side by side, separately and apart from each other, to leverage their respective strengths.
However, it is often overlooked that an integrated architecture of both solutions can combine the best of both worlds while compensating for the disadvantages.
Cloudera Enterprise can accommodate large volumes of a wide variety of data and limit SAP Hana's hunger for resources. In particular, it ideally complements Hana in the following areas:
- Data volumes and diversity: Traditional data management systems reach their limits here. Organizations have to compromise on cost and technical complexity to decide what data to keep and what not to keep, and unstructured data is difficult to model and store. Cloudera Enterprise is ideal for providing and storing all of an organization's data.
- Resource utilization: In an integrated architecture, processes can be outsourced. Hana can then use the freed-up resources to best serve queries and applications.Most commonly, resource-intensive ETL workloads are outsourced, but it is also possible to transfer queries and analytics to Cloudera Enterprise, especially for very large data sets.
- Capacity issues: It can be beneficial to offload data from Hana, such as historical data or data of low value. This makes it possible to run analyses over longer periods of time without having to keep the complete data history in Hana. Less load on Hana servers automatically means lower costs.
- Analytics and query resources: In an integrated architecture, Hana can handle the rapid processing of structured and online data, for example, OLTP, data warehousing or OLAP (Online Analytical Processing).Cloudera Enterprise provides complementary capabilities to process large volumes of unstructured online and offline data. Organizations can decide which distribution makes the most sense from a cost and performance perspective.
In an integrated architecture, costs can be significantly reduced compared to a pure Hana operation. Companies can start with simple tasks such as outsourcing ETL workloads and move step by step to a combined analytics platform.
Beyond outsourcing individual processes, it is even possible to pull data from Hana and transfer it to Cloudera Enterprise for analysis. This can be useful for examining data of unknown value without using Hana's expensive resources, which in turn can be used for more important workloads.
The bottom line is that this is a real dream team: Companies don't have to choose one platform over the other; in combination, they get the best of both worlds.