By Amit Walia, SVP & GM, Data Integration & Data Security, Informatica Inc.
Big data has been a buzzword in the industry for years, but we are starting to see hard evidence of both enterprises that have been successful with technologies, such as Hadoop, and those that are facing challenges. One of the distinguishing elements of successful enterprises is the holistic approach with which they have approached big data management, governance and security.
There is clear value to using technologies, such as Hadoop, to build next generation "data lakes" for collecting, preparing, and analyzing greater volumes and types of data. Enterprises are augmenting their traditional data warehousing architectures to include Hadoop both as a more efficient and scalable preparation stage and to offload less frequently used data for easily access archives. But as more enterprise data moves into Hadoop for collection, preparation, and analysis, concerns around security can arise. Compliance-sensitive industries, such as healthcare or financial services, or any other consumer-driven industry, such as retail or CPG, are legally obligated to ensure strict controls on the use of data. These controls must also apply to new data platforms, such as Hadoop.
The risks of data breaches are very clear. Most conservatively, data breaches lead to a loss in customer trust and revenue shortfalls from consumer churn. More drastically, we hear of enterprise executives who face fines or incarceration for failing to meet government mandated compliance policies for the collection and access of sensitive consumer information. The risks of data breaches are all the more complicated by a trend toward more autonomous and self-service access to data by an increasingly information-driven workforce. In a world where data is becoming progressively more strategic to the enterprise, organizations have the opportunity to either manage data as an asset or face the risk of it becoming a liability.
Successful enterprises are taking a holistic approach to big data security by evolving beyond traditional perimeter and endpoint-based security approaches and addressing the security of data itself at multiple levels.
1) Authentication. Most Hadoop distributions have native Kerberos-based systems for controlling access to data. Enterprises are enabling Kerberos-based
access controls and using data preparation technologies that fully integrate with these control systems.
2) Authorization. Most Hadoop distributions are also shipping with fine-grained authorization mechanisms, such as Apache Ranger and Apache Sentry.
Enabling fine-grained authorization to data further ensures protected access to sensitive information.
3) Sensitive Data Protection. Data Masking technologies can be used to de-identify/de-sensitize private and confidential data. Data access is controlled
in instances where data sets are shared for outsourcing, application testing or third-party analytics.
4) Data Security Intelligence. A new category of data security has recently emerged to even more effectively improve enterprise security posture through
proactive risk management. Before migrating data to Hadoop, organizations can identify where sensitive data resides and understand how it should be
protected. After migrating data to Hadoop, risk can continue to be monitored with proper data security controls applied so that data proliferation is
Advanced companies who have gained expertise in Hadoop technologies are also leveraging the larger-scale data analysis capabilities of Hadoop to drive even broader security outcomes for the organization. For example, financial services organizations are collecting network access data, physical building access data and other sources of human interaction data in real-time into Hadoop to more quickly and comprehensively identify patterns of intrusion detection. The use of big data analytics as a platform to drive broader security outcomes opens up fundamentally new categories of big data driven security analytics.
There is no doubt that big data security is one of the key pillars of making big data ready for analytical success. As the use of Hadoop continues to grow and more sensitive information is collected, processed and analyzed in Hadoop, enterprises will need holistic approaches for ensuring big data security. Successful organizations are moving beyond traditional and superficial approaches to security to focus on more intelligent and metadata-driven approaches to data security. By leveraging a systematic understanding of big data, enterprises can more holistically improve their big data security postures and ensure that big data remains an asset, and not a liability.