The big clean up: welcome to data clean rooms

data clean

Clean, raw data, for your eyes only – and protecting consumer information


Data has been going through something of a democracy surge. Not just as a result of so-called citizen data scientists now springing up to engage with abstracted simplified data analytics tools, but also in the sense of its wider usage.

During the last half-decade in particular, we have witnessed the rise of data marketplaces, data exchanges and what we now call data clean rooms as mechanisms to collaborate with information resources, some of which will be monetized by their owners and originators, but some of which may be open and free.

Although organizations today realize they can benefit from the sharing and exchange of corporate operational data, they will still want to protect sensitive consumer information and intellectual property. This means companies are sharing the ‘shape and form’ of their data, rather than the actual values and attributes in the traditional sense. 

Anonymized and obfuscated data shapes 

If a clothing retailer is about to go to market in Eastern Europe or South East Asia in a new expansion drive, then it might reasonably want to see what type of sales cycle, supply chain and seasonal trend data others who have come beforehand have experienced. This doesn’t mean data owner A gives data consumer B the names and addresses of its customers or the value of their purchases. Instead, it means they share the shape of the data topography they have experienced, typically in an anonymized or obfuscated form. 

Businesses are ensuring they reach audiences in the most effective possible way – David Fisher, Snowflake 

Data clean rooms can help make this happen by allowing multiple parties to combine and analyze their data in a protected environment, where participants are unable to see each other’s raw data. 

Steve Sobel, global head for communications, media and entertainment GTM division at Databricks says that data clean rooms allow businesses to easily collaborate with their customers and partners on any cloud service in a secure, governed and privacy-safe environment. A key additional benefit here is the fact that organizations can then keep up with the rapidly changing security, compliance and privacy landscape. 

Databricks launched its own data clean rooms functionality as part of the company’s efforts to accelerate the company’s vision for open and collaborative sharing. Its offering in this space enables what it calls collaboration in a fragmented ecosystem, i.e. multiple participants can share and join their existing data and run complex workloads based on them in a variety of programming languages – Python, R, SQL, Java and Scala – without risk of exposing data to other participants unintentionally. 

Overcoming pre-existing challenges 

In the development of his firm’s own solution in this space, Sobel also points to the issues that have existed with some pre-existing data cleanroom solutions, including the difficulties associated with data replication, a restriction to the use of SQL and challenges associated with scaling. Databricks says it has conquered those issues to now present what is a truly functional data clean room service. 

Well known for championing this technology is data cloud company Snowflake. The company highlights the relevance and use of data clean rooms in the context of the marketing, media and advertising industries, which are today facing more regulation than ever – especially when it comes to how data can be used and processed. Data regulations, such as GDPR, matched by increasing pressure from consumers to understand what data is being collected and processed, is making it increasingly difficult to give customers the experiences they expect. 

“By offering the ability to analyze and share data in a data clean room, businesses aren’t just protecting themselves from being at the heart of the next headline data breach, they are also ensuring that they are reaching audiences in the most effective way possible,” explains David Fisher, Snowflake’s industry principal for media and entertainment. “Data clean rooms help solve this challenge by enabling multiple parties to combine, collaborate and analyze data in a protected environment, where participants are unable to see each other’s raw data.” 

In the case of Snowflake’s approach, amongst others, sensitive data such as personally identifiable information (PII) can be matched, but is encrypted to each party. Adoption of data clean rooms has been pioneered within the media community for advertising use cases where collaborative data sets are used for ad targeting and measurement, driving yield and effectiveness.  

Clean rooms aren’t a panacea for the loss of third-party cookies – Chris Hogg, Lotame

However, as the prevalence of data clean rooms grows, any organization or industry handling PII data, will be effectively able to securely collaborate on data with other parties, using data clean rooms.  

“Snowflake’s data clean rooms leverage our platform’s core architecture, enabling each organization to maintain full control of their data in their own secure Snowflake account. In other words, two (or more) organizations can use Snowflake data clean rooms to join and collaborate on data in near real-time, without copying, moving or sharing the underlying data. Crucially, they can also perform analysis on large amounts of data with high performance and scalability,” says Fisher, providing full context in terms of how the use case here actually executes. 

Outlier identifier 

In the widest sense then, the act of using data in this collaborative manner can help identify high-value targets, pinpoint lapsed customers and uncover opportunities to win new business.  

While it’s easy to start thinking of the data clean room as some kind of bustling marketplace with a number of contributing and consuming entities, it can simply be a way for two separate organizations to create a straddling interchange bond. Chief revenue officer at data management and identity solutions company Lotame is Chris Hogg. Explaining that data clean rooms are collaborative environments where two or more organizations can share and compare data sets, Hogg reminds us that all data is scrambled with cryptography so that each party can retain full ownership of their data, choose which data is visible and which is hidden – plus ensure they can remain compliant with privacy regulations. 

Through this exchange, shared audiences and cross-platform customer journeys may be discovered, but only if the data being shared is in a usable state – as always, this is a case of garbage in, garbage out. 

No cookie crumble panacea 

“Clean rooms aren’t, however, a panacea for the loss of third-party cookies, despite what those marketing the technology might say,” says Hogg. “Unless all parties go in with a clear objective for what they want to get out of a clean room, everyone will leave confused as to what the use case is supposed to be. We’ve found publishers – who have much to gain from expanded audiences – have been early adopters of the tech, but are now questioning the ROI of their efforts, leading to high rates of retirement.” 

Hogg also highlights how important it is to distinguish between third-party clean rooms that meet these open and shared understanding criteria, versus those increasingly being offered by media giants.  

Data clean rooms are going to be so important in 2023. It’s the right way to handle other people’s information – Adam Jennings, Doublecloud 

“There is the notion of a perhaps more commercially-pumped clean room, that uses the same technology, but operates more like a data marketplace, where users barter or pay to take a peek at the masses of profiles held within walled gardens,” said Hogg. “In this case, it is a tool for further consolidation and centralization of data rather than mutually beneficial collaboration.” 

Starting to look ahead into the next five years (always the only sensible window within which to assess the development of ‘the next big thing’), the wider use of data exchange technologies appears likely to enjoy wider development and deployment. Senior solutions architect at analytics platform DoubleCloud Adam Jennings is logically backing the proliferation of data clean rooms; his firm specializes in technology designed to store, analyze and transfer data in an easy and “undeniably” fast way. 

“Data clean rooms are going to be so important in 2023, as organizations look to preserve privacy by stripping out personal identifying information. Not just for the sake of laws such as GDPR and CCPA, but because it’s the right way to handle other people’s information. They will allow businesses to share anonymized information, meaning they can still leverage one of their best assets, data, to continue delivering a great customer experience,” says Jennings. 

DoubleCloud is a newly developed data platform that helps a business build an end-to-end modern data stack and sub-second data analytics functions with fully managed open source technologies, like ClickHouse (for analytical processing) and Apache Kafka (for data streaming).  

Neutral, secure and privacy-compliant 

VP of product at customer data platform company BlueConic Michele Szabocsik agrees with many of the sentiments expressed here so far. She points out that a data clean room should provide a “neutral, secure and privacy-compliant environment” to facilitate customer data sharing between two or more parties without revealing PII data across parties.  

“It’s especially valuable when two parties have a direct relationship with one another. For example a consumer goods manufacturer and a retailer that sells its products, a publisher and an advertiser that buys ads to target its audience, or a financial institution that offers a rewards credit card in partnership with an airline, hotel chain or retail company,” clarifies Szabocsik. 

I’d urge brands and publishers investing in a clean room to do their due diligence – Michele Szabocsik, Blueconic 

Even in a world without third-party cookies, data clean rooms enable privacy-compliant second party data sharing between multiple parties so they can: 

  • Uncover actionable insights about shared customers and audiences 
  • Measure marketing and advertising campaign impact on a shared audience 
  • Activate valuable audiences that would have otherwise been undiscoverable 

 “But it’s the ‘what’ and the ‘how’ in that last point about activation that may have privacy regulators raising their eyebrows and re-opening the books on GDPR, CCPA, and the like,” Szabocsik continues. “With privacy-related fines ramping up in the US and Europe over the last year, I’d urge brands and publishers investing in a clean room to do their due diligence. 

“Questions start to arise when shared audiences are built inside a clean room by combining customer data sets from two different parties for the purposes of directly targeting those audiences with a marketing or advertising message, but in the absence of individual consent across both parties.” 

Even if the PII data is never revealed in the clean room or shared across parties, the BlueConic team advises that the shared audience output is still a product of matching unique customer IDs across parties – and Szabocsik highlights the fact that such practices raise questions like: 

  • Is shared audience activation directly from a data clean room just a data management platform by another name? 
  • Does it violate the intent of consumer privacy laws such as GDPR, CCPA and others that require explicit consent to target an individual based on their personal information? 
  • If you have consent to directly target the individuals that make up the shared audience, then why not just enrich the customer records in your first party data set and bypass the need for a data clean room altogether? 

“A more privacy-compliant alternative is to use a model-based approach to activation. So for example, rather than directly targeting the shared audiences built in the clean room, brands and publishers could develop rich lookalike models based on those shared audiences and then apply those models to their consented, first party data for targeted activation,” concludes Szabocsik. 

Data clean rooms allow businesses to easily collaborate with their customers and partners – Steve Sobel, Databricks

Cleaning up on clean rooms 

Asking the tech industry at this precise point in time to detail its thoughts on data clean rooms appears to have been a very appropriate and pertinent discussion point. Not only do we have data marketplace specialists positively championing the use of this technology, we also have data-centric cloud players showcasing their data collaboration prowess. There is also an expansive universe of data management and data analytics vendors also making sure they have skin in the game. 

We can expect the identity platform specialists to help further this discussion as we now lock down any cracks or fissures that might have developed in what is still version 1.0 of this technology’s total base of iteration. Data clean rooms are here and the cleaning staff are currently mop in hand – now, please wash your hands.