Eliminate generative AI risk by uncovering the data nuances

Image of an orange neon AI sign on a dark wall | AI risk

Before organizations go all-in on generative AI, they need to be mindful of the risks of pushing sensitive and proprietary data into publicly hosted Large Language Models (LLMs) – in terms of security, privacy, and governance. Businesses need to understand the impact these risks can have, and adopt an approach that maintains the security of data while complying with current regulations.

Data governance rules surrounding publicly hosted LLMs can be ambiguous. However, it’s understood that risks can arise from LLM’s ability to ‘learn’ from businesses’ internal prompts, and then potentially disclose that information to others who query for related things or be used to further train the LLM, as IDC points out. There are also worries about exposing sensitive data that a business shares and stores online to hackers, or accidentally disclosing to the public. So despite the perceived gains, organizations understandably are hesitant to adopt such models into their business, especially those that operate in regulated industries such as financial services and the public sector.

So, how can businesses harness the full potential of LLMs and AI, while minimizing risks at the same time?

 

Setting boundaries

Setting strict boundaries around the use of data and LLMs is essential when it comes to security and governance. Often, most large enterprises will already have these in place. Data should be strictly stored within a protected perimeter when large enterprises host and deploy LLMs. This means instead of data being sent out to an LLM, the LLM should be brought into the centralized and governed platform where data already resides. By doing so, businesses can ensure that employees will interact with the LLM, and data teams can develop and customize it, within their already secure infrastructure.

The prerequisite for a strong AI strategy is a strong data strategy. That means data cannot be siloed in different parts of the organization, and consistent, strong policies around security and governance need to be implemented. Ultimately, a business is looking to possess actionable and trustworthy data that can be easily used with an LLM in a secure and governed environment.

 

Training models for specific business needs

LLMs such as ChatGPT helped to spark the current flurry of interest in the technology, however, they have their challenges for enterprise users. These models have been ‘trained’ on the vast amounts of data across the entire internet. This makes the models prone to ‘hallucinations’, as they can offer inaccuracies, be biased, and provide offensive answers. For enterprise users, these foundational LLMs may also be less useful, as they have not been exposed to the internal systems and data that users might need, and therefore unable to answer business-specific queries.

A worthwhile solution to this is for a business to fine-tune and customize a model to the needs of the business and customers. Enterprises already have a large list of LLMs to choose from to download and use, including StarCoder from Hugging Face, StableLM from StabilityAI and Llama v2 from Meta. These models can be customized so that the LLM is trained on the right information and used behind a firewall. For business leaders, it’s key to choose a unified platform that protects data from unwanted ingestion, allowing business leaders to rapidly build and deploy custom LLM apps.

Choosing a smaller model built for the organization’s specific use cases can also save time and money. Smaller models built for a particular organization’s needs require far less memory and computing power than general-purpose LLMs. They are also far less time-consuming, as fine-tuning a model to a particular content domain takes a fraction of the time it takes to train a general-purpose model. All of this helps to make the business’s LLM more cost-effective and efficient.

 

Why multimodal AI matters

When it comes to using AI to its full potential, businesses need to ensure the model can work with all of the data they possess. With around 80 percent of the world’s data being unstructured, organizations are likely to have data in forms other than text, from image and video to weather and social media data.

Multimodal AI can be crucial for organizations here, enabling models to process and analyze all forms of data. Natural language processing (NLP) technologies can offer valuable insights for business leaders, highlighting the relationships between data stored as videos, images, text, and more. Once ingested and analyzed, business leaders should ensure their platform offers role-based access definitions to safeguard this data from unintended use.

 

AI caution is crucial to balance risk and reward

Generative AI offers organizations real and measurable benefits, empowering less technical employees to be more productive and creative in their roles. However, caution is advised. Business leaders need to ensure that they fully understand the models and services that they use, work with reputable vendors and put processes in place so their employees can use these models within their security perimeter.

Generative AI is also about balancing risk and reward – embracing the technology is essential for forward-thinking organizations, but they must ensure their teams can use it effectively and safely to drive long-term success.