UPDATED 11:54 EDT / JUNE 28 2023

AI

Databricks adds federation, generative AI front end and development tools to its lakehouse

Databricks Inc. is joining the generative AI parade today with a query engine that lets users ask questions and get contextually relevant answers in natural language.

LakehouseIQ is integrated with the company’s Unity Catalog governance platform to ensure adherence to internal security and governance rules.

The company noted that although large language models are good at answering general questions, they don’t understand jargon, internal acronyms or a company’s unique data sets. LakehouseIQ learns from signals within an organization using schemas, documents, queries, popularity, lineage, notebooks and business intelligence dashboards to improve query performance over time. Integration with Unity Catalog ensures that employees only have access to the data they are authorized to use.

LLMs in a lakehouse

Databricks is also using this week’s Data + AI Summit to announce a set of tools for building generative AI applications, including large language models, within its Lakehouse platform. Called Lakehouse AI, the toolset includes vector search, a curated collection of open-source models, lakehouse monitoring, LLM-optimized model serving and the open-source MLflow 2.5 platform for managing machine learning development projects.

“Databricks has been on a mission to democratize data and AI for more than a decade and we’re continuing to innovate as we make the lakehouse the best place for building, owning and securing generative AI models,” Databricks co-founder and Chief Executive Ali Ghodsi (pictured) said in a statement. He added in this morning’s keynote: “Every company in the future will be a data and AI company.”

Vector search uses vector representations to find similar items or retrieve semantically relevant information without the need for exact keyword matching. As implemented in Lakehouse AI, it can be used to manage and automatically create vector embedding from files in Unity Catalog.

The unified platform reduces the complexity of moving models from experimentation to production and improves data quality, Databricks said. Customers can use foundational models within the platform to train their own custom models securely and bring together operations, monitoring and governance on a single platform.

Secure model training

Databricks AutoML is a low-code approach to fine-tuning LLMs securely with a user’s own data and without having to send data to a third party. MLflow, Unity Catalog and Model Serving integrations enable models to be shared within an organization, governed for authorized use, served for inference in production and monitored.

A curated list of open-source models is available within Databricks Marketplace covering a variety of generative AI use cases. Lakehouse AI capabilities like Databricks Model Serving have been performance and cost-optimized for these models.

MLFlow 2.5 includes a gateway that enables organizations to centrally manage credentials for software-as-a-service models or model APIs and provide access-controlled routes for querying. Routes can be shared across the organization and the backend models can be swapped out without disrupting operations. No-code visual tools enable various models’ output data to be compared based on a set of prompts, which are automatically tracked within MLflow.

Databricks Model Serving has been optimized LLM inferencing to achieve up to 10 times lower latency compared to unoptimized models. The fully managed service now enables graphics processing unit-based inference support, automatically logs and monitors all requests and responses to Delta Tables and ensures end-to-end lineage tracking through Unity Catalog. Model Serving scales up and down as demand changes to reduce operational costs.

Databricks Lakehouse Monitoring provides full visibility into data pipelines to give users insight into the lineage of their data and AI assets. Proactive detection and reporting simplifies error detection and the software automatically performs root cause analysis and identifies recommended solutions to common problems.

Enabling data mesh

The company also today announced lakehouse federation capabilities that enable organizations to create a data mesh architecture with unified governance. Lakehouse Federation in Unity Catalog allows users to discover, query and govern data across all of their data platforms from within Databricks without the need for moving or copying. Databricks said the capability is aimed at addressing data fragmentation by making it possible for organizations to expose and govern siloed data as an extension of the lakehouse.

New catalog and querying capabilities consolidate and map data to provide a single view across multiple sources including MySQL, PostgreSQL, Amazon Web Services Inc.’s Redshift, Snowflake Inc.’s Snowflake, Microsoft Corp.’s Azure SQL Database, Azure Synapse and Google LLC’s BigQuery.

A recently announced Hive Metastore interface for Unity Catalog further expands data discovery, governance and management to include Amazon’s Elastic MapReduce, Apache Spark, Amazon Athena, the open-source Presto and Trino distributed query platforms and others.

Unity Catalog enables consistent access policies to be defined for tables, rows, columns and tags on any registered asset. Future enhancements will let customers push those policies to other data warehouses for consistent enforcement.

The net effect is to allow for distributed domain ownership while reducing the need for data integration, thereby saving on storage costs and improving overall data security and governance.

MLflow 2.5 features will be available in the July release of MLflow. New capabilities such as vector search and lakehouse monitoring are currently in preview.

Lakehouse Federation and the Hive metastore interface will be available soon in a public preview. A waiting list is available here.

Photo: Databricks/livestream

A message from John Furrier, co-founder of SiliconANGLE:

Your vote of support is important to us and it helps us keep the content FREE.

One-click below supports our mission to provide free, deep and relevant content.  

Join our community on YouTube

Join the community that includes more than 15,000 #CubeAlumni experts, including Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.

“TheCUBE is an important partner to the industry. You guys really are a part of our events and we really appreciate you coming and I know people appreciate the content you create as well” – Andy Jassy

THANK YOU