Tuesday, July 16, 2024

About BIRD, SQL, IBM granite models, and your business reporting

BIRD-SQL benchmark
Some years ago, when composing SQL queries, I was hoping that those queries would just "fly", performing flawlessly and quickly. Now, I stumbled over something SQL-related that seems to fly: BIRD-bench. It measures Large Language Models' (LLMs) capabilities to generate SQL queries from text input. It is at the core of SQL: You describe the result set you need.

What is BIRD?

In its own words:

BIRD (BIg Bench for LaRge-scale Database Grounded Text-to-SQL Evaluation) represents a pioneering, cross-domain dataset that examines the impact of extensive database contents on text-to-SQL parsing.

Basically, it aims to answer whether LLMs already can serve as database interface. You would truly describe the results to obtain, the LLM together with the database would take care of the rest. The benchmark contains 12,751 text-to-SQL pairs and 95 databases with a total size of 33.4 GB, spanning 37 professional domains.

The benchmark evaluates execution accuracy (EX) and valid efficiency score (VES). The first, accuracy, measures the correctness of the generated SQL statements. The efficiency score looks into how well the query would "fly". It is an important property given massive amounts of data.

Details are described in an academic available at the BIRD-bench website.

Text-to-SQL

Using a conversational UI (chatbot) to turn text into valid SQL is an important capability. If you have ever tried to get from a description of the desired result to the actual SQL statement, you probably experienced the challenges involved, especially for more complex scenarios. Having an LLM to come up with accurate and efficient SQL statements helps to gain additional insights and to unlock the full value of corporate data.

Benchmark results

Let's try a prompt to retrieve benchmark results... :)

Return the creator of the currently best performing LLM for text-to-SQL according to BIRD-bench.
As of now (2024-07-16), that would be IBM:

BIRD-bench and current leaders for Valid Efficiency Score

That's it for today. If you have feedback, suggestions, or questions about this post, please reach out to me on Mastodon (@data_henrik@mastodon.social) or LinkedIn.