Monday, October 30, 2023

πŸŽƒ Unicode characters and Db2 πŸ•Έ️ 🏚️

A smiley query in Db2
Recently, I had a discussion about Unicode characters and Db2. Since Db2 LUW version 9.5, new databases default to the Unicode code page. Instead of having the entire database in the Unicode code page, you can specify a CCSID (coded character set identifier) for either individual columns or the whole table when creating the table (isn't that 😱?). Our discussion was not around emojis (πŸ˜€) and Halloween (πŸŽƒ), but "business" - the Euro sign (€). How can you insert and retrieve Unicode characters when you know their code points? Let's take a look and have some fun...

Db2 character string constants

Unicode characters can be inserted into Db2 as typographical characters (so-called "glyphs") or by their code points. The Db2 documentation on constants has a section discussion character string constants, including two paragraphs on Unicode characters. The typographical character for the Jack-O-Lantern is πŸŽƒ, the code point is U+1F383. It is important to note that a Unicode character can be composed from multiple characters or code points to force other rendering, select from subsets and codespaces, have special orientation and much more. The spider web (πŸ•Έ️) could sometimes be encoded as just U+1F578, but I had to use U+1F578 U+FE0F. The latter "U+FE0F" is a variation selector, namely the emoji variation selector. It enforces the rendering as colorful emoji.

I am not too deep into Unicode strings and the above must do. BTW: The Euro sign is a simple U+20AC.

Emojis and Euro signs in Db2

For my tests, I utilized the free Db2 Community Edition for Docker. In a new test database with Unicode code page, I created a simple table with integer and string columns. Then, after some tests, I inserted the following from the Linux command line:

db2 "insert into uni values(1, U&'\20AC')"
db2 "insert into uni values(2, U&'\+01F602')"
db2 "insert into uni values(3, U&'\+01F600')"
db2 "insert into uni values(4, U&'\+01F609')"

The I retrieved the inserted data and go the result as shown in the screenshot at the top. You probably noticed that I used different ways to specify the Unicode character constants, U&'\..' and U&'\+..'. The first is used for constants with four hexadecimal digits, the second form with the plus sign for six hexadecimal digits. The backslash (\) is the default escape character, but you could change it by specifying the UESCAPE clause.

If you want to just print the Halloween-themed characters from this blog post's title, then use the following command:

db2 "values(U&'\+01F383 \\+01F3DA\FE0F \\+01F578\FE0F')"

It should return and output like this:

Db2 in Halloween mood / mode

To get the second and third character, I had to use the variation selector and hence enter a longer sequence of digits.

I hope you enjoyed πŸ¦„ this look at Unicode support in Db2 and found it not too spooky πŸ‘». If you have feedback, suggestions, or questions about this post, please reach out to me on Twitter (@data_henrik), Mastodon (@data_henrik@mastodon.social), or LinkedIn.