Wednesday, May 16, 2007

National character set and string literals

Until the arrival of Oracle 10gR2 there was an quite interesting limitation regarding string literals and the national character set: If you were using a non-unicode database character set (e.g. in Europe usually WE8ISO8859P1 or WE8ISO8859P15) and attempted to put a character not being part of the database character set into a column of type NCHAR/NVARCHAR/NCLOB using a string literal, you wouldn't succeed. This was because the SQL parser always converted the whole SQL string into the database character set before parsing and executing the statement.

Although some versions ago Oracle introduced the N' as national character set string literal modifier, it didn't really help due to the limitation mentioned above. So even string literals marked with N' were converted into the database character set by the SQL parser.

10gR2 introduces a new OCI mode and a corresponding environment variable which can be set to enable the new mode for OCI applications that do not support turning it on explicitly. Using a 10gR2 client and server, you can set the following environment variable:

ORA_NCHAR_LITERAL_REPLACE=TRUE

before starting the OCI application, e.g. SQL*Plus. You need to enable this new mode explicitly, by default it is disabled (for backwards compatibility, they say).

So, only if these three requirements are met:

1. 1ogR2 server
2. 10gR2 client
3. Environment variable ORA_NCHAR_LITERAL_REPLACE=TRUE is set on client before starting OCI application

then the above mentioned limitation is lifted and you are actually able to put characters only available in the national character set into the database by using a N' string literal.

Note that you need to use the N' modifier for the string literal, otherwise a string literal will be converted into the database character set, and meeting the requirements mentioned above will not help either.

Here is a small script that demonstrates the issue. It was tested in the following environment:

Server:
10gR2 10.2.0.2 Enterprise Edition on Linux (32bit)
Database character set WE8ISO8859P1
National character set AL16UTF16

Client:
10gR2 10.2.0.3 Windows XP (32bit)
Client character set WE8MSWIN1252

The script:

create table test_nls1(text1 nclob);

create table test_nls2(text2 clob);

insert into test_nls1(text1) values (N'äöü@@ßß•');

insert into test_nls2(text2) values (N'äöü@@ßß•');

commit;

insert into test_nls1(text1) values ('äöü@@ßß•');

insert into test_nls2(text2) values ('äöü@@ßß•');

commit;

If you use a application that is capable of showing the character accordingly (funny enough SQL*PlusW, the windows version of SQL*Plus does not, although it can be used to run the script, it shows the last character as "block"), e.g. the Java based Oracle tool SQLDeveloper (which is in turn not capable of storing the data correctly in the database, at least it didn't work for me, may be some settings need to adjusted), you'll find out that only the first of the four inserts was actually successful in storing the last character of the text literal correctly in the database. And it will only be successful if you met the three requirements mentioned above.

Note that the last character of the string literal is a Windows specific ANSI character that is not part of the WE8ISO8859P1 database character set. So if you attempt to insert this string into a normal text column the character set conversion will replace this character with one of the usual "replacement" characters like "¿".

Please note further that you need a database that is using a different database character set than WE8MSWIN1252. This character set is typically used by default databases created under Windows, so be aware of the fact that your results will be different if your database uses this character or if your database uses the same character set as your client (because in this case no conversion at all will take place).

And one final note: My first attempts to test this new functionality failed when using the "Instant Client 10.2.0.1" on Windows XP (32bit). It just didn't seem to use the new setting of the environment variable, I never succeeded in putting the string literal correctly into the NCLOB column.


You can find further details about this topic at the following web locations:

MetaLink Knowledge Base: Character Sets & Conversion - Frequently Asked Questions Doc ID: 227330.1
Oracle 10gR2 Online Documentation - Globalization Support Guide - Programming with Unicode

No comments:

Post a Comment