Monday, March 14, 2016

Can N function cause problems with existing queries?

Leave a Comment

We use Oracle 10g and Oracle 11g.

We also have a layer to automatically compose queries, from pseudo-SQL code written in .net (something like SqlAlchemy for Python).

Our layer currently wraps any string in single quotes ' and, if contains non-ANSI characters, it automatically compose the UNISTR with special characters written as unicode bytes (like \00E0).

Now we created a method for doing multiple inserts with the following construct:
INSERT INTO ... (...) SELECT ... FROM DUAL UNION ALL SELECT ... FROM DUAL ...

This algorithm could compose queries where the same string field is sometimes passed as 'my simple string' and sometimes wrapped as UNISTR('my string with special chars like \00E0').

The described condition causes a ORA-12704: character set mismatch.

One solution is to use the INSERT ALL construct but it is very slow compared to the one used now.

Another solution is to instruct our layer to put N in front of any string (except for the ones already wrapped with UNISTR). This is simple.

I just want to know if this could cause any side-effect on existing queries.

Note: all our fields on DB are either NCHAR or NVARCHAR2.


Oracle ref: http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch7progrunicode.htm

3 Answers

Answers 1

Basicly what you are asking is, is there a difference between how a string is stored with or without the N function.

You can just check for yourself consider:

SQL> create table test (val nvarchar2(20));  Table TEST created.  SQL> insert into test select n'test' from dual;  1 row inserted.  SQL> insert into test select 'test' from dual;  1 row inserted.  SQL> select dump(val) from test; DUMP(VAL)                                                                       -------------------------------------------------------------------------------- Typ=1 Len=8: 0,116,0,101,0,115,0,116                                             Typ=1 Len=8: 0,116,0,101,0,115,0,116   

As you can see identical so no side effect.

The reason this works so beautifully is because of the elegance of unicode

If you are interested here is a nice video explaining it

https://www.youtube.com/watch?v=MijmeoH9LT4

Answers 2

I assume that you get an error "ORA-12704: character set mismatch" because your data inside quotes considered as char but your fields is nchar so chay collated using different charsets one using NLS_CHARACTERSET other NLS_NCHAR_CHARACTERSET.

When you use an UNISTR function, it converts data form char to nchar (besides that converts encoded values into characters) as oralce docs says:

"UNISTR takes as its argument a text literal or an expression that resolves to character data and returns it in the national character set."

But when you converts values explicitly using N or TO_NCHAR you get only value in NLS_NCHAR_CHARACTERSET without decoding. But if you have some values encoded like this "\00E0" they will not be decoded and will be considered as is.

So if you will get such insert:

   insert into  select N'my string with special chars like \00E0',      UNISTR('my string with special chars like \00E0') from dual .... 

your data in first inserting field will be: 'my string with special chars like \00E0' not 'my string with special chars like à'. This is the only side effect I'm aware of. Other queries should already use NLS_NCHAR_CHARACTERSET encoding, so it shouldn't be any problem using explicit conversion.

And by the way, why not just insert all values as N'my string with special chars like à', just encode them into UTF-16(I suppose that you use UTF-16 for ncahrs) first if you use different encoding in 'upper level' software.

Answers 3

  • use of n function - you have answers already above.

If you have any chance to change the charset of the database, that would really make your life easier. I was working on huge production systems, and found the trend that because of storage space is cheap, simply everyone moves to AL32UTF8 and the hassle of internationalization slowly becomes the painful memories of the past.

I found the easiest thing is to use AL32UTF8 as the charset of the database instance, and simply use varchar2 everywhere. We're reading and writing standard Java unicode strings via JDBC as bind variables without any harm, and fiddle.

Your idea to construct a huge text of SQL inserts may not scale well for multiple reasons:

  • there is a fixed length of maximum allowed SQL statement - so it won't work with 10000 inserts
  • it is advised to use bind variables (and then you don't have the n'xxx' vs unistr mess either)
  • the idea to create a new SQL statement dynamically is very resource unfriedly. It does not allow Oracle to cache any execution plan for anything, and will make Oracle hard parse your looong statement at each call.

What you're trying to achieve is a mass insert. Use the JDBC batch mode of the Oracle driver to perform that at light-speed, see e.g.: http://viralpatel.net/blogs/batch-insert-in-java-jdbc/

Note that insert speed is also affected by triggers (which has to be executed) and foreign key constraints (which has to be validated). So if you're about to insert more than a few thousands of rows, consider disabling the triggers and foreign key constraints, and enable them after the insert. (You'll lose the trigger calls, but the constraint validation after insert can make an impact.)

Also consider the rollback segment size. If you're inserting a million of records, that will need a huge rollback segment, which likely will cause serious swapping on the storage media. It is a good rule of thumb to commit after each 1000 records.

(Oracle uses versioning instead of shared locks, therefore a table with uncommitted changes are consistently available for reading. The 1000 records commit rate means roughly 1 commit per second - slow enough to benefit of write buffers, but quick enough to not interfer with other humans willing to update the same table.)

If You Enjoyed This, Take 5 Seconds To Share It

0 comments:

Post a Comment