TRichView Unicode support
Posted: Thu Aug 07, 2008 9:09 am
Hi Sergey,
regarding support of Chinese characters I have a question and possibly a few suggestions.
As I was forced to investigate a problem with our application when running under a DBCS default locale, I took a closer look at the source code of TRichview. I understand, that TRichview does not support DBCS character sets and that this behaviour is documented. This is ok, although I did not know this, when the decision to use TRichview in our product was made.
Problems reported regarding Chinese character support in TRichview include:
- Word wrap behaviour (a double byte character must not be split)
- Cursor movement (2 cursor key presses are necessary to move over one Chinese character
- Painting of the selection may start inside of a double byte pair (and result in the display of bogus characters)
Ok, I think you get the idea. And I fully understand why TRichview does not support DBCS.
But I am now faced with the dilemma of either switching over to Unicode support or implementing the missing character boundary checks.
At this time, I tend to do Unicode, but I could not yet find any hint in the source code, that TRichview supports UTF-16 correctly. From what I could see, the RVU_Length function only counts 2 bytes as a character. Therefore I think, that Unicode Surrogate pairs (characters outside of the BMP (Basic Mulilingual Plane or Plane 0) would have the same problem as DBCS character sets. To my knowledge, such characters are used at least on Traditional Chinese systems. For these also, word wrap, cursor-movements etc. would have to know the actual character boundary. Actually, I think that a fix for this would also be very similar to what would be needed to support DBCS (both are using either one or 2 chars (each made up of 1 or 2 bytes)).
Could you please confirm that Unicode code points outside of the BMP are either supported or not supported by TRichview?
If they are supported, I will switch to use Unicode text in my items. Otherwise I would have to implement the necessary changes anyway and could also do it for DBCS, which would, hopefully, not have an impact on single byte encodings.
I would appreciate your thoughts on this issue.
Best regards,
Gunnar
GERMANY
regarding support of Chinese characters I have a question and possibly a few suggestions.
As I was forced to investigate a problem with our application when running under a DBCS default locale, I took a closer look at the source code of TRichview. I understand, that TRichview does not support DBCS character sets and that this behaviour is documented. This is ok, although I did not know this, when the decision to use TRichview in our product was made.
Problems reported regarding Chinese character support in TRichview include:
- Word wrap behaviour (a double byte character must not be split)
- Cursor movement (2 cursor key presses are necessary to move over one Chinese character
- Painting of the selection may start inside of a double byte pair (and result in the display of bogus characters)
Ok, I think you get the idea. And I fully understand why TRichview does not support DBCS.
But I am now faced with the dilemma of either switching over to Unicode support or implementing the missing character boundary checks.
At this time, I tend to do Unicode, but I could not yet find any hint in the source code, that TRichview supports UTF-16 correctly. From what I could see, the RVU_Length function only counts 2 bytes as a character. Therefore I think, that Unicode Surrogate pairs (characters outside of the BMP (Basic Mulilingual Plane or Plane 0) would have the same problem as DBCS character sets. To my knowledge, such characters are used at least on Traditional Chinese systems. For these also, word wrap, cursor-movements etc. would have to know the actual character boundary. Actually, I think that a fix for this would also be very similar to what would be needed to support DBCS (both are using either one or 2 chars (each made up of 1 or 2 bytes)).
Could you please confirm that Unicode code points outside of the BMP are either supported or not supported by TRichview?
If they are supported, I will switch to use Unicode text in my items. Otherwise I would have to implement the necessary changes anyway and could also do it for DBCS, which would, hopefully, not have an impact on single byte encodings.
I would appreciate your thoughts on this issue.
Best regards,
Gunnar
GERMANY