Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

General TRichView support forum. Please post your questions here
Post Reply
saeid2016
Posts: 72
Joined: Wed Mar 16, 2016 11:56 am

Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Post by saeid2016 »

Hello,
When we import docx or doc files by TRVOfficeImporter to TRichViewEdit, If the footnotes has ZERO WIDTH NON-JOINER(8204) character it have been deleted after import but this character imports correctly in main text.
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Post by Sergey Tkachenko »

Try copy-pasting from MS Word to TRichViewEdit.
If this problem persists, the problem is in our RTF reading procedure. Otherwise, most probably, the problem is in the converter.
saeid2016
Posts: 72
Joined: Wed Mar 16, 2016 11:56 am

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Post by saeid2016 »

Sergey Tkachenko wrote: Sun Feb 18, 2018 10:25 am Try copy-pasting from MS Word to TRichViewEdit.
If this problem persists, the problem is in our RTF reading procedure. Otherwise, most probably, the problem is in the converter.
I tried copy-pasting from MS Word to TRichViewEdit. The problem doesn't exist.

I use this converter: https://www.microsoft.com/en-us/downloa ... .aspx?id=3
I have downloaded and installed it's update from here: https://support.microsoft.com/en-us/hel ... y-pack-sp3

Is there an other converter to use it?
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Post by Sergey Tkachenko »

I looked at RTF files generated by the converter and MS Word 2016.
The both of them saved this character (8204 decimal code, or 200C hexadecimal code) using \zwbo keyword, both in the main text and in footnotes.
So (a strange thing!) on my computer the effect is the same regardless the character location and using the converter.

According to RTF specification:
\zwbo - Zero-width break opportunity. Used to insert break opportunity between two characters.
This RTF keyword is ignored by TRichView, because there are no Unicode characters that work exactly in the way it is described, and RTF specification has the more appropriate keyword:
\zwnj - Zero-width nonjoiner. This is used for unligating a character.
TRichView supports \zwnj, and loads it as ZERO WIDTH NON-JOINER character.

I made one more test: created RTF file containing \zwnj opened and resaved it. \zwnj were saved as \zwbo!
So it looks like MS Word threats them as synonyms.
In the next update, I'll include loading \zwbo as ZERO WIDTH NON-JOINER character. However, I am not sure that it fixes the problem from your side, because you describe different results. To answer mode definitely, I need to see RTF file generated by the converter.
To get it, instead of rvc.ImportRV call rvc.ImportRTF, and then rvc.Stream.SaveToFile(<name or RTF file>).
Sergey Tkachenko
Site Admin
Posts: 17557
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Post by Sergey Tkachenko »

Quick fix:
Open RVRTF.pas. Find the constant isymMax, increase its value by 2.
Add two items in rgsymRtf array declaration (at any place):

Code: Select all

    (Keyword:'zwbo';     DefValue:$200C;        UseDef:False;  kwd:rtf_kwd_WideChar; idx:0;               AffectTo:rtf_af_None),
    (Keyword:'zwnbo';    DefValue:$200D;        UseDef:False;  kwd:rtf_kwd_WideChar; idx:0;               AffectTo:rtf_af_None),
saeid2016
Posts: 72
Joined: Wed Mar 16, 2016 11:56 am

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Post by saeid2016 »

Thank you very much.
Post Reply