Page 1 of 2
RTF to Docx
Posted: Mon Mar 03, 2025 3:38 pm
by tiagosis
Hello, I'm migrating my texts from RTF to DOCx but the texts are losing their encoding, how can I solve this?
Here's an example.
RTF: PROCURAÇÃO
...AFTER CONVERTED TO doCX
docX: PROCURAÇÃO
Re: RTF to Docx
Posted: Mon Mar 03, 2025 3:59 pm
by Sergey Tkachenko
Can you send me a sample RTF file? (to email richviewgmailcom, or attach it here)
Re: RTF to Docx
Posted: Mon Mar 03, 2025 4:26 pm
by tiagosis
Yes, it follows. My problem is precisely when I import the rtf into the editor, the text already comes out of the encoding, however, when I open it through the DBForge viewer the text displays correctly.
Re: RTF to Docx
Posted: Mon Mar 03, 2025 6:25 pm
by Sergey Tkachenko
This RTF is not completely correct. RTF should specify character set for text in order to load in correctly.
This RTF specifies \fcharset1, that means DEFAULT_CHARSET.
As a result, characters in the file depend on the main Windows code page (on the computer where it is read).
I have Cyrillic as the default code page on my computer. So, when I load this document (in MS Word, WordPad, or TRichView), I can see the main heading "
PROCURAЗГO".
Unfortunately, the current version of TRichView does not have property to override the code page used when RTF has text of DEFAULT_CHARSET: the main Windows code page is always used.
In my working version, I added a new property, TRichView.RTFReadProperties.DefCharsetCodePage.
If you assign
1252 to this property, your file will be loaded correctly, with the main heading as "
PROCURAÇÃO"
This change will be included in the next update.
If you want, I can send modified units to you by email.
Re: RTF to Docx
Posted: Mon Mar 03, 2025 6:45 pm
by tiagosis
It would be great if you could send it to me, as I have a very urgent demand, and this would give me time to wait for the component to be updated.
I looked for the property in "SRichViewEdit1.RTFReadProperties...." but I couldn't find it. Maybe it's in your units. I'm waiting for you to send it to me, and if possible, an example of how I should use it. Thank you.
Re: RTF to Docx
Posted: Mon Mar 03, 2025 7:09 pm
by tiagosis
send me to email
Re: RTF to Docx
Posted: Mon Mar 03, 2025 7:16 pm
by Sergey Tkachenko
Sent
Re: RTF to Docx
Posted: Mon Mar 03, 2025 7:55 pm
by tiagosis
what am I doing wrong, I copied the units to "C:\Components\TRichView\TRichView\Source"
I recompiled it in two ways but the text remains with incorrect accentuation:
RichView1.Clear;
RichView1.RTFReadProperties.DefCharsetCodePage := 1242;
RichView1.LoadFromStream(_stream, rvynaYes, False);
RichView1.Format;
//SRichViewEdit1.Clear;
//SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
//SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
//SRichViewEdit1.RichViewEdit.Format;
Re: RTF to Docx
Posted: Mon Mar 03, 2025 8:01 pm
by tiagosis
I'm sorry, it was 1252, however, even 1252 didn't work.
Re: RTF to Docx
Posted: Mon Mar 03, 2025 8:21 pm
by Sergey Tkachenko
Most probably, data are already corrupted in _stream.
How do you load document in it? For RTF, you should not use TStringStream, use TMemoryStream instead (becayse conversions of RTF code to Unicode must be avoided, because RTF may contain characters of different code pages, and they will be damaged)
I attached a project that uses LoadFromFile (which uses LoadFromStream with TFileStream internally), it works in my tests.
Re: RTF to Docx
Posted: Mon Mar 03, 2025 8:25 pm
by tiagosis
I use TMemoryStream. My _stream variable receives the Blob with the rtf directly from the database as shown, the text is correct in the database.
//obtendo dados do RTF..
_stream := TMemoryStream.Create;
QrTxtOrigemTEXTO.SaveToStream(_stream);
_stream.Position := 0;
_streamDocX := TMemoryStream.Create;
try
Application.ProcessMessages;
// RichView1.Clear;
// RichView1.RTFReadProperties.DefCharsetCodePage := 1252;
// RichView1.LoadFromStream(_stream, rvynaYes, False);
// RichView1.Format;
SRichViewEdit1.Clear;
SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
SRichViewEdit1.RichViewEdit.Format;
Re: RTF to Docx
Posted: Mon Mar 03, 2025 8:29 pm
by tiagosis
I just did a test with the rtf on disk and it worked correctly, the problem seems to me to be with the data coming from the database in _stream of type TMemoryStream...
Re: RTF to Docx
Posted: Mon Mar 03, 2025 9:07 pm
by Sergey Tkachenko
Data may be damaged when saving in database field.
Usually, RTF code contains characters with codes less than 127. Such RTFs can be saved in any memo or binary field type, even in Unicode memo.
However, some RTFs, including this one, contain characters of national alphabets as they are, without conversion to RTF character codes.
As a result, this RTF may be saved only in a database field that accept any types of data without conversion.
Re: RTF to Docx
Posted: Mon Mar 03, 2025 9:20 pm
by tiagosis
the problem is that I need to convert these rtfs to docx
Re: RTF to Docx
Posted: Mon Mar 03, 2025 9:24 pm
by Sergey Tkachenko
Save the content of this stream to a file and send this file to me
(call _stream.SaveToFile before calling RichView.LoadFromStream)