RTF to Docx

General TRichView support forum. Please post your questions here
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

RTF to Docx

Post by tiagosis »

Hello, I'm migrating my texts from RTF to DOCx but the texts are losing their encoding, how can I solve this?
Here's an example.

RTF: PROCURAÇÃO
...AFTER CONVERTED TO doCX
docX: PROCURAÇÃO
Sergey Tkachenko
Site Admin
Posts: 17839
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

Can you send me a sample RTF file? (to email richviewgmailcom, or attach it here)
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

Yes, it follows. My problem is precisely when I import the rtf into the editor, the text already comes out of the encoding, however, when I open it through the DBForge viewer the text displays correctly.
Attachments
databseViewer.jpg
databseViewer.jpg (183.45 KiB) Viewed 109136 times
IN_SRichEdit.jpg
IN_SRichEdit.jpg (155.88 KiB) Viewed 109136 times
rtf demo.txt
(5.78 KiB) Downloaded 4442 times
Sergey Tkachenko
Site Admin
Posts: 17839
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

This RTF is not completely correct. RTF should specify character set for text in order to load in correctly.
This RTF specifies \fcharset1, that means DEFAULT_CHARSET.
As a result, characters in the file depend on the main Windows code page (on the computer where it is read).

I have Cyrillic as the default code page on my computer. So, when I load this document (in MS Word, WordPad, or TRichView), I can see the main heading "PROCURAЗГO".
Unfortunately, the current version of TRichView does not have property to override the code page used when RTF has text of DEFAULT_CHARSET: the main Windows code page is always used.

In my working version, I added a new property, TRichView.RTFReadProperties.DefCharsetCodePage.
If you assign 1252 to this property, your file will be loaded correctly, with the main heading as "PROCURAÇÃO"
This change will be included in the next update.
If you want, I can send modified units to you by email.
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

It would be great if you could send it to me, as I have a very urgent demand, and this would give me time to wait for the component to be updated.
I looked for the property in "SRichViewEdit1.RTFReadProperties...." but I couldn't find it. Maybe it's in your units. I'm waiting for you to send it to me, and if possible, an example of how I should use it. Thank you.
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

send me to email
Sergey Tkachenko
Site Admin
Posts: 17839
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

Sent
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

what am I doing wrong, I copied the units to "C:\Components\TRichView\TRichView\Source"
I recompiled it in two ways but the text remains with incorrect accentuation:

RichView1.Clear;
RichView1.RTFReadProperties.DefCharsetCodePage := 1242;
RichView1.LoadFromStream(_stream, rvynaYes, False);
RichView1.Format;


//SRichViewEdit1.Clear;
//SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
//SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
//SRichViewEdit1.RichViewEdit.Format;
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

I'm sorry, it was 1252, however, even 1252 didn't work.
Sergey Tkachenko
Site Admin
Posts: 17839
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

Most probably, data are already corrupted in _stream.
How do you load document in it? For RTF, you should not use TStringStream, use TMemoryStream instead (becayse conversions of RTF code to Unicode must be avoided, because RTF may contain characters of different code pages, and they will be damaged)

I attached a project that uses LoadFromFile (which uses LoadFromStream with TFileStream internally), it works in my tests.
Attachments
RTFCharset.zip
(7.4 KiB) Downloaded 4493 times
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

I use TMemoryStream. My _stream variable receives the Blob with the rtf directly from the database as shown, the text is correct in the database.

//obtendo dados do RTF..
_stream := TMemoryStream.Create;

QrTxtOrigemTEXTO.SaveToStream(_stream);
_stream.Position := 0;

_streamDocX := TMemoryStream.Create;

try
Application.ProcessMessages;
// RichView1.Clear;
// RichView1.RTFReadProperties.DefCharsetCodePage := 1252;
// RichView1.LoadFromStream(_stream, rvynaYes, False);
// RichView1.Format;

SRichViewEdit1.Clear;
SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
SRichViewEdit1.RichViewEdit.Format;
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

I just did a test with the rtf on disk and it worked correctly, the problem seems to me to be with the data coming from the database in _stream of type TMemoryStream...
Sergey Tkachenko
Site Admin
Posts: 17839
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

Data may be damaged when saving in database field.

Usually, RTF code contains characters with codes less than 127. Such RTFs can be saved in any memo or binary field type, even in Unicode memo.
However, some RTFs, including this one, contain characters of national alphabets as they are, without conversion to RTF character codes.
As a result, this RTF may be saved only in a database field that accept any types of data without conversion.
tiagosis
Posts: 64
Joined: Thu Apr 13, 2017 5:34 pm

Re: RTF to Docx

Post by tiagosis »

the problem is that I need to convert these rtfs to docx
Sergey Tkachenko
Site Admin
Posts: 17839
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Re: RTF to Docx

Post by Sergey Tkachenko »

Save the content of this stream to a file and send this file to me
(call _stream.SaveToFile before calling RichView.LoadFromStream)
Post Reply