RTF to Docx
RTF to Docx
Hello, I'm migrating my texts from RTF to DOCx but the texts are losing their encoding, how can I solve this?
Here's an example.
RTF: PROCURAÇÃO
...AFTER CONVERTED TO doCX
docX: PROCURAÇÃO
Here's an example.
RTF: PROCURAÇÃO
...AFTER CONVERTED TO doCX
docX: PROCURAÇÃO
-
- Site Admin
- Posts: 17839
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
Can you send me a sample RTF file? (to email richviewgmailcom, or attach it here)
Re: RTF to Docx
Yes, it follows. My problem is precisely when I import the rtf into the editor, the text already comes out of the encoding, however, when I open it through the DBForge viewer the text displays correctly.
- Attachments
-
- databseViewer.jpg (183.45 KiB) Viewed 109136 times
-
- IN_SRichEdit.jpg (155.88 KiB) Viewed 109136 times
-
- rtf demo.txt
- (5.78 KiB) Downloaded 4442 times
-
- Site Admin
- Posts: 17839
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
This RTF is not completely correct. RTF should specify character set for text in order to load in correctly.
This RTF specifies \fcharset1, that means DEFAULT_CHARSET.
As a result, characters in the file depend on the main Windows code page (on the computer where it is read).
I have Cyrillic as the default code page on my computer. So, when I load this document (in MS Word, WordPad, or TRichView), I can see the main heading "PROCURAЗГO".
Unfortunately, the current version of TRichView does not have property to override the code page used when RTF has text of DEFAULT_CHARSET: the main Windows code page is always used.
In my working version, I added a new property, TRichView.RTFReadProperties.DefCharsetCodePage.
If you assign 1252 to this property, your file will be loaded correctly, with the main heading as "PROCURAÇÃO"
This change will be included in the next update.
If you want, I can send modified units to you by email.
This RTF specifies \fcharset1, that means DEFAULT_CHARSET.
As a result, characters in the file depend on the main Windows code page (on the computer where it is read).
I have Cyrillic as the default code page on my computer. So, when I load this document (in MS Word, WordPad, or TRichView), I can see the main heading "PROCURAЗГO".
Unfortunately, the current version of TRichView does not have property to override the code page used when RTF has text of DEFAULT_CHARSET: the main Windows code page is always used.
In my working version, I added a new property, TRichView.RTFReadProperties.DefCharsetCodePage.
If you assign 1252 to this property, your file will be loaded correctly, with the main heading as "PROCURAÇÃO"
This change will be included in the next update.
If you want, I can send modified units to you by email.
Re: RTF to Docx
It would be great if you could send it to me, as I have a very urgent demand, and this would give me time to wait for the component to be updated.
I looked for the property in "SRichViewEdit1.RTFReadProperties...." but I couldn't find it. Maybe it's in your units. I'm waiting for you to send it to me, and if possible, an example of how I should use it. Thank you.
I looked for the property in "SRichViewEdit1.RTFReadProperties...." but I couldn't find it. Maybe it's in your units. I'm waiting for you to send it to me, and if possible, an example of how I should use it. Thank you.
-
- Site Admin
- Posts: 17839
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
Sent
Re: RTF to Docx
what am I doing wrong, I copied the units to "C:\Components\TRichView\TRichView\Source"
I recompiled it in two ways but the text remains with incorrect accentuation:
RichView1.Clear;
RichView1.RTFReadProperties.DefCharsetCodePage := 1242;
RichView1.LoadFromStream(_stream, rvynaYes, False);
RichView1.Format;
//SRichViewEdit1.Clear;
//SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
//SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
//SRichViewEdit1.RichViewEdit.Format;
I recompiled it in two ways but the text remains with incorrect accentuation:
RichView1.Clear;
RichView1.RTFReadProperties.DefCharsetCodePage := 1242;
RichView1.LoadFromStream(_stream, rvynaYes, False);
RichView1.Format;
//SRichViewEdit1.Clear;
//SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
//SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
//SRichViewEdit1.RichViewEdit.Format;
Re: RTF to Docx
I'm sorry, it was 1252, however, even 1252 didn't work.
-
- Site Admin
- Posts: 17839
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
Most probably, data are already corrupted in _stream.
How do you load document in it? For RTF, you should not use TStringStream, use TMemoryStream instead (becayse conversions of RTF code to Unicode must be avoided, because RTF may contain characters of different code pages, and they will be damaged)
I attached a project that uses LoadFromFile (which uses LoadFromStream with TFileStream internally), it works in my tests.
How do you load document in it? For RTF, you should not use TStringStream, use TMemoryStream instead (becayse conversions of RTF code to Unicode must be avoided, because RTF may contain characters of different code pages, and they will be damaged)
I attached a project that uses LoadFromFile (which uses LoadFromStream with TFileStream internally), it works in my tests.
- Attachments
-
- RTFCharset.zip
- (7.4 KiB) Downloaded 4493 times
Re: RTF to Docx
I use TMemoryStream. My _stream variable receives the Blob with the rtf directly from the database as shown, the text is correct in the database.
//obtendo dados do RTF..
_stream := TMemoryStream.Create;
QrTxtOrigemTEXTO.SaveToStream(_stream);
_stream.Position := 0;
_streamDocX := TMemoryStream.Create;
try
Application.ProcessMessages;
// RichView1.Clear;
// RichView1.RTFReadProperties.DefCharsetCodePage := 1252;
// RichView1.LoadFromStream(_stream, rvynaYes, False);
// RichView1.Format;
SRichViewEdit1.Clear;
SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
SRichViewEdit1.RichViewEdit.Format;
//obtendo dados do RTF..
_stream := TMemoryStream.Create;
QrTxtOrigemTEXTO.SaveToStream(_stream);
_stream.Position := 0;
_streamDocX := TMemoryStream.Create;
try
Application.ProcessMessages;
// RichView1.Clear;
// RichView1.RTFReadProperties.DefCharsetCodePage := 1252;
// RichView1.LoadFromStream(_stream, rvynaYes, False);
// RichView1.Format;
SRichViewEdit1.Clear;
SRichViewEdit1.RichViewEdit.LoadFromStream(_stream,rvynaAuto);
SRichViewEdit1.RTFReadProperties.DefCharsetCodePage := 1242;
SRichViewEdit1.RichViewEdit.Format;
Re: RTF to Docx
I just did a test with the rtf on disk and it worked correctly, the problem seems to me to be with the data coming from the database in _stream of type TMemoryStream...
-
- Site Admin
- Posts: 17839
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
Data may be damaged when saving in database field.
Usually, RTF code contains characters with codes less than 127. Such RTFs can be saved in any memo or binary field type, even in Unicode memo.
However, some RTFs, including this one, contain characters of national alphabets as they are, without conversion to RTF character codes.
As a result, this RTF may be saved only in a database field that accept any types of data without conversion.
Usually, RTF code contains characters with codes less than 127. Such RTFs can be saved in any memo or binary field type, even in Unicode memo.
However, some RTFs, including this one, contain characters of national alphabets as they are, without conversion to RTF character codes.
As a result, this RTF may be saved only in a database field that accept any types of data without conversion.
Re: RTF to Docx
the problem is that I need to convert these rtfs to docx
-
- Site Admin
- Posts: 17839
- Joined: Sat Aug 27, 2005 10:28 am
- Contact:
Re: RTF to Docx
Save the content of this stream to a file and send this file to me
(call _stream.SaveToFile before calling RichView.LoadFromStream)
(call _stream.SaveToFile before calling RichView.LoadFromStream)