Page 1 of 1
HTML paste and HTML export
Posted: Wed Feb 22, 2006 8:13 am
by K.A.
I have paste russan text from Internet Explorer.
Everything is OK.
But then I use SaveHTMLToStream to producer HTML text.
I have a problem. Text like this программируемые
Another way. I have entered text is editor by keyboard, not from clipboard.
Everytihink is ok. In HTML my russian symbols is readable.
I think this beacouse of unicode 2 -byte coding. But where?
How I may to correct this?
Posted: Wed Feb 22, 2006 9:18 am
by K.A.
Sorry my incorrect text is auto convert in corect
but only one symbol present
е
this symbols is &# and 4 digits.
Posted: Wed Feb 22, 2006 7:45 pm
by Sergey Tkachenko
This means you TRichView uses Unicode.
By default, TRichView saves non-English unicode characters using their codes, like е. It is correct, but files may become too large.
More elegant way to save Unicode document is using UTF-8 encoding: include rvsoUTF8 in the Options parameter of SaveHTML/SaveHTMLEx.
Posted: Thu Feb 23, 2006 4:05 am
by K.A.
Thanks, but I assumed that I corect in all places in dbinspector to NO UNICODE. May be there is another place to not use unicode? I not need to use it.
I simply want to my symbols be readable. And this correct if I type from keyboard, not paste from clipboard (word2003 or Internet explorer).
I assume, that problem in HTMLImport. Can I fix to not use unicode in import?
I try to use rvsoUTF8 in result I see ÑовмеÑÑ‚Ð¸Ð¼Ñ but this is not readable also.
OK.... I write workaround to decode and replace е to russan symbols.
But this is not full correct way becouse HTML text is large and complex.
I not sure in correct parser to find all, and only wanted tags.
Posted: Sun Feb 26, 2006 10:23 am
by Sergey Tkachenko
Characters like е in TRichView's HTML export mean that the text in TRichView ihas Unicode encoding, at least partially.
Older versions of RvHtmlImporter did not use Unicode for adding text. The newest version uses TextStyle[0].Unicode to determine if it should create Unicode styles or not (if ClearDocument=False).
I can explain how to convert existing documents to ANSI, but TRichView cannot save multilingual ANSI files (containing text of different charsets) in HTML properly. RUSSIAN_CHARSET contains both Russian and English characters, no problem here, but Russian+Greek document can be saved (without Unicode) properly: only Russian or only Greek text can be viewed normally.
UTF-8 files can be viewed and edited in capable text editors. For example, the standard WinXP's Notepad supports UTF-8.
Posted: Mon Feb 27, 2006 10:22 am
by K.A.
Sergey Tkachenko wrote:Characters like е in TRichView's HTML export mean that the text in TRichView ihas Unicode encoding, at least partially.
This is in export operation. But what about import? How I may closeoff Unicode in Import?
Sergey Tkachenko wrote:Older versions of RvHtmlImporter did not use Unicode for adding text. The newest version uses TextStyle[0].Unicode to determine if it should create Unicode styles or not (if ClearDocument=False).
I use TextStyle[0].Unicode=false.
Sergey Tkachenko wrote:
I can explain how to convert existing documents to ANSI, but TRichView cannot save multilingual ANSI files (containing text of different charsets) in HTML properly. RUSSIAN_CHARSET contains both Russian and English characters, no problem here, but Russian+Greek document can be saved (without Unicode) properly: only Russian or only Greek text can be viewed normally.
I need not in any other charset. Only russian and English.
Sergey Tkachenko wrote:
UTF-8 files can be viewed and edited in capable text editors. For example, the standard WinXP's Notepad supports UTF-8.
I know. But exported HTML I use to post to system, that not correct proceed unicode.
Posted: Thu Mar 02, 2006 5:30 pm
by Sergey Tkachenko
Unicode characters in RichView HTML output mean that this text has Unicode encoding in TRichView.
You can convert document from Unicode using ConvertFromUnicode procedure:
Code: Select all
procedure ConvertRVFromUnicode(RVData: TCustomRVData);
var i,r,c, StyleNo: Integer;
table: TRVTableItemInfo;
begin
for i := 0 to RVData.ItemCount-1 do begin
StyleNo := RVData.GetItemStyle(i);
if StyleNo>=0 then begin
if RVData.GetRVStyle.TextStyles[StyleNo].Unicode then begin
RVData.Items[i] := RVData.GetItemTextA(i);
Exclude(RVData.GetItem(i).ItemOptions, rvioUnicode);
end;
end
else if RVData.GetItemStyle(i)=rvsTable then begin
table := TRVTableItemInfo(RVData.GetItem(i));
for r := 0 to table.Rows.Count-1 do
for c := 0 to table.Rows[r].Count-1 do
if table.Cells[r,c]<>nil then
ConvertRVFromUnicode(table.Cells[r,c].GetRVData);
end;
end;
end;
procedure ConvertFromUnicode(rv: TCustomRichView);
var i: Integer;
begin
ConvertRVFromUnicode(rv.RVData);
for i := 0 to rv.Style.TextStyles.Count-1 do
rv.Style.TextStyles[i].Unicode := False;
end;
To make sure that this conversion will be to Russian, you can call:
Code: Select all
for i := 0 to rv.Style.TextStyles.Count-1 do
rv.Style.TextStyles[i].Charset := RUSSIAN_CHARSET;
before calling ConvertFromUnicode.