Page 1 of 2
loading a Unicode file line by line
Posted: Sat Apr 05, 2008 12:26 pm
by Cosmin3
Hi.
I have a question.
I'm loading a UTF8 Unicode into a string array like this: I load the entire text into a string then from the third character to the end I split in lines (at #13#0#10#0).
After I make some modifications to the lines I want to load them to RichViewEdit then save all the text as Unicode or Ansi.
I tried with "AddTextNLW" but I see some strange characters in the text and when I save I get a file that can't be loaded in any text editor.
What should I do? Please help me... Thank you.
Posted: Sun Apr 06, 2008 7:51 am
by Sergey Tkachenko
Posted: Sun Apr 06, 2008 8:44 am
by Cosmin3
I know this example (from Help), I tried that but it's not working...
This is because it's loading all the text at once - I'm loading line by line...
Posted: Mon Apr 07, 2008 2:09 pm
by Sergey Tkachenko
1. Load line in s: String. It will contain text in UTF-8 encoding.
2. Convert text to WideString (UTF-16): ws := UTF8Decode(s), where ws is WideString
3. Use AddNLWTag or AddTextNLW to add ws in TRichView
Posted: Mon Apr 07, 2008 2:41 pm
by Cosmin3
Yes, I do that but it's not working...
I see that in editor
http://www.imagehosting.gr/show.php/974 ... e.PNG.html
PS: I checked again the code that extracts the lines from text: works 100% fine.
The text should begin like that: "LETHAL WEAPON 4.#13#10Riggs,...".
Posted: Tue Apr 08, 2008 2:21 pm
by Sergey Tkachenko
Please post here your code
Posted: Tue Apr 08, 2008 2:46 pm
by Cosmin3
If I do that:
ScaleRichView.RichviewEdit.LoadTextW(FileName, 0, 0, False);
ScaleRichView.RichviewEdit.Format;
Then it works ok.
But If I do this:
Stream := TFileStream.Create(FileName, fmOpenRead);
SetLength(s, Stream.Size);
Stream.ReadBuffer(PChar(s)^, Stream.Size);
ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False);
Stream.Free;
ScaleRichView.RichviewEdit.Format;
Then I get the text as you see in the picture. This code is from TCustomRichView.LoadTextW >> TCustomRVData.LoadTextW >> TCustomRVData.LoadTextFromStreamW.
It doesn't matter if it's a line or an entire text and it doesn't matter if I use UTF8Decode or not.
Maybe I'm doing something wrong - but what is it?
Posted: Tue Apr 08, 2008 4:03 pm
by Sergey Tkachenko
If file can be loaded by LoadTextW, this is not a UTF-8, but UTF-16 file (each character = 2 bytes).
Your code is not correct. You load file in string, so each Unicode character is read in two adjacent characters. When you pass this string to WideString parameter of TRichView.AddTextNLW, the string is converted to WideString implicitly, that makes no sense if the string contains data like this.
Why the similar code works in TCustomRVData? Because TCustomRVData.AddTextNLW is different from TRichView.AddTextNLW and intended for private use. While TRichView.AddTextNLW expects WideString parameter, TCustomRVData.AddTextNLW expects String containg data like yours (each Unicode character in two adjacent string characters).
The correct code:
Code: Select all
s: WideString;
Stream := TFileStream.Create(FileName, fmOpenRead);
if Stream.Size mod 2 = 1 then
!!! error, the file is not Unicode UTF-16 !!!
else begin
SetLength(s, Stream.Size div 2);
Stream.ReadBuffer(Pointer(s)^, Stream.Size);
ScaleRichView.RichviewEdit.AddTextNLW(s, 0, 0, 0, False);
end;
Stream.Free;
ScaleRichView.RichviewEdit.Format;
Posted: Tue Apr 08, 2008 5:02 pm
by Cosmin3
I understand, thank you very much for your help.
Posted: Sat Apr 12, 2008 10:24 am
by Cosmin3
Just one small problem.
For example I have item "Hello world!" (index 0) which IsFromNewLine returns True.
I insert a special character with Insert >> Symbol.
Now I have 3 items:
Item[0] = 'Hello'
Item[1] = special character
Item[2] = ' world!'
I understand that IsFromNewLine(2) = True (that's normal) but why IsFromNewLine(0) returns also True? Strange is that the new line is not visible and when I save the text with "Save.." it's not saved also.
I ask because I don't save the text with "Save...", instead I get text from each item with GetTextA/W and I add #13#10 if IsFromNewLine returns True. In this case Item[1] is on a new line...
What can I do...?
Posted: Tue Apr 15, 2008 7:30 am
by Sergey Tkachenko
IsFromNewLine(0) is always true (because any document starts from a new line). If you use your own function getting text from RichView, do not add #13#10 for the 0th item.
Posted: Tue Apr 15, 2008 8:21 am
by Cosmin3
Thank you but it doesn't work well if I don't insert the character.
If I have the text on HDD:
Hello World!
How are you?
Item[0] = 'Hello World!' and IsFromNewLine(0) = True.
If I save now like you said the text becomes:
Hello World!How are you?
Posted: Tue Apr 15, 2008 3:31 pm
by Sergey Tkachenko
If there are two paragraph, each containing 1 item, both IsFromNewLine(0) and IsFromNewLine(1) are True.
Code: Select all
text := '';
for i := 0 to Editor.ItemCount-1 do
begin
if (i>0) and Editor.IsFromNewLine(i) then
text := text + #13#10;
if Editor.GetItemStyle(i)=rvsTab then
text := text + #9
else if Editor.GetItemStyle(i)>=0 then
text := text + Editor.GetItemTextA(i);
end;
Posted: Tue Apr 15, 2008 3:59 pm
by Cosmin3
Seems to be working. Thank you.
Posted: Tue Apr 15, 2008 8:32 pm
by Cosmin3
Sorry to bother you again but I met a text where I can't use your code.
Looks like that:
Item[0] = 'LETHAL WEAPON 4.' IsFromNewLine = True
Item[1] = 'Riggs, are you ...' IsFromNewLine = True
I insert a character in first item Now it's like this:
Item[0] = 'LETHAL' IsFromNewLine = True
Item[1] = character IsFromNewLine = False
Item[2] = ' WEAPON 4.' IsFromNewLine = False
Item[3] = 'Riggs, are you...' IsFromNewLine = True
The problem is that IsFromNewLine(2) switched from True to False.
And it's not happening only to the first line from text. Everywhere I break an item (which is a line) into three the first piece has isFromNewLine True and the last has False.
If I convert the text to rtf, I load the file with LoadRtf and I test the items before and after I insert he character then it's the same thing.
PS: it's a normal text, nothing special about it but if you want I will send it to you.