DIUnicode provides Unicode Reader and Unicode Writer classes for Delphi (Embarcadero / CodeGear / Borland). The classes implement automatic and native decoding and / or encoding of 150+ character sets when linked against DIConverters.DIUnicode's Pascal implementation features more than 70 encodings, like UTF-7, UTF-8, UTF-16, the ISO-8859 family, various Windows and Macintosh codepages, KOI8 character sets, Chinese GB18030, and more. Adding a new character coding is as simple as writing a single conversion procedure.Key BenefitsDIUnicode is for you if your application needs to handle text with multiple character encodings with high performance and little development time.Both the Unicode Reader and the Unicode Writer work with strings, buffers, and streams. You can, for example, directly read from or write to database BLOB streams avoiding all temporary storage of your data.An efficient buffering system guarantees excellent performance, even when processing huge files.Simple Usage ExamplesDIUnicode makes reading and writing Unicode as simple as ASCII text, regardless of the character set or encoding you are processing. the code snippets below show some of the techniques usually applied with TDIUnicodeReader, the reader class of DIUnicode. Remember that you can use the parsing routine unchanged with any of the available encodings.Read entire lines from a Unicode text file:
Read individual characters only:
Use overloaded methods to read up to a particular character or a set of characters:
Advanced parsing:
PerformanceDIUnicode is extremely fast, even when processing very large files. Both the reader and the writer classes benefit from their internal buffers which allows them to read and write files in small chunks of data, one at a time only. DIUnicode will never require you to fit the entire file into memory. This way it achieves conversion rates of far over 20 MB per second.
PHP Code:
{ Setup and initialize. }
Reader := TDIUnicodeReader.Create(nil);
{ Let's say we want to read UTF-8.
This could well be any other
character encoding. }
Reader.ReadMethods := Read_Utf_8;
Reader.SourceStream :=
TFileStream.Create('MyFile.txt', fmOpenRead);
{ Now the actual reading: }
while Reader.ReadLine do
begin
TheLine := Reader.DataAsStrW;
{ Your code to process the line
goes here. }
end;
PHP Code:
while Reader.ReadChar do
begin
TheChar := Reader.Char;
case TheChar of
'A'..'Z':
; // Process Alphas
'0'..'9':
; // Process Digits
end;
end;
PHP Code:
{ Read all characters up to the Dollar sign. }
Reader.ReadCharsTill('$');
{ Read all characters up to either '(' or ')'. }
Reader.ReadCharsTill('(', ')');
{ Skip rest of line and advance to next one. }
Reader.SkipLine;
- An RFC compliant CSV Parser is part of DIUnicode. Source code is available as a feature demonstration.
- The popular DIHtmlParser is build on top of DIUnicode. It implements a full featured HTML, XHTML and XML parser with Unicode support and a flexible plugin architecture.
PHP Code:
var
UR: TDIUnicodeReader;
c: WideChar;
begin
{ ... TDIUnicodeReader creation
and initialization should go here ... }
UR.PeekAhead(5); // Read up to 5 characters to internal buffer.
if UR.PeekedCount >= 1 then // Test if 1st peekd character could be read ...
c := TDIUnicodeReader.PeekedChars[0]; // and examine it.
if UR.PeekedCount >= 5 then // Same as above ...
c := TDIUnicodeReader.PeekedChars[4]; // but with 5th peeked chararcter now.
c := UR.ReadChar; // Continue reading with next char.