Internationalization and Unicode Tutorial
XenCraft: Your Source for International and Unicode Training
This tutorial, created by Tex Texin a leader in software and Web internationalization,
is available to be presented at your site. It can be customized specifically for your organization and its development environment. This is the best
way to provide expert training to your development and QA staff, with minimal time away from their projects.
To schedule a tutorial for your staff, contact XenCraft.
AGENDA: Internationalization and Unicode Tutorial
Networking and Objective Setting
Attendees will introduce themselves and state their goals in attending the tutorial.
Speakers will introduce themselves and review objectives, customizations, and logistics for the tutorial.
Introduction- What are the business drivers for internationalization?
- Business Without Borders
- Opportunities Internationally
- Opportunities on the Web
- Business and Economic Forces at Work
- ROI
Technological drivers for Unicode and Internationalization
- In Software applications
- On the World Wide Web
- Multilingual applications
World Tour: Regional Customs Affecting Software Design and Implementation and Efficient Solutions
- Graphics
- Data Formats (Calendars, Dates, Times, Numbers, Currency, Addresses, etc.)
- Linguistic Software Requirements (Externalization, Argument Substitution, Text expansion, word order, Collation, etc)
- Rendering, Fonts, Writing directions (Bidirectional Vertical)
- Input methods
Writing Systems Around the World
A survey of languages and writing systems including ideographic, bidirectional, and complex scripts.
(e.g. Chinese, Japanese, Korean, Thai, Indic, Hebrew, Arabic, and others.)
Models of Character Encoding
- Character Sets and Character Encodings- What are they, What problems do they create?
- Unicode and its Repertoire
- Character-Glyph Model
- Combining Characters
- Unicode Encoding Model and it's encodings - Scalar Values, CEF, CES, UTF-8, UTF-16, Surrogates, UTF-32, BOM, etc.
- Character properties (alphabetic, numeric, direction, case, etc.)
Design Decisions
- Choosing the right UTF-n
- Migration to Unicode- programming changes for Unicode-enabling
- Transcoding- Converting legacy encodings to Unicode
- Typical problems with encoding conversions
- Characters that look alike- How to choose the right character
Unicode Algorithms - Part I
- Bidirectional Algorithm
- Line-Breaking
- Regular Expressions and Unicode
Unicode Algorithms - Part II
- UCA- Unicode Collation Algorithm
- Tailoring collations
- Canonical Forms and Normalization
- When is normalization required or important?
- Choosing a normalization form
- Private Use Area, Gaiji Characters
- Unicode compression
- Comparing compression approaches
- Working in small spaces: Efficient storage for Unicode tables
Migration Techniques
- Migration tools
- Estimating migration to Unicode projects
- Unicode footprint requirements (disk, memory, etc.)
- Unicode and Databases (data types, field widths, indexes, queries, collation, database drivers, etc.)
- Multilingual text processing and issues
Unicode on the Wire
- Protocols and Standards on the internet and the Web (e-mail, URLs, etc.)
HTTP, IRI, IDN, Mail (MIME)
- HTML, XML, XHTML
- Encoding declarations and encoding negotiation
- Unicode versus Markup
- Reference Processing model
Unicode in Programming Languages
- identifiers
- parsers
- SQL
- Java
- C/C++/
- C#
- Perl
- Debugging Tips, tools
Localization with Unicode
Tools, Globalization Management Systems (GMS), translation memory supporting Unicode
Unicode and Real World issues
- Surrogates on Windows
- GB18030
- Oracle, SQL Server
- Security