Logo for XenCraft, Making e-Business Work Around The World

Comparing Binary Sort with Linguistic Sort

Text data can be sorted linguistically (according to rules of language), which is efficient for users, or by the binary ordering that is efficient for computers. The tables on this page lets you compare the orderings.

Computer programs represent characters internally as numbers. Each character is assigned a unique number called a code point. In general, there is no pattern to the assignment of characters to numbers and no relation to ordering by language. For example, the letter "A" is assigned the value 65 decimal (41 hexadecimal) in many (ASCII-based) code pages. The letter "B" is assigned 6610 (4216). The lower-case letter "a" is assigned 97 10 (6116).

A program that uses "binary collation rules" sorts text based on the code point values of characters. Therefore, if you expected to sort upper and lower-case letters close together you will be disappointed. Looking at the ordering of characters in the Binary Table below, you can see that punctuation characters are interspersed between letters, and that numbers (including fractions, superscript values, etc.) are also spread throughout the table. Accented characters are far from their unaccented base letters.

The second table relies on a linguistic or language-based set of collation rules from the Progress BASIC9 collation table. The BASIC9 rules are intended to meet expectations for sorting of text in romance languages. (There is a separate collation for Spanish.) This is a one-level sort that simply uses each character's code point to index into the BASIC9 collation table to retrieve the character's sort weight or rank. The rank determines the position of the character in the table. Some characters have the same rank and so are treated as equivalent for sorting purposes. For example, so that "resume" and "résumé" sort close together, the letters "e" and "é" are given identical ranks.

Collation tables are used for two purposes: evaluating string comparisons (e.g. IF "string1" >= "string2" THEN...) and sorting (e.g. FOR EACH tablename BY fieldname). Linguistic collations make it easy for users to lookup data within sorted lists. Binary collations (or collations based on languages or rules different from those the user is expecting) make it easy for users to make errors. If users do not see data where they expect it, they assume there are no entries for the data and the records do not exist. However, if the data is actually sorted into a different location, then decisions that users make based on the assumption that there are no records will be incorrect.

Using the two tables below, you can easily imagine how queries will order results and you can compare that with expectations of users that speak romance languages. Note that in the BASIC9 table below some characters actually have identical sort weights, but equality is not indicated.

Progress developers using Unicode, now have an alternative to binary collation. The XenCraft product XenPUC enables the 4GL and the database to compare and sort Unicode text linguistically.

Table for Binary Sort (Characters ordered by Windows 1252 code point assignments)

! " # $ % & ' ( ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O Q R S T U V W X Y Z [ \ ] ^ _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z { | } ~  € ‚ ƒ „ … † ‡ ˆ ‰ Š ‹ Œ Ž ‘ ’ “ ” • – — ˜ ™ š › œ ž Ÿ   ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ­ ® ¯ ° ± ² ³ ´ µ ¶ · ¸ ¹ º » ¼ ½ ¾ ¿ À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ Ò Ó Ô Õ Ö × Ø Ù Ú Û Ü Ý Þ ß à á â ã ä å æ ç è é ê ë ì í î ï ð ñ ò ó ô õ ö ÷ ø ù ú û ü ý þ ÿ

Table for Linguistic Sort (Progress Basic9 Collation)

" ! ¡ # $ € ¢ £ ¤ ¥ % ‰ & ' ( ) * + , - ± . / × ÷ ¼ ½ ¾ 0 1 ¹ 2 ² 3 ³ 4 5 6 7 8 9 : ; < = > ? ¿ @ [ \ ] ^ _ ` { | } ~   ‚ „ … † ‡ ˆ ‹   ‘ ’ “ ” • – — ˜ ™ ›  ¦ § ¨ © « ¬ ­ ® ¯ ° ´ µ ¶ · ¸ » A a À à Á á  â à ã Ä Æ ä æ Å å ª B b C c Ç ç D d Ð ð E e È è É é Ê ê Ë ë F f ƒ G g H h I i Ì ì Í í Î î Ï ï J j K k L l M m N n Ñ ñ O o Ò ò Ó ó Ô ô Õ õ Ö ö Œ œ º Ø ø P p Q q R r S s ß Š š T t Þ þ U u Ù ù Ú ú Û û Ü ü V v W w X x Y y Ý ý Ÿ ÿ Z z Ž ž