##Adobe File Version: 1.000 #======================================================================= # FTP file name: KOREAN.TXT # # Contents: Map (external version) from Mac OS Korean # encoding to Unicode 2.1 # # Copyright: (c) 1996-1999 by Apple Computer, Inc., all rights # reserved. # # Contact: charsets@apple.com # # Changes: # # b02 1999-Sep-22 Update contact e-mail address. Matches # internal utom<b1>, ufrm<b2>, and Text # Encoding Converter version 1.5. # n04 1998-Feb-05 Update to match internal utom<n9>, ufrm<n11> # and Text Encoding Converter version 1.3: # Use single variant tags instead of multiple # tags and add mappings for many more # characters; see details below. Also delete # the Unicode 1.1 mappings, reorder into a # single list, and rewrite the initial # comments. # n01 1996-Sep-24 Before internal ufrm. # # Standard header: # ---------------- # # Apple, the Apple logo, and Macintosh are trademarks of Apple # Computer, Inc., registered in the United States and other countries. # Unicode is a trademark of Unicode Inc. For the sake of brevity, # throughout this document, "Macintosh" can be used to refer to # Macintosh computers and "Unicode" can be used to refer to the # Unicode standard. # # Apple makes no warranty or representation, either express or # implied, with respect to these tables, their quality, accuracy, or # fitness for a particular purpose. In no event will Apple be liable # for direct, indirect, special, incidental, or consequential damages # resulting from any defect or inaccuracy in this document or the # accompanying tables. # # These mapping tables and character lists are subject to change. # The latest tables should be available from the following: # # <ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/> # <ftp://dev.apple.com/devworld/Technical_Documentation/Misc._Standards/> # # For general information about Mac OS encodings and these mapping # tables, see the file "README.TXT". # # Format: # ------- # # Three tab-separated columns; # '#' begins a comment which continues to the end of the line. # Column #1 is the Mac OS Korean code (in hex as 0xNN or 0xNNNN) # Column #2 is the corresponding Unicode or Unicode sequence (in # hex as 0xNNNN, 0xNNNN+0xNNNN, etc.). Sequences of up to 5 # Unicode characters are used here. # Column #3 is a comment containing the Unicode name. # In some cases an additional comment follows the Unicode name. # # The entries are in Mac OS Korean code order. All one-byte # characters are at the beginning. The mappings are not complete # (about 140 characters are still unmapped); see below. # # Some of these mappings require the use of corporate characters. # See the file "CORPCHAR.TXT" and notes below. # # Control character mappings are not shown in this table, following # the conventions of the standard UTC mapping tables. However, the # Mac OS Korean encoding uses the standard control characters at # 0x00-0x1F and 0x7F. # # Notes on Mac OS Korean: # ----------------------- # # This table covers the standard Mac OS Korean encoding used in Mac OS # versions 7.1 and later, including the Korean Language Kit. The Mac OS # Korean encoding is based on EUC-KR, but it extends the low-byte range # and adds about 1140 characters using code points that are unassigned # in EUC-KR and code points using the extended low-byte range. # # For Mac OS Korean, two-byte characters have first/lead/high byte in the # range 0xA1-0xFE, and second/trail/low byte in the range 0x41-0x7D or # 0x81-0xFE (low bytes in the range 0x41-0x7D and 0x81-0xA0 are only used # with high bytes in the range 0xA1-0xAD). # # 1. Standard EUC-KR # # This includes one-byte characters, which are usually the ASCII set. In # addition, it includes two-byte characters with both bytes in the range # 0xA1-0xFE. The two-byte characters are from KSC 5601, but their code # points are transformed from the KSC 5601 range 0x2121-0xFEFE by adding # 0x8080. # # 2. Mac OS Korean additions # # a) One-byte additions # # 0x80 NO-BREAK SPACE # 0x81 WON SIGN # 0x82 EN DASH alternate version; standard at 0xA1A9 # 0x83 COPYRIGHT SIGN # 0x84 FULLWIDTH LOW LINE alternate version; standard at 0xA3DF # 0xFF HORIZONTAL ELLIPSIS alternate version; standard at 0xA1A6 # # b) Two-byte additions # # These include various symbols and dingbat-like number and letter # forms. For all of these, the high byte is in the range 0xA1-0xAD. # Most of them use code points in the extended low-byte range # 0x41-0x7D or 0x81-0xA0, although some use unassigned code points # in the standard EUC-KR range. # # Many of these additional characters do not correspond to any # standard single Unicode character. See mapping issues, below. # # Unicode mapping issues and notes: # --------------------------------- # # 1. Mapping the Apple two-byte additions # # The goals in the mappings provided here are: # - Ensure roundtrip mapping from every character in the Mac OS Korean # encoding to Unicode and back # - Use standard Unicode characters as much as possible, to maximize # interchangeability of the resulting Unicode text. Whenever possible, # avoid having content carried by private-use characters. # # Since not all of the Mac OS Korean characters correspond to # distinct, single Unicode characters, we employ various strategies. # # a) Map a single Mac OS Korean character to a sequence of Unicode # characters # # For example, the character 0xAA41 in the Apple additions is a # square Hangul dingbat. There is no single Unicode character for # this. However, it can be mapped to 0xC6B4+0x20DE, a Hangul syllable # + COMBINING ENCLOSING SQUARE # # b) Use private use characters to mark variants or groupings that # are similar to a sequence of one or more standard Unicode # characters. # # Apple has defined a block of 32 corporate characters as "transcoding # hints." These are used in combination with standard Unicode characters # to force them to be treated in a special way for mapping to other # encodings; they have no other effect. Sixteen of these transcoding # hints are "grouping hints" - they indicate that the next 2-4 Unicode # characters should be treated as a single entity for transcoding. The # other sixteen transcoding hints are "variant tags" - they are like # combining characters, and can follow a standard Unicode (or a sequence # consisting of a base character and other combining characters) to # cause it to be treated in a special way for transcoding. These always # terminate a combining-character sequence. # # The transcoding coding hints used in this mapping table are: # # 0xF860 group next 2 characters # 0xF861 group next 3 characters # 0xF862 group next 4 characters # 0xF863 group next 4 characters, variant 1 # 0xF864 group next 4 characters, variant 2 # 0xF865 group next 4 characters, variant 3 # 0xF866 group next 4 characters, variant 4 # 0xF867 group next 2 characters, variant 1 # 0xF868 group next 2 characters, variant 2 # 0xF869 group next 2 characters, variant 3 # 0xF870-71 variant tags # 0xF873-7D variant tags # 0xF87F variant tag for other alternate forms # # For example, the Apple addition character 0xA369 is a parenthesized # capital A. There is no single Unicode for this (although there are # single Unicodes for parenthesized small letters). Using the grouping # hint 0xF861 in combination with standard Unicodes, we can map this as # 0xF861+0x0028+0x0041+0x0029, i.e. ( + A + ) . # # NOTE: About 140 of the Apple two-byte additions are still unmapped # (the mappings are being worked out). These are shown as comment lines # with the Mac OS Korean code point followed by a Unicode mapping of # 0xNNNN (or by a probable Unicode mapping in a few cases). # # 2. Mapping the basic EUC-KR characters # # The mappings for KSC 5601-1987 Hangul to Unicode 2.1 are based on # the KSC5601.TXT mapping table provided by the Unicode Consortium (UTC), # dated 24 July 1995, which was created by Lori Hoerth and K.D.Chang. # # The mappings for KSC 5601-1987 non-Hangul characters are based on the # OLD5601.TXT mapping table provided by the Unicode Consortium (UTC), # dated 6 December 1993, which was created by Glenn Adams and John # Jenkins. That table is Copyright 1991-1994 by Unicode, Inc. # # Some of the non-Hangul mappings were changed from the UTC mappings. # There were two reasons for this: # - To better match the meaning of the KSC 5601 character as described # in the KSC 5601 spec. # - If the UTC table mapped the KSC character to a "fullwidth" version # but there was no mapping to the "basic" version, then the mapping was # changed to the "basic" version. This is more consistent with the other # UTC mapping tables, which only map to a compatibility character (such # as a fullwidth version) to preserve roundtrip fidelity - i.e. when # there is another character in the source encoding that i...
dzidziaz