• Home
  • History
  • Annotate
Name Date Size #Lines LOC

..--

READMED23-Nov-20232.7 KiB8054

_codecs_cn.cD23-Nov-20239.2 KiB448335

_codecs_hk.cD23-Nov-20234.7 KiB185137

_codecs_iso2022.cD23-Nov-202333.2 KiB1,132974

_codecs_jp.cD23-Nov-202318.9 KiB732595

_codecs_kr.cD23-Nov-202312.1 KiB453358

_codecs_tw.cD23-Nov-20232 KiB13389

alg_jisx0201.hD23-Nov-20231.2 KiB2523

cjkcodecs.hD23-Nov-202312.8 KiB403340

emu_jisx0213_2000.hD23-Nov-20232.3 KiB4435

mappings_cn.hD23-Nov-2023312.4 KiB4,1044,092

mappings_hk.hD23-Nov-2023179.4 KiB2,3792,369

mappings_jisx0213_pair.hD23-Nov-20233.7 KiB6057

mappings_jp.hD23-Nov-2023356.9 KiB4,7664,743

mappings_kr.hD23-Nov-2023247.9 KiB3,2523,246

mappings_tw.hD23-Nov-2023198.8 KiB2,6342,626

multibytecodec.cD23-Nov-202356 KiB1,8351,527

multibytecodec.hD23-Nov-20234.2 KiB142114

README

1To generate or modify mapping headers
2-------------------------------------
3Mapping headers are imported from CJKCodecs as pre-generated form.
4If you need to tweak or add something on it, please look at tools/
5subdirectory of CJKCodecs' distribution.
6
7
8
9Notes on implmentation characteristics of each codecs
10-----------------------------------------------------
11
121) Big5 codec
13
14  The big5 codec maps the following characters as cp950 does rather
15  than conforming Unicode.org's that maps to 0xFFFD.
16
17    BIG5        Unicode     Description
18
19    0xA15A      0x2574      SPACING UNDERSCORE
20    0xA1C3      0xFFE3      SPACING HEAVY OVERSCORE
21    0xA1C5      0x02CD      SPACING HEAVY UNDERSCORE
22    0xA1FE      0xFF0F      LT DIAG UP RIGHT TO LOW LEFT
23    0xA240      0xFF3C      LT DIAG UP LEFT TO LOW RIGHT
24    0xA2CC      0x5341      HANGZHOU NUMERAL TEN
25    0xA2CE      0x5345      HANGZHOU NUMERAL THIRTY
26
27  Because unicode 0x5341, 0x5345, 0xFF0F, 0xFF3C is mapped to another
28  big5 codes already, a roundtrip compatibility is not guaranteed for
29  them.
30
31
322) cp932 codec
33
34  To conform to Windows's real mapping, cp932 codec maps the following
35  codepoints in addition of the official cp932 mapping.
36
37    CP932     Unicode     Description
38
39    0x80      0x80        UNDEFINED
40    0xA0      0xF8F0      UNDEFINED
41    0xFD      0xF8F1      UNDEFINED
42    0xFE      0xF8F2      UNDEFINED
43    0xFF      0xF8F3      UNDEFINED
44
45
463) euc-jisx0213 codec
47
48  The euc-jisx0213 codec maps JIS X 0213 Plane 1 code 0x2140 into
49  unicode U+FF3C instead of U+005C as on unicode.org's mapping.
50  Because euc-jisx0213 has REVERSE SOLIDUS on 0x5c already and A140
51  is shown as a full width character, mapping to U+FF3C can make
52  more sense.
53
54  The euc-jisx0213 codec is enabled to decode JIS X 0212 codes on
55  codeset 2. Because JIS X 0212 and JIS X 0213 Plane 2 don't have
56  overlapped by each other, it doesn't bother standard conformations
57  (and JIS X 0213 Plane 2 is intended to use so.) On encoding
58  sessions, the codec will try to encode kanji characters in this
59  order:
60
61    JIS X 0213 Plane 1 -> JIS X 0213 Plane 2 -> JIS X 0212
62
63
644) euc-jp codec
65
66  The euc-jp codec is a compatibility instance on these points:
67   - U+FF3C FULLWIDTH REVERSE SOLIDUS is mapped to EUC-JP A1C0 (vice versa)
68   - U+00A5 YEN SIGN is mapped to EUC-JP 0x5c. (one way)
69   - U+203E OVERLINE is mapped to EUC-JP 0x7e. (one way)
70
71
725) shift-jis codec
73
74  The shift-jis codec is mapping 0x20-0x7e area to U+20-U+7E directly
75  instead of using JIS X 0201 for compatibility. The differences are:
76   - U+005C REVERSE SOLIDUS is mapped to SHIFT-JIS 0x5c.
77   - U+007E TILDE is mapped to SHIFT-JIS 0x7e.
78   - U+FF3C FULL-WIDTH REVERSE SOLIDUS is mapped to SHIFT-JIS 815f.
79
80