• Java & 计算机编码


    目录

    一、Java & 计算机编码

    1、Java的char字符

    2、String类

    3、Java中的编码表类型

    4、byte转16进制字符串


    一、Java & 计算机编码

    1、Java的char字符

    在Java内部进行字符处理时,采用的都是Unicode,具体编码格式是UTF-16BE。简单回顾一下,UTF-16使用两个或四个字节表示一个字符,Unicode编号范围在65536以内的占两个字节,超出范围的占四个字节,BE(Big Endian)就是先输出高位字节,再输出低位字节,这与整数的内存表示是一致的。

    char本质上是一个固定占用两个字节的无符号正整数,这个正整数对应于Unicode编号,用于表示那个Unicode编号对应的字符。

    由于固定占用两个字节,char只能表示Unicode编号在65536以内的字符,而不能表示超出范围的字符。

    那超出范围的字符怎么表示呢?只能使用String类来表示,例如:汉字"𠮷"的Unicode码点为0x20BB7,该码点显然超出了65535,所以只能用String表示,而当粘贴到代码中时,自动转换为两个字符"\uD842\uDFB7"

    1. public class CharTest {
    2. public static void main(String[] args) {
    3. char c = '贤';
    4. System.out.println(c);
    5. char c1 = 0x8d24;
    6. System.out.println(c1);
    7. char c2 = 36132;
    8. System.out.println(c2);
    9. char c3 = '\u8d24';
    10. System.out.println(c3);
    11. // char c4 = '\uD842\uDFB7';
    12. String s = "\uD842\uDFB7";
    13. System.out.println(s);
    14. }
    15. }

    2、String类

    getBytes():此方法是根据java命令运行时参数file.encoding设置的编码表进行编码的。

    查看getBytes()底层:

    1. public static Charset defaultCharset() {
    2. if (defaultCharset == null) {
    3. synchronized (Charset.class) {
    4. String csn = AccessController.doPrivileged(
    5. new GetPropertyAction("file.encoding"));
    6. Charset cs = lookup(csn);
    7. if (cs != null)
    8. defaultCharset = cs;
    9. else
    10. defaultCharset = forName("UTF-8");
    11. }
    12. }
    13. return defaultCharset;
    14. }

    例子:

    1. import java.util.Arrays;
    2. public class StringTest {
    3. public static void main(String[] args) {
    4. System.out.println(System.getProperty("file.encoding"));
    5. String str = "你好";
    6. byte[] bytes=str.getBytes();
    7. System.out.println(Arrays.toString(bytes));
    8. }
    9. }
    1. import java.util.Arrays;
    2. public class StringTest {
    3. public static void main(String[] args) throws Exception {
    4. String str = "你好";
    5. byte[] bytes = str.getBytes("UTF-8");
    6. System.out.println(Arrays.toString(bytes));//[-28, -67, -96, -27, -91, -67]
    7. byte[] gbks = str.getBytes("GBK");
    8. System.out.println(Arrays.toString(gbks));//[-60, -29, -70, -61]
    9. byte[] bytes1 = {-28, -67, -96, -27, -91, -67};
    10. String str1 = new String(bytes1,"UTF-8");
    11. System.out.println(str1);//你好
    12. byte[] bytes2 = {-60, -29, -70, -61};
    13. String str2 = new String(bytes2,"GBK");
    14. System.out.println(str2);//你好
    15. }
    16. }

    乱码可逆演示

    1. public static void lmknCode() throws Exception {
    2. String str = "你好";
    3. byte[] bytes = str.getBytes("GBK");
    4. System.out.println(Arrays.toString(bytes));
    5. String str1 = new String(bytes,"UTF-8");
    6. System.out.println(str1);
    7. String str2 = new String(bytes,"GBK");
    8. System.out.println(str2);
    9. }

    乱码不可逆演示

    1. public static void lmbknCode() throws Exception {
    2. String str = "你好";
    3. byte[] bytes = str.getBytes("ISO-8859-1");
    4. System.out.println(Arrays.toString(bytes));//[63, 63]
    5. String str1 = new String(bytes,"GBK");
    6. System.out.println(str1);//??
    7. String str2 = new String(bytes,"UTF-8");
    8. System.out.println(str2);//??
    9. }

    3、Java中的编码表类型

    1. import java.nio.charset.Charset;
    2. import java.util.Set;
    3. public class JavaCode {
    4. public static void main(String[] args) {
    5. Set<String> charsetNames = Charset.availableCharsets().keySet();
    6. System.out.println("-----JDK1.8 charset is "+charsetNames.size()+"----- ");
    7. for (String str : charsetNames) {
    8. System.out.println(str);
    9. }
    10. }
    11. }

    结果:

    1. -----JDK1.8 charset is 170-----
    2. Big5
    3. Big5-HKSCS
    4. CESU-8
    5. EUC-JP
    6. EUC-KR
    7. GB18030
    8. GB2312
    9. GBK
    10. IBM-Thai
    11. IBM00858
    12. IBM01140
    13. IBM01141
    14. IBM01142
    15. IBM01143
    16. IBM01144
    17. IBM01145
    18. IBM01146
    19. IBM01147
    20. IBM01148
    21. IBM01149
    22. IBM037
    23. IBM1026
    24. IBM1047
    25. IBM273
    26. IBM277
    27. IBM278
    28. IBM280
    29. IBM284
    30. IBM285
    31. IBM290
    32. IBM297
    33. IBM420
    34. IBM424
    35. IBM437
    36. IBM500
    37. IBM775
    38. IBM850
    39. IBM852
    40. IBM855
    41. IBM857
    42. IBM860
    43. IBM861
    44. IBM862
    45. IBM863
    46. IBM864
    47. IBM865
    48. IBM866
    49. IBM868
    50. IBM869
    51. IBM870
    52. IBM871
    53. IBM918
    54. ISO-2022-CN
    55. ISO-2022-JP
    56. ISO-2022-JP-2
    57. ISO-2022-KR
    58. ISO-8859-1
    59. ISO-8859-13
    60. ISO-8859-15
    61. ISO-8859-2
    62. ISO-8859-3
    63. ISO-8859-4
    64. ISO-8859-5
    65. ISO-8859-6
    66. ISO-8859-7
    67. ISO-8859-8
    68. ISO-8859-9
    69. JIS_X0201
    70. JIS_X0212-1990
    71. KOI8-R
    72. KOI8-U
    73. Shift_JIS
    74. TIS-620
    75. US-ASCII
    76. UTF-16
    77. UTF-16BE
    78. UTF-16LE
    79. UTF-32
    80. UTF-32BE
    81. UTF-32LE
    82. UTF-8
    83. windows-1250
    84. windows-1251
    85. windows-1252
    86. windows-1253
    87. windows-1254
    88. windows-1255
    89. windows-1256
    90. windows-1257
    91. windows-1258
    92. windows-31j
    93. x-Big5-HKSCS-2001
    94. x-Big5-Solaris
    95. x-euc-jp-linux
    96. x-EUC-TW
    97. x-eucJP-Open
    98. x-IBM1006
    99. x-IBM1025
    100. x-IBM1046
    101. x-IBM1097
    102. x-IBM1098
    103. x-IBM1112
    104. x-IBM1122
    105. x-IBM1123
    106. x-IBM1124
    107. x-IBM1166
    108. x-IBM1364
    109. x-IBM1381
    110. x-IBM1383
    111. x-IBM300
    112. x-IBM33722
    113. x-IBM737
    114. x-IBM833
    115. x-IBM834
    116. x-IBM856
    117. x-IBM874
    118. x-IBM875
    119. x-IBM921
    120. x-IBM922
    121. x-IBM930
    122. x-IBM933
    123. x-IBM935
    124. x-IBM937
    125. x-IBM939
    126. x-IBM942
    127. x-IBM942C
    128. x-IBM943
    129. x-IBM943C
    130. x-IBM948
    131. x-IBM949
    132. x-IBM949C
    133. x-IBM950
    134. x-IBM964
    135. x-IBM970
    136. x-ISCII91
    137. x-ISO-2022-CN-CNS
    138. x-ISO-2022-CN-GB
    139. x-iso-8859-11
    140. x-JIS0208
    141. x-JISAutoDetect
    142. x-Johab
    143. x-MacArabic
    144. x-MacCentralEurope
    145. x-MacCroatian
    146. x-MacCyrillic
    147. x-MacDingbat
    148. x-MacGreek
    149. x-MacHebrew
    150. x-MacIceland
    151. x-MacRoman
    152. x-MacRomania
    153. x-MacSymbol
    154. x-MacThai
    155. x-MacTurkish
    156. x-MacUkraine
    157. x-MS932_0213
    158. x-MS950-HKSCS
    159. x-MS950-HKSCS-XP
    160. x-mswin-936
    161. x-PCK
    162. x-SJIS_0213
    163. x-UTF-16LE-BOM
    164. X-UTF-32BE-BOM
    165. X-UTF-32LE-BOM
    166. x-windows-50220
    167. x-windows-50221
    168. x-windows-874
    169. x-windows-949
    170. x-windows-950
    171. x-windows-iso2022jp

    4、byte转16进制字符串

    1. import ch.qos.logback.core.encoder.ByteArrayUtil; //logback-core-1.2.10.jar中
    2. public class ByteTest {
    3. public static void main(String[] args) {
    4. System.out.println(ByteArrayUtil.toHexString(new byte[10]));
    5. }
    6. }

    每天⽤⼼记录⼀点点。内容也许不重要,但习惯很重要!

    计算机编码与解码&编码表

  • 相关阅读:
    【前端实习评审】针对推电影详情模块的双向绑定锚点tab有一定的实现
    皮肤暗黄怎么办?
    技术分享 | 自动驾驶的春晚—Tesla AI Day
    Jmeter排查正则表达式提取器未生效问题
    【9】Docker的迁移与备份
    ​LeetCode解法汇总121. 买卖股票的最佳时机
    【财政金融】全国各地区财政收入与支出面板数据合集(2000-2022年)
    如何写出高性能代码(三)优化内存回收(GC)
    200、使用默认 Exchange 实现 P2P 消息 之 消息生产者(发送消息) 和 消息消费者(消费消息)
    vue重修之路由【上】
  • 原文地址:https://blog.csdn.net/weixin_42472027/article/details/126688557