I'll also give you a bit to supplement the concept of CODE PAGE and its setting values
Chcp
Displays the number of the active console code page, or changes the console's active console code page. Used without parameters, chcp displays the number of the active console code page.
Syntax
chcp [nnn]
Parameters
nnn : Specifies the code page. The following table lists each code page supported and its country/region or language:
Code page Country/region or language
437 United States
850 Multilingual (Latin I)
852 Slavic (Latin II)
855 Cyrillic (Russian)
857 Turkish
860 Portuguese
861 Icelandic
863 Canadian-French
865 Nordic
866 Russian
869 Modern Greek
What is code page, and how to modify the codepage in Windows cmd
If your cmd can't display Chinese or other characters normally, modify it through chcp, where the parameter nnn represents three numbers. The codepage for Simplified Chinese is: 936, and for Western is: 1252
History of Code page:
1. Definition and history of Codepage
Character code refers to the internal code used to represent characters. Readers use internal codes when inputting and storing documents. Internal codes are divided into
Single-Byte character sets (SBCS), which can support 256 character encodings.
Double-Byte character sets (DBCS), which can support 65,000 character encodings. Mainly used to encode large character set Oriental texts.
codepage refers to a selected list of character internal codes arranged in a specific order. For early single-byte internal code languages, the internal code order in codepage enables the system to give a corresponding internal code according to the input value of the keyboard according to this list. For double-byte internal codes, it gives a correspondence table from MultiByte to Unicode, so that characters stored in Unicode form can be converted into corresponding character internal codes, or vice versa. The corresponding functions in the Linux kernel are utf8_mbtowc and utf8_wctomb.
Before 1980, there were still no international standards such as ISO-8859 or Unicode to define how to extend US-ASCII encoding for non-English users. Many IT manufacturers invented their own encodings and used hard-to-remember numbers to identify them:
For example, 936 represents Simplified Chinese. 950 represents Traditional Chinese.
1.1 CJK Codepage
Greatly different from Extended Unix Coding (EUC) encoding, all the following Far East codepage use C1 control codes {=80..=9F} as the first byte and ASCII values {=40..=7E} as the second byte to include up to tens of thousands of double-byte characters, which indicates that ASCII values less than 3F in this encoding do not necessarily represent ASCII characters.
CP932
Shift-JIS contains Japanese charset JIS X 0201 (one byte per character) and JIS X 0208 (two bytes per character), so JIS X 0201 hiragana contains one byte half-width characters, and the remaining 60 bytes are used as the first byte for 7,076 Chinese characters and 648 other full-width characters. Different from EUC-JP encoding, Shift-JIS does not contain 5,802 Chinese characters defined in JIS X 202.
CP936
GBK extends EUC-CN encoding (GB 2312-80 encoding, including 6,763 Chinese characters) to 20,902 Chinese characters defined in Unicode (GB13000.1-93), and Simplified Chinese zh_CN is used in mainland China.
CP949
UnifiedHangul (UHC) is a superset of Korean EUC-KR encoding (KS C 5601-1992 encoding, including 2,350 Korean syllables and 4,888 Chinese characters a), and contains 8,822 additional Korean syllables (in C1)
CP950
is the Big5 encoding (13,072 Traditional Chinese zh_TW Chinese characters) replacing EUC-TW (CNS 11643-1992), and these definitions can be found in Ken Lunde's CJK.INF or the Unicode encoding table.
Note: Microsoft adopts the above four Codepage, so the above Codepage must be used when accessing the Microsoft file system.
1.2 IBM's Far East language Codepage
IBM's Codepage is divided into SBCS and DBCS:
IBM SBCS Codepage
37 (English) *
290 (Japanese) *
833 (Korean) *
836 (Simplified Chinese) *
891 (Korean)
897 (Japanese)
903 (Simplified Chinese)
904 (Traditional Chinese)
IBM DBCS Codepage
300 (Japanese) *
301 (Japanese)
834 (Korean) *
835 (Traditional Chinese) *
837 (Simplified Chinese) *
926 (Korean)
927 (Traditional Chinese)
928 (Simplified Chinese)
Mixing SBCS Codepage and DBCS Codepage becomes: IBM MBCS Codepage
930 (Japanese) (Codepage 300 plus 290) *
932 (Japanese) (Codepage 301 plus 897)
933 (Korean) (Codepage 834 plus 833) *
934 (Korean) (Codepage 926 plus 891)
938 (Traditional Chinese) (Codepage 927 plus 904)
936 (Simplified Chinese) (Codepage 928 plus 903)
5031 (Simplified Chinese) (Codepage 837 plus 836) *
5033 (Traditional Chinese) (Codepage 835 plus 37) *
* means using EBCDIC encoding format
It can be seen that Microsoft's CJK Codepage is derived from IBM's Codepage