|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
  『楼 主』:
这是个什么字符?
使用 LLM 解释/回答一下
下载一文件,文件名中包含有 "・" (红色表示)这个字符,咋一看,很像中文的 "·" (蓝色表示)符号,即常用于外国人名和姓中间的间隔号,现在这里可能看不出有区别,但在文本编辑器中有时可以看出,间隔号是全角的,而这个不知名的符号(至少看上去)是半角的,但我为什么说有时可以看出呢?另一个奇怪就奇怪在这里,间隔号有时候看上去是半角的?就像在这个帖子中看到的一样。再回正题,我把这两个放到一起放大些可能就稍微看出有点明显差别了 "・·"
对于这个字符,Windows 似乎可以正常操作,在资源管理器中可以看到,也可以复制、移动、删除文件,也可以改名时去掉这个字符,在命令行中也可以复制、移动、删除、改名,但 dir 却无法显示这个字符,该字符位置处显示的是“空白”,那么这个“空白”究竟是否是空格呢?而且,对于有这个字符的文件名,许多应用程序不认,无法对其操作,打开时说找不到文件之类的话。对于包含有这个字符的文本文件,许多纯文本编辑器都无法将其正常保存,保存后该处的字符变成了 “?” 即西文的问号。如果用 Windows 的 Notepad 保存的话会提示你这个文件包含 Unicode 格式的字符,如果保存为 ANSI 编码的文件将会丢失。那看来该字符是一个 Unicode 字符了?但我奇怪的是既然 Windows XP 可以处理 Unicode 字符,那为什么 dir 不能正常显示呢?那些可以正常处理 Unicode 编码的纯文本编辑器为什么不能正常保存呢?
按照 Unicode little endian 编码,该字符的编码是 FB 30
Download a file, the file name contains "・" (red indicates) this character. At first glance, it looks very similar to the Chinese "·" (blue indicates) symbol, which is often used as the separator between foreign given names and surnames. Now it may not be distinguishable here, but sometimes it can be seen in the text editor. The separator is full-width, and this unknown symbol (at least it seems) is half-width. But why do I say sometimes it can be seen? Another strangeness is here: the separator sometimes seems to be half-width? Just like seen in this post. Let's get back to the main topic. I put these two together and enlarge them, maybe we can see a slight obvious difference "・·"
For this character, Windows seems to be able to operate normally. In Windows Explorer, you can see it, and you can copy, move, delete files, and also remove this character when renaming. In the command line, you can also copy, move, delete, rename, but dir cannot display this character, and a "blank" is displayed at the position of this character. Then is this "blank" exactly a space? Moreover, for file names with this character, many applications do not recognize them and cannot operate on them. When opening, it says the file is not found and so on. For text files containing this character, many plain text editors cannot save them normally, and after saving, the character at that position becomes "?" which is the Western question mark. If you save it with Windows Notepad, it will prompt you that this file contains Unicode format characters, and if you save it as an ANSI encoded file, it will be lost. Then it seems that this character is a Unicode character? But I am curious that since Windows XP can handle Unicode characters, why can't dir display it normally? Why can't those plain text editors that can handle Unicode encoding normally save it?
According to the Unicode little endian encoding, the encoding of this character is FB 30
|

DOS倒下了,但永远不死
DOS NEVER DIES !
投票调查:
http://www.cn-dos.net/forum/viewthread.php?tid=46187
本人尚未解决的疑难问题:
http://www.cn-dos.net/forum/viewthread.php?tid=15135
http://www.cn-dos.net/forum/viewthread.php?tid=47663
http://www.cn-dos.net/forum/viewthread.php?tid=48747 |
|
2008-8-25 07:05 |
|
|
stockghost
中级用户
  
积分 215
发帖 105
注册 2007-6-2
状态 离线
|
『第 2 楼』:
使用 LLM 解释/回答一下
刚才用UEStudio拷贝过去编辑了,却发现只有一个'?'
SlickEdit也是一样
拷贝到DOS下的编辑器,发现显示正确,只有一个字符,代码0xFA
Just now I used UEStudio to copy and edit, but found only a '?'
SlickEdit is the same
Copied to the editor under DOS, found to be displayed correctly, only one character, code 0xFA
|
|
2008-8-25 09:47 |
|
|
stockghost
中级用户
  
积分 215
发帖 105
注册 2007-6-2
状态 离线
|
|
2008-8-25 09:49 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
『第 4 楼』:
使用 LLM 解释/回答一下
以下是命令行环境下,不同代码页和不同编码环境下该字符的编码和表现(这话听上去怎么有点别扭?谁帮我重新叙述一下。)
CP ANSI Unicode dir
---------------------------------------
936: 3F FB 30 N
437: FA FB 30 Y
可以看出,在简体中文代码页的环境下,该字符转换为相应的 ANSI/ASCII 码值为 0x3F ,对应字符为 "?" ,那么这个问号究竟是确实是应该对应的问号,还是凡是不知道该对应什么字符的一律转换为问号?
在西文US代码页的环境下,该字符转换为相应的 ANSI/ASCII 码值为 0xFA ,对应的字符这里无法正常显示出来,这个字符确实也显示的是一个位置在中间的点,可以看作是西文的间隔号。可以认为这里做了正确的转换。
不论哪一种代码页环境下,Unicode 编码始终都是一样的,看来确实是 Universal 。
在 CP936 环境下,dir 该字符不能正常显示,该处显示的是一个“空白”。
在 CP437 环境下,dir 能够正确显示该字符。
我奇怪的是,既然 WindowsXP 可以处理 Unicode 字符集,为什么以前从未碰到此类问题呢?那么到此为止是不是可以这么说, Unicode 字符集中的字符转换为不同代码页对应的字符还不完善?或者,我在 Windows 中还需要做什么设置?
The following is the encoding and performance of this character under different code pages and different encoding environments in the command line environment (Why does this sound a bit awkward? Who can rephrase it.)
CP ANSI Unicode dir
---------------------------------------
936: 3F FB 30 N
437: FA FB 30 Y
It can be seen that in the environment of the Simplified Chinese code page, this character is converted to the corresponding ANSI/ASCII code value of 0x3F, and the corresponding character is "?", so is this question mark supposed to correspond to the question mark indeed, or is everything that does not know what character it should correspond to converted to a question mark?
In the environment of the Western US code page, this character is converted to the corresponding ANSI/ASCII code value of 0xFA, and the corresponding character cannot be displayed normally here. This character does display a dot in the middle, which can be regarded as a Western interpunct. It can be considered that the conversion is correct here.
Regardless of the code page environment, the Unicode encoding is always the same. It seems that it is indeed Universal.
Under the CP936 environment, the dir cannot display this character normally, and a "blank" is displayed there.
Under the CP437 environment, the dir can display this character correctly.
I am curious that since Windows XP can handle the Unicode character set, why have I never encountered such a problem before? So can it be said so far that the conversion of characters in the Unicode character set to characters corresponding to different code pages is not perfect? Or, what settings do I need to make in Windows?
|

DOS倒下了,但永远不死
DOS NEVER DIES !
投票调查:
http://www.cn-dos.net/forum/viewthread.php?tid=46187
本人尚未解决的疑难问题:
http://www.cn-dos.net/forum/viewthread.php?tid=15135
http://www.cn-dos.net/forum/viewthread.php?tid=47663
http://www.cn-dos.net/forum/viewthread.php?tid=48747 |
|
2008-8-25 22:48 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
『第 5 楼』:
使用 LLM 解释/回答一下
又遇到两个类似的字符,在 Windows 的资源管理器中和命令行中的操作表现和前述的一样,该字符 「」 和中文的全角符号 「」 咋看上去很像。而且不像中文全角字符,它们是占半角的。如果把该字符复制以 ANSI 编码保存后都变为 ? 了,这两个字符的 Unicode 编码分别是 0x62FF 和 0x63FF 。
看来类似的 Unicode 中文字符很多?如此看来,Unicode 编码的汉字字符(姑且这么称)都不能正确地转换为相应的 ANSI 编码的中文字符?
Then I encountered two similar characters again. The operations in Windows Explorer and the command line are the same as the previous ones. This character 「」 and the full-width Chinese symbol 「」 look very similar at first glance. Moreover, unlike full-width Chinese characters, they occupy half-width. If this character is copied and saved in ANSI encoding, it all becomes?. The Unicode codes of these two characters are 0x62FF and 0x63FF respectively.
It seems there are many similar Unicode Chinese characters? So it seems that Unicode-encoded Chinese characters (so to speak) cannot all be correctly converted to corresponding ANSI-encoded Chinese characters?
|

DOS倒下了,但永远不死
DOS NEVER DIES !
投票调查:
http://www.cn-dos.net/forum/viewthread.php?tid=46187
本人尚未解决的疑难问题:
http://www.cn-dos.net/forum/viewthread.php?tid=15135
http://www.cn-dos.net/forum/viewthread.php?tid=47663
http://www.cn-dos.net/forum/viewthread.php?tid=48747 |
|
2008-9-13 03:44 |
|
|
tigerpower
中级用户
   大师兄
积分 377
发帖 99
注册 2005-8-26
状态 离线
|
『第 6 楼』:
使用 LLM 解释/回答一下
因为各个字符集并非出自同一组织,所以相互之间是交集的关系,大多数字符集包含ASCII,而Unicode包含所有的字符集。
Windows XP把所有的文件名存储为Unicode,所以能在文件名里看见这些字符。
而在中文版Windows XP中的命令行cmd.exe默认以GBK编码运行(就是cp936),所以那些在GBK编码中没有的字符就看不到。
可以正常处理 Unicode 编码的纯文本编辑器,如果没有以unicode编码打开,就无法保存那些字符集中没有的字符。
我们可以看看楼主所找出的都是些什么字符:
打开记事本,将那6个字符复制上去,且一行一个,
文件->另存为->编码:Unicode big endian,文件名:c:\ch.txt
然后打开命令行,运行:
echo d 100 l 22 | debug c:\ch.txt
这条命令查看ch.txt的16进制格式,结果类似于:
-d 100 l 22
0B67:0100 FE FF 30 FB 00 0D 00 0A-00 B7 00 0D 00 0A FF 62 ..0............b
0B67:0110 00 0D 00 0A FF 63 00 0D-00 0A 30 0C 00 0D 00 0A .....c....0.....
0B67:0120 30 0D 0.
然后开始 -> 附件 -> 系统工具 -> 字符映射表
字体选Arial Unicode MS,选中下方“高级查看”,字符集:Unicode,分组:全部
在“转到Unicode”框里填上面红色的四个字符(如30FB),
就找到了该字符,并在最下方有该字符的名字,这6个字符依次是:
片假名中间点
中间点
半形左角括号
半形右角括号
左角括号
右角括号
Last edited by tigerpower on 2008-9-16 at 08:35 PM ]
Because various character sets are not from the same organization, so they are in an intersecting relationship. Most character sets include ASCII, and Unicode includes all character sets.
Windows XP stores all file names as Unicode, so you can see these characters in file names.
In the Chinese version of Windows XP, the command line cmd.exe runs with GBK encoding by default (that is, cp936). So those characters not in the GBK encoding cannot be seen.
A plain text editor that can handle Unicode encoding normally cannot save characters not in the character set if it is not opened with Unicode encoding.
We can take a look at what characters the building-block owner has found:
Open Notepad, copy those 6 characters and put them in, one per line,
File -> Save As -> Encoding: Unicode big endian, File name: c:\ch.txt
Then open the command line and run:
echo d 100 l 22 | debug c:\ch.txt
This command views the hexadecimal format of ch.txt, and the result is similar to:
-d 100 l 22
0B67:0100 FE FF 30 FB 00 0D 00 0A-00 B7 00 0D 00 0A FF 62 ..0............b
0B67:0110 00 0D 00 0A FF 63 00 0D-00 0A 30 0C 00 0D 00 0A .....c....0.....
0B67:0120 30 0D 0.
Then Start -> Accessories -> System Tools -> Character Map
Select font Arial Unicode MS, select "Advanced View" below, Character set: Unicode, Group: All
Fill in the four red characters above (such as 30FB) in the "Go to Unicode" box,
Then find the character, and there is the name of the character at the bottom. These 6 characters are in sequence:
Katakana middle dot
Middle dot
Half-width left angle bracket
Half-width right angle bracket
Left angle bracket
Right angle bracket
Last edited by tigerpower on 2008-9-16 at 08:35 PM ]
|
|
2008-9-15 21:49 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
『第 7 楼』:
使用 LLM 解释/回答一下
我用的是英文版的 WindowsXP ,在字符映射表中,字体选择没有 Arial Unicode MS ,只有 Arial 和 Arial Black ,唯一一个和 Unicode 有关的是 Lucida Sans Unicode 。但是我按照 30fb 填进去所显示的字符不是原来的字符。既然都选 Unicode 字符集,那和选什么字体有什么关系?
一般,应用程序让你所选择的字符集中 Unicode 就是指 Unicode litte endian ,big endian 和 UTF-8 会另外写明,那你为什么一开始就想到要看它的 Unicode big endian 编码?字符映射表中为什么也是让你填 big endian 的编码?
I'm using the English version of Windows XP. In the Character Map, there's no Arial Unicode MS in the font selection; only Arial and Arial Black are available. The only one related to Unicode is Lucida Sans Unicode. But the character displayed when I entered 30fb isn't the original character. Since I'm choosing the Unicode character set, why does it matter which font I choose?
Generally, in applications, when you choose a character set, Unicode just means Unicode little endian. Big endian and UTF-8 are specified separately. Then why did you initially think to look at its Unicode big endian encoding? Why does the Character Map also let you enter big endian encoding?
|

DOS倒下了,但永远不死
DOS NEVER DIES !
投票调查:
http://www.cn-dos.net/forum/viewthread.php?tid=46187
本人尚未解决的疑难问题:
http://www.cn-dos.net/forum/viewthread.php?tid=15135
http://www.cn-dos.net/forum/viewthread.php?tid=47663
http://www.cn-dos.net/forum/viewthread.php?tid=48747 |
|
2008-9-16 03:53 |
|
|
tigerpower
中级用户
   大师兄
积分 377
发帖 99
注册 2005-8-26
状态 离线
|
『第 8 楼』:
使用 LLM 解释/回答一下
没有Arial Unicode MS就选MS Mincho或MS Gothic,至于Lucida Sans Unicode,虽然是用Unicode编码的字体,但是不支持日文,所以日文字符(30FB)是找不到的。
Windows XP本身是设计成Little Endian结构的。
修改过游戏的朋友都知道,4字节查找金钱1000(16进制的3E8)在内存里是E8 03 00 00,这就叫Little Endian。
而Windows XP中的程序几乎都是Little Endian的,所以你见到一些程序UTF-16默认使用Little Endian。
但这只是Windows中的规矩,根据 Unicode官方,当UTF-16不标明是BE还是LE时,默认是指BE,
具体还需根据文件头部的 BOM(Byte Order Mark)判别。
至于是怎么想到要看它的 Unicode big endian 编码嘛,我也是试出来的:)
If you don't have Arial Unicode MS, choose MS Mincho or MS Gothic. As for Lucida Sans Unicode, although it is a font encoded with Unicode, it doesn't support Japanese, so the Japanese character (30FB) can't be found.
Windows XP itself is designed in Little Endian structure.
Friends who have modified games all know that looking for 1000 in money in 4 bytes (3E8 in hexadecimal) is E8 03 00 00 in memory, which is called Little Endian.
And most programs in Windows XP are almost Little Endian, so you see that some programs use Little Endian by default for UTF-16.
But this is just the rule in Windows. According to Unicode official, when UTF-16 doesn't indicate whether it is BE or LE, it defaults to BE.
Specifically, it needs to be judged according to the BOM (Byte Order Mark) in the file header.
As for how I thought of looking at its Unicode big endian encoding, I also figured it out : )
|
|
2008-9-16 21:32 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
『第 9 楼』:
使用 LLM 解释/回答一下
呵呵呵呵…… 你小子还记得玩游戏
看了我还是有点糊涂,既然那些没标明 Unicode 的字体为什么可以有 Unicode 字符?
应用程序中选什么字体,可能不影响保存,但为什么不影响显示呢?比如我的记事本中的默认字体为 Lucida Console 而不是 Lucida Sans Unicode ,也不是 MS Mincho或 MS Gothic,其对应的 0x30FB 编码不是这个“片假名中间点”而是其它字符,但为什么照样可以正确显示呢?
顺便再问个问题,字体名前面有个 T 字样的代表是 TrueType 字体,有个 O 字样的是什么意思?
Hehe he he...... You kid still remember playing games :P
After reading, I'm still a bit confused. Since those fonts that don't specify Unicode can have Unicode characters, why is that?
In an application, choosing which font might not affect saving, but why doesn't it affect display? For example, the default font in my Notepad is Lucida Console instead of Lucida Sans Unicode, nor MS Mincho or MS Gothic. The corresponding 0x30FB encoding is not this "katakana middle dot" but some other character, but why does it still display correctly?
By the way, ask another question: the T character in front of the font name represents a TrueType font. What does the O character mean?
|

DOS倒下了,但永远不死
DOS NEVER DIES !
投票调查:
http://www.cn-dos.net/forum/viewthread.php?tid=46187
本人尚未解决的疑难问题:
http://www.cn-dos.net/forum/viewthread.php?tid=15135
http://www.cn-dos.net/forum/viewthread.php?tid=47663
http://www.cn-dos.net/forum/viewthread.php?tid=48747 |
|
2008-9-17 01:18 |
|
|
tigerpower
中级用户
   大师兄
积分 377
发帖 99
注册 2005-8-26
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
没标明Unicode的字体有些也是用Unicode编码的,比如上次提到的那二个日文字体。
记事本对于默认字体中没有的字就去找另外一种字体显示它。
Lucida Console里根本就没有汉字,能显示汉字是因为系统中有宋体。
至于它是以什么规则、什么顺序去找字体就不清楚了(Windows XP中文版中的记事本对找不到的字似乎会寻找宋体)。
但不是都能找到的,下方蓝色方框里第二行是六字真言的藏文写法,你贴到记事本看看,通常不能显示(如浏览器不能正常显示,请安装Arial Unicode MS字体)
另外,有O 字样的是OpenType字体。
六字大明咒:唵嘛呢叭咪吽
ༀ་མ་ཎི་པད་མེ་ཧཱུྃ
Some fonts that don't indicate Unicode are also encoded in Unicode, such as the two Japanese fonts mentioned last time.
Notepad looks for another font to display characters that are not in the default font.
Lucida Console has no Chinese characters at all. It can display Chinese characters because there is SimSun in the system.
As for what rules and order it uses to find fonts, I don't know (In Notepad in Chinese Windows XP, it seems to look for SimSun for characters it can't find).
But not all can be found. The second line in the blue box below is the Tibetan writing of the six-syllable mantra. Paste it into Notepad and usually it can't be displayed (If the browser can't display it normally, install Arial Unicode MS font).
Also, those with the O are OpenType fonts.
六字大明咒:唵嘛呢叭咪吽
ༀ་མ་ཎི་པད་མེ་ཧཱུྃ
|
|
2008-9-17 22:17 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
我的浏览器确实不能正常显示,而且不同的浏览器对不能正常显示的字符表现也不同。
IE 和以 IE 为核心的浏览器显示的是一个个方框,但同样是非 IE 核心的 Opera 显示的也是方框
非 IE 核心的 NetScape Navigator 和 K-Meleon 显示的是一个个问号
最好的我看是 Firefox 了,它对不能正确显示的字符把它的编码显示出来

但不知这是 big endian 还是 little endian 还是 UTF-16 ?还有我不解的是,你说的是六字,但从这 Firefox 显示出的编码来看,怎么是17个字符?
另外,哪里去单独下载字体,去 Microsoft 的网站上?
My browser indeed can't display normally, and different browsers show different performances for the characters that can't be displayed normally.
IE and browsers with IE as the core show square boxes one by one, but Opera which is also non-IE core shows square boxes too
NetScape Navigator and K-Meleon which are non-IE core show question marks one by one
The best one I think is Firefox, it displays the encoding of the characters that can't be displayed correctly

But I don't know whether this is big endian or little endian or UTF-16? Also, what I don't understand is that you said six characters, but from the encoding displayed by this Firefox, why are there 17 characters?
Also, where can I download the font alone, from the Microsoft website?
|

DOS倒下了,但永远不死
DOS NEVER DIES !
投票调查:
http://www.cn-dos.net/forum/viewthread.php?tid=46187
本人尚未解决的疑难问题:
http://www.cn-dos.net/forum/viewthread.php?tid=15135
http://www.cn-dos.net/forum/viewthread.php?tid=47663
http://www.cn-dos.net/forum/viewthread.php?tid=48747 |
|
2008-9-19 02:54 |
|
|
tigerpower
中级用户
   大师兄
积分 377
发帖 99
注册 2005-8-26
状态 离线
|
|
2008-9-19 20:48 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
|
2008-9-20 02:05 |
|
|
fujianabc
金牌会员
     
积分 3467
发帖 1616
注册 2004-6-21
状态 离线
|
『第 14 楼』:
使用 LLM 解释/回答一下
Originally posted by DOSforever at 2008-9-20 02:05 AM:
嘿嘿,不错,能正确显示了。
原来所说的六字是指翻译成汉字为六个字,原藏文为17个字
藏文是6个字,好像你显示的有问题,把藏文每个字的音素都拆开了,我vista的ie 7下显示的是6个字(每个字以倒三角形符号为间隔)
Last edited by fujianabc on 2008-9-21 at 11:46 AM ]
Originally posted by DOSforever at 2008-9-20 02:05 AM:
Hehe, not bad, it can be displayed correctly.
The originally mentioned six characters refer to six Chinese characters when translated, and the original Tibetan text has 17 characters
The Tibetan text is 6 characters. It seems that your display is problematic, as you have separated the phonetic elements of each Tibetan character. What I see displayed in IE 7 on Vista is 6 characters (each character separated by an inverted triangle symbol)
Last edited by fujianabc on 2008-9-21 at 11:46 AM ]
附件
1: 5o2V6I63_ysamN7Yvq4HK[1].png (2008-9-21 11:45, 2.79 KiB,下载次数: 1)
|
|
2008-9-20 21:14 |
|
|
DOSforever
金牌会员
     
积分 4639
发帖 2239
注册 2005-1-30
状态 离线
|
|
2008-9-21 01:52 |
|