|
sjhf
初级用户
 
积分 161
发帖 7
注册 2004-4-21
状态 离线
|
『楼 主』:
【原创】4.5万字 透彻分析FAT文件系统!★★★★★
使用 LLM 解释/回答一下
文章较长,字符数大约是4.5万左右。所以,原文中我加了目录索引和文内链接。可以到 http://www.sjhf.net/bbs 下载。
另外,由于论坛里不太容易发表格,所以在论坛里发的时候,我将表格换成了图片,在原文中仍是表格。
一、硬盘的物理结构:
硬盘存储数据是根据电、磁转换原理实现的。硬盘由一个或几个表面镀有磁性物质的金属或玻璃等物质盘片以及盘片两面所安装的磁头和相应的控制电路组成(图1),其中盘片和磁头密封在无尘的金属壳中。
硬盘工作时,盘片以设计转速高速旋转,设置在盘片表面的磁头则在电路控制下径向移动到指定位置然后将数据存储或读取出来。当系统向硬盘写入数据时,磁头中“写数据”电流产生磁场使盘片表面磁性物质状态发生改变,并在写电流磁场消失后仍能保持,这样数据就存储下来了;当系统从硬盘中读数据时,磁头经过盘片指定区域,盘片表面磁场使磁头产生感应电流或线圈阻抗产生变化,经相关电路处理后还原成数据。因此只要能将盘片表面处理得更平滑、磁头设计得更精密以及尽量提高盘片旋转速度,就能造出容量更大、读写数据速度更快的硬盘。这是因为盘片表面处理越平、转速越快就能越使磁头离盘片表面越近,提高读、写灵敏度和速度;磁头设计越小越精密就能使磁头在盘片上占用空间越小,使磁头在一张盘片上建立更多的磁道以存储更多的数据。
二、硬盘的逻辑结构。
硬盘由很多盘片(platter)组成,每个盘片的每个面都有一个读写磁头。如果有N个盘片。就有2N个面,对应2N个磁头(Heads),从0、1、2开始编号。每个盘片被划分成若干个同心圆磁道(逻辑上的,是不可见的。)每个盘片的划分规则通常是一样的。这样每个盘片的半径均为固定值R的同心圆再逻辑上形成了一个以电机主轴为轴的柱面(Cylinders),从外至里编号为0、1、2……每个盘片上的每个磁道又被划分为几十个扇区(Sector),通常的容量是512byte,并按照一定规则编号为1、2、3……形成Cylinders×Heads×Sector个扇区。这三个参数即是硬盘的物理参数。我们下面的很多实践需要深刻理解这三个参数的意义。
三、磁盘引导原理。
3.1 MBR(master boot record)扇区:
计算机在按下power键以后,开始执行主板bios程序。进行完一系列检测和配置以后。开始按bios中设定的系统引导顺序引导系统。假定现在是硬盘。Bios执行完自己的程序后如何把执行权交给硬盘呢。交给硬盘后又执行存储在哪里的程序呢。其实,称为mbr的一段代码起着举足轻重的作用。MBR(master boot record),即主引导记录,有时也称主引导扇区。位于整个硬盘的0柱面0磁头1扇区(可以看作是硬盘的第一个扇区),bios在执行自己固有的程序以后就会jump到mbr中的第一条指令。将系统的控制权交由mbr来执行。在总共512byte的主引导记录中,MBR的引导程序占了其中的前446个字节(偏移0H~偏移1BDH),随后的64个字节(偏移1BEH~偏移1FDH)为DPT(Disk PartitionTable,硬盘分区表),最后的两个字节“55 AA”(偏移1FEH~偏移1FFH)是分区有效结束标志。
MBR不随操作系统的不同而不同,意即不同的操作系统可能会存在相同的MBR,即使不同,MBR也不会夹带操作系统的性质。具有公共引导的特性。
我们来分析一段mbr。下面是用winhex查看的一块希捷120GB硬盘的mbr。

你的硬盘的MBR引导代码可能并非这样。不过即使不同,所执行的功能大体是一样的。 这里找wowocock关于磁盘mbr的反编译,已加了详细的注释,感兴趣可以细细研究一下。
我们看DPT部分。操作系统为了便于用户对磁盘的管理。加入了磁盘分区的概念。即将一块磁盘逻辑划分为几块。磁盘分区数目的多少只受限于C~Z的英文字母的数目,在上图DPT共64个字节中如何表示多个分区的属性呢?microsoft通过链接的方法解决了这个问题。在DPT共64个字节中,以16个字节为分区表项单位描述一个分区的属性。也就是说,第一个分区表项描述一个分区的属性,一般为基本分区。第二个分区表项描述除基本分区外的其余空间,一般而言,就是我们所说的扩展分区。这部分的大体说明见表1。

注:上表中的超过1字节的数据都以实际数据显示,就是按高位到地位的方式显示。存储时是按低位到高位存储的。两者表现不同,请仔细看清楚。以后出现的表,图均同。
也可以在winhex中看到这些参数的意义:
说明: 每个分区表项占用16个字节,假定偏移地址从0开始。如图3的分区表项3。分区表项4同分区表项3。
1、0H偏移为活动分区是否标志,只能选00H和80H。80H为活动,00H为非活动。其余值对microsoft而言为非法值。
2、重新说明一下(这个非常重要):大于1个字节的数被以低字节在前的存储格式格式(little endian format)或称反字节顺序保存下来。低字节在前的格式是一种保存数的方法,这样,最低位的字节最先出现在十六进制数符号中。例如,相对扇区数字段的值0x3F000000的低字节在前表示为0x0000003F。这个低字节在前的格式数的十进制数为63。
3、系统在分区时,各分区都不允许跨柱面,即均以柱面为单位,这就是通常所说的分区粒度。有时候我们分区是输入分区的大小为7000M,分出来却是6997M,就是这个原因。 偏移2H和偏移6H的扇区和柱面参数中,扇区占6位(bit),柱面占10位(bit),以偏移6H为例,其低6位用作扇区数的二进制表示。其高两位做柱面数10位中的高两位,偏移7H组成的8位做柱面数10位中的低8位。由此可知,实际上用这种方式表示的分区容量是有限的,柱面和磁头从0开始编号,扇区从1开始编号,所以最多只能表示1024个柱面×63个扇区×256个磁头×512byte=8455716864byte。即通常的8.4GB(实际上应该是7.8GB左右)限制。实际上磁头数通常只用到255个(由汇编语言的寻址寄存器决定),即使把这3个字节按线性寻址,依然力不从心。 在后来的操作系统中,超过8.4GB的分区其实已经不通过C/H/S的方式寻址了。而是通过偏移CH~偏移FH共4个字节32位线性扇区地址来表示分区所占用的扇区总数。可知通过4个字节可以表示2^32个扇区,即2TB=2048GB,目前对于大多数计算机而言,这已经是个天文数字了。在未超过8.4GB的分区上,C/H/S的表示方法和线性扇区的表示方法所表示的分区大小是一致的。也就是说,两种表示方法是协调的。即使不协调,也以线性寻址为准。(可能在某些系统中会提示出错)。超过8.4GB的分区结束C/H/S一般填充为FEH FFH FFH。即C/H/S所能表示的最大值。有时候也会用柱面对1024的模来填充。不过这几个字节是什么其实都无关紧要了。
虽然现在的系统均采用线性寻址的方式来处理分区的大小。但不可跨柱面的原则依然没变。本分区的扇区总数加上与前一分区之间的保留扇区数目依然必须是柱面容量的整数倍。(保留扇区中的第一个扇区就是存放分区表的MBR或虚拟MBR的扇区,分区的扇区总数在线性表示方式上是不计入保留扇区的。如果是第一个分区,保留扇区是本分区前的所有扇区。
附:分区表类型标志如图4
3.2 扩展分区
扩展分区中的每个逻辑驱动器都存在一个类似于MBR的扩展引导记录( Extended Boot Record, EBR),也有人称之为虚拟mbr或扩展mbr,意思是一样的。扩展引导记录包括一个扩展分区表和该扇区的标签。扩展引导记录将记录只包含扩展分区中每个逻辑驱动器的第一个柱面的第一面的信息。一个逻辑驱动器中的引导扇区一般位于相对扇区32或63。但是,如果磁盘上没有扩展分区,那么就不会有扩展引导记录和逻辑驱动器。第一个逻辑驱动器的扩展分区表中的第一项指向它自身的引导扇区。第二项指向下一个逻辑驱动器的EBR。如果不存在进一步的逻辑驱动器,第二项就不会使用,而且被记录成一系列零。如果有附加的逻辑驱动器,那么第二个逻辑驱动器的扩展分区表的第一项会指向它本身的引导扇区。第二个逻辑驱动器的扩展分区表的第二项指向下一个逻辑驱动器的EBR。扩展分区表的第三项和第四项永远都不会被使用。
通过一幅4分区的磁盘结构图可以看到磁盘的大致组织形式。如图5:
关于扩展分区,如图6所示,扩展分区中逻辑驱动器的扩展引导记录是一个连接表。该图显示了一个扩展分区上的三个逻辑驱动器,说明了前面的逻辑驱动器和最后一个逻辑驱动器之间在扩展分区表中的差异。
除了扩展分区上最后一个逻辑驱动器外,表2中所描述的扩展分区表的格式在每个逻辑驱动器中都是重复的:第一个项标识了逻辑驱动器本身的引导扇区,第二个项标识了下一个逻辑驱动器的EBR。最后一个逻辑驱动器的扩展分区表只会列出它本身的分区项。最后一个扩展分区表的第二个项到第四个项被使用。
扩展分区表项中的相对扇区数字段所显示的是从扩展分区开始到逻辑驱动器中第一个扇区的位移的字节数。总扇区数字段中的数是指组成该逻辑驱动器的扇区数目。总扇区数字段的值等于从扩展分区表项所定义的引导扇区到逻辑驱动器末尾的扇区数。
有时候在磁盘的末尾会有剩余空间,剩余空间是什么呢?我们前面说到,分区是以1柱面的容量为分区粒度的,那么如果磁盘总空间不是整数个柱面的话,不够一个柱面的剩下的空间就是剩余空间了,这部分空间并不参与分区,所以一般无法利用。照道理说,磁盘的物理模式决定了磁盘的总容量就应该是整数个柱面的容量,为什么会有不够一个柱面的空间呢。在我的理解看来,本来现在的磁盘为了更大的利用空间,一般在物理上并不是按照外围的扇区大于里圈的扇区这种管理方式,只是为了与操作系统兼容而抽象出来CHS。可能其实际空间容量不一定正好为整数个柱面的容量吧。关于这点,如有高见,请告知 http://www.sjhf.net 或 zymail@vip.sina.com
### Physical Structure of the Hard Disk:
The storage of data on a hard disk is realized based on the principle of electrical-magnetic conversion. A hard disk consists of one or several disk platters coated with magnetic material on the surface, as well as read/write heads installed on both sides of the platters and corresponding control circuits (Figure 1). The platters and heads are sealed in a dust-free metal case.
When the hard disk is in operation, the platters rotate at a high speed according to the designed rotational speed. The read/write heads set on the surface of the platters move radially to the specified position under the control of the circuit and then store or read data. When the system writes data to the hard disk, the "write data" current in the heads generates a magnetic field, causing a change in the state of the magnetic material on the surface of the platter. And it can remain after the write current magnetic field disappears, so that the data is stored. When the system reads data from the hard disk, the heads pass through the specified area of the platter. The magnetic field on the surface of the platter causes an induced current in the heads or a change in the impedance of the coil. After being processed by the relevant circuit, the data is restored. Therefore, as long as the surface of the platter is made smoother, the heads are designed more precisely, and the rotational speed of the platter is increased as much as possible, a hard disk with a larger capacity and faster data reading/writing speed can be manufactured. This is because the smoother the surface of the platter and the faster the rotational speed, the closer the heads can be to the surface of the platter, improving the reading and writing sensitivity and speed. The smaller and more precise the heads are designed, the smaller the space occupied by the heads on the platter, allowing the heads to establish more tracks on one platter to store more data.
### Logical Structure of the Hard Disk.
A hard disk is composed of many platters (platter). Each surface of each platter has a read/write head. If there are N platters, there are 2N surfaces, corresponding to 2N heads (Heads), numbered starting from 0, 1, 2. Each platter is divided into several concentric circular tracks (logically, it is invisible). The division rules of each platter are usually the same. In this way, the concentric circles with a fixed radius R for each platter logically form a cylinder (Cylinders) with the motor spindle as the axis, numbered from 0, 1, 2... from the outside to the inside. Each track on each platter is further divided into dozens of sectors (Sector), usually with a capacity of 512 bytes, and numbered as 1, 2, 3... according to a certain rule to form Cylinders × Heads × Sector sectors. These three parameters are the physical parameters of the hard disk. Many of our subsequent practices need to deeply understand the meanings of these three parameters.
### Disk Booting Principle.
#### 3.1 MBR (Master Boot Record) Sector:
After the computer presses the power key, it starts to execute the motherboard BIOS program. After a series of inspections and configurations, it starts to boot the system according to the system boot sequence set in the BIOS. Assume it is the hard disk now. After the BIOS executes its own program, how does it hand over the execution power to the hard disk? After handing over to the hard disk, which program stored where is executed? In fact, a section of code called MBR plays a crucial role. MBR (Master Boot Record), that is, the master boot record, is sometimes also called the master boot sector. It is located in the 0 cylinder 0 head 1 sector of the entire hard disk (which can be regarded as the first sector of the hard disk). After the BIOS executes its own inherent program, it will jump to the first instruction in the MBR and hand over the control of the system to the MBR for execution. In the master boot record with a total of 512 bytes, the boot program of the MBR occupies the first 446 bytes (offset 0H ~ offset 1BDH). The subsequent 64 bytes (offset 1BEH ~ offset 1FDH) are the DPT (Disk Partition Table, hard disk partition table). The last two bytes "55 AA" (offset 1FEH ~ offset 1FFH) are the partition valid end mark.
The MBR is not different with different operating systems, that is, different operating systems may have the same MBR. Even if they are different, the MBR will not carry the nature of the operating system. It has the characteristic of public booting.
Let's analyze a section of MBR. The following is the MBR of a Seagate 120GB hard disk viewed with WinHex.
The MBR boot code of your hard disk may not be like this. However, even if it is different, the functions performed are generally the same. Here, find the decompilation of disk MBR by wowocock, which has been added with detailed comments. If you are interested, you can study it carefully.
Let's look at the DPT part. In order to facilitate users' management of the disk, the operating system has added the concept of disk partitioning, that is, dividing a disk logically into several parts. The number of disk partitions is only limited by the number of English letters from C to Z. How to represent the attributes of multiple partitions in the 64 bytes of the DPT in the above figure? Microsoft solves this problem by the link method. In the 64 bytes of the DPT, a partition's attributes are described with 16 bytes as a partition table entry unit. That is, the first partition table entry describes the attributes of a partition, generally the primary partition. The second partition table entry describes the remaining space except the primary partition, generally speaking, it is the extended partition we call. The general description of this part is shown in Table 1.
Note: The data exceeding 1 byte in the above table is displayed as the actual data, that is, displayed in the way from high bit to low bit. When storing, it is stored from low bit to high bit. The two are different, please see clearly. The same applies to the tables and figures that appear later.
You can also see the meaning of these parameters in WinHex:
Explanation: Each partition table entry occupies 16 bytes. Assume that the offset address starts from 0. For example, partition table entry 3 in Figure 3. Partition table entry 4 is the same as partition table entry 3.
1. The offset 0H is the active partition flag, which can only be 00H and 80H. 80H is active, and 00H is inactive. Other values are illegal values for Microsoft.
2. Re-explain (this is very important): Numbers greater than 1 byte are stored in the little endian format, also known as the reverse byte order. The little endian format is a way to store numbers, so that the lowest byte appears first in the hexadecimal number symbol. For example, the value of the relative sector number field 0x3F000000 is represented as 0x0000003F in the little endian format. The decimal number of this little endian format number is 63.
3. When the system partitions, each partition is not allowed to cross cylinders, that is, all are in units of cylinders. This is the so-called partition granularity. Sometimes when we partition, we enter the partition size as 7000M, but it turns out to be 6997M. This is the reason. In the sector and cylinder parameters of offset 2H and offset 6H, the sector occupies 6 bits (bit), and the cylinder occupies 10 bits (bit). Taking offset 6H as an example, its low 6 bits are used as the binary representation of the number of sectors. Its high two bits are used as the high two bits of the 10-bit cylinder number, and the 8 bits composed of offset 7H are used as the low 8 bits of the 10-bit cylinder number. From this, it can be seen that the partition capacity represented in this way is limited. The cylinder and head are numbered starting from 0, and the sector is numbered starting from 1. So the maximum can only represent 1024 cylinders × 63 sectors × 256 heads × 512 bytes = 8455716864 bytes. That is, the usual 8.4GB (actually it should be about 7.8GB) limit. In fact, the number of heads is usually only used up to 255 (determined by the addressing register of the assembly language). Even if these 3 bytes are linearly addressed, it is still insufficient. In later operating systems, partitions exceeding 8.4GB are no longer addressed by the C/H/S method. Instead, the 4-byte 32-bit linear sector address from offset CH ~ offset FH is used to represent the total number of sectors occupied by the partition. It can be seen that 2^32 sectors can be represented by 4 bytes, that is, 2TB = 2048GB, which is an astronomical number for most computers at present. On partitions not exceeding 8.4GB, the representation methods of C/H/S and linear sectors represent the same partition size. That is, the two representation methods are coordinated. Even if they are not coordinated, the linear addressing is used as the standard. (There may be errors prompted in some systems). The end C/H/S of partitions exceeding 8.4GB is generally filled with FEH FFH FFH. That is, the maximum value that C/H/S can represent. Sometimes it is also filled with the modulo of the cylinder to 1024. However, what these bytes are actually is irrelevant.
Although current systems all use the linear addressing method to handle the partition size, the principle of not crossing cylinders remains unchanged. The total number of sectors of this partition plus the number of reserved sectors between it and the previous partition must still be an integer multiple of the cylinder capacity. (The first sector in the reserved sectors is the sector where the partition table is stored, MBR or virtual MBR. The total number of sectors of the partition is not counted in the reserved sectors in the linear representation method. If it is the first partition, the reserved sectors are all sectors before this partition.
Attachment: Partition table type flag is shown in Figure 4
#### 3.2 Extended Partition
Each logical drive in the extended partition has an extended boot record (Extended Boot Record, EBR) similar to the MBR, which is also called virtual MBR or extended MBR, meaning the same. The extended boot record includes an extended partition table and the label of this sector. The extended boot record will record the information of the first surface of the first cylinder of each logical drive in the extended partition. The boot sector in a logical drive is generally located at relative sector 32 or 63. However, if there is no extended partition on the disk, there will be no extended boot record and logical drives. The first item in the extended partition table of the first logical drive points to its own boot sector. The second item points to the EBR of the next logical drive. If there are no further logical drives, the second item will not be used and is recorded as a series of zeros. If there are additional logical drives, then the first item of the extended partition table of the second logical drive will point to its own boot sector. The second item of the extended partition table of the second logical drive points to the EBR of the next logical drive. The third and fourth items of the extended partition table are never used.
Through a disk structure diagram of 4 partitions, the general organization form of the disk can be seen. As shown in Figure 5:
Regarding the extended partition, as shown in Figure 6, the extended boot record of the logical drive in the extended partition is a connection table. This figure shows three logical drives on an extended partition, illustrating the differences in the extended partition table between the previous logical drive and the last logical drive.
Except for the last logical drive on the extended partition, the format of the extended partition table described in Table 2 is repeated in each logical drive: the first item identifies the boot sector of the logical drive itself, and the second item identifies the EBR of the next logical drive. The extended partition table of the last logical drive will only list its own partition item. The second to fourth items of the last extended partition table are used.
The relative sector number field in the extended partition table entry shows the number of bytes of the displacement from the start of the extended partition to the first sector in the logical drive. The number in the total sector number field refers to the number of sectors composing this logical drive. The value of the total sector number field is equal to the number of sectors from the boot sector defined by the extended partition table entry to the end of the logical drive.
Sometimes there will be remaining space at the end of the disk. What is the remaining space? We mentioned earlier that the partition takes the capacity of 1 cylinder as the partition granularity. So if the total space of the disk is not an integer number of cylinders, the remaining space that is less than one cylinder is the remaining space. This part of the space is not involved in partitioning, so it is generally not available. According to reason, the physical mode of the disk determines that the total capacity of the disk should be exactly an integer number of cylinder capacities. Why is there space less than one cylinder? In my understanding, originally, in order to make better use of space, the current disks are generally not managed in the way that the sectors on the outer circle are larger than those on the inner circle at present. It is just CHS abstracted to be compatible with the operating system. Maybe its actual space capacity is not exactly an integer number of cylinder capacities. Regarding this point, if you have any opinions, please inform http://www.sjhf.net or zymail@vip.sina.com
|

非商业站点!数据恢复网是一个探讨磁盘存储和数据软恢复技术的站点.爱好的可以过来交流,我们也可以免费帮朋友们找回数据.
 |
|
2004-4-21 00:00 |
|
|
sjhf
初级用户
 
积分 161
发帖 7
注册 2004-4-21
状态 离线
|
『第 2 楼』:
使用 LLM 解释/回答一下
四、FAT分区原理。
先来一幅结构图:
现在我们着重研究FAT格式分区内数据是如何存储的。FAT分区格式是MICROSOFT最早支持的分区格式,依据FAT表中每个簇链的所占位数(有关概念,后面会讲到)分为fat12、fat16、fat32三种格式"变种",但其基本存储方式是相似的。
仔细研究图7中的fat16和fat32分区的组成结构。下面依次解释DBR、FAT1、FAT2、根目录、数据区、剩余扇区的概念。提到的地址如无特别提示均为分区内部偏移。
4.1 关于DBR.
DBR区(DOS BOOT RECORD)即操作系统引导记录区的意思,通常占用分区的第0扇区共512个字节(特殊情况也要占用其它保留扇区,我们先说第0扇)。在这512个字节中,其实又是由跳转指令,厂商标志和操作系统版本号,BPB(BIOS Parameter Block),扩展BPB,os引导程序,结束标志几部分组成。 以用的最多的FAT32为例说明分区DBR各字节的含义。见图8。
图8的对应解释见表3
图9给出了winhex对图8 DBR的相关参数解释:
根据上边图例,我们来讨论DBR各字节的参数意义。
MBR将CPU执行转移给引导扇区,因此,引导扇区的前三个字节必须是合法的可执行的基于x86的CPU指令。这通常是一条跳转指令,该指令负责跳过接下来的几个不可执行的字节(BPB和扩展BPB),跳到操作系统引导代码部分。
跳转指令之后是8字节长的OEM ID,它是一个字符串, OEM ID标识了格式化该分区的操作系统的名称和版本号。为了保留与MS-DOS的兼容性,通常Windows 2000格式化该盘是在FAT16和FAT32磁盘上的该字段中记录了“MSDOS 5.0”,在NTFS磁盘上(关于ntfs,另述),Windows 2000记录的是“NTFS”。通常在被Windows 95格式化的磁盘上OEM ID字段出现“MSWIN4.0”,在被Windows 95 OSR2和Windows 98格式化的磁盘上OEM ID字段出现“MSWIN4.1”。
接下来的从偏移0x0B开始的是一段描述能够使可执行引导代码找到相关参数的信息。通常称之为BPB(BIOS Parameter Block),BPB一般开始于相同的位移量,因此,标准的参数都处于一个已知的位置。磁盘容量和几何结构变量都被封在BPB之中。由于引导扇区的第一部分是一个x86跳转指令。因此,将来通过在BPB末端附加新的信息,可以对BPB进行扩展。只需要对该跳转指令作一个小的调整就可以适应BPB的变化。图9已经列出了项目的名称和取值,为了系统的研究,针对图8,将FAT32分区格式的BPB含义和扩展BPB含义释义为表格,见表4和表5。
DBR的偏移0x5A开始的数据为操作系统引导代码。这是由偏移0x00开始的跳转指令所指向的。在图8所列出的偏移0x00~0x02的跳转指令"EB 58 90"清楚地指明了OS引导代码的偏移位置。jump 58H加上跳转指令所需的位移量,即开始于0x5A。此段指令在不同的操作系统上和不同的引导方式上,其内容也是不同的。大多数的资料上都说win98,构建于fat基本分区上的win2000,winxp所使用的DBR只占用基本分区的第0扇区。他们提到,对于fat32,一般的32个基本分区保留扇区只有第0扇区是有用的。实际上,以FAT32构建的操作系统如果是win98,系统会使用基本分区的第0扇区和第2扇区存储os引导代码;以FAT32构建的操作系统如果是win2000或winxp,系统会使用基本分区的第0扇区和第0xC扇区(win2000或winxp,其第0xC的位置由第0扇区的0xAB偏移指出)存储os引导代码。所以,在fat32分区格式上,如果DBR一扇区的内容正确而缺少第2扇区(win98系统)或第0xC扇区(win2000或winxp系统),系统也是无法启动的。如果自己手动设置NTLDR双系统,必须知道这一点。
DBR扇区的最后两个字节一般存储值为0x55AA的DBR有效标志,对于其他的取值,系统将不会执行DBR相关指令。上面提到的其他几个参与os引导的扇区也需以0x55AA为合法结束标志。
FAT16 DBR:
FAT32中DBR的含义大致如此,对于FAT12和FAT16其基本意义类似,只是相关偏移量和参数意义有小的差异,FAT格式的区别和来因,以后会说到,此处不在多说FAT12与FAT16。我将FAT16的扇区参数意义列表。感兴趣的朋友自己研究一下,和FAT32大同小异的。
4.2 关于保留扇区
在上述FAT文件系统DBR的偏移0x0E处,用2个字节存储保留扇区的数目。所谓保留扇区(有时候会叫系统扇区,隐藏扇区),是指从分区DBR扇区开始的仅为系统所有的扇区,包括DBR扇区。在FAT16文件系统中,保留扇区的数据通常设置为1,即仅仅DBR扇区。而在FAT32中,保留扇区的数据通常取为32,有时候用Partition Magic分过的FAT32分区会设置36个保留扇区,有的工具可能会设置63个保留扇区。
FAT32中的保留扇区除了磁盘总第0扇区用作DBR,总第2扇区(win98系统)或总第0xC扇区(win2000,winxp)用作OS引导代码扩展部分外,其余扇区都不参与操作系统管理与磁盘数据管理,通常情况下是没作用的。操作系统之所以在FAT32中设置保留扇区,是为了对DBR作备份或留待以后升级时用。FAT32中,DBR偏移0x34占2字节的数据指明了DBR备份扇区所在,一般为0x06,即第6扇区。当FAT32分区DBR扇区被破坏导致分区无法访问时。可以用第6扇区的原备份替换第0扇区来找回数据。
4.3 FAT表和数据的存储原则。
FAT表(File Allocation Table 文件分配表),是Microsoft在FAT文件系统中用于磁盘数据(文件)索引和定位引进的一种链式结构。假如把磁盘比作一本书,FAT表可以认为相当于书中的目录,而文件就是各个章节的内容。但FAT表的表示方法却与目录有很大的不同。
在FAT文件系统中,文件的存储依照FAT表制定的簇链式数据结构来进行。同时,FAT文件系统将组织数据时使用的目录也抽象为文件,以简化对数据的管理。
★存储过程假想:
我们模拟对一个分区存储数据的过程来说明FAT文件系统中数据的存储原则。
假定现在有一个空的完全没有存放数据的磁盘,大小为100KB,我们将其想象为线形的空间地址。为了存储管理上的便利,我们人为的将这100KB的空间均分成100份,每份1KB。我们来依次存储这样几个文件:A.TXT(大小10KB),B.TXT(大小53.6KB),C.TXT(大小20.5KB)。
最起码能够想到,我们可以顺序的在这100KB空间中存放这3个文件。同时不要忘了,我们还要记下他们的大小和开始的位置,这样下次要用时才能找的到,这就像是目录。为了便于查找,我们假定用第1K的空间来存储他们的特征(属性)。还有,我们设计的存储单位是1KB,所以,A.TXT我们需要10个存储单位(为了说明方便,我们把存储单位叫做“簇”吧。也能少打点字,呵呵。),B.TXT需要54个簇,C.TXT需要21个簇。可能有人会说B.TXT和C.TXT不是各自浪费了不到1簇的空间吗?干嘛不让他们紧挨着,不是省地方吗?我的回答是,如果按照这样的方式存储,目录中原本只需要记下簇号,现在还需要记下簇内的偏移,这样会增加目录的存储量,而且存取没有了规则,读取也不太方便,是得不偿失的。
根据上面所说的思想,我们设计了这样的图4.3.1所示的存储方式。
我们再考虑如何来写这三个文件的目录。对于每个文件而言,一定要记录的有:文件名,开始簇,大小,创建日期、时间,修改日期、时间,文件的读写属性等。这里大小能不能用结束簇来计算呢?一定不能,因为文件的大小不一定就是整数个簇的大小,否则的话像B.TXT的内容就是54KB的内容了,少了固然不行,可多了也是不行的。那么我们怎么记录呢?可以想象一下。为了管理上的方便,我们用数据库的管理方式来管理我们的目录。于是我把1KB再分成10份,假定开始簇号为0,定义每份100B的各个位置的代表含义如图4.3.2
这样设计的结构绝对可以对文件进行正确的读写了。接着让我们设计的文件系统工作吧。先改动个文件,比如A.TXT,增加点内容吧!咦?增加后往哪里放呀,虽然存储块的后面有很多空间,但紧随其后B.TXT的数据还顶着呢?要是把A.TXT移到后边太浪费处理资源,而且也不一定解决问题。这个问题看来暂时解决不了。
那我们换个操作,把B.txt删了,b.txt的空间随之释放。这时候空间如图4.3.3,目录如图4.3.4
这个操作看来还可以,我们接着做,在存入一个文件D.txt(大小为60.3KB),总共100簇的空间只用了31簇,还有68簇剩余,按说能放下。可是?往那里放呢?没有61个连续的空间了,目录行没办法写了,看来无连续块存储暂时也不行。
你一定能够想到我们可以在连续空间不够或增加文件长度的时候转移影响我们操作的其他文件,从而腾出空间来,但我要问你,那不是成天啥也不要干了,就是倒腾东西了吗?
看来我们设计的文件系统有致命的漏洞,怎么解决呢?。。。。
。。。。。。
其实可以这样解决:
首先我们允许文件的不连续存储。目录中依然只记录开始簇和文件的大小。那么我们怎么记录文件占用那些簇呢,以文件映射簇不太方便,因为文件名是不固定的。我们换个思想,可以用簇来映射文件,在整个存储空间的前部留下几簇来记录数据区中数据与簇号的关系。对于上例因为总空间也不大,所以用前部的1Kb的空间来记录这种对应,假设3个文件都存储,空间分配如图4.3.5,同时修改一下目录,如图4.3.6
第一簇用来记录数据区中每一簇的被占用情况,暂时称其为文件分配表。结合文件分配表和文件目录就可以达到完全的文件读取了。我们想到,把文件分配表做成一个数据表,以图4.3.7的形式记录簇与数据的对应。
用图4.3.7的组织方式是完全可以实现对文件占有簇的记录的。但还不够效率。比如文件名在文件分配表中记录太多,浪费空间,而实际上在目录中已经记录了文件的开始簇了。所以可以改良一下,用链的方式来存放占有簇的关系,变成图4.3.8的组织方式。
参照图4.3.8来理解一下文件分配表的意义。如文件a.txt我们根据目录项中指定的a.txt的首簇为2,然后找到文件分配表的第2簇记录,上面登记的是3,我们就能确定下一簇是3。找到文件分配表的第3簇记录,上面登记的是4,我们就能确定下一簇是4......直到指到第11簇,发现下一个指向是FF,就是结束。文件便丝毫无误读取完毕。
我们再看上面提到的第三种情况,就是将b.txt删除以后,存入一个大小为60.3KB的d.txt。利用簇链可以很容易的实现。实现后的磁盘如图4.3.9 4.3.10 4.3.11
上面是我们对文件存储的一种假设,也该揭开谜底的时候了。上面的思想其实就是fat文件系统的思想的精髓(但并不是,尤其像具体的参数的意义与我们所举的例子是完全不同的。请忘掉上边细节,努力记忆下边)。
IV. Principle of FAT Partitions.
First, let's have a structure diagram:
Now we focus on studying how data is stored in FAT - formatted partitions. The FAT partition format is the partition format first supported by MICROSOFT. It is divided into three format "variants" of fat12, fat16, and fat32 according to the number of bits occupied by each cluster chain in the FAT table (related concepts will be discussed later), but their basic storage methods are similar.
Carefully study the composition structures of fat16 and fat32 partitions in Figure 7. The concepts of DBR, FAT1, FAT2, root directory, data area, and remaining sectors are explained in turn below. The addresses mentioned are all internal offsets of the partition unless otherwise specified.
4.1 About DBR.
The DBR area (DOS BOOT RECORD) means the operating system boot record area. It usually occupies the 0th sector of the partition, a total of 512 bytes (in special cases, it also occupies other reserved sectors. Let's first talk about the 0th sector). In these 512 bytes, it is actually composed of a jump instruction, a vendor flag and an operating system version number, BPB (BIOS Parameter Block), extended BPB, an operating system boot program, and an end flag. Take FAT32, which is used the most, to illustrate the meaning of each byte in the partition DBR. See Figure 8.
The corresponding explanation of Figure 8 is shown in Table 3
Figure 9 gives the relevant parameter explanation of the DBR in Figure 8 by WinHex:
According to the above illustrations, we will discuss the meaning of the parameters of each byte in the DBR.
The MBR transfers the CPU execution to the boot sector. Therefore, the first three bytes of the boot sector must be legal executable x86 CPU instructions. This is usually a jump instruction, which is responsible for skipping the next few non - executable bytes (BPB and extended BPB) and jumping to the operating system boot code part.
After the jump instruction is an 8 - byte long OEM ID, which is a string. The OEM ID identifies the name and version number of the operating system that formatted the partition. To maintain compatibility with MS - DOS, usually Windows 2000 records "MSDOS 5.0" in this field on FAT16 and FAT32 disks, and records "NTFS" on NTFS disks (about ntfs, which will be described separately). Usually, the OEM ID field appears as "MSWIN4.0" on disks formatted by Windows 95, and "MSWIN4.1" on disks formatted by Windows 95 OSR2 and Windows 98.
Next, starting from offset 0x0B is a section of information that enables the executable boot code to find relevant parameters. It is usually called BPB (BIOS Parameter Block). The BPB generally starts at the same displacement, so the standard parameters are all in a known position. Disk capacity and geometric structure variables are enclosed in the BPB. Since the first part of the boot sector is an x86 jump instruction. Therefore, the BPB can be extended by appending new information at the end of the BPB in the future. Only a small adjustment needs to be made to the jump instruction to adapt to the change of the BPB. Figure 9 has listed the names and values of the items. For systematic research, for Figure 8, the meanings of the BPB and extended BPB of the FAT32 partition format are interpreted as tables, see Table 4 and Table 5.
The data starting from offset 0x5A of the DBR is the operating system boot code. This is pointed to by the jump instruction starting from offset 0x00. The jump instruction "EB 58 90" listed in Figure 8 clearly indicates the offset position of the OS boot code. Adding the displacement required by the jump instruction to jump 58H means starting at 0x5A. The content of this section of instructions is different on different operating systems and different boot methods. Most materials say that the DBR of win98, win2000 built on the fat basic partition, and winxp only occupies the 0th sector of the basic partition. They mention that for fat32, generally only the 0th sector of the 32 basic partition reserved sectors is useful. In fact, if the operating system built on FAT32 is win98, the system will use the 0th sector and the 2nd sector of the basic partition to store the os boot code; if the operating system built on FAT32 is win2000 or winxp, the system will use the 0th sector and the 0xC sector of the basic partition (for win2000 or winxp, the position of 0xC is indicated by the offset 0xAB of the 0th sector) to store the os boot code. Therefore, on the fat32 partition format, if the content of the DBR sector is correct but the 2nd sector (win98 system) or the 0xC sector (win2000 or winxp system) is missing, the system will not be able to boot. If you manually set up NTLDR dual - system, you must know this.
The last two bytes of the DBR sector generally store the valid flag of the DBR with the value 0x55AA. For other values, the system will not execute the relevant instructions of the DBR. The other several sectors involved in os boot mentioned above also need to use 0x55AA as a legal end flag.
FAT16 DBR:
The meaning of DBR in FAT32 is roughly like this. For FAT12 and FAT16, their basic meanings are similar, but there are small differences in the relevant offsets and parameter meanings. The differences and causes of the FAT format will be mentioned later, and no more will be said about FAT12 and FAT16 here. I will list the sector parameter meanings of FAT16. Friends who are interested can study it by themselves, which is similar to FAT32.
4.2 About Reserved Sectors
At the offset 0x0E of the above - mentioned FAT file system DBR, the number of reserved sectors is stored in 2 bytes. The so - called reserved sectors (sometimes called system sectors, hidden sectors) refer to the sectors that are only owned by the system starting from the DBR sector of the partition, including the DBR sector. In the FAT16 file system, the data of the reserved sectors is usually set to 1, that is, only the DBR sector. In FAT32, the data of the reserved sectors is usually 32. Sometimes a FAT32 partition divided by Partition Magic will set 36 reserved sectors, and some tools may set 63 reserved sectors.
In FAT32, in addition to the 0th sector of the disk used as the DBR, the 2nd sector (win98 system) or the 0xC sector (win2000, winxp) used as the extended part of the OS boot code, the remaining sectors do not participate in the operating system management and disk data management, and are usually useless. The reason why the operating system sets reserved sectors in FAT32 is to back up the DBR or for future upgrades. In FAT32, the data at offset 0x34 of the DBR occupies 2 bytes and indicates where the DBR backup sector is located, which is generally 0x06, that is, the 6th sector. When the DBR sector of the FAT32 partition is damaged and the partition cannot be accessed, the data can be retrieved by replacing the 0th sector with the original backup of the 6th sector.
4.3 Storage Principles of FAT Tables and Data.
FAT table (File Allocation Table), which is a chain structure introduced by Microsoft in the FAT file system for disk data (file) indexing and positioning. If the disk is compared to a book, the FAT table can be considered as the table of contents in the book, and the file is the content of each chapter. But the representation method of the FAT table is very different from the directory.
In the FAT file system, the storage of files is carried out according to the cluster chain data structure formulated by the FAT table. At the same time, the directory used in organizing data in the FAT file system is also abstracted as a file to simplify the management of data.
★Imagination of the storage process:
We simulate the process of storing data in a partition to illustrate the storage principle in the FAT file system.
Suppose there is an empty disk with no data stored, with a size of 100KB. We imagine it as a linear space address. For the convenience of storage management, we artificially divide this 100KB space into 100 parts, each 1KB. Let's store the following several files in turn: A.TXT (size 10KB), B.TXT (size 53.6KB), C.TXT (size 20.5KB).
At least it can be thought that we can store these 3 files in sequence in this 100KB space. At the same time, don't forget that we also need to record their sizes and starting positions, so that we can find them next time, which is like a directory. For the convenience of searching, we assume that the first 1K of space is used to store their characteristics (attributes). Also, the storage unit we designed is 1KB, so A.TXT needs 10 storage units (for the convenience of explanation, we call the storage unit a "cluster". It can also save some typing, heh heh.). B.TXT needs 54 clusters, and C.TXT needs 21 clusters. Maybe someone will say that B.TXT and C.TXT each waste less than 1 cluster of space? Why not let them be adjacent to each other, isn't it space - saving? My answer is that if stored in this way, the directory originally only needs to record the cluster number, but now it also needs to record the offset within the cluster, which will increase the storage amount of the directory, and the access is not regular, and the reading is not very convenient, which is not worth the loss.
According to the above - mentioned idea, we designed the storage method shown in Figure 4.3.1.
Let's consider how to write the directory of these three files. For each file, what must be recorded are: file name, starting cluster, size, creation date/time, modification date/time, file read/write attributes, etc. Can the size be calculated by the end cluster? It must not be, because the size of the file is not necessarily an integer multiple of the cluster size. Otherwise, the content of B.TXT would be 54KB, which is not acceptable if it is less, but also not acceptable if it is more. So how do we record it? You can imagine it. For the convenience of management, we manage our directory in the way of database management. So I divide 1KB into 10 parts again, and assume that the starting cluster number is 0, and define the representative meaning of each position in each 100B as shown in Figure 4.3.2
The structure designed in this way can definitely correctly read and write files. Then let's make the file system we designed work. Let's change a file, for example, A.TXT, and add some content! Hey? Where to put it after adding? Although there is a lot of space behind the storage block, the data of B.TXT is still there! If we move A.TXT to the back, it is a waste of processing resources, and it may not solve the problem. This problem seems to be temporarily unsolvable.
Then let's change the operation, delete B.txt, and the space of b.txt is released. At this time, the space is shown in Figure 4.3.3, and the directory is shown in Figure 4.3.4
This operation seems okay. Let's continue to do it. Store a file D.txt (size 60.3KB). A total of 100 clusters of space only use 31 clusters, and there are 68 clusters remaining. According to reason, it can be put down. But? Where to put it? There are no 61 consecutive spaces, and the directory line can't be written. It seems that it is also not possible to store without continuous blocks.
You must be able to think that we can transfer other files that affect our operations when the continuous space is not enough or when the file length is increased, so as to free up space, but I want to ask you, isn't that all day long doing nothing but moving things?
It seems that the file system we designed has a fatal flaw. How to solve it?。。。。
。。。。。。
In fact, it can be solved in this way:
First, we allow files to be stored discontinuously. The directory still only records the starting cluster and the size of the file. Then how do we record which clusters the file occupies? It is not convenient to map the file with clusters because the file name is not fixed. We change our thinking. We can map files with clusters, and leave a few clusters at the front of the entire storage space to record the relationship between data in the data area and cluster numbers. For the above example, because the total space is not very large, so use the first 1K of space to record this correspondence. Assume that 3 files are all stored, and the space allocation is shown in Figure 4.3.5, and the directory is modified, as shown in Figure 4.3.6
The first cluster is used to record the occupied situation of each cluster in the data area, temporarily called the file allocation table. Combined with the file allocation table and the file directory, the complete file reading can be achieved. We thought that making the file allocation table into a data table, and recording the correspondence between clusters and data in the form shown in Figure 4.3.7.
The organization method in Figure 4.3.7 can completely realize the recording of the clusters occupied by the file. But it is not efficient enough. For example, the file name is recorded too much in the file allocation table, which wastes space. In fact, the start cluster of the file has been recorded in the directory. So it can be improved. The chain method is used to store the relationship of occupied clusters, and it becomes the organization method shown in Figure 4.3.8.
Refer to Figure 4.3.8 to understand the meaning of the file allocation table. For example, for file a.txt, according to the specified first cluster of a.txt in the directory entry is 2, then find the record of the 2nd cluster in the file allocation table, and the next cluster registered above is 3, so we can determine that the next cluster is 3. Find the record of the 3rd cluster in the file allocation table, and the next cluster registered above is 4, so we can determine that the next cluster is 4...... until we point to the 11th cluster and find that the next pointer is FF, which is the end. The file is read without error.
Let's look at the third situation mentioned above, that is, after deleting b.txt, store a d.txt with a size of 60.3KB. It can be easily realized by using the cluster chain. The realized disk is shown in Figure 4.3.9, 4.3.10, 4.3.11
The above is a hypothesis about file storage, and it is time to reveal the mystery. The above idea is actually the essence of the thought of the fat file system (but it is not, especially the meaning of specific parameters is completely different from the example we gave. Please forget the above details and try to remember the following).
|

非商业站点!数据恢复网是一个探讨磁盘存储和数据软恢复技术的站点.爱好的可以过来交流,我们也可以免费帮朋友们找回数据.
 |
|
2004-4-21 00:00 |
|
|
sjhf
初级用户
 
积分 161
发帖 7
注册 2004-4-21
状态 离线
|
『第 3 楼』:
使用 LLM 解释/回答一下
★FAT16存储原理:
当把一部分磁盘空间格式化为fat文件系统时,fat文件系统就将这个分区当成整块可分配的区域进行规划,以便于数据的存储。一般来讲,其划分形式如图7所示。我们把FAT16部分提取出来,详细描述一下:
FAT16是Microsoft较早推出的文件系统,具有高度兼容性,目前仍然广泛应用于个人电脑尤其是移动存储设备中,FAT16简单来讲由图4.3.12所示的6部分组成(主要是前5部分)。引导扇区(DBR)我们已经说过,FAT16在DBR之后没有留有任何保留扇区,其后紧随的便是FAT表。FAT表是FAT16用来记录磁盘数据区簇链结构的。像前面我们说过的例子一样,FAT将磁盘空间按一定数目的扇区为单位进行划分,这样的单位称为簇。通常情况下,每扇区512字节的原则是不变的。簇的大小一般是2n (n为整数)个扇区的大小,像512B,1K,2K,4K,8K,16K,32K,64K。实际中通常不超过32K。 之所以簇为单位而不以扇区为单位进行磁盘的分配,是因为当分区容量较大时,采用大小为512b的扇区管理会增加fat表的项数,对大文件存取增加消耗,文件系统效率不高。分区的大小和簇的取值是有关系的,见表9
注意:少于32680个扇区的分区中,簇空间大小可最多达到每个簇8个扇区。不管用户是使用磁盘管理器来格式化分区,还是使用命令提示行键入format命令格式化,格式化程序都创建一个12位的FAT。少于16MB的分区,系统通常会将其格式化成12位的FAT,FAT12是FAT的初始实现形式,是针对小型介质的。FAT12文件分配表要比FAT16和FAT32的文件分配表小,因为它对每个条目使用的空间较少。这就给数据留下较多的空间。所有用FAT12格式化的5.25英寸软盘以及1.44MB的3.5英寸软盘都是由FAT12格式化的。除了FAT表中记录每簇链结的二进制位数与FAT16不同外,其余原理与FAT16均相同,不再单独解释。。。
格式化FAT16分区时,格式化程序根据分区的大小确定簇的大小,然后根据保留扇区的数目、根目录的扇区数目、数据区可分的簇数与FAT表本身所占空间 来确定FAT表所需的扇区数目,然后将计算后的结果写入DBR的相关位置。
FAT16 DBR参数的偏移0x11处记录了根目录所占扇区的数目。偏移0x16记录了FAT表所占扇区的数据。偏移0x10记录了FAT表的副本数目。系统在得到这几项参数以后,就可以确定数据区的开始扇区偏移了。
FAT16文件系统从根目录所占的32个扇区之后的第一个扇区开始以簇为单位进行数据的处理,这之前仍以扇区为单位。对于根目录之后的第一个簇,系统并不编号为第0簇或第1簇 (可能是留作关键字的原因吧),而是编号为第2簇,也就是说数据区顺序上的第1个簇也是编号上的第2簇。
FAT文件系统之所以有12,16,32不同的版本之分,其根本在于FAT表用来记录任意一簇链接的二进制位数。以FAT16为例,每一簇在FAT表中占据2字节(二进制16位)。所以,FAT16最大可以表示的簇号为0xFFFF(十进制的65535),以32K为簇的大小的话,FAT32可以管理的最大磁盘空间为:32KB×65535=2048MB,这就是为什么FAT16不支持超过2GB分区的原因。
FAT表实际上是一个数据表,以2个字节为单位,我们暂将这个单位称为FAT记录项,通常情况其第1、2个记录项(前4个字节)用作介质描述。从第三个记录项开始记录除根目录外的其他文件及文件夹的簇链情况。根据簇的表现情况FAT用相应的取值来描述,见表10
看一幅在winhex所截FAT16的文件分配表,图10:
如图,FAT表以"F8 FF FF FF" 开头,此2字节为介质描述单元,并不参与FAT表簇链关系。小红字标出的是FAT扇区每2字节对应的簇号。
相对偏移0x4~0x5偏移为第2簇(顺序上第1簇),此处为FF,表示存储在第2簇上的文件(目录)是个小文件,只占用1个簇便结束了。
第3簇中存放的数据是0x0005,这是一个文件或文件夹的首簇。其内容为第5簇,就是说接下来的簇位于第5簇——〉 FAT表指引我们到达FAT表的第5簇指向,上面写的数据是"FF FF",意即此文件已至尾簇。
第4簇中存放的数据是0x0006,这又是一个文件或文件夹的首簇。其内容为第6簇,就是说接下来的簇位于第6簇——〉FAT表指引我们到达FAT表的第6簇指向,上面写的数据是0x0007,就是说接下来的簇位于第7簇——〉FAT表指引我们到达FAT表的第7簇指向……直到根据FAT链读取到扇区相对偏移0x1A~0x1B,也就是第13簇,上面写的数据是0x000E,也就是指向第14簇——〉14簇的内容为"FF FF",意即此文件已至尾簇。
后面的FAT表数据与上面的道理相同。不再分析。
FAT表记录了磁盘数据文件的存储链表,对于数据的读取而言是极其重要的,以至于Microsoft为其开发的FAT文件系统中的FAT表创建了一份备份,就是我们看到的FAT2。FAT2与FAT1的内容通常是即时同步的,也就是说如果通过正常的系统读写对FAT1做了更改,那么FAT2也同样被更新。如果从这个角度来看,系统的这个功能在数据恢复时是个天灾。
FAT文件系统的目录结构其实是一颗有向的从根到叶的树,这里提到的有向是指对于FAT分区内的任一文件(包括文件夹),均需从根目录寻址来找到。可以这样认为:目录存储结构的入口就是根目录。
FAT文件系统根据根目录来寻址其他文件(包括文件夹),故而根目录的位置必须在磁盘存取数据之前得以确定。FAT文件系统就是根据分区的相关DBR参数与DBR中存放的已经计算好的FAT表(2份)的大小来确定的。格式化以后,跟目录的大小和位置其实都已经确定下来了:位置紧随FAT2之后,大小通常为32个扇区。根目录之后便是数据区第2簇。
FAT文件系统的一个重要思想是把目录(文件夹)当作一个特殊的文件来处理,FAT32甚至将根目录当作文件处理(旁:NTFS将分区参数、安全权限等好多东西抽象为文件更是这个思想的升华),在FAT16中,虽然根目录地位并不等同于普通的文件或者说是目录,但其组织形式和普通的目录(文件夹)并没有不同。FAT分区中所有的文件夹(目录)文件,实际上可以看作是一个存放其他文件(文件夹)入口参数的数据表。所以目录的占用空间的大小并不等同于其下所有数据的大小,但也不等同于0。通常是占很小的空间的,可以看作目录文件是一个简单的二维表文件。其具体存储原理是:
不管目录文件所占空间为多少簇,一簇为多少字节。系统都会以32个字节为单位进行目录文件所占簇的分配。这32个字节以确定的偏移来定义本目录下的一个文件(或文件夹)的属性,实际上是一个简单的二维表。
这32个字节的各字节偏移定义如表11:
对表11中的一些取值进行说明:
(1)、对于短文件名,系统将文件名分成两部分进行存储,即主文件名+扩展名。0x0~0x7字节记录文件的主文件名,0x8~0xA记录文件的扩展名,取文件名中的ASCII码值。不记录主文件名与扩展名之间的"." 主文件名不足8个字符以空白符(20H)填充,扩展名不足3个字符同样以空白符(20H)填充。0x0偏移处的取值若为00H,表明目录项为空;若为E5H,表明目录项曾被使用,但对应的文件或文件夹已被删除。(这也是误删除后恢复的理论依据)。文件名中的第一个字符若为“.”或“..”表示这个簇记录的是一个子目录的目录项。“.”代表当前目录;“..”代表上级目录(和我们在dos或windows中的使用意思是一样的,如果磁盘数据被破坏,就可以通过这两个目录项的具体参数推算磁盘的数据区的起始位置,猜测簇的大小等等,故而是比较重要的)
(2)、0xB的属性字段:可以看作系统将0xB的一个字节分成8位,用其中的一位代表某种属性的有或无。这样,一个字节中的8位每位取不同的值就能反映各个属性的不同取值了。如00000101就表示这是个文件,属性是只读、系统。
(3)、0xC~0x15在原FAT16的定义中是保留未用的。在高版本的WINDOWS系统中有时也用它来记录修改时间和最近访问时间。那样其字段的意义和FAT32的定义是相同的,见后边FAT32。
(4)、0x16~0x17中的时间=小时*2048+分钟*32+秒/2。得出的结果换算成16进制填入即可。也就是:0x16字节的0~4位是以2秒为单位的量值;0x16字节的5~7位和0x17字节的0~2位是分钟;0x17字节的3~7位是小时。
(5)、0x18~0x19中的日期=(年份-1980)*512+月份*32+日。得出的结果换算成16进制填入即可。也就是:0x18字节0~4位是日期数;0x18字节5~7位和0x19字节0位是月份;0x19字节的1~7位为年号,原定义中0~119分别代表1980~2099,目前高版本的Windows允许取0~127,即年号最大可以到2107年。
(6)、0x1A~0x1B存放文件或目录的表示文件的首簇号,系统根据掌握的首簇号在FAT表中找到入口,然后再跟踪簇链直至簇尾,同时用0x1C~0x1F处字节判定有效性。就可以完全无误的读取文件(目录)了。
(7)、普通子目录的寻址过程也是通过其父目录中的目录项来指定的,与数据文件(指非目录文件)不同的是目录项偏移0xB的第4位置1,而数据文件为0。
对于整个FAT分区而言,簇的分配并不完全总是分配干净的。如一个数据区为99个扇区的FAT系统,如果簇的大小设定为2扇区,就会有1个扇区无法分配给任何一个簇。这就是分区的剩余扇区,位于分区的末尾。有的系统用最后一个剩余扇区备份本分区的DBR,这也是一种好的备份方法。
早的FAT16系统并没有长文件名一说,Windows操作系统已经完全支持在FAT16上的长文件名了。FAT16的长文件名与FAT32长文件名的定义是相同的,关于长文件名,在FAT32部分再详细作解释。
★FAT32存储原理:
FAT32是个非常有功劳的文件系统,Microsoft成功地设计并运用了它,直到今天NTFS铺天盖地袭来的时候,FAT32依然占据着Microsoft Windows文件系统中重要的地位。FAT32最早是出于FAT16不支持大分区、单位簇容量大以致空间急剧浪费等缺点设计的。实际应用中,FAT32还是成功的。
FAT32与FAT16的原理基本上是相同的,图4.3.13标出了FAT32分区的基本构成。
FAT32在格式化的过程中就根据分区的特点构建好了它的DBR,其中BPB参数是很重要的,可以回过头来看一下表4和表5。首先FAT32保留扇区的数目默认为32个,而不是FAT16的仅仅一个。这样的好处是有助于磁盘DBR指令的长度扩展,而且可以为DBR扇区留有备份空间。上面我们已经提到,构建在FAT32上的win98或win2000、winXP,其操作系统引导代码并非只占一个扇区了。留有多余的保留扇区就可以很好的拓展OS引导代码。在BPB中也记录了DBR扇区的备份扇区编号。备份扇区可以让我们在磁盘遭到意外破坏时恢复DBR。
FAT32的文件分配表的数据结构依然和FAT16相同,所不同的是,FAT32将记录簇链的二进制位数扩展到了32位,故而这种文件系统称为FAT32。32位二进制位的簇链决定了FAT表最大可以寻址2T个簇。这样即使簇的大小为1扇区,理论上仍然能够寻址1TB范围内的分区。但实际中FAT32是不能寻址这样大的空间的,随着分区空间大小的增加,FAT表的记录数会变得臃肿不堪,严重影响系统的性能。所以在实际中通常不格式化超过32GB的FAT32分区。WIN2000及之上的OS已经不直接支持对超过32GB的分区格式化成FAT32,但WIN98依然可以格式化大到127GB的FAT32分区,但这样没必要也不推荐。同时FAT32也有小的限制,FAT32卷必须至少有65527个簇,所以对于小的分区,仍然需要使用FAT16或FAT12。
分区变大时,如果簇很小,文件分配表也随之变大。仍然会有上面的效率问题存在。既要有效地读写大文件,又要最大可能的减少空间的浪费。FAT32同样规定了相应的分区空间对应的簇的大小,见表12:
FAT32簇的取值意义和FAT16类似,不过是位数长了点罢了,比较见表13:
FAT32的另一项重大改革是根目录的文件化,即将根目录等同于普通的文件。这样根目录便没有了FAT16中512个目录项的限制,不够用的时候增加簇链,分配空簇即可。而且,根目录的位置也不再硬性地固定了,可以存储在分区内可寻址的任意簇内,不过通常根目录是最早建立的(格式化就生成了)目录表。所以,我们看到的情况基本上都是根目录首簇占簇区顺序上的第1个簇。在图4.3.12中也是按这种情况制作的画的。
FAT32对簇的编号依然同FAT16。顺序上第1个簇仍然编号为第2簇,通常为根目录所用(这和FAT16是不同的,FAT16的根目录并不占簇区空间,32个扇区的根目录以后才是簇区第1个簇)
FAT32的文件寻址方法与FAT16相同,但目录项的各字节参数意义却与FAT16有所不同,一方面它启用了FAT16中的目录项保留字段,同时又完全支持长文件名了。
对于短文件格式的目录项。其参数意义见表14:
说明:
(1)、这是FAT32短文件格式目录项的意义。其中文件名、扩展名、时间、日期的算法和FAT16时相同的。
(2)、由于FAT32可寻址的簇号到了32位二进制数。所以系统在记录文件(文件夹)开始簇地址的时候也需要32位来记录,FAT32启用目录项偏移0x12~0x13来表示起始簇号的高16位。
(3)、文件长度依然用4个字节表示,这说明FAT32依然只支持小于4GB的文件(目录),超过4GB的文件(目录),系统会截断处理。
FAT32的一个重要的特点是完全支持长文件名。长文件名依然是记录在目录项中的。为了低版本的OS或程序能正确读取长文件名文件,系统自动为所有长文件名文件创建了一个对应的短文件名,使对应数据既可以用长文件名寻址,也可以用短文件名寻址。不支持长文件名的OS或程序会忽略它认为不合法的长文件名字段,而支持长文件名的OS或程序则会以长文件名为显式项来记录和编辑,并隐藏起短文件名。
当创建一个长文件名文件时,系统会自动加上对应的短文件名,其一般有的原则:
(1)、取长文件名的前6个字符加上"~1"形成短文件名,扩展名不变。
(2)、如果已存在这个文件名,则符号"~"后的数字递增,直到5。
(3)、如果文件名中"~"后面的数字达到5,则短文件名只使用长文件名的前两个字母。通过数学操纵长文件名的剩余字母生成短文件名的后四个字母,然后加后缀"~1"直到最后(如果有必要,或是其他数字以避免重复的文件名)。
(4)、如果存在老OS或程序无法读取的字符,换以"_"
长文件名的实现有赖于目录项偏移为0xB的属性字节,当此字节的属性为:只读、隐藏、系统、卷标,即其值为0FH时,DOS和WIN32会认为其不合法而忽略其存在。这正是长文件名存在的依据。将目录项的0xB置为0F,其他就任由系统定义了,Windows9x或Windows 2000、XP通常支持不超过255个字符的长文件名。系统将长文件名以13个字符为单位进行切割,每一组占据一个目录项。所以可能一个文件需要多个目录项,这时长文件名的各个目录项按倒序排列在目录表中,以防与其他文件名混淆。
长文件名中的字符采用unicode形式编码(一个巨大的进步哦),每个字符占据2字节的空间。其目录项定义如表15。
系统在存储长文件名时,总是先按倒序填充长文件名目录项,然后紧跟其对应的短文件名。从表15可以看出,长文件名中并不存储对应文件的文件开始簇、文件大小、各种时间和日期属性。文件的这些属性还是存放在短文件名目录项中,一个长文件名总是和其相应的短文件名一一对应,短文件名没有了长文件名还可以读,但长文件名如果没有对应的短文件名,不管什么系统都将忽略其存在。所以短文件名是至关重要的。在不支持长文件名的环境中对短文件名中的文件名和扩展名字段作更改(包括删除,因为删除是对首字符改写E5H),都会使长文件名形同虚设。长文件名和短文件名之间的联系光靠他们之间的位置关系维系显然远远不够。其实,长文件名的0xD字节的校验和起很重要的作用,此校验和是用短文件名的11个字符通过一种运算方式来得到的。系统根据相应的算法来确定相应的长文件名和短文件名是否匹配。这个算法不太容易用公式说明,我们用一段c程序来加以说明。
假设文件名11个字符组成字符串shortname,校验和用chknum表示。得到过程如下:
int i,j,chknum=0;
for (i=11; i>0; i--)
chksum = ((chksum & 1) ? 0x80 : 0) + (chksum >> 1) + shortname;
如果通过短文件名计算出来的校验和与长文件名中的0xD偏移处数据不相等。系统无论如何都不会将它们配对的。
依据长文件名和短文件名对目录项的定义,加上对簇的编号和链接,FAT32上数据的读取便游刃有余了。
五、结束。
本文出自数据恢复网(www.sjhf.net),疏漏在所难免,希望指正。若需转载请保留此信息;若需修改,请用以下方式与作者取得联系
1、http://www.sjhf.net
2、zymail@vip.sina.com
★FAT16 Storage Principle:
When a part of the disk space is formatted into the FAT file system, the FAT file system plans this partition as a whole allocable area for data storage. Generally speaking, its division form is shown in Figure 7. We extract the FAT16 part and describe it in detail:
FAT16 is an earlier file system launched by Microsoft, with high compatibility. It is still widely used in personal computers, especially in mobile storage devices. Simply speaking, FAT16 is composed of 6 parts as shown in Figure 4.3.12 (mainly the first 5 parts). We have already talked about the boot sector (DBR). There are no reserved sectors left after DBR in FAT16, and the FAT table follows immediately. The FAT table is used by FAT16 to record the cluster chain structure of the disk data area. Just like the example we said before, the FAT divides the disk space into units of a certain number of sectors, and such a unit is called a cluster. Usually, the principle of 512 bytes per sector remains unchanged. The size of the cluster is generally 2n (n is an integer) sectors, such as 512B, 1K, 2K, 4K, 8K, 16K, 32K, 64K. In practice, it usually does not exceed 32K. The reason why the cluster is used as the unit instead of the sector for disk allocation is that when the partition capacity is large, using the sector management with a size of 512B will increase the number of items in the FAT table, increase the consumption for large file access, and the efficiency of the file system is not high. The size of the partition is related to the value of the cluster, see Table 9
Note: In partitions with less than 32680 sectors, the cluster space size can be up to 8 sectors per cluster. Whether the user uses the disk manager to format the partition or types the format command in the command prompt to format, the formatting program creates a 12-bit FAT. For partitions with less than 16MB, the system usually formats it into a 12-bit FAT. FAT12 is the initial implementation form of FAT, which is for small media. The FAT12 file allocation table is smaller than the FAT16 and FAT32 file allocation tables because it uses less space for each entry. This leaves more space for data. All 5.25-inch floppy disks formatted with FAT12 and 3.5-inch floppy disks with 1.44MB are formatted with FAT12. Except that the number of binary bits recording the cluster link in the FAT table is different from FAT16, the rest of the principles are the same as FAT16, and no separate explanation is given...
When formatting a FAT16 partition, the formatting program determines the size of the cluster according to the size of the partition, then determines the number of sectors required for the FAT table according to the number of reserved sectors, the number of sectors of the root directory, the number of clusters that can be divided in the data area, and the space occupied by the FAT table itself, and then writes the calculated result to the relevant position of the DBR.
The offset 0x11 of the FAT16 DBR parameter records the number of sectors occupied by the root directory. Offset 0x16 records the data of the number of sectors occupied by the FAT table. Offset 0x10 records the number of copies of the FAT table. After the system obtains these parameters, it can determine the start sector offset of the data area.
The FAT16 file system processes data in clusters starting from the first sector after the 32 sectors occupied by the root directory, and still uses sectors as the unit before that. For the first cluster after the root directory, the system does not number it as cluster 0 or cluster 1 (probably because it is reserved for keywords), but numbers it as cluster 2, that is, the first cluster in the data area order is also cluster 2 in numbering.
The reason why the FAT file system has different versions of 12, 16, and 32 lies in the number of binary bits used by the FAT table to record the link of any cluster. Taking FAT16 as an example, each cluster occupies 2 bytes (16 binary bits) in the FAT table. Therefore, the maximum cluster number that FAT16 can represent is 0xFFFF (65535 in decimal). If the cluster size is 32K, the maximum disk space that FAT32 can manage is: 32KB × 65535 = 2048MB, which is why FAT16 does not support partitions larger than 2GB.
The FAT table is actually a data table, with 2 bytes as a unit. We temporarily call this unit a FAT record entry. Usually, the first and second record entries (the first 4 bytes) are used as the media description. Starting from the third record entry, the cluster chain conditions of other files and folders except the root directory are recorded. According to the performance of the cluster, FAT uses corresponding values to describe it, see Table 10
Look at a FAT16 file allocation table intercepted in WinHex, Figure 10:
As shown in the figure, the FAT table starts with "F8 FF FF FF", and these 2 bytes are the media description unit, which does not participate in the cluster chain relationship of the FAT table. The small red characters mark the cluster number corresponding to each 2 bytes in the FAT sector.
The relative offset 0x4~0x5 is cluster 2 (the first cluster in order), where it is FF, indicating that the file (directory) stored in cluster 2 is a small file and ends with only 1 cluster.
The data stored in cluster 3 is 0x0005, which is the first cluster of a file or folder. Its content is cluster 5, which means that the next cluster is in cluster 5 → The FAT table guides us to the cluster 5 pointer in the FAT table, and the data written above is "FF FF", meaning that this file has reached the end cluster.
The data stored in cluster 4 is 0x0006, which is the first cluster of another file or folder. Its content is cluster 6, which means that the next cluster is in cluster 6 → The FAT table guides us to the cluster 6 pointer in the FAT table, and the data written above is 0x0007, which means that the next cluster is in cluster 7 → The FAT table guides us to the cluster 7 pointer in the FAT table... until we read to the relative offset 0x1A~0x1B of the sector, which is cluster 13, and the data written above is 0x000E, which means pointing to cluster 14 → The content of cluster 14 is "FF FF", meaning that this file has reached the end cluster.
The subsequent FAT table data is the same as the above principle. No more analysis.
The FAT table records the storage linked list of disk data files, which is extremely important for data reading. So much so that Microsoft created a backup for the FAT table in the FAT file system it developed, which is FAT2. The content of FAT2 and FAT1 is usually synchronized in real time, that is, if changes are made to FAT1 through normal system reading and writing, FAT2 is also updated. From this perspective, this function of the system is a natural disaster during data recovery.
The directory structure of the FAT file system is actually a directed tree from the root to the leaves. The directed here means that for any file (including folders) in the FAT partition, it needs to be addressed from the root directory to find it. It can be considered that the entrance of the directory storage structure is the root directory.
The FAT file system addresses other files (including folders) according to the root directory, so the position of the root directory must be determined before the disk accesses data. The FAT file system determines it according to the relevant DBR parameters of the partition and the size of the FAT table (two copies) stored in the DBR. After formatting, the size and position of the root directory are actually determined: the position follows FAT2, and the size is usually 32 sectors. After the root directory is the second cluster of the data area.
An important idea of the FAT file system is to treat the directory (folder) as a special file. FAT32 even treats the root directory as a file (by the way: NTFS abstracts many things such as partition parameters and security permissions as files, which is an elevation of this idea). In FAT16, although the status of the root directory is not equivalent to an ordinary file or directory, its organization form is not different from an ordinary directory (folder). All folder (directory) files in the FAT partition can actually be regarded as a data table that stores the entry parameters of other files (folders). Therefore, the size of the space occupied by the directory is not equivalent to the size of all data under it, but is not equivalent to 0. It usually occupies a very small space and can be regarded as the directory file is a simple two-dimensional table file. Its specific storage principle is:
No matter how many clusters the directory file occupies and how many bytes per cluster, the system will allocate the clusters occupied by the directory file in units of 32 bytes. These 32 bytes define the attributes of a file (or folder) under this directory with a determined offset, which is actually a simple two-dimensional table.
The offsets of each byte of these 32 bytes are defined as shown in Table 11:
Some values in Table 11 are explained:
(1) For the short file name, the system stores the file name in two parts, that is, the main file name + extension. The 0x0~0x7 bytes record the main file name of the file, 0x8~0xA record the extension, and take the ASCII code value in the file name. The "." between the main file name and the extension is not recorded. The main file name is filled with blanks (20H) if it is less than 8 characters, and the extension is also filled with blanks (20H) if it is less than 3 characters. If the value at offset 0x0 is 00H, it means the directory entry is empty; if it is E5H, it means the directory entry has been used, but the corresponding file or folder has been deleted. (This is also the theoretical basis for undelete recovery). If the first character in the file name is "." or "..", it means that this cluster record is a directory entry of a subdirectory. "." represents the current directory; ".." represents the parent directory (the same meaning as in dos or windows. If the disk data is damaged, you can calculate the start position of the data area of the disk and guess the size of the cluster through the specific parameters of these two directory entries, so it is relatively important)
(2) The attribute field of 0xB: It can be regarded as the system dividing a byte of 0xB into 8 bits, and using one of the bits to represent the presence or absence of a certain attribute. In this way, each bit in the 8 bits of a byte can reflect the different values of each attribute. For example, 00000101 means this is a file, and the attributes are read-only and system.
(3) 0xC~0x15 are reserved and unused in the original FAT16 definition. In high-version WINDOWS systems, it is sometimes used to record the modification time and recent access time. In this case, the meaning of the field is the same as that of FAT32, see the following FAT32.
(4) The time in 0x16~0x17 = hour * 2048 + minute * 32 + second / 2. The result is converted into hexadecimal and filled in. That is: the 0~4 bits of the 0x16 byte are the value in units of 2 seconds; the 5~7 bits of the 0x16 byte and the 0~2 bits of the 0x17 byte are minutes; the 3~7 bits of the 0x17 byte are hours.
(5) The date in 0x18~0x19 = (year - 1980) * 512 + month * 32 + day. The result is converted into hexadecimal and filled in. That is: the 0~4 bits of the 0x18 byte are the date number; the 5~7 bits of the 0x18 byte and the 0 bit of the 0x19 byte are the month; the 1~7 bits of the 0x19 byte are the year number. In the original definition, 0~119 represent 1980~2099, and current high-version Windows allows 0~127, that is, the year number can be up to 2107.
(6) 0x1A~0x1B store the first cluster number representing the file or directory. The system finds the entry in the FAT table according to the master cluster number it has, then tracks the cluster chain until the end of the cluster, and uses the bytes at 0x1C~0x1F to determine validity, and then can read the file (directory) without error.
(7) The addressing process of the ordinary subdirectory is also specified by the directory entry in its parent directory. Different from the data file (referring to the non-directory file), the 4th position of offset 0xB of the directory entry is set to 1, and the data file is 0.
For the entire FAT partition, the allocation of clusters is not always completely clean. For example, in a FAT system with a data area of 99 sectors, if the cluster size is set to 2 sectors, there will be 1 sector that cannot be allocated to any cluster. This is the remaining sector of the partition, located at the end of the partition. Some systems use the last remaining sector to back up the DBR of this partition, which is also a good backup method.
The early FAT16 system did not have long file names. The Windows operating system has fully supported long file names on FAT16. The definition of long file names in FAT16 is the same as that in FAT32. The detailed explanation of long file names will be given in the FAT32 part.
★FAT32 Storage Principle:
FAT32 is a very meritorious file system. Microsoft successfully designed and applied it. Until today, when NTFS is overwhelming, FAT32 still occupies an important position in the Microsoft Windows file system. FAT32 was originally designed for the shortcomings of FAT16 such as not supporting large partitions and large unit cluster capacity leading to rapid space waste. In practical applications, FAT32 is still successful.
The principle of FAT32 is basically the same as that of FAT16. Figure 4.3.13 marks the basic composition of the FAT32 partition.
During the formatting process of FAT32, it builds its DBR according to the characteristics of the partition. The BPB parameters are very important. You can look back at Table 4 and Table 5. First, the number of reserved sectors in FAT32 is 32 by default, instead of just one in FAT16. The advantage of this is that it helps to extend the length of the disk DBR instruction and can leave backup space for the DBR sector. As we mentioned above, the operating system boot code built on FAT32, such as Win98 or Win2000, WinXP, does not only occupy one sector. Having extra reserved sectors can well expand the OS boot code. The BPB also records the backup sector number of the DBR sector. The backup sector can help us restore the DBR when the disk is accidentally damaged.
The data structure of the file allocation table of FAT32 is still the same as that of FAT16. The difference is that FAT32 extends the number of binary bits recording the cluster chain to 32 bits, so this file system is called FAT32. The cluster chain of 32-bit binary bits determines that the FAT table can address up to 2T clusters at most. In this way, even if the cluster size is 1 sector, theoretically, it can still address partitions within 1TB. But in practice, FAT32 cannot address such a large space. As the partition space increases, the number of records in the FAT table will become cumbersome, which seriously affects the performance of the system. Therefore, in practice, FAT32 partitions larger than 32GB are usually not formatted. OS above WIN2000 does not directly support formatting partitions larger than 32GB into FAT32, but WIN98 can still format FAT32 partitions as large as 127GB, but this is unnecessary and not recommended. At the same time, FAT32 also has a small limit. The FAT32 volume must have at least 65527 clusters. Therefore, for small partitions, FAT16 or FAT12 is still needed.
When the partition becomes larger, if the cluster is very small, the file allocation table will also become larger. The above efficiency problem still exists. To effectively read and write large files and minimize space waste as much as possible, FAT32 also specifies the corresponding cluster size for the corresponding partition space, see Table 12:
The meaning of the cluster value in FAT32 is similar to that in FAT16, but the number of bits is longer. For comparison, see Table 13:
Another major reform of FAT32 is the fileization of the root directory, that is, the root directory is equivalent to an ordinary file. In this way, the root directory is not limited by 512 directory entries in FAT16. When it is not enough, increase the cluster chain and allocate empty clusters. Moreover, the position of the root directory is no longer rigidly fixed and can be stored in any addressable cluster in the partition. However, the root directory is usually the earliest created (formatted) directory table. Therefore, the situation we see is basically that the first cluster of the root directory occupies the first cluster in the cluster area order. It is also drawn according to this situation in Figure 4.3.12.
The numbering of clusters in FAT32 is still the same as in FAT16. The first cluster in order is still numbered as cluster 2, which is usually used by the root directory (this is different from FAT16. The root directory of FAT16 does not occupy the cluster area space. The first cluster in the cluster area is after 32 sectors of the root directory)
The file addressing method in FAT32 is the same as in FAT16, but the meaning of each byte parameter in the directory entry is different from that in FAT16. On the one hand, it enables the reserved fields of the directory entry in FAT16, and at the same time fully supports long file names.
For the directory entry in the short file format. Its parameter meaning is shown in Table 14:
Instructions:
(1) This is the meaning of the directory entry in the FAT32 short file format. The algorithms for the file name, extension, time, and date are the same as in FAT16.
(2) Since the cluster number that FAT32 can address reaches 32-bit binary numbers. Therefore, when recording the start cluster address of the file (folder), the system also needs 32 bits to record. FAT32 enables the offset 0x12~0x13 of the directory entry to represent the high 16 bits of the start cluster number.
(3) The file length is still represented by 4 bytes, which means that FAT32 still only supports files (directories) smaller than 4GB. For files (directories) larger than 4GB, the system will truncate them.
An important feature of FAT32 is that it fully supports long file names. Long file names are still recorded in the directory entry. In order for low-version OS or programs to correctly read long file name files, the system automatically creates a corresponding short file name for all long file name files, so that the corresponding data can be addressed by both the long file name and the short file name. OS or programs that do not support long file names will ignore the long file name fields that they think are illegal, while OS or programs that support long file names will record and edit with the long file name as the explicit item and hide the short file name.
When creating a long file name file, the system will automatically add the corresponding short file name. The general principles are:
(1) Take the first 6 characters of the long file name plus "~1" to form the short file name, and the extension remains unchanged.
(2) If this file name already exists, the number after "~" is incremented until 5.
(3) If the number after "~" in the file name reaches 5, the short file name only uses the first two letters of the long file name. Generate the last four letters of the short file name by mathematically manipulating the remaining letters of the long file name, then add the suffix "~1" until the end (if necessary, or other numbers to avoid duplicate file names).
(4) If there are characters that cannot be read by old OS or programs, replace them with "_"
The realization of long file names depends on the attribute byte of offset 0xB of the directory entry. When the attribute of this byte is: read-only, hidden, system, volume label, that is, its value is 0FH, DOS and WIN32 will consider it illegal and ignore its existence. This is the basis for the existence of long file names. Set 0xB of the directory entry to 0F, and the rest is left to the system to define. Windows9x or Windows 2000, XP usually support long file names not exceeding 255 characters. The system cuts the long file name in units of 13 characters, and each group occupies one directory entry. Therefore, a file may need multiple directory entries. At this time, the directory entries of the long file name are arranged in reverse order in the directory table to prevent confusion with other file names.
The characters in the long file name are encoded in unicode form (a great progress), and each character occupies 2 bytes of space. Its directory entry definition is shown in Table 15.
When the system stores the long file name, it always fills the long file name directory entry in reverse order first, and then follows its corresponding short file name. As can be seen from Table 15, the long file name does not store the start cluster of the corresponding file, the file size, various time and date attributes. These attributes of the file are still stored in the short file name directory entry. A long file name always corresponds to its corresponding short file name one by one. The short file name can still be read without the long file name, but if the long file name does not have a corresponding short file name, no matter what system, it will be ignored. Therefore, the short file name is very important. Changing (including deleting, because deleting is to rewrite the first character to E5H) the file name and extension fields in the short file name in an environment that does not support long file names will make the long file name ineffective. The connection between the long file name and the short file name is obviously not enough to rely solely on their positional relationship. In fact, the checksum of the 0xD byte of the long file name plays a very important role. This checksum is obtained by a calculation method using the 11 characters of the short file name. The system determines whether the corresponding long file name and short file name match according to the corresponding algorithm. This algorithm is not easy to explain with a formula. We use a section of C program to illustrate it.
Assume that the file name is composed of 11 characters to form the string shortname, and the checksum is represented by chknum. The process is as follows:
int i, j, chknum = 0;
for (i = 11; i > 0; i--)
chksum = ((chksum & 1) ? 0x80 : 0) + (chksum >> 1) + shortname;
If the checksum calculated from the short file name is not equal to the data at offset 0xD in the long file name, the system will never match them.
According to the definition of the directory entry for the long file name and the short file name, and the numbering and linking of clusters, the data reading on FAT32 is easy.
V. End.
This article is from the Data Recovery Network (www.sjhf.net). Omissions are inevitable. I hope to be corrected. If you need to reprint, please keep this information; if you need to modify, please contact the author in the following ways
1、http://www.sjhf.net
2、zymail@vip.sina.com
|

非商业站点!数据恢复网是一个探讨磁盘存储和数据软恢复技术的站点.爱好的可以过来交流,我们也可以免费帮朋友们找回数据.
 |
|
2004-4-21 00:00 |
|
|
sjhf
初级用户
 
积分 161
发帖 7
注册 2004-4-21
状态 离线
|
『第 4 楼』:
使用 LLM 解释/回答一下
为什么图片不能显示?
文章较长,字符数大约是4.5万左右。所以,原文中我加了目录索引和文内链接。可以到 http://www.sjhf.net/bbs 下载。
另外,由于论坛里不太容易发表格,所以在论坛里发的时候,我将表格换成了图片,在原文中仍是表格。
Why can't the picture be displayed?
The article is relatively long, with approximately 45,000 characters. Therefore, I added a table of contents and internal links in the original text. You can download it at http://www.sjhf.net/bbs.
In addition, since it's not easy to post tables in the forum, I replaced the tables with pictures when posting in the forum, but the tables are still in the original text.
|

非商业站点!数据恢复网是一个探讨磁盘存储和数据软恢复技术的站点.爱好的可以过来交流,我们也可以免费帮朋友们找回数据.
 |
|
2004-4-21 00:00 |
|
|
aria
高级用户
   
积分 924
发帖 243
注册 2003-7-9
状态 离线
|
|
2004-4-23 00:00 |
|
|
aria
高级用户
   
积分 924
发帖 243
注册 2003-7-9
状态 离线
|
『第 6 楼』:
使用 LLM 解释/回答一下
推荐大家去www.sjhf.net看看
Recommend everyone to go to www.sjhf.net to have a look
|
|
2004-4-23 00:00 |
|
|
sjhf
初级用户
 
积分 161
发帖 7
注册 2004-4-21
状态 离线
|
『第 7 楼』:
使用 LLM 解释/回答一下
顶上去!!!呵呵
Push it up!!! Hehe
|

非商业站点!数据恢复网是一个探讨磁盘存储和数据软恢复技术的站点.爱好的可以过来交流,我们也可以免费帮朋友们找回数据.
 |
|
2004-5-2 00:00 |
|
|
ff854
初级用户
 
积分 127
发帖 5
注册 2004-5-5
状态 离线
|
|
2004-5-11 00:00 |
|
|
zhri
高级用户
    zhri
积分 665
发帖 153
注册 2004-2-23
状态 离线
|
|
2004-5-12 00:00 |
|
|
would
中级用户
  
积分 338
发帖 86
注册 2003-11-22
状态 离线
|
『第 10 楼』:
使用 LLM 解释/回答一下
雖然好長,不過一定係好文章……長長長……
Although it's very long, it must be a good article... Long long long...
|
|
2004-5-12 00:00 |
|
|
726842270
初级用户
  枫中残雪
积分 65
发帖 43
注册 2010-4-29 来自 吉林省长春市
状态 离线
|
『第 11 楼』:
使用 LLM 解释/回答一下
却实不错。又有新的收获了。强顶 
It's really good. There are new gains again. Strongly recommend;)
|
|
2010-5-11 23:31 |
|
|
mini2324
初级用户
 
积分 106
发帖 55
注册 2010-8-4
状态 离线
|
|
2010-8-25 04:02 |
|
|
lmlz41
初级用户
 
积分 21
发帖 16
注册 2010-10-24
状态 离线
|
|
2010-10-29 15:34 |
|
|
kaiyuanmdf
新手上路

积分 2
发帖 1
注册 2010-11-15
状态 离线
|
|
2010-11-16 23:43 |
|
|
lygwdyanzi
初级用户
 
积分 51
发帖 26
注册 2007-5-13
状态 离线
|
|
2010-11-17 18:25 |
|
|