China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-24 11:25
中国DOS联盟论坛 » 网络日志(Blog) » Forum post extracts View 54,853 Replies 120
Floor 16 Posted 2016-06-22 20:40 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Because 1MB = 1KB
So 1.44MB ≈ 1474.56KB
And because 1KB = 1024 byte (bytes)
So 1474.56KB ≈ 1509949 byte (bytes)
≈ 754974 Chinese characters
So there are 754974 Chinese characters

[ Last edited by zzz19760225 on 2016-6-22 at 23:10 ]
1<词>,2,3/段\,4{节},5(章)。
Floor 17 Posted 2016-06-22 20:48 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Inside the forum

Usage instructions and command explanations for the DOS command prompt interface in the forum
http://www.cn-dos.net/forum/viewthread.php?tid=26803&sid=4ruh6y

Original Dot matrix encoding generator, interpreter (shocking subtitles!)
http://www.cn-dos.net/forum/viewthread.php?tid=48088&fpage=1&highlight=%E7%82%B9%E9%98%B5

kiss's uboot
https://pan.baidu.com/s/1qXLwMh2
https://pan.baidu.com/s/1qXLwMh2#list/path=%2F

1543
http://upload.cn-dos.net/img/1543.rar
D dos7.1 dos installation disk 7.6MB 2009/06/15(Mon)23:13 MS-DOS+v7.10+complete installation CD ISO version.rar

543
http://upload.cn-dos.net/img/534.rar
D PPdos 13 music + player = 64k = more than 1 hour 60.8KB 2008/06/23(Mon)11:01 2220298-64kMusics.rar



[ Last edited by zzz19760225 on 2017-8-30 at 13:16 ]
1<词>,2,3/段\,4{节},5(章)。
Floor 18 Posted 2016-06-22 20:48 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Chinese characters Richard Sears
http://www.chineseetymology.org/CharacterEtymology.aspx?characterInput=車&submitButton1=Etymology

Wuyou Boot Forum
Don't click
Building a home website, personal homepage, personal website, newbie needs support
http://bbs.wuyou.net/forum.php?mod=viewthread&tid=363929&extra=page%3D1

History of Computers in Audio (lots of ads!)
http://www.tingchina.com/jiaoyu/527/play_527_3.htm

Weibo Disk
http://vdisk.weibo.com/u/1288508191?page=11
http://vdisk.weibo.com/u/3496344845?page=2
http://vdisk.weibo.com/s/uyA1pDHjlQKsP

How does binary convert addition, subtraction, multiplication and division into addition? How is addition implemented by logical operations AND, OR, XOR?
https://www.zhihu.com/question/37319895#answer-23994303

How to compile a binary in C that doesn't require an operating system?
https://www.zhihu.com/question/49580321#answer-42183533

INT 21H instruction description and usage
http://www.cnblogs.com/ynwlgh/archive/2011/12/12/2285017.html

What impact did Japan's colonial rule over Taiwan have on Taiwan (some content in the follow-up posts shows that Japan's network influence was as focused as China's Sino-Japanese War in 1894-1895)
https://www.zhihu.com/question/53104504
https://www.zhihu.com/topic/19553437/hot
Here the conversation was led astray by people of Japanese descent, they are people of Japanese descent, not Taiwanese. Taiwanese are Chinese, otherwise please go away.



57 Jinqianbao (It is said that it is currently a program with the democratic spirit style in Taiwan Province, different from the increasingly gangster and Cultural Revolution-like programs. After all, China has gone through the Cultural Revolution, a vortex of populism. What needs to be seen is not the excessive content of human feelings. From an extreme interest-hostile perspective, the group of people of Japanese descent entering this state is of long-term benefit to the Chinese mainland. Of course, the Japanese will not object, because there are many Han bloodlines. In their eyes, traitors like dogs biting the power and interest groups of their ancestors are what they are willing to support. They just need to support and encourage cannon fodder. But then again, in the end, most relatively silent, kind-hearted people will suffer. Ordinary people who are deceived by these public opinions and bear the strategic consequences of the times and history. Just like the square in the mainland back then, fortunately I didn't participate, otherwise I would only regret it.)
http://www.niotv.com/people_support.php?id=1837&name=%E6%A5%8A%E4%B8%96%E5%85%89
http://search.bilibili.com/all?keyword=57%E9%87%91%E9%92%B1%E7%88%86

http://www.niotv.com/index.php
PTT Forum in Taiwan Province
https://pttweb.tw/news


【CCTV】Military Documentary - The Sorrow of 1895
http://static.hdslb.com/miniloader.swf?aid=6772791&page=1
http://www.bilibili.com/video/av6772791/

Those who injure others' teeth and eyes but oppose revenge and advocate tolerance, never get close to them. ——Lu Xun

One Tiger One Discussion
http://v.ifeng.com/news/society/201206/afa117ec-d7d4-424e-94ab-11eaca41da65.shtml
http://tv.sogou.com/tvshow/wxt4vu553lcl6igsxo52fuv3z6x4zoa.html?vrid=70052900

Crystal Radio
http://www.crystalradio.cn/thread-834162-1-1.html

Author= ManateeLazyCat Deepin Operating System
http://www.jianshu.com/u/E6EbkP
If you also want to go for a long run after reading my article, here are my suggestions for long runs:

Start by running 1 kilometer, gradually accumulate and you won't get tired
Don't run every day, running for motivation, long running is for health, not for competition, a vain marathon will ruin your knees and regret it for a lifetime
Be sure to do warm-up exercises before running, stretch muscles and joints, there are plenty of teaching videos on Youku
Don't drink a lot of water before running, otherwise you will have a stomachache after running 1 kilometer
It's best to wear a Bluetooth earphone when running long distances, run while listening to music, not listening to your own exhausted breathing, you will persevere longer, because the more you listen to the breathing sound, the more it will psychologically suggest that you are tired and don't want to persevere
Once you run more than 5 kilometers, your body won't be tired. Unless your feet are worn out or your legs cramp, running 10 kilometers or 20 kilometers is just a matter of time
Be sure to wear sportswear and sports shoes when running, don't wear jeans and clunky shoes, don't ask me why, because after running like this, there will be all kinds of hidden pains, you know
When running, be sure to spread your shoulders and run with your chest straight. Don't run with your back hunched and buttocks stuck out when you get tired. This will make it impossible for your chest to breathe normally and make you more tired
When running, the arm swing should be minimal, swing forward along the running direction, don't swing left and right in front of the chest horizontally, it not only wastes unnecessary energy, but also easily makes the body posture distorted, becoming hunched and buttocks stuck out
When running, let your feet and legs land naturally, relax your whole body, don't be nervous at all. The more nervous you are, the more you will force your feet to be faster or slower, which will disrupt the breathing. The more relaxed your body is, the more it will swing reflexively, and the breathing will be more smooth, eventually avoiding getting tired as soon as you run
How to relax the body? It's what I said earlier, don't think about anything, including not thinking about relaxing. When you really don't think about anything, you will run very easily
Hope long running will also bring you a different way of seeing the world.
http://www.jianshu.com/p/e871723f9460 Deepin Desktop Operating System Architecture Design


http://item.btime.com/366is0qe82f8jf96e1m9928plfq?from=so
The navy is a service that can provide a lot of public services and has incomparable great diplomatic value among other services. The competition in the field of public service capability is an important part of the all-round competition between China and the United States. The existence of China's navy's ocean-going fleet can not only increase the threshold that the United States needs to cross when attacking China's overseas interests, but also promote the accelerated construction of the community of interests between countries in the West Pacific - Indian Ocean continuum and China.

Principle of Rubik's Cube Woodworking
http://www.zuojiaju.com/thread-189070-1-1.html

Longteng (look at the selected foreign translation)
http://www.ltaaa.com/
Santaihu (India)
http://www.santaihu.com/
Quora Chinese Network (USA)
http://www.quora123.com/
Watch Japanese translation views
http://2chcn.com/
China White Paper
http://www.scio.gov.cn/zfbps/index.htm


Hu Zhouzhou

Feifei's Home > DOS Era
http://www.ffhome.com/category/articles/dos

MAXDOS
http://www.maxdos.net/forum.php

[ Last edited by zzz19760225 on 2017-7-29 at 20:17 ]
1<词>,2,3/段\,4{节},5(章)。
Floor 19 Posted 2016-06-22 20:48 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Search the forum for "grub for dos "
http://www.cn-dos.net/forum/search.php?searchid=380707&orderby=lastpost&ascdesc=desc&searchsubmit=yes&sid=ZNDxjT


enjoyer
Intermediate User

Tribe Watcher

Points 351
Posts 140
Registered 2006-6-19
Status Offline
『Post 30』: Good idea, creative!!!

If the OOP of batch processing is realized, then any application will be text-based!!! In the future, when sharing software, just transfer text is OK, haha. However, the copyright issue is not easy to handle at this time :P
If the efficiency is good enough (the hardware speed is getting faster, it seems there is no big problem), then developing the program is also simple (eliminating many tools Compiler/Assemblyer/Llinker), DOS itself is a big development environment ah, too creative!!!
Should plan well, this may be the future development direction (if done well, this feature can surpass Windows and other graphical interface OS), support the building owner!!
Can imagine, at that time DOS is purely just a kernal (onion core), on its basis can run n applications even n OS (onion n layers of skin):D

Our DOS should also consider the 64-bit architecture, otherwise how to play out its compact and powerful power呢?? On the 64-bit, using batch processing on a large scale should be very meaningful.



This post was +1 point. Click to view details

Everything starts from the bottom

http://www.cn-dos.net/forum/viewthread.php?tid=16392&fpage=0&highlight=&page=2

[ Last edited by zzz19760225 on 2016-12-11 at 23:26 ]
1<词>,2,3/段\,4{节},5(章)。
Floor 20 Posted 2016-06-22 20:49 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Installation
Search for stardict in the Deepin Store to install.
For command installation, execute in the terminal:
sudo apt-get install stardict

Uninstallation
Search for stardict in the Deepin Store to uninstall.
For command uninstallation, execute in the terminal:
sudo apt-get remove stardict

----------------------------------------------------------------------------------------------------------------

zh_CN Simplified Chinese Dictionary
http://download.huzheng.org/zh_CN/

----------------------------------------------------------------------------------------------------------------

http://www.phpfans.net/article/htmls/200708/MjQxNTEx.html

Author: xuqiangjun Date: 2007-08-17
1. Download dictionary http://stardict.sourceforge.net/Dictionaries_zh_CN.php
stardict-kdic-computer-gb-2.4.2.tar.bz2
stardict-oxford-gb-2.4.2.tar.bz2
2. Download speech
WyabdcRealPeopleTTS.tar.bz2

Put the dictionary in the installation directory /usr/share/stardict/dic
Put the speech directly in /usr/share/ and it's okay

----------------------------------------------------------------------------------------------------

http://blog.csdn.net/jeep_ouc/article/details/39271065

Move all files in a folder to another folder under Linux
2014-09-14 17:08 3474 people read Comment(0) Collect Report
Classification: linux (16)
Reprinted from

Today I searched online and found that there is no content on the website for moving all file contents in a folder under Linux. So I just fiddled around randomly and found that I successfully moved all files in a folder to another folder under Linux.

First of all, we all know that the command to move a file under Linux is mv. Normally, to move a folder is mv /folder name /new folder name

Secondly, if we want to move the content under a certain folder, it should be mv /folder name/file name /new folder name

Then, if we want to move all files under the old folder, can we try to use the * key instead?

mv /folder name/* /new folder name

Sure enough, the test was successful. Below is the picture.

qq picture 20130722225416.jpg - Size: 7.31 KB - Size: x - Click to open a new window to browse the full picture

I hope the above is helpful for everyone to move all files in a folder under Linux to another folder

Top
0

[ Last edited by zzz19760225 on2016-12-12-12-15:51
1<词>,2,3/段\,4{节},5(章)。
Floor 21 Posted 2016-06-22 20:49 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
http://www.jianshu.com/p/009e31c93d7c

100 Author Jin Ge Da Wang 2016.03.26 00:19
Wrote 72,243 words, followed by 172 people, and got 339 likes

Making Your Own Instruction Set Architecture from Scratch
Word count 8,805 Read 1,338 Comments 6 Likes 35
Reading Classics - "Deep Understanding of Computer Systems" 06
In this article, we are going to do a bold thing, to implement a brand new instruction set architecture from scratch, so as to deeply understand the working principle of the processor.

Overview of Instruction Set Development History
Y86 Instruction Set
Instruction Set and Its Encoding
Hardware Control Language HCL
Memory and Clock
Phased Execution of Instructions
State Change Cycles of SEQ
Implementation of Each Stage of SEQ
General Principles of Pipelining
Pipeline Hazards
More Perfect Design
Gap with Real Instruction Set Architectures
Overview of Instruction Set Development History
Before starting our creative journey, let's first understand what instruction set architectures have been in history.

The instructions supported by a processor and the byte-level encoding of the instructions are called its Instruction Set Architecture (ISA).

The most familiar one is the x86 architecture, because the personal computers we use daily use processors based on the x86 architecture. The two largest processor manufacturers in the world, Intel and AMD, both have a series of products based on the x86 architecture. Starting from the Intel i386 processor, the x86 architecture entered the 32-bit era, called the IA32 architecture (Intel Architecture 32bit). Later, 32 bits were no longer enough for our needs, and Intel began to enter the field of 64-bit processors and proposed the IA64 architecture. However, this architecture is not the 64-bit processor we are using now, but a new processor architecture completely unrelated to x86, and does not maintain backward compatibility. Although it can achieve very high performance, due to poor compatibility, the market response is cold. At the same time, AMD seized the opportunity and first proposed the x86-64 processor architecture, which supports 64 bits while maintaining backward compatibility, and took the initiative in the market competition with Intel. Of course, Intel will not be stubborn, they decisively gave up IA64 and began to shift to the x86-64 architecture, and gradually recovered the lost market share. Later, although AMD named its architecture AMD64 and Intel named its architecture Intel64, people still habitually call them x86-64 collectively.

Y86 Instruction Set
To pay tribute to the great x86 instruction set architecture, we name our own instruction set architecture Y86. In fact, the design concept of Y86 is completely borrowed from x86, which is equivalent to a simplified x86 architecture.

To design an instruction set architecture from scratch, we need to first define the instruction set and instruction set encoding, then divide each instruction into several stages for step-by-step execution, each stage only needs to do one or two simple tasks, and then combine the hardware devices with appropriate logic circuits to implement the work of each stage of the instruction. Now we will explain the specific implementation process in detail.

Instruction Set and Its Encoding
For a simple instruction set, not too many instructions are needed, and basic data transfer and process control are enough. The following figure lists all the instructions in the Y86 instruction set and the encoding of each instruction.


Y86 Instruction Set
These are very basic instructions, but they look a bit strange because we replaced the movl instruction in x86 with four independent instructions: rrmovl, irmovl, rmmovl, and mrmovl. Each instruction indicates the source of the operand, which avoids the trouble of various addressing methods.

It can be seen that the lengths of each instruction range from 1 byte to 6 bytes, and this encoding can reduce the space occupied by the program code. The high 4 bits of the first byte are used as the instruction code to distinguish different instructions, and the low 4 bits are either 0 or fn. fn is called the function code, which is used to distinguish different operations. As shown in the following figure, different function codes have different meanings in different instructions. In arithmetic instructions, they represent addition, subtraction, AND, and XOR respectively; in branch jump instructions, they represent different jump conditions; in conditional transfer instructions, they represent different transfer conditions.


Function Codes of Y86 Instruction Set
The second byte, for most instructions, stores the register identifier. Please see the following figure:


Program Register Identifiers in Y86
Each register corresponds to a number one by one, and F represents no register operand.

Finally, some instructions also contain a four-byte immediate number.

Let's take an example to help us better understand the instruction encoding. For example, for the following instruction

rmmovl %esp, 0x12345(%edx)
The corresponding encoding is

40 42 45 23 01 00
Among them, from left to right, 40 is the instruction code, 42 are the 4 corresponding to register %esp and 2 corresponding to register %edx, and 45230100 is the representation of the offset 0x12345 on a little-endian machine.

Hardware Control Language HCL
The various hardware devices in the processor (such as ALU, program counter) usually need specific functional logic circuits to connect. In the design stage, we use a structured language to describe these logical relationships.

HCL (Hardware Control Language) is a hardware control language similar to C, used to describe the control logic of the processor.

Take a simple example, for the following combinational logic circuit:


Combinational Circuit
It can be expressed in HCL language as

e = (a && !(b||c)) || (!d && !(b||c))
This sentence describes the logical relationship between the output and the input. No matter how complex the combinational circuit is, it can be implemented with the most basic AND, OR, and NOT gates. HCL will be applied a lot later.

Memory and Clock
Careful readers may have noticed that the previous paragraph mentioned "no matter how complex the combinational circuit". Why is combinational circuit emphasized specially? Because there is another kind of circuit - sequential circuit.

Everyone should have basic circuit knowledge. Combinational circuit only completes the function of a function. Different inputs lead to different outputs, and the circuit itself does not store any information. Sequential circuits are different. They can store information and respond to inputs under the control of clock signals.

Next, the key point comes. There are two types of storage devices in the processor:

Clock register (referred to as register) stores a single bit or word. The clock signal controls the register to load the input value.
Random access memory (referred to as memory) stores multiple words, and the address is used to select which word to read and write. The memory mentioned here can be divided into two types: the virtual memory system of the processor and the register file. The former is the usual memory system, and the latter is the general-purpose register corresponding to the 8 register identifiers in our instruction set.

The following is the working principle of the register. The output of the register always remains in the current state until the rising edge of the clock, and the new input will become the current register state.


Register Operation
The register file can be regarded as such a functional block:


Register File.png
It has two read ports and one write port, and supports simultaneous reading and writing operations. It is worth noting that the read operation of the register file is immediate, while the write operation is clock-based. That is to say, the read values valA and valB change with the changes of srcA and srcB at any time, and the value valW to be written can only be written at the rising edge of clock. Think carefully, the read and write characteristics of the register file are exactly the same as those of the register, except that there is an additional address selection operation.

Phased Execution of Instructions
Although macroscopically, the instruction has become an indivisible basic element of the program. But in the processor, the execution of an instruction still needs to be divided into multiple stages, so as to improve the processing efficiency of the hardware. In the Y86 architecture, we divide the execution of each instruction into 6 stages.

Fetch: Take the current instruction to be executed from the PC, and decompose it according to the instruction encoding to get values such as icode, ifun, rA, rB, valC.
Decode: Take out the values valA and valB of the corresponding registers according to rA and rB.
Execute: The ALU performs different operations under different instructions, including simple operations, address addition and subtraction, etc. The operation result is valE, and the condition code will be affected during the operation.
Memory access: Read data from memory or write data to memory. The read value is valM.
Write back: Write the previously generated result back to the register file.
Update PC: Set the PC to the address of the next instruction.

These steps seem messy now, and I don't know what their use is. But upon careful analysis, it can be seen that each stage only does one or two things related to hardware, and the output is determined by the input, which can be completely done in one clock cycle. The connection between each stage is the input and output of various signals. For example, the output valA of the decoding stage can be used as the input of the execution stage, and the output of the execution stage can be used as the input of the write-back stage, so these hardware units can be connected with simple combinational circuits to realize the functions we need.

In order to help everyone understand the role of each stage more clearly, we use an example to explain in detail.


Phased Implementation of Instructions
The above figures are respectively the phased execution processes of the three instructions: OPl rA, rB, rrmovl rA, rB, and irmovl V, rB. In the fetch stage, M represents the memory, and M1 represents taking out 1-byte data from the memory with PC as the base address. Since the lengths of each instruction are different, the things done in the fetch stage are also different. At the end of this stage, the new value valP of the PC will be calculated. The decoding stage is to take out the value of the register from the register file, and R represents the value of register rA. For the OPl instruction, the execution stage will set the status code CC, while the latter two instructions will not affect the status code. There is no memory access involved in these three instructions in the memory access stage. The final update PC stage assigns the value of valP to the PC.

When I read this, I have a big question: it is said that each stage only does one simple thing, but the things done by different instructions in the same stage seem different. For example, in the three instructions just now, only the OPl instruction sets the status code in the execution stage, while the other two do not. Why is this? Including other examples given in the book later, the update PC stage does not necessarily assign the value of valP to the PC. Some instructions such as call and ret will assign the value of valC or valM to the PC. How is this done?

Have you also thought about these problems? Obviously, it is natural that each stage has different responses to different instructions, otherwise how to adapt to the different functions of each instruction. The HCL hardware control language mentioned earlier is to complete this task, controlling the tasks that each instruction should complete in each stage.

Well, before explaining in detail how to use HCL control logic, let's first give the complete hardware structure diagram.


SEQ Hardware Structure
We need to pay attention to the different colored squares and different thicknesses of lines in the figure, which represent different meanings. The green blocks represent basic hardware units, such as ALU, register file, PC, which we have basically come into contact with. The gray squares will be the key points of our next research. They are combinational logic circuits described by HCL, used to connect green blocks and implement specific selections or logical operations. The white circles have no special meaning, just used to identify the names of signal lines. There are also three types of lines in the figure. The thick solid line represents a signal line with a word length, the thin solid line represents a signal line with a width of 1 byte or narrower, and the dashed line represents a signal line with a single bit.

The figures from bottom to top are the fetch, decode (write back), execute, memory access, and update PC stages introduced just now. Since the decode and write back stages are both operations on the register file, they are drawn in the same position in the figure. The signals marked with circles are the intermediate values generated in each stage mentioned earlier. These values usually play different roles in different instructions, so there will be a situation where one signal branches into two signals. For example, after valA is generated in the figure, it is divided into two lines, one leading to the ALUB control logic and the other leading to the Data control logic. Another example is that after valM is generated in the figure, it is divided into two lines, one leading to the New PC control logic and the other leading to the input end of the register file. We need to understand that a signal is divided into two signals, which means that both receiving ends can read the value of the signal, but reading the value does not mean using the value. The control logic of the receiving end determines whether to use the value, which will be described in detail below.

State Change Cycles of SEQ
I didn't explain the title of the previous figure, but actually left a question. The meaning of SEQ is Sequential (sequential), "SEQ Hardware Structure" means "sequential hardware structure" or "sequential implementation of hardware structure". What! Is there any other way to implement it? The answer is of course, we will uncover the mystery later. The hardware structure of SEQ makes the instructions must be executed one by one in sequence, and the start of the next instruction must be later than the end of the previous instruction. This leads to extremely low processor efficiency, because an instruction must pass through all stages in one clock cycle, and due to the inherent factors of circuit delay, the time required to pass through all stages is very long, which also limits the clock cycle from being improved. However, why must an instruction pass through all stages in one clock cycle?

Because for sequential logic circuits, such as the memory, register file, CC, and program counter in SEQ, they only write data at the rising edge of the clock signal. When the previous instruction ends and the next instruction starts, the rising edge of the clock signal triggers the update of these hardware units. If the new value to be updated has not been generated before the rising edge of the next clock cycle, this instruction is equivalent to not being executed or executed halfway. Therefore, the clock cycle cannot be raised too high, otherwise it will cause instruction execution disorder.

The following figure shows the state changes of each hardware unit controlled by the clock during the process of two instruction cycles.


Tracking Two Execution Cycles of SEQ
It can be seen that in the figure, the other parts outside the four sequential logic circuits are regarded as a whole of the combinational logic circuit. When cycle 3 starts, the combinational logic circuit starts to run, and all results are obtained before cycle 3 ends, ready to be written to devices such as memory. When cycle 4 starts, the values of the memory, register file, CC, and program counter are updated, and at the same time, these new values are read by the combinational logic circuit and start to calculate the results, and so on. Therefore, the state of SEQ changes once every clock cycle.

Implementation of Each Stage of SEQ
The SEQ hardware structure diagram given earlier is only a general implementation, and some details are not given. Now, let's analyze the specific implementation of SEQ stage by stage.

Fetch Stage:


SEQ Fetch Stage
After the instruction is taken from the memory, it is divided into two parts by bytes: Split and Align. Split is further divided into icode and ifun. Align is divided into rA, rB, and valC, which are all easy to understand. The key is the logic of PC increment. How much the PC is incremented depends on the length of this instruction. The length of this instruction also depends on whether the instruction contains a register identifier and whether it contains a constant valC. The two combinational circuits Need valC and Need rigids in the figure are used to make this judgment.

Take Need rigids as an example. Its HCL language description is as follows:

bool need_rigids =
icode in { IRRMOVL, IOPL, IPUSHL, IPOPL, IIRMOVL, IRMMOVL, IMRMOVL };
That is to say, when icode is one of the 7 instruction codes in the parentheses, need_rigids is true. That is to say, these 7 instructions contain register identifiers. Similarly, need_valC can also be determined by this enumeration method. Just find the instructions containing valC from the previous instruction set encoding table and put them in the parentheses.

After need_rigids and need_valC are determined, the new PC value will be calculated according to the following formula for PC increment. In fact, it is adding the length of this instruction:

newPC = oldPC + 1 + need_rigids + 4*need_valC
Now we understand that the combinational circuit represented by the gray box can be described in HCL language. In the actual circuit, these HCL statements will be synthesized into real combinational logic circuits through synthesis. Here, HCL is a good abstraction, separating the principle from the specific implementation, which is convenient for our design.

Decode and Write Back Stages:


SEQ Decode and Write Back Stages
Both of these stages are related to the reading and writing of the register file. The signals icode, rA, and rB obtained from the fetch stage are used as input signals here, and some combinational circuits are used to generate the input of the register file. Our purpose is that in the decoding stage, for commands that need to use specific registers, take out the values of these registers from the register file, the address is determined by srcA and srcB, and the result output is valA and valB; in the write back stage, write the result valE of the execution stage or the result valM of the memory access stage back to the specific register, and the address of the register is determined by dstE and dstM. Take the combinational circuit srcA as an example, its HCL expression is:

int srcA =



;
The square brackets are similar to the switch statement in C language. When the condition before the first semicolon is satisfied, rA is returned, and the following two conditions are no longer considered; otherwise, it is judged whether the second condition is satisfied, and if satisfied, RESP is returned; otherwise, RNONE is returned, indicating that no register file needs to be read. It can be seen from this that in the decoding stage, when the instruction is one of the four before the first semicolon, the value of the rA register will be read and put into the result valA; when the instruction is one of the two before the second semicolon, the value of the RESP register will be read and put into the result valA; otherwise, no register needs to be read.

Similar to srcA, there are three combinational logic circuits: srcB, dstE, and dstM. Their HCL expressions can be analyzed from the SEQ hardware structure and instruction set encoding, and will not be described one by one.

Execution Stage:


SEQ Execution Stage

The ALU needs two operands and an alufun signal. The alufun signal is used to indicate what kind of logical operation (addition, subtraction, AND, XOR) the ALU performs on the two operators.

Take the first operand aluA as an example. Its HCL description is as follows:

int aluA =





;
It can be seen that the operand aluA sometimes takes valA, sometimes takes valC, sometimes takes -4 or 4, which is completely determined by the instruction type.

The HCL description of the alufun signal is as follows:

int alufun =


;
Only when the instruction is an IOPL instruction (that is, an arithmetic instruction), alufun is determined by ifun. In other cases, the ALU is used as an adder. This also explains why aluA will take -4 or 4 just now. Therefore, aluA is used as one addend of the adder, and the other addend can only come from valB as can be seen from the figure. Although we have not given the HCL of valB in the decoding stage, we can tell you that the output of valB in these four cases is RESP. Therefore, for ICALL and IPUSH, it is to make the stack pointer esp-4, and for IRET and IPOPL, it is to make the stack pointer esp+4.

Memory Access Stage:


SEQ Memory Access Stage
Mem read and Mem write determine whether the current instruction is a read or write operation on the memory. Mem addr and Mem data determine the address and data of the read and write operations. Take Mem addr as an example, the HCL description is as follows:

int mem_addr =



;
Update PC Stage:


SEQ Update PC Stage
The source of the new PC value can be selected from valC, valM, and valP. The HCL description of New PC is as follows:

int new_pc =








;
General Principles of Pipelining
So far, our prelude has just ended, and we are finally going to get to the main topic. (This prelude is indeed a bit long, haha.)

A foreshadowing was planted in "State Change Cycles of SEQ", and now we will uncover the mystery. Because the clock frequency of SEQ is too low, we need to find some ways to increase the clock frequency. Usually, two ways can be thought of. One is to shorten the execution time of each instruction, and the other is to let multiple instructions execute simultaneously. Method one is not feasible because the execution time of each instruction is difficult to compress, which is determined by the inherent nature of the circuit. Therefore, we can only adopt method two, that is, pipelining technology.

Let's first use an image metaphor to describe the pipelining technology. There is a self-service restaurant with a conveyor belt. The food is placed on the conveyor belt and passes by the customers. The customers can take away their favorite food at will. If we regard a plate of food as an instruction and the customers on both sides of the conveyor belt as each stage of instruction execution, then the implementation of SEQ is equivalent to putting only one plate of food on the conveyor belt each time. When this plate of food reaches the end of the conveyor belt, the next plate of food is put on. If the restaurant really does this, the customers will probably starve to death. The actual situation is that the food is placed on the conveyor belt one after another, and each customer sends away this plate of food and immediately welcomes the next plate of food, which greatly improves the efficiency.

The pipelining technology of the processor architecture is also like this. Each stage has an instruction being executed. There will be 6 instructions executed simultaneously in 6 stages, which increases the throughput to 6 times that of SEQ. Does this feel very powerful? However, things are far from being as simple as imagined. The most direct problem is whether multiple instructions will interfere with each other?

Let's review the SEQ hardware structure diagram. There are often cross-stage connections between different stages. For example, valC obtained in the fetch stage is directly connected to the New PC in the update PC stage. This will cause problems in the pipelining situation, because the subsequent instructions will overwrite the valC generated by the previous instructions. Therefore, when the previous instruction reaches the update PC stage and then goes back to get the value of valC, it is no longer the value generated by its own decoding stage. What to do?

The solution is also easy to think of. Just save the values that may be used by each instruction later. It is equivalent to adding a set of registers to each stage, and updating the values in these registers to the values supporting the current instruction at the beginning of the stage. In pipelining technology, these registers inserted between each stage are called pipeline registers.

Now our processor architecture is updated to PIPE- (Pipeline-, the minus sign means non-final version), as shown in the following figure.


PIPE- Hardware Structure
There are two changes compared with SEQ. One is that the update PC stage and the fetch stage are put together, and the PC is updated before fetching; the other is that pipeline registers are inserted between every two stages. These pipeline registers are updated based on the clock. The data in these registers will be updated at the beginning of each clock cycle, which is equivalent to passing the state of the current instruction to the next stage.

Pipeline Hazards
Is it all done now? No. When we carefully analyze PIPE-, we will find that there are still some problems. Although the pipeline registers isolate the data sharing between each instruction, there are still dependencies between multiple instructions, including two aspects:

Data dependency: The register or memory written by the previous instruction is exactly the register or memory that the subsequent instruction needs to read. In PIPE-, when the subsequent instruction reads the register in the decoding stage, the previous instruction has just reached the execution stage, so the new value has not been written to the register. If the subsequent instruction directly reads the register at this time, the old value will be read, which violates the rule of sequential execution of code.

Control dependency: When an instruction is jump, call, or return, the address of the next instruction cannot be determined in advance, and it depends on the execution result of the current instruction. Therefore, the pipeline may need to be interrupted.

These dependencies may cause the pipeline to produce calculation errors, and this phenomenon is called pipeline hazard. Let's first consider data hazard. The following figure shows the phased execution process of a segment of code.


Execution Process of prog1 Code Segment
There are three empty instructions inserted between irmovl $3, %eax and addl %edx, %eax. In this way, the former finishes the write back stage, and the latter just starts the decoding stage, ensuring that the register is read after it has been written. No data hazard occurs.

Now look at the following figure.


Execution Process of prog2 Code Segment
Now one empty instruction is removed, and the situation immediately deteriorates. The write back stage of instruction 0x006 and the decoding stage of instruction 0x00e occur at the same time, but since the operation of writing back the register will not take effect until the start of cycle 7, the value read in the decoding stage is still the old value, and a data hazard phenomenon occurs.

If the remaining two empty instructions are also removed, it goes without saying that more serious data hazards will definitely occur, which we will not verify here. Next, consider how to avoid data hazards.

There are still two solutions:

Suspension: Similar to inserting a nop empty instruction, the processor automatically inserts a bubble between the codes where data hazards may occur, so that the currently executing instruction pauses for one clock cycle.


Execution Process When prog2 Uses Suspension
As shown in the figure above, when the addl instruction reaches the decoding stage, it detects that a data hazard will occur, so it inserts a bubble, and the addl instruction repeats one clock cycle in the decoding stage.

If all nops are removed, the bubble insertion method can still be used to solve the data hazard, but multiple bubbles need to be inserted. As shown in the following figure.


Execution Process When prog4 Uses Suspension
Forwarding: Suspension has one disadvantage, which is that it will reduce the program execution efficiency, because a lot of useless instructions are added, which is purely a waste of time. Forwarding can make more use of the time of each cycle.

Still take the previous code segment as an example to explain how forwarding works.


Execution Process When prog2 Uses Forwarding
As shown in the figure, when addl reaches the decoding stage, irmovl reaches the write back stage. Since the register has not been written yet, a data hazard occurs when reading the data. However, we can use a clever method to avoid this hazard. Since the write back stage can only write the register until the start of the next cycle, it is better to directly forward the value to be written to the decoding stage. In this way, the decoding stage does not need to read from the register anymore, and can directly use the forwarded value.

Next, what about the prog3 code segment?


Execution Process When prog3 Uses Forwarding
The difference between prog3 and prog2 is that there is one less nop instruction, which causes that when addl reaches the decoding stage, the irmovl instruction only reaches the memory access stage. However, it seems that there is no impact on forwarding, because the irmovl instruction does not operate on the memory. The value to be written to the register in the next stage has been generated now, which is M_valE (it needs to be annotated. M_valE means the value of valE in the pipeline register of the M stage. Please refer to the previous PIPE- hardware structure diagram). So we can directly forward M_valE to the decoding stage.

Next, what about the prog4 code segment?


Execution Process When prog4 Uses Forwarding
Now, there is no nop instruction left. irmovl is followed by addl immediately. When addl reaches the decoding stage, irmovl only reaches the execution stage. But surprisingly, forwarding can still be used. First, we can find that the value of the register needed at the end is the result calculated in the execution stage. Secondly, we need to consider whether the time required for the execution stage to obtain the result will cause it to not be forwarded to the decoding stage on time. The answer is no. Because even if the decoding stage gets this value very early, it will not write it into the pipeline register of the execution stage until the start of the next cycle. Therefore, as long as this value is calculated before the start of the next cycle, it can always be satisfied.

Does it feel very magical? Forwarding can solve the data hazard problem without reducing the program efficiency. It's really great. But nothing is perfect. The previous example is only irmovl followed by addl and both use the same register. There are very many possible combinations in actual programs. Can forwarding solve all problems? Let's look at the following example.


Load/Use Data Hazard
The two lines of code 0x018 and 0x01e in the prog5 code segment are called load/use data hazard. mrmovl loads data from memory into register %eax, and then addl immediately uses the value of register %eax. Still using forwarding, forwarding the value of the execution stage of mrmovl to addl, but gets the wrong result. In fact, the reason is very easy to think of. Because the mrmovl instruction needs to reach the memory access stage to obtain the correct value and assign it to %eax, so it is completely impossible to forward from the execution stage to the decoding stage. How to solve this problem? We can combine the two methods of suspension and forwarding. First, suspend for one cycle, and then when mrmovl reaches the memory access stage, the value can be correctly forwarded to addl.

Well, after solving so many problems, we can finally give our final version of the hardware structure diagram.


PIPE Hardware Structure
The content added compared with PIPE- is the forwarding circuit added to solve the data hazard problem. The receiving ends of forwarding are basically in the decoding stage.

More Perfect Design
Everything pursues perfection. The PIPE we get now is not perfect enough, and some key details are not considered.

Exception Handling: An extremely important aspect of the processor is exception handling. Many instructions may encounter various exceptions during execution, such as invalid addresses when accessing memory, invalid instruction encodings, etc. When a program has an exception, it should immediately stop the program. The effect from the outside should be that the previous code has been fully executed, and the subsequent code has not been executed at all. What seems like a simple thing is not so easy in PIPE, because there are multiple instructions executing simultaneously in the pipeline. If an instruction has an exception in a certain stage, at this time, the subsequent code may have executed part of it. To get the effect of not having been executed at all, it is necessary to eliminate the already generated impact, which requires strengthening the function of the control logic.

Control Hazard: We mentioned control dependency in the previous pipeline hazard, which will lead to control hazard. When executing a conditional jump instruction, branch prediction needs to be performed. Once the prediction is wrong, several executed instructions need to be eliminated, and the instructions of the correct branch need to be executed again. When executing the subroutine return instruction, the return address needs to be taken from the memory, so the next instruction can only start to execute until the memory access stage. These special cases need our special consideration and implementation in the control logic.

If the specific implementation of these two parts is explained in detail, it will take a lot of space. Interested friends can visit the official website of this book for further understanding.

Gap with Real Instruction Set Architectures
This article describes the design process of the Y86 instruction set architecture. Although the description is already very rough, it still takes such a long space. However, compared with the complexity of the real instruction set architecture (such as x86), it is really small. We only specified a very simple instruction set and completed a simple implementation. The real instruction set will contain many instructions, including some multi-cycle instructions, such as floating-point arithmetic instructions. These instructions cannot be completed in one cycle, so some additional hardware units are needed. The memory in Y86 is regarded as an ideal storage unit by us, and we think that the data access operation can be completed in one clock cycle. However, the CPU speed and memory speed are actually thousands of times different. Usually, a complex memory hierarchy composed of multiple levels of cache is needed to speed up the access efficiency. Modern processors also use multi-issue and out-of-order execution technologies. They are no longer executed stage by stage as described in Y86. Instead, multiple instructions are executed simultaneously, and they are not related to the order in which they appear in the code. In recent years, processors have developed towards multi-core, and multiple cores have stronger processing capabilities, which also makes the parallel execution of instructions at the code level a trend. In the future, what new technologies processors will adopt we don't know, but they will definitely become more and more complex. However, no matter how it changes, understanding the basic principles of the processor and the instruction set can help us see through everything. No matter how complex the system is, it is gradually expanded from the basic form. Grasping the core is the most important thing.

Follow the author or the collection "Deep Understanding of Computer Systems" to get the latest published articles in the first time.

References
Deep Understanding of Computer Systems (4.2) --- The Charm of Hardware Zuo Xiaolong

[ Last edited by zzz19760225 on 2016-12-12 at 20:02 ]
1<词>,2,3/段\,4{节},5(章)。
Floor 22 Posted 2016-06-22 21:29 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Search Engine
Default Search Settings



Sogou

sogou.com

https://www.sogou.com/web?ie={inputEncoding}&query=%s

(Used under the default Google browser to modify to domestic products, although the national football team!)




http://www.aisinoha.com/indexShowAction.do

http://www.aisinoha.com/indexActionArticle.do?id=231&addTime=2014-09-29%2015:30:25&sid=25&typeName=%25E6%259C%258D%25E5%258A%25A1%25E6%25B5%2581%25E7%25A8%258B&sectionName=%25E6%259C%258D%25E5%258A%25A1%25E6%2594%25AF%25E6%258C%2581

http://www.aisinoha.com/shipinAction.do

Sogou Guide
http://zhinan.sogou.com/

[ Last edited by zzz19760225 on 2017-1-18 at 11:23 ]
1<词>,2,3/段\,4{节},5(章)。
Floor 23 Posted 2016-06-22 21:29 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 24 Posted 2016-06-22 21:30 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 25 Posted 2016-06-22 22:50 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 26 Posted 2016-06-22 22:50 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 27 Posted 2016-06-22 22:51 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 28 Posted 2016-06-22 22:51 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 29 Posted 2016-06-22 22:51 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
1<词>,2,3/段\,4{节},5(章)。
Floor 30 Posted 2016-06-22 22:51 ·  中国 海南 三亚 电信
超级版主
★★★★
Credits 3,673
Posts 2,020
Joined 2016-02-01 00:00
10-year member
UID 181465
Gender Male
Status Offline
Homemade Compound Optical Biological Microscope
2012-10-8 11:54 | Publisher: Ancient | Views: 99908
http://www.geekfans.com/article-2887-2.html

[ Last edited by zzz19760225 on 2017-4-5 at 04:27 ]
1<词>,2,3/段\,4{节},5(章)。
Forum Jump: