China DOS Union

-- Unite DOS · Advance DOS · Grow DOS --

Union site: www.cn-dos.net Forum site: www.cn-dos.net/forum
DOS stands for freedom, openness and progress. Let us work hard, learn from the openness and GNU spirit of FreeDOS and Linux, and together build and grow a free GNU GPL world!

中国DOS联盟论坛
The time now is 2026-06-24 14:53
中国DOS联盟论坛 » DOS批处理 & 脚本技术(批处理室) » [Collaborative Participation] [Challenging Ideas] [Batch Processing: Easily Translate Words] DigestI View 22,522 Replies 55
Floor 16 Posted 2006-10-12 08:02 ·  中国 广东 佛山 广东睿江科技有限公司
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
Just as 3742668 said, the format of the lexicon is very important. To find matching content in the lexicon using CMD, findstr is the best choice. A preliminary code is written, requiring the format of the lexicon to be: each word occupies one line (case - insensitive); the translation content starts a new line (pure - letter lines are not allowed), and there can be multiple lines.

Code:

@echo off
:begin
cls
set input=
set /p input=Please enter the word to be searched (press Enter directly to exit):
if not defined input exit
for /f "tokens=1,2 delims=:" %%i in ('findstr /n. test.txt') do (
if /i "%%j"=="%input%" (set line=%%i&&goto display)
)
echo _________________________________
echo.
echo No record of %input% found
echo _________________________________
echo.
pause
goto begin

:display
echo _________________________________
echo.
echo %input%:
for /f "skip=%line% tokens=*" %%i in (test.txt) do (
echo %%i|findstr "^*$">nul &&goto end||echo %%i%
)
:end
echo _________________________________
echo.
pause
goto begin


Test content format example:

China
n.
China, porcelain

DOS
n.
Disk operating system

name
n.
name, name, name, reputation
vt.
name, nominate, call out, designate
adj.
name - related, named - after

who
pron.
who, the one... (person)


Those interested can use more content for testing.

[ Last edited by namejm on 2006 - 10 - 12 at 08:19 ]
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Floor 17 Posted 2006-10-12 08:38 ·  中国 湖北 荆门 电信
荣誉版主
★★★
Credits 2,013
Posts 718
Joined 2006-02-18 07:07
20-year member
UID 50550
Status Offline
Re namejm:
Remember that in a previous post, I mentioned that the efficiency of using findstr inside or outside the loop can be very different. Using your above method is very inefficient when processing tens of thousands of lines of records. So it's best to use findstr first and then for. Due to time constraints, write a process:

@echo off
for /f "delims=:" %%i in ('findstr /nirc:"^%word% .*" word library.txt') do set /a col=%%i + 1
for /f "delims= skip=%col%" %%i in (word library.txt) do echo %%i && goto :next
:next
.
.
.

The above method should be much more efficient. Bro, I bought a new phone, I'm going to help with the music, bye.
Floor 18 Posted 2006-10-12 08:56 ·  中国 广东 佛山 广东睿江科技有限公司
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
Hehe, yes, the efficiency difference between putting it inside and outside the for is quite big. Because the antivirus software has just been changed to Kaspersky, and the updates are frequent. Once updated, the CPU usage often reaches 100%, and it's very laggy. I only tested a small segment of that code. It seems I need to disable Kaspersky and then test again.
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Floor 19 Posted 2006-10-12 09:12 ·  中国 湖北 武汉 电信
版主
★★★★★
Credits 11,386
Posts 4,938
Joined 2006-07-23 17:10
19-year member
UID 59080
Status Offline

Brother namejm's code is good, give a thumbs up first.......
Floor 20 Posted 2006-10-12 09:52 ·  中国 广东 佛山 广东睿江科技有限公司
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
The code for floor 16 has been modified as follows. When testing 30,000 lines of content on my local machine, the result comes out in an instant. Please各位test more complex content and see how the efficiency and accuracy are:


@echo off
:: The format of the word list is: each word occupies one line (case-insensitive);
:: The translation content starts a new line (pure alphabet lines are not allowed), and there can be multiple lines.
:begin
cls
set input=
set line=
set /p input=Please enter the word to look up (press Enter directly to exit):
if not defined input exit
for /f "tokens=1* delims=:" %%i in ('findstr /nirc:"^%input%" 词库.txt') do (if /i "%%j"=="%input%" set line=%%i)
if not "%line%"=="" (goto display) else (
echo _________________________________
echo.
echo No record of %input% found
echo _________________________________
echo.
pause
goto begin)

:display
echo _________________________________
echo.
echo %input%:
for /f "skip=%line% delims=" %%i in (词库.txt) do (
echo %%i|findstr "^*$">nul &&goto end||echo %%i
)
:end
echo _________________________________
echo.
pause
goto begin
Recent Ratings for This Post ( 1 in total) Click for details
RaterScoreTime
redtek +5 2006-11-23 07:10
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Floor 21 Posted 2006-10-12 09:53 ·  中国 北京 东城区 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
Originally posted by 3742668 at 2006-10-12 07:25:
Purely using batch processing to handle the requirements of this topic, once the efficiency is achieved, the lexicon format is difficult to handle. If the requirements for the lexicon are lowered, then the efficiency is unsatisfactory. In the case where both cannot be achieved, one...


『Building 15』This is really interesting~:)
Even the principle of future lexicon entry has been worked out, haha...

[ Last edited by redtek on 2006-10-12 at 09:54 ]
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 22 Posted 2006-10-12 11:44 ·  中国 湖北 荆门 电信
荣誉版主
★★★
Credits 2,013
Posts 718
Joined 2006-02-18 07:07
20-year member
UID 50550
Status Offline
It is suggested to make it a bidirectional translation. Not only translate Chinese to English, but also English to Chinese.

In addition, there seem to be some flaws in the code of 20F:
It is suggested to add $ after %input% in the first for statement, otherwise it may lead to incorrect search results.

for /f "delims=:" %%i in ('findstr /nirc:"^%input%$" 词库.txt') do if not defined line set line=%%i

Just armchair theorizing, not practiced yet.
Floor 23 Posted 2006-10-12 12:16 ·  中国 广东 佛山 广东睿江科技有限公司
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
Originally posted by 3742668 at 2006-10-12 11:44:
It is suggested to make it bidirectional translation. Not only Chinese to English, but also English to Chinese.
In addition, there seem to be some flaws in the code of 20F:
It is suggested to add $ after %input% in the first for statement, otherwise it may lead to incorrect search results.

  It should not be wrong because I used if /i "%%j"=="%input%" after do to match completely. The reason why I didn't use the format of adding $ is that when I tested at the beginning, adding $ didn't succeed. Later, I found that if the content of the last line is the %input% to be searched, the word will not be detected. It should be that there is a bug in the findstr regular expression when matching the last line, so I changed to the if format - although the method of adding $ has no problem in this specific application.

  As for making Chinese to English, this is a bit extremely difficult. I even have some doubts whether CMD can complete this task. If it is possible, then we still need to find a breakthrough in the format of the word bank.
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Floor 24 Posted 2006-10-12 13:12 ·  中国 北京 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
If looking for breakthroughs in the dictionary format, it's more and more like looking for and re - researching a plain - text database system most suitable for batch processing~:)
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 25 Posted 2006-10-12 13:26 ·  中国 北京 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
If a Chinese-English translation function is made, then the Chinese explanations corresponding to English words in the same file may contain any Chinese words, such as "is", "represent", "meaning", etc. These Chinese words or even single Chinese characters are almost everywhere in the Chinese translations of every English word. Finding a Chinese word or single Chinese character in the entire file with findstr may very likely find a large number of mixed contents contained in the explanations of other irrelevant words : )

Unless, the English word and its Chinese explanation each occupy one file, that is, the English word inventory is in one file, and each word occupies one line. The Chinese explanation of the English word is also stored in one file, and the Chinese explanation must also occupy only one line. Then, although the English word and its Chinese explanation are not in the same file, they must be in the same line number. Then, when finding any line number's Chinese explanation or finding any English word, as long as you know its line number, you can directly extract the content in the other file by using for /f "skip= to skip how many lines.

However, one English word may have multiple Chinese explanations, and the vocabulary of these multiple Chinese explanations is very likely to have great similarities in the content of up to 100,000 English word Chinese explanations. Findstr is very likely to find a lot when looking up a certain Chinese word...
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 26 Posted 2006-10-12 13:58 ·  中国 北京 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
Do a small experiment :)

============ Experimental version of Chinese-English/English-Chinese translation ===============
e.bat


@echo off
echo.
echo =======================================
echo Experimental version, can only translate the following content:
echo.
echo China USA Japan Computer
echo china usa japan computer
echo =======================================
echo.

set /p go=Please enter the word to translate :
goto :%go%


:china
:China
echo china : 中国
goto :eof

:usa
:USA
echo usa : 美国
goto :eof

:japan
:Japan
echo japan : 日本
goto :eof

:computer
:Computer
echo computer : 计算机




Execution process:

C:\TEMP>e.bat

=======================================
Experimental version, can only translate the following content:

China USA Japan Computer
china usa japan computer
=======================================

Please enter the word to translate : USA
usa : 美国

C:\TEMP>e.bat

=======================================
Experimental version, can only translate the following content:

China USA Japan Computer
china usa japan computer
=======================================

Please enter the word to translate : china
china : 中国

C:\TEMP>e.bat

=======================================
Experimental version, can only translate the following content:

China USA Japan Computer
china usa japan computer
=======================================

Please enter the word to translate : China
china : 中国



Batch processing supports Chinese labels for goto (curious, just happened to try it out),
Anyway, after the :computer label is executed, there is no GOTO behind it,
The :Computer label will also execute, so just put multiple labels together!
No matter whether the input is Chinese or English,
How can it point correctly :)
This pile of related labels can only go out when encountering goto, haha :)


(Just for fun, haha...)

[ Last edited by redtek on 2006-10-12 at 14:04 ]
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 27 Posted 2006-10-13 01:04 ·  中国 湖南 娄底 电信
银牌会员
★★★
Credits 1,218
Posts 485
Joined 2006-07-21 21:24
19-year member
UID 58987
From 湖南.娄底
Status Offline
The discussions of everyone are very wonderful. If we carry out word retrieval, looking up a dictionary in real life is a very good algorithm.

Chinese dictionaries have methods like looking up by pinyin and stroke count, etc.

And English dictionaries are to look up the corresponding page numbers one by one according to the starting letter. Such an algorithm is very fast for manual lookup. If a computer does it, I believe the speed must be good.
Recent Ratings for This Post ( 1 in total) Click for details
RaterScoreTime
redtek +1 2006-11-23 07:06
Floor 28 Posted 2006-10-13 01:08 ·  中国 北京 朝阳区 联通
金牌会员
★★★★
Credits 2,902
Posts 1,147
Joined 2006-09-21 12:00
19-year member
UID 63324
Gender Male
Status Offline
Then the DOS version of the Xinhua Dictionary was born, haha...
    Redtek,一个永远在网上流浪的人……

_.,-*~'`^`'~*-,.__.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._,_.,-*~'`^`'~*-,._
Floor 29 Posted 2006-10-13 01:13 ·  中国 湖南 娄底 电信
银牌会员
★★★
Credits 1,218
Posts 485
Joined 2006-07-21 21:24
19-year member
UID 58987
From 湖南.娄底
Status Offline
Hehe~ If all English words are used as tags here, the speed still can't be too fast. The reason is that searching so many tags also takes some time.

And classifying words and skipping the previous data to start searching from the matching line, in this way, after calculating the corresponding positions of the first few letters, then searching, it may only need to search dozens or hundreds of words to find the result you want.
Floor 30 Posted 2006-10-13 01:39 ·  中国 广东 佛山 广东睿江科技有限公司
荣誉版主
★★★★
batch fan
Credits 5,226
Posts 1,737
Joined 2006-03-10 00:38
20-year member
UID 51697
From 成都
Status Offline
Finally found a suitable thesaurus format for Chinese-to-English translation. Please see the following initial code and format examples. Testing 30,000 lines of data is still just a matter of moments. Currently, only the exact query mode has been made. Due to time constraints, there is no time to test carefully. Those who have time and interest can take it to torture their own machines ^_^.

Code:

@echo off
:: Function: Query the English corresponding to Chinese words from the thesaurus
:: Requirement: The format of the thesaurus is:
:: The first line of the file must be empty, or the text where the content to be found does not appear
:: The first line of a paragraph block must be a pure English word
:: The Chinese translation content starts a new line (no pure letter line is allowed), and there can be multiple lines
:: Examples can be below the translation
:: Add a space and a Chinese half-width semicolon before and after each explanation respectively. This explanation can be in the same line or in different lines
:: Try to avoid the consecutive use of spaces and Chinese half-width semicolons in the example content below the explanation
:: Empty lines can be used to separate each paragraph block

:begin
cls
set input=
set line=
set /p input=Please enter the Chinese word to be queried (press Enter directly to exit):
if not defined input exit
for /f "tokens=1* delims=:" %%i in ('findstr /nrc:" %input%;" 词库.txt') do if not "%%j"=="" set /a line=%%i-2&& goto display
echo _________________________________
echo.
echo No record of %input% found
echo _________________________________
echo.
pause
goto begin

:display
echo _________________________________
echo.
echo %input%:
:loop
for /f "skip=%line% delims=" %%i in (词库.txt) do (
echo %%i|findstr "^*$">nul &&(echo %%i&&goto end)||(set /a line=%line%-1&& goto loop)
)
:end
echo _________________________________
echo.
pause
goto begin

Format example:

(This line is the top line, and the text where the content to be found is not allowed)
china
n.
中国; 瓷器;
说明:……
china

DOS
n.
磁盘操作系统;
DOS


The algorithm of the code has been optimized, so that the English word lines in the thesaurus content do not need to be repeatedly marked, which can reduce the volume of the thesaurus.

[ Last edited by namejm on 2006-10-13 at 06:31 ]
尺有所短,寸有所长,学好CMD没商量。
考虑问题复杂化,解决问题简洁化。
Forum Jump: