标题: [讨论]请教P处理文本的效率问题
[打印本页]
作者: genteman
时间: 2008-9-24 09:39
标题: [讨论]请教P处理文本的效率问题
要处理的文本内容剪辑如下:
<hgsdp:msisdn=8615082687528,loc;
HLR SUBSCRIBER DATA
SUBSCRIBER IDENTITY
MSISDN IMSI STATE AUTHD
8615082687528 460023826698378 CONNECTED AVAILABLE
NAM
0
LOCATION DATA
VLR ADDRESS MSRN MSC NUMBER LMSID
4-8613744818 8613744818
SGSN NUMBER
4-8613741803
END
<hgsdp:msisdn=8613980325588,loc;
HLR SUBSCRIBER DATA
SUBSCRIBER IDENTITY
MSISDN IMSI STATE AUTHD
8613980325588 460000324808588 CONNECTED AVAILABLE
NAM
0
LOCATION DATA
VLR ADDRESS MSRN MSC NUMBER LMSID
4-99292370000 99292370000
SGSN NUMBER
4-8613746991
MS PURGED IN SGSN
END
<hgsdp:msisdn=8615884933063,loc;
HLR SUBSCRIBER DATA
SUBSCRIBER IDENTITY
MSISDN IMSI STATE AUTHD
8615884933063 460028825794364 CONNECTED AVAILABLE
NAM
0
LOCATION DATA
VLR ADDRESS MSRN MSC NUMBER LMSID
4-8613741249 8613741249
MS PURGED IN VLR
SGSN NUMBER
4-8613742297
END
<hgsdp:msisdn=8613608272886,loc;
HLR SUBSCRIBER DATA
SUBSCRIBER IDENTITY
MSISDN IMSI STATE AUTHD
8613608272886 460008480450049 CONNECTED AVAILABLE
NAM
0
LOCATION DATA
VLR ADDRESS MSRN MSC NUMBER LMSID
4-8613744818 8613744818
SGSN NUMBER
UNKNOWN
END
<hgsdp:msisdn=8613882685885,loc;
HLR SUBSCRIBER DATA
SUBSCRIBER IDENTITY
MSISDN IMSI STATE AUTHD
8613882685885 460028825960924 CONNECTED AVAILABLE
NAM
1
LOCATION DATA
VLR ADDRESS MSRN MSC NUMBER LMSID
4-8613740848 8613740848
END
我要选出含有MS PURGED IN SGSN、MS PURGED IN VLR、UNKNOWN或无SGSN NUMBER字符串对应的<hgsdp开头的这行。
我的代码如下:
CODE: [Copy to clipboard]
::findmsisdn.bat - 查找固定模式文本内容中含有MS PURGED IN SGSN、MS PURGED IN VLR、UNKNOWN或无SGSN NUMBER字符串中的第一行
::genteman - 2009-09-22 -CMD@WinXP Pro
::contact amdaround@163.com
::错误处理部分借用了Will Sort的代码,在此表示感谢!
@echo off & setlocal enabledelayedexpansion
if [%1]==[:error] goto :error
if [%1]==[] %0 :error 0 "Incomplete argument - Usage:%~n0 <filename>"
if not exist "%~1" %0 :error 1 "%~n1 does not exist"
for /f "delims=" %%i in (%~s1) do (
echo "%%i" | find "hgsdp" >nul && set tmpstr=%%i
set str=!str!,%%i
if %%i==END (
echo "!str!" | find "SGSN NUMBER" >nul || echo !tmpstr! >>"%~dp1NO SGSN NUMBER.txt"
echo "!str!" | find "MS PURGED IN SGSN" >nul && echo !tmpstr! >>"%~dp1MS PURGED IN SGSN.txt"
echo "!str!" | find "MS PURGED IN VLR" >nul && echo !tmpstr! >>"%~dp1MS PURGED IN VLR.txt"
echo "!str!" | find "SGSN NUMBER,UNKNOWN" >nul && echo !tmpstr! >>"%~dp1"UNKNOWN.txt
set str=
)
)
goto :EOF
:error - 错误处理
echo.
echo Error %2: %3
echo.
exit /b %2
现在的问题是运行效率特别低,对一个大小超过10M的文件处理需要至少24小时的执行时间,如果是在UNIX下用sed来处理的话只需要几秒钟就能搞定。
上面的代码问题是因为每次循环都要进行多次判断,所以造成执行效率低下,但我实在想不出有更好的算法。
请教大家有没有优化的代码?