标题: [已结]文本内容提取
[打印本页]
作者: lxh623
时间: 2009-4-11 22:11
标题: [已结]文本内容提取
一个文件夹有许多个文本,想批量地提取一些文本内容。每个文本都比较大10M左右。
要求:(第一、第二主要说明任务,不表示顺序。)
第一,提取“UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US) ”[包含]和下一个“PATENT (Number; Kind; Date): ”[不包含]之间的。
第二,提取含有下面内容的行:
BASIC-PATENT:
PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU)
文本部分示例如下:(一个文本可能有一百个这样的段落(以“BASIC-PATENT:”分隔的。)
BASIC-PATENT:
European Patent Office (EP) 277,004; A1; August 03, 1988
PATENT FAMILY
Number of Patents: 276
TAIWAN (TW)
PATENT (Number; Kind; Date): Taiwan (TW) 464,511; B; November 21, 2001
TITLE: Pressure-sensitive adhesive composition suitable for use in a transdermal drug delivery system and preparation method therefor
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1994-178558; A; January 07, 1994
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): Taiwan (TW) 19958410044; A; January 19, 1995
INT-CL: A61K9/00 (Section A, Class 61, Sub-class K, Group 9, Sub-group 00)
A61K31/74 (Section A, Class 61, Sub-class K, Group 31, Sub-group 74)
ABST:
A blend of at least three polymers, including a soluble polyvinylpyrrolidone, in combination with a drug provides a pressure-sensitive adhesive composition for a transdermal drug delivery system in which the drug is delivered from the pressure-sensitive adhesive composition and through dermis when the pressure-sensitive adhesive composition is in contact with human skin. Soluble polyvinylpyrrolidone increases the solubility of drug without negatively affecting the adhesivity of the composition or the rate of drug delivery from the pressure-sensitive adhesive composition.
UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US) 5,958,446; A; September 28, 1999
TITLE: SOLUBILITY PARAMETER BASED DRUG DELIVERY SYSTEM AND METHOD FOR ALTERING DRUG SATURATION CONCENTRATION
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1995-433754; A; May 04, 1995
United States of America (US) 1991-722342; A1; June 27, 1991
United States of America (US) 1989-295847; A2; January 11, 1989
United States of America (US) 1988-164482; A2; March 04, 1988
United States of America (US) 1991-671709; A2; April 02, 1991
World Intellectual Property Organisation (WO) 1990US9001750; W; March 28, 1990
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): United States of America (US) 1995433754; A; May 04, 1995
INT-CL: A61F13/02 (Section A, Class 61, Sub-class F, Group 13, Sub-group 02)
NAT-CL: 424448; X426449
EURO-CL: A61F13/02M; A61K9/70E; A61L15/18; A61L15/58; A61L15/58M+C08L33/00; A61L15/58M+C08L31/04
DERWENT NUMBER: C1989-106432; C1990-225696; C1991-230072; C1991-310376; C1993-036110; C1994-109332; C1995-044946; C1997-558092
CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X; 116(10)091389M; 118(16)154566F; 120(26)331144F; 128(15)184708C
ABST:
The method of adjusting the saturation concentration of a drug in a transdermal composition for application to the dermis, which comprises mixing polymers having differing solubility parameters, so as to modulate the delivery of the drug. This results in the ability to achieve a predetermined permeation rate of the drug into and through the dermis. In one embodiment, a dermal composition of the present invention comprises a drug, an acrylate polymer, and a polysiloxane. The dermal compositions can be produced by a variety of methods known in the preparation of drug-containing adhesive preparations, including the mixing of the polymers, drug, and additional ingredients in solution, followed by removal of the processing solvents. The method and composition of this invention permit selectable loading of the drug into the dermal formulation and adjustment of the delivery rate of the drug from the composition through the dermis, while maintaining acceptable shear, tack, and peel adhesive properties.
PATENT (Number; Kind; Date): United States of America (US) 5,300,291; A; April 05, 1994
TITLE: METHOD AND DEVICE FOR THE RELEASE OF DRUGS TO THE SKIN
INVENTOR: SABLOTSKY STEVEN, United States of America (US); GENTILE JOSEPH A, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1989-295847; A2; January 11, 1989
United States of America (US) 1988-164482; A2; March 04, 1988
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): United States of America (US) 1991671709; A; April 02, 1991
INT-CL: A61K31/74 (Section A, Class 61, Sub-class K, Group 31, Sub-group 74)
NAT-CL: 424 7818; X424485; X424484; X424448
DERWENT NUMBER: C89-106432; C90-225696; C91-230072
CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X
ABST:
A method of increasing the adhesiveness of a shaped pressure sensitive adhesive, comprising adding an adhesiveness and drug release increasing amount of a clay to said adhesive prior to casting of the adhesive. A dermal composition comprising a drug, a pressure sensitive adhesive, an adhesiveness increasing amount of a clay and a solvent. A dermal composition comprising a drug, a multipolymer of ethylene vinyl acetate, an acrylic polymer, a natural or synthetic rubber and a clay, along with optional ingredients known for use in transdermal drug delivery systems.
WORLD INTELLECTUAL PROPERTY ORGANISATION (WO)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO) 9,640,086; A3; February 13, 1997
TITLE: COMPOSITIONS AND METHODS FOR TOPICAL ADMINISTRATION OF PHARMACEUTICALLY ACTIVE AGENTS
INVENTOR: KANIOS DAVID P, United States of America (US); GENTILE JOSEPH A, United States of America (US); MANTELLE JUAN A, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1995-477361; A; June 07, 1995
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US); KANIOS DAVID P, United States of America (US); GENTILE JOSEPH A, United States of America (US); MANTELLE JUAN A, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
APPLICATION (Number; Kind; Date): World Intellectual Property Organisation (WO) 199608294; A; June 05, 1996
INT-CL: A61K9/70 (Section A, Class 61, Sub-class K, Group 9, Sub-group 70)
EURO-CL: A61K9/00M18D; A61K9/70E
DESIGNATED COUNTRIES: Albania (AL); Armenia (AM); Austria (AT); Australia (AU); Azerbaijan (AZ); Barbados (BB); Bulgaria (BG); Brazil (BR); Belarus (BY); Canada (CA); Switzerland (CH); People's Republic of China (CN); Czech Republic (CZ); Germany (DE); Denmark (DK); Estonia (EE); Spain (ES); Finland (FI); United Kingdom (GB); Georgia (GE); Hungary (HU); Israel (IL); Iceland (IS); Japan (JP); Kenya (KE); Kyrgyzstan (KG); Democratic Peoples Rep. of Korea (KP); Republic of Korea (KR); Kazakhstan (KZ); Sri Lanka (LK); Liberia (LR); Lesotho (LS); Lithuania (LT); Luxembourg (LU); Latvia (LV); Moldova, Republic of (MD); Madagascar (MG); former Yugoslav Republic of Macedonia (MK); Mongolia (MN); Malawi (MW); Mexico (MX); Norway (NO); New Zealand (NZ); Poland (PL); Portugal (PT); Romania (RO); Russian Federation (RU); Sudan (SD); Sweden (SE); Singapore (SG); Slovenia (SI); Slovakia (SK); Tajikistan (TJ); Turkmenistan (TM); Turkey (TR); Trinidad and Tobago (TT); Ukraine (UA); Uganda (UG); United States of America (US); Uzbekistan (UZ); Vietnam (VN); Armenia (AM); Azerbaijan (AZ); Belarus (BY); Kyrgyzstan (KG); Kazakhstan (KZ); Moldova, Republic of (MD); Russian Federation (RU); Tajikistan (TJ); Turkmenistan (TM)
DESIGNATED STATES REGISTERED PATENT: Kenya (KE); Lesotho (LS); Malawi (MW); Sudan (SD); Swaziland (SZ); Uganda (UG); Austria (AT); Belgium (BE); Switzerland (CH); Germany (DE); Denmark (DK); Spain (ES); Finland (FI); France (FR); United Kingdom (GB); Greece (GR); Ireland (IE); Italy (IT); Luxembourg (LU); Monaco (MC); Netherlands (NL); Portugal (PT); Sweden (SE); Burkina Faso (BF); Benin (BJ); Central African Empire (CF); Congo (CG); Ivory Coast (CI); Cameroon (CM); Gabon (GA)
LANGUAGE: English
DERWENT NUMBER: C1997-051830
CHEMICAL ABSTRACT NUMBER: 126(09)122452Q; 128(15)184708C
FILING DETAILS: WO 300000; Without international search report and to be republished upon receipt of that report;
ABST:
Compositions for topical application comprising a therapeutically effective amount of a pharmaceutical agent(s), a pharmaceutically acceptable bioadhesive carrier, a solvent for the pharmaceutical agent(s) in the carrier and a clay, and methods of administering the pharmaceutical agents to a mammal are disclosed.
谢谢您!
附件
简版
http://www.namipan.com/d/1.txt/f ... 5e01d72bf7d77bb0000
全版:
http://www.namipan.com/d/sour.tx ... c2dc733dcb12471ce00
[
Last edited by lxh623 on 2009-4-13 at 23:30 ]
作者: freeants001
时间: 2009-4-11 22:33
FINDSTR [/B] [/E] [/L] [/R] [/S] [/I] [/X] [/V] [/N] [/M] [/O] [/F:file]
[/C:string] [/G:file] [/D:dir list] [/A:color attributes] [/OFF[LINE]]
strings [[drive:][path]filename[ ...]]
/B 在一行的开始配对模式。
/E 在一行的结尾配对模式。
/L 按字使用搜索字符串。
/R 将搜索字符串作为一般表达式使用。
/S 在当前目录和所有子目录中搜索
匹配文件。
/I 指定搜索不分大小写。
/X 打印完全匹配的行。
/V 只打印不包含匹配的行。
/N 在匹配的每行前打印行数。
/M 如果文件含有匹配项,只打印其文件名。
/O 在每个匹配行前打印字符偏移量。
/P 忽略有不可打印字符的文件。
/OFF[LINE] 不跳过带有脱机属性集的文件。
/A:attr 指定有十六进位数字的颜色属性。请见 "color /?"
/F:file 从指定文件读文件列表 (/ 代表控制台)。
/C:string 使用指定字符串作为文字搜索字符串。
/G:file 从指定的文件获得搜索字符串。 (/ 代表控制台)。
/D:dir 查找以分号为分隔符的目录列表
strings 要查找的文字。
[drive:][path]filename
指定要查找的文件。
除非参数有 /C 前缀,请使用空格隔开搜索字符串。
例如: 'FINDSTR "hello there" x.y' 在文件 x.y 中寻找 "hello" 或
"there" 。 'FINDSTR /C:"hello there" x.y' 文件 x.y 寻找
"hello there"。
一般表达式的快速参考:
. 通配符: 任何字符
* 重复: 以前字符或类别出现零或零以上次数
^ 行位置: 行的开始
$ 行位置: 行的终点
[class] 字符类别: 任何在字符集中的字符
[^class] 补字符类别: 任何不在字符集中的字符
[x-y] 范围: 在指定范围内的任何字符
\x Escape: 元字符 x 的文字用法
\<xyz 字位置: 字的开始
xyz\> 字位置: 字的结束
作者: yishanju
时间: 2009-4-12 00:47
最好能传一些文本上来,
看着晕得不行,不知道你要干什么
还有要得到怎样的格式
[
Last edited by yishanju on 2009-4-12 at 00:52 ]
作者: yishanju
时间: 2009-4-12 01:04
推荐用正则查找替换工具FR 来处理
可以方便的把其它杂质信息过滤掉,只得到想要的内容
作者: netbenton
时间: 2009-4-12 01:53
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set bg=PATENT (Number; Kind; Date): United States of America (US)
set en=PATENT (Number; Kind; Date):
set li1=PATENT (Number; Kind; Date):
set li2=BASIC-PATENT:
set "ver="
(for /f "delims=" %%a in (sour.txt) do (set "str=%%a"&call :sub %%a))>dest.txt
start dest.txt
pause
goto :eof
:sub
if defined ver (echo.!str!
if not "!str:%en%=!"=="!str!" set ver=
goto :eof)
if not "!str:%bg%=!"=="!str!" (set ver=y&echo !ho!&echo.!str!&goto :eof)
if not "!str:%li1%=!"=="!str!" echo !str!
if not "!str:%li2%=!"=="!str!" echo !str!
goto :eof
作者: freeants001
时间: 2009-4-12 04:25
复制保存为jsConvert.js
假定要转换的文件为a.txt
在命令行下输入: cscript /nologo jsConvert.js a.txt
转换后的文件为 a.txt__转换后.txt
File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.opentextfile(File_Path,1);sss=fl.readall();
fl=fso.opentextfile(File_Path+"_转换后.txt",2,true);
var re=/\r\nUNITED STATES OF AMERICA \(US\)[\s]*PATENT \(Number; Kind; Date\): United States of America \(US\)[\s\S]*?\r\nPATENT \(Number; Kind; Date\)\:.*|\r\nPATENT \(Number; Kind; Date\)\:.*/g
while ((arr=re.exec(sss))!=null)osss=osss+arr+"\r\n";
fl.write(osss);
作者: lxh623
时间: 2009-4-12 10:25
Quote: |
Originally posted by yishanju at 2009-4-12 00:47:
最好能传一些文本上来,
看着晕得不行,不知道你要干什么
还有要得到怎样的格式
[ Last edited by yishanju on 2009-4-12 at 00:52 ] |
|
新手不让上传附件,待会儿我上传。
作者: lxh623
时间: 2009-4-12 10:30
Quote: |
Originally posted by netbenton at 2009-4-12 01:53:
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set bg=PATENT (Number; Kind; Date): United States of America (US)
set en=PATENT (Number; Kind; Date):
set li ... |
|
谢谢您!
1、我试了一下,到59节左右出现无处“此时不该有 〉”。
2、另外,PATENT (Number; Kind; Date):可不可以只取下面所述几种,像阿根廷之类,需要再去删除。当然,若是这样,也行。
3、怎么使得该批处理自动处理整个文件夹,原文编辑。
PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU)
[
Last edited by lxh623 on 2009-4-12 at 10:51 ]
作者: lxh623
时间: 2009-4-12 10:32
Quote: |
Originally posted by freeants001 at 2009-4-12 04:25:
复制保存为jsConvert.js
假定要转换的文件为a.txt
在命令行下输入: cscript /nologo jsConvert.js a.txt
转换后的文件为 a.txt__转换后.txt
[code]File_Path=WScript.argum ... |
|
谢谢您!
本人愚钝,试了试,还是没搞出来。另外,有没有办法处理整个文件夹。
作者: freeants001
时间: 2009-4-12 10:40
Quote: |
1、我试了一下,到59节左右出现无处“此时不该有 〉”。 |
|
说详细些
另外是不是只以"PATENT (Number; Kind; Date): "开头的只保留下面这些国家?
Quote: |
PATENT (Number; Kind; Date): European Patent Office (EP)
PATENT (Number; Kind; Date): United States of America (US)
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO)
PATENT (Number; Kind; Date): Canada (CA)
PATENT (Number; Kind; Date): People's Republic of China (CN)
PATENT (Number; Kind; Date): Japan (JP)
PATENT (Number; Kind; Date): Republic of Korea (KR)
PATENT (Number; Kind; Date): United Kingdom (GB)
PATENT (Number; Kind; Date): Germany (DE)
PATENT (Number; Kind; Date): France (FR)
PATENT (Number; Kind; Date): Russian Federation (RU) |
|
作者: lxh623
时间: 2009-4-12 10:47
Quote: |
Originally posted by freeants001 at 2009-4-12 10:40:
说详细些
另外是不是只以"PATENT (Number; Kind; Date): "开头的只保留下面这些国家?
|
|
不好意思,八楼引用错误,是批处理,不是脚本。
脚本,我存为UNICODE,可以运行,但是数据比较乱。最好顺序不变。
速度很快,PATENT (Number; Kind; Date): United States of America (US)
与原文一样,出现1186次。
但是,basic patent本来100处,只出来6处。
批处理的问题:
第一、错了,“此时不应有〈 ”。
第二、是的,只保留这十来个格式。
待会儿上传文件。
[
Last edited by lxh623 on 2009-4-12 at 11:20 ]
作者: netbenton
时间: 2009-4-12 11:28
你贴出来的部分我是测试通过了的,要整过都能通过,你还是传上来再说吧
作者: lxh623
时间: 2009-4-12 11:37
Quote: |
Originally posted by netbenton at 2009-4-12 11:28:
你贴出来的部分我是测试通过了的,要整过都能通过,你还是传上来再说吧 |
|
附件已上传,见顶楼下面。谢谢您们!
谢谢netbenton,最后测试,尽管仍然出来提示,但是很好,出来结果了。
花了十分钟。
1、请问,多文件处理可以吗?
2、十来个国家限定,行不行?或者,再麻烦您做一个批处理,可以批量删除文件夹内所有txt文件中不需要的一些行。比如文本a中输入不需要的:
PATENT (Number; Kind; Date): Austria (AT)
PATENT (Number; Kind; Date): Argentina (AR) 等等。
[
Last edited by lxh623 on 2009-4-12 at 12:00 ]
作者: netbenton
时间: 2009-4-12 12:18
已经修改过了,你的附件我下不了,你自己测试一下吧,
贴出来的数据已经通过了测试,这次应该可以了的
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set bg=PATENT (Number; Kind; Date): United States of America (US)
set en=PATENT (Number; Kind; Date):
set li1=PATENT (Number; Kind; Date):
set li10=European Patent Office (EP)
set li11=United States of America (US)
set li12=World Intellectual Property Organisation (WO)
set li13=Canada (CA)
set li14=People's Republic of China (CN)
set li15=Japan (JP)
set li16=Republic of Korea (KR)
set li17=United Kingdom (GB)
set li18=Germany (DE)
set li19=France (FR)
set li21=Russian Federation (RU)
::国家判断只对比了前面10个字节,应该可以了的。
set li2=BASIC-PATENT:
set "ver="
(for /f "delims=" %%a in (sour.txt) do (set "str=%%a"&call :sub))>dest.txt
start dest.txt
pause
goto :eof
:sub
if defined ver (echo.!str!
if not "!str:%en%=!"=="!str!" set ver=
goto :eof)
if not "!str:%bg%=!"=="!str!" (set ver=y&echo !ho!&echo.!str!&goto :eof)
if not "!str:%li1%=!"=="!str!" (
set "coc=!str:*%li1% =!"
for /l %%a in (10,1,21) do (if "!li%%a:~0,10!"=="!coc:~0,10!" echo !str!&goto :eof)
)
if not "!str:%li2%=!"=="!str!" echo !str!
goto :eof
作者: netbenton
时间: 2009-4-12 12:39
改了一下,效率提高了一点,加入了整个目录处理功能
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set bg=United States of America (US)
set en=PATENT (Number; Kind; Date):
set li2=BASIC-PATENT:
set li10=European Patent Office (EP)
set li11=Russian Federation (RU)
set li12=World Intellectual Property Organisation (WO)
set li13=Canada (CA)
set li14=People's Republic of China (CN)
set li15=Japan (JP)
set li16=Republic of Korea (KR)
set li17=United Kingdom (GB)
set li18=Germany (DE)
set li19=France (FR)
::国家判断只对比了前面10个字节,应该可以了的。
for /f %%a in ('dir /b *.txt') do (
set "ver="
(for /f "delims=" %%d in (%%a) do (set "str=%%d"&call :sub))>%%~na_dest.txt
start %%~na_dest.txt
)
echo 处理完成
pause
goto :eof
:sub
if defined ver (echo.!str!
if not "!str:%en%=!"=="!str!" set ver=
goto :eof)
if not "!str:%li2%=!"=="!str!" echo !str!&goto :eof
if "!str:%en%=!"=="!str!" (goto :eof) else (
set "coc=!str:*%en% =!"
if "!bg:~0,10!"=="!coc:~0,10!" (set ver=y&echo !ho!&echo.!str!&goto :eof)
for /l %%a in (10,1,19) do (if "!li%%a:~0,10!"=="!coc:~0,10!" echo !str!&goto :eof)
)
goto :eof
[
Last edited by netbenton on 2009-4-12 at 10:51 ]
作者: lxh623
时间: 2009-4-12 12:48
Quote: |
Originally posted by netbenton at 2009-4-12 12:18:
已经修改过了,你的附件我下不了,你自己测试一下吧,
贴出来的数据已经通过了测试,这次应该可以了的
[code]@echo off&setlocal enabledelayedexpansio ... |
|
麻烦您一下,可能我的表述不十分清楚。
有个问题:
UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US)
开始部分完全与之一样,不是分为两段。
没有“UNITED STATES OF AMERICA (US)”,只是“PATENT (Number; Kind; Date): United States of America (US)”后面的文摘不需要。不知能不能办到?
[
Last edited by lxh623 on 2009-4-12 at 12:50 ]
作者: netbenton
时间: 2009-4-12 12:55
不明白,如果只是小改动,麻烦你自己搞一下了,
我要下了。。。
[
Last edited by netbenton on 2009-4-12 at 10:58 ]
作者: lxh623
时间: 2009-4-12 22:21
EmEditor中,UNITED STATES OF AMERICA (US)/n/nPATENT (Number; Kind; Date): United States of America (US)
这样的正则表达式作为开始字符串。怎样做?求求您!
作者: yishanju
时间: 2009-4-12 22:29
得用跨行正则表达式,WINDOWS 的回车换行是\r\n
作者: lxh623
时间: 2009-4-12 22:52
ho=UNITED STATES OF AMERICA (US)\r\n\r\nPATENT (Number; Kind; Date): United States of America (US)
替换code第一行依然如故,希望“TITLE”出来100个,现在出来159个。
作者: freeants001
时间: 2009-4-12 23:00
用正则表达式很容易实现,只是不知楼主究竟要保留那些内容,都搞糊涂了~~~
作者: yishanju
时间: 2009-4-12 23:05
光是网上的文字表达很吃力啊
哈哈
作者: yishanju
时间: 2009-4-12 23:12
楼主是怎么学会正则表达式的,好奇问下
我自己是在学PYTHON 的时候学的
作者: freeants001
时间: 2009-4-12 23:20
楼主是怎么学会正则表达式的,好奇问下
我自己是在学PYTHON 的时候学的
楼主应该不会正则表达式的,会的话,就不会来求助了

作者: lxh623
时间: 2009-4-12 23:47
Quote: |
Originally posted by freeants001 at 2009-4-12 23:00:
用正则表达式很容易实现,只是不知楼主究竟要保留那些内容,都搞糊涂了~~~ |
|
蓝色部分(“UNITED STATES OF AMERICA (US)\r\n\r\nPATENT (Number; Kind; Date): United States of America (US)”,到下一个“PATENT (Number; Kind; Date): ”),以及要求2所有行。
正则表达式懂一点点,是因为要制作文献软件Biblioscape过滤器。正则表达式半懂不懂,但是批处理还是不太一样。俺学化学的,外行一个,向诸位学习致敬。
[
Last edited by lxh623 on 2009-4-12 at 23:50 ]
作者: yishanju
时间: 2009-4-13 00:07
批处理不支持正则表达式。。。。
需要用像我说的FR 那样的第三方命令行工具。
findstr 不完全支持正则表达式。
作者: yishanju
时间: 2009-4-13 00:13
是不是处理过后还保留原来的内容顺序
作者: freeants001
时间: 2009-4-13 00:13
Quote: |
Originally posted by yishanju at 2009-4-13 00:07:
批处理不支持正则表达式。。。。
需要用像我说的FR 那样的第三方命令行工具。
findstr 不完全支持正则表达式。 |
|
vbs,js中有啊,而且是系统自带的~~
作者: freeants001
时间: 2009-4-13 00:21
这个不只是否符合要求
File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.opentextfile(File_Path,1);sss=fl.readall();
fl=fso.opentextfile(File_Path+"_转换后.txt",2,true);
var re=/\r\nUNITED STATES OF AMERICA \(US\)\s*PATENT \(Number; Kind; Date\): United States of America \(US\)[\s\S]*?\r\nPATENT \(Number; Kind; Date\)\:.*|PATENT \(Number; Kind; Date\): European Patent Office \(EP\).*|PATENT \(Number; Kind; Date\): United States of America \(US\).*|PATENT \(Number; Kind; Date\): World Intellectual Property Organisation \(WO\).*|PATENT \(Number; Kind; Date\): Canada \(CA\).*|PATENT \(Number; Kind; Date\): People's Republic of China \(CN\).*|PATENT \(Number; Kind; Date\): Japan \(JP\).*|PATENT \(Number; Kind; Date\): Republic of Korea \(KR\).*|PATENT \(Number; Kind; Date\): United Kingdom \(GB\).*|PATENT \(Number; Kind; Date\): Germany \(DE\).*|PATENT \(Number; Kind; Date\): France \(FR\).*|PATENT \(Number; Kind; Date\): Russian Federation \(RU\).*\:.*/g
while ((arr=re.exec(sss))!=null)osss=osss+arr+"\r\n";
fl.write(osss);
WScript.echo("ok")
作者: lxh623
时间: 2009-4-13 00:32
Quote: |
Originally posted by freeants001 at 2009-4-13 00:21:
这个不只是否符合要求
[code]File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.op ... |
|
谢谢您!
怎样操作?不是bat?仍然JS?
作者: netbenton
时间: 2009-4-13 00:53
Quote: |
Originally posted by lxh623 at 2009-4-12 10:48:
麻烦您一下,可能我的表述不十分清楚。
有个问题:
UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US)
开始部分完全与之一 ... |
|
这是处理你贴出的数据的结果,你指一下看哪里还有问问题
- BASIC-PATENT:
- UNITED STATES OF AMERICA (US)
- PATENT (Number; Kind; Date): United States of America (US) 5,958,446; A; September 28, 1999
- TITLE: SOLUBILITY PARAMETER BASED DRUG DELIVERY SYSTEM AND METHOD FOR ALTERING DRUG SATURATION CONCENTRATION
- INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
- PRIORITY (Number; Kind; Date):
- United States of America (US) 1995-433754; A; May 04, 1995
- United States of America (US) 1991-722342; A1; June 27, 1991
- United States of America (US) 1989-295847; A2; January 11, 1989
- United States of America (US) 1988-164482; A2; March 04, 1988
- United States of America (US) 1991-671709; A2; April 02, 1991
- World Intellectual Property Organisation (WO) 1990US9001750; W; March 28, 1990
- PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
- APPLICATION (Number; Kind; Date): United States of America (US) 1995433754; A; May 04, 1995
- INT-CL: A61F13/02 (Section A, Class 61, Sub-class F, Group 13, Sub-group 02)
- NAT-CL: 424448; X426449
- EURO-CL: A61F13/02M; A61K9/70E; A61L15/18; A61L15/58; A61L15/58M+C08L33/00; A61L15/58M+C08L31/04
- DERWENT NUMBER: C1989-106432; C1990-225696; C1991-230072; C1991-310376; C1993-036110; C1994-109332; C1995-044946; C1997-558092
- CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X; 116(10)091389M; 118(16)154566F; 120(26)331144F; 128(15)184708C
- ABST:
- The method of adjusting the saturation concentration of a drug in a transdermal composition for application to the dermis, which comprises mixing polymers having differing solubility parameters, so as to modulate the delivery of the drug. This results in the ability to achieve a predetermined permeation rate of the drug into and through the dermis. In one embodiment, a dermal composition of the present invention comprises a drug, an acrylate polymer, and a polysiloxane. The dermal compositions can be produced by a variety of methods known in the preparation of drug-containing adhesive preparations, including the mixing of the polymers, drug, and additional ingredients in solution, followed by removal of the processing solvents. The method and composition of this invention permit selectable loading of the drug into the dermal formulation and adjustment of the delivery rate of the drug from the composition through the dermis, while maintaining acceptable shear, tack, and peel adhesive properties.
- PATENT (Number; Kind; Date): United States of America (US) 5,300,291; A; April 05, 1994
- PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO) 9,640,086; A3; February 13, 1997
作者: freeants001
时间: 2009-4-13 00:57
复制保存为.js文件,直接把要处理的文件拖到该JS文件的图标上
File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.opentextfile(File_Path,1);sss=fl.readall();
fl=fso.opentextfile(File_Path+"_转换后.txt",2,true);
var re=/\r\nUNITED STATES OF AMERICA \(US\)\s*PATENT \(Number; Kind; Date\): United States of America \(US\)[\s\S]*?\r\nPATENT \(Number; Kind; Date\)\:.*|PATENT \(Number; Kind; Date\): European Patent Office \(EP\).*|PATENT \(Number; Kind; Date\): United States of America \(US\).*|PATENT \(Number; Kind; Date\): World Intellectual Property Organisation \(WO\).*|PATENT \(Number; Kind; Date\): Canada \(CA\).*|PATENT \(Number; Kind; Date\): People's Republic of China \(CN\).*|PATENT \(Number; Kind; Date\): Japan \(JP\).*|PATENT \(Number; Kind; Date\): Republic of Korea \(KR\).*|PATENT \(Number; Kind; Date\): United Kingdom \(GB\).*|PATENT \(Number; Kind; Date\): Germany \(DE\).*|PATENT \(Number; Kind; Date\): France \(FR\).*|PATENT \(Number; Kind; Date\): Russian Federation \(RU\).*\:.*/g
while ((arr=re.exec(sss))!=null)osss=osss+arr+"\r\n";
fl.write(osss);
WScript.echo("ok")
作者: lxh623
时间: 2009-4-13 02:51
Quote: |
Originally posted by netbenton at 2009-4-13 00:53:
这是处理你贴出的数据的结果,你指一下看哪里还有问问题
- BASIC-PATENT:
- UNITED STATES OF AMERICA (US)
- PATENT (Number; Kind; Date): United States of Am ...
|
|
同一BASIC PATENT内可能有多个“PATENT (Number; Kind; Date): United States of Am ”,但只有第一个前面有“UNITED STATES OF AMERICA (US)”。
据说,批处理不支持正则表达式,不知道如何解决?
贴出来的部分,因为考虑版面,节略太多。
再次谢谢您!
作者: lxh623
时间: 2009-4-13 02:56
Quote: |
Originally posted by freeants001 at 2009-4-13 00:57:
复制保存为.js文件,直接把要处理的文件拖到该JS文件的图标上
[code]File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject(&quo ... |
|
很快!“TITLE”和“UNITED STATES OF AMERICA (US)\r\n\r\nPATENT (Number; Kind; Date): United States of America (US)”,都是100条。
遗憾的是,“basic patent”只有50几条,顺序也还不完美。开始就是“PATENT (Number; Kind; Date): ”。
[
Last edited by lxh623 on 2009-4-13 at 02:57 ]
作者: freeants001
时间: 2009-4-13 03:10
下面这行也要吗??
-------------------------------------------------
BASIC-PATENT:
作者: freeants001
时间: 2009-4-13 03:22
一楼示例处理后是不是要这种结果
PATENT (Number; Kind; Date): Taiwan (TW) 464,511; B; November 21, 2001 //这行不要
BASIC-PATENT:
UNITED STATES OF AMERICA (US)
PATENT (Number; Kind; Date): United States of America (US) 5,958,446; A; September 28, 1999
TITLE: SOLUBILITY PARAMETER BASED DRUG DELIVERY SYSTEM AND METHOD FOR ALTERING DRUG SATURATION CONCENTRATION
INVENTOR: MIRANDA JESUS, United States of America (US); SABLOTSKY STEVEN, United States of America (US)
PRIORITY (Number; Kind; Date):
United States of America (US) 1995-433754; A; May 04, 1995
United States of America (US) 1991-722342; A1; June 27, 1991
United States of America (US) 1989-295847; A2; January 11, 1989
United States of America (US) 1988-164482; A2; March 04, 1988
United States of America (US) 1991-671709; A2; April 02, 1991
World Intellectual Property Organisation (WO) 1990US9001750; W; March 28, 1990
PATENT ASSIGNEE: NOVEN PHARMA, United States of America (US)
APPLICATION (Number; Kind; Date): United States of America (US) 1995433754; A; May 04, 1995
INT-CL: A61F13/02 (Section A, Class 61, Sub-class F, Group 13, Sub-group 02)
NAT-CL: 424448; X426449
EURO-CL: A61F13/02M; A61K9/70E; A61L15/18; A61L15/58; A61L15/58M+C08L33/00; A61L15/58M+C08L31/04
DERWENT NUMBER: C1989-106432; C1990-225696; C1991-230072; C1991-310376; C1993-036110; C1994-109332; C1995-044946; C1997-558092
CHEMICAL ABSTRACT NUMBER: 111(10)084137W; 114(04)030158X; 116(10)091389M; 118(16)154566F; 120(26)331144F; 128(15)184708C
ABST:
The method of adjusting the saturation concentration of a drug in a transdermal composition for application to the dermis, which comprises mixing polymers having differing solubility parameters, so as to modulate the delivery of the drug. This results in the ability to achieve a predetermined permeation rate of the drug into and through the dermis. In one embodiment, a dermal composition of the present invention comprises a drug, an acrylate polymer, and a polysiloxane. The dermal compositions can be produced by a variety of methods known in the preparation of drug-containing adhesive preparations, including the mixing of the polymers, drug, and additional ingredients in solution, followed by removal of the processing solvents. The method and composition of this invention permit selectable loading of the drug into the dermal formulation and adjustment of the delivery rate of the drug from the composition through the dermis, while maintaining acceptable shear, tack, and peel adhesive properties.
PATENT (Number; Kind; Date): United States of America (US) 5,300,291; A; April 05, 1994
PATENT (Number; Kind; Date): World Intellectual Property Organisation (WO) 9,640,086; A3; February 13, 1997
作者: lxh623
时间: 2009-4-13 03:22
Quote: |
Originally posted by freeants001 at 2009-4-13 03:10:
下面这行也要吗??
-------------------------------------------------
BASIC-PATENT: |
|
是的!
台湾那行不要。其余就是想要的,空行可以删除。
js不能对整个文件夹操作吧?
谢谢!
[
Last edited by lxh623 on 2009-4-13 at 03:36 ]
作者: freeants001
时间: 2009-4-13 03:24
Quote: |
Originally posted by lxh623 at 2009-4-13 03:22:
是的!
台湾那行不要。
js不能对整个文件夹操作吧?
谢谢!
[ Last edited by lxh623 on 2009-4-13 at 03:23 ] |
|
可以处理子文件夹,包括子文件夹
作者: freeants001
时间: 2009-4-13 03:29
发现附件中下面这行前有一个空格,但你贴出的示例中没有
BASIC-PATENT:
作者: lxh623
时间: 2009-4-13 03:36
Quote: |
Originally posted by freeants001 at 2009-4-13 03:29:
发现附件中下面这行前有一个空格,但你贴出的示例中没有
BASIC-PATENT: |
|
不好意思,没有仔细检查。
作者: freeants001
时间: 2009-4-13 03:49
下面代码应该符合要求了吧?
要批量处理请文件夹,自己结合for命令把
File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new ActiveXObject("scripting.filesystemobject");
var fl=fso.opentextfile(File_Path,1);sss=fl.readall();
fl=fso.opentextfile(File_Path+"_转换后.txt",2,true);
var re=/(?:^|\r\n) ?BASIC-PATENT:|\r\nUNITED STATES OF AMERICA \(US\)\s*PATENT \(Number; Kind; Date\): United States of America \(US\)[\s\S]*?\r\nPATENT \(Number; Kind; Date\)\:.*|PATENT \(Number; Kind; Date\): European Patent Office \(EP\).*|PATENT \(Number; Kind; Date\): United States of America \(US\).*|PATENT \(Number; Kind; Date\): World Intellectual Property Organisation \(WO\).*|PATENT \(Number; Kind; Date\): Canada \(CA\).*|PATENT \(Number; Kind; Date\): People's Republic of China \(CN\).*|PATENT \(Number; Kind; Date\): Japan \(JP\).*|PATENT \(Number; Kind; Date\): Republic of Korea \(KR\).*|PATENT \(Number; Kind; Date\): United Kingdom \(GB\).*|PATENT \(Number; Kind; Date\): Germany \(DE\).*|PATENT \(Number; Kind; Date\): France \(FR\).*|PATENT \(Number; Kind; Date\): Russian Federation \(RU\).*\:.*/g
while ((arr=re.exec(sss))!=null)osss=osss+arr+"\r\n";
fl.write(osss);
WScript.echo("ok")
[
Last edited by freeants001 on 2009-4-13 at 03:51 ]
作者: lxh623
时间: 2009-4-13 04:02
Quote: |
Originally posted by freeants001 at 2009-4-13 03:49:
下面代码应该符合要求了吧?
要批量处理请文件夹,自己结合for命令把
[code]File_Path=WScript.arguments(0);
var sss,arr="",osss="";
var fso=new Acti ... |
|
谢谢!
可以了!这一天,您和netbenton给我极大的帮助,祝您们事事顺心!
作者: netbenton
时间: 2009-4-13 07:25
标题: 纯批的也可以了
@echo off&setlocal enabledelayedexpansion
set ho=UNITED STATES OF AMERICA (US)
set en=PATENT (Number; Kind; Date):
set bg=United States of America (US)
set li2=BASIC-PATENT:
set li10=European Patent Office (EP)
set li11=Russian Federation (RU)
set li12=World Intellectual Property Organisation (WO)
set li13=Canada (CA)
set li14=People's Republic of China (CN)
set li15=Japan (JP)
set li16=Republic of Korea (KR)
set li17=United Kingdom (GB)
set li18=Germany (DE)
set li19=France (FR)
set li20=United States of America (US)
::国家判断只对比了前面10个字节,应该可以了的。
for /f %%a in ('dir /b *.txt') do (
set "ver="
(for /f "delims=" %%d in (%%a) do (set "str=%%d"&call :sub))>%%~na_dest.txt
start %%~na_dest.txt
)
echo 处理完成
pause
goto :eof
:sub
if defined ver (echo.!str!
if not "!str:%en%=!"=="!str!" set ver=
goto :eof)
if not "!str:%li2%=!"=="!str!" echo !str!&goto :eof
if "!str:%en%=!"=="!str!" (
if "!str!"=="!ho!" (set vho=y&goto :eof) else (set vho=)
goto :eof
) else (
set "coc=!str:*%en% =!"
if defined vho (
if "!bg:~0,10!"=="!coc:~0,10!" (set vho=&set ver=y&echo !ho!&echo.!str!&goto :eof)
)
for /l %%a in (10,1,20) do (if "!li%%a:~0,10!"=="!coc:~0,10!" echo !str!&goto :eof)
)
goto :eof
作者: lxh623
时间: 2009-4-13 10:39
再次谢谢两位!
这么复杂的数据经过批处理或JS脚本,得到可以进一步处理或导入文献软件的数据。我心里充满对二位的感激!两个办法都达到接近的结果,非常好!!