蜘蛛抱蛋 发表于 2010-12-31 19:46:19


代码如下,大家运行就知道怎么回事了。希望对大家会有些帮助~   源文段来自一篇英文文献
### 友情提示:本脚本由 Au3.REHelper 于 2010/12/31 19:24 自动生成,不保证其正确性,请自行测试 ###
#include <Array.au3>
Local $Str = _
                'Gene slr1393 of the cyanobacterium Synechocystis sp.' & @CRLF & _
                'PCC6803 encodes a red–green photoreversible cyanobacter-' & @CRLF & _
                'iochrome. The full-length protein contains three GAF' & @CRLF & _
                'domains, but GAF3 (aa 441–596) alone is capable of' & @CRLF & _
                'autocatalytically binding PCB to cysteine-528.' & @CRLF & _
                '' & @CRLF & _
                'Addition' & @CRLF & _
                'of PCB to GA results in a reversibly photochromic chromo-' & @CRLF & _
                'protein, termed RGS (red–green switchable protein): state Pr' & @CRLF & _
                '(lmax =650 nm) is strongly fluorescent (FF =0.06); it is' & @CRLF & _
                'reversibly converted by irradiation with red light into state' & @CRLF & _
                'Pg (lmax =539 nm), which has reduced and strongly blue-' & @CRLF & _
                'shifted fluorescence (Table 1, Figure 1a). Photoswitching can' & @CRLF & _
                'be repeated many times; it is stable over a wide pH range, and' & @CRLF & _
                'is retained after RGS is embedded into polyvinyl alcohol' & @CRLF & _
                '(PVA) film (see Figures S1 and S2 in the Supporting' & @CRLF & _
MsgBox(0, '原字符串', $Str)
Local $Test = StringRegExp($str, "\b(?!'-)(?:|-[\r\n]++)+", 3)
If Not @Error Then MsgBox(0, '匹配数量: ' & UBound($Test), '其中元素为: ' & $Test)
_ArrayDisplay($Test, UBound($Test))

3mile 发表于 2011-1-1 00:35:50


蜘蛛抱蛋 发表于 2011-1-1 14:50:54

回复 2# 3mile

你的方法返回了118个单词,word 2003显示的是127个非中文单词,看来word也把连字符两边的字母串看作两个单词了
页: [1]
查看完整版本: 正则_英文分词