请教正则高手,这个正则表达式怎么写?
本帖最后由 316428696 于 2010-1-22 23:49 编辑<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Yahoo! Singapore Finance</title>
</head>
<body>
<table class="modtbl datatbl" cellpadding="2" cellspacing="0">
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5ESTI">STI</a></small></td>
<td align="right">
<small>2,888.38 </small>
</td>
<td align="right">
<small><span class="neg">-27.73</span></small></td>
<td align="right">
<small><span class="neg">(-0.95%)</span></small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5ESESD">SESDAQ</a></small></td>
<td align="right">
<small>N/A </small>
</td>
<td align="right">
<small>0.00</small></td>
<td align="right">
<small>(0.00%)</small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5EBSRI">BT-SRI</a></small></td>
<td align="right">
<small>N/A </small>
</td>
<td align="right">
<small>0.00</small></td>
<td align="right">
<small>(0.00%)</small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5EKLSE">KLSE Comp</a></small></td>
<td align="right">
<small>1,289.51 </small>
</td>
<td align="right">
<small><span class="neg">-3.34</span></small></td>
<td align="right">
<small><span class="neg">(-0.26%)</span></small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5EN225">Nikkei 225</a></small></td>
<td align="right">
<small>10,735.03 </small>
</td>
<td align="right">
<small><span class="neg">-144.11</span></small></td>
<td align="right">
<small><span class="neg">(-1.32%)</span></small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5EHSI">Hang Seng</a></small></td>
<td align="right">
<small>21,748.60 </small>
</td>
<td align="right">
<small><span class="neg">-578.04</span></small></td>
<td align="right">
<small><span class="neg">(-2.59%)</span></small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5EDJI">Dow</a></small></td>
<td align="right">
<small>10,626.81 </small>
</td>
<td align="right">
<small><span class="neg">-0.45</span></small></td>
<td align="right">
<small>(-0.00%)</small></td>
</tr>
<tr>
<td class="datatblhlt" align="left" nowrap colspan="1">
<small><a href="/q?s=%5EIXIC">Nasdaq</a></small></td>
<td align="right">
<small>2,279.51 </small>
</td>
<td align="right">
<small><span class="neg">-2.80</span></small></td>
<td align="right">
<small><span class="neg">(-0.12%)</span></small></td>
</tr>
<tr>
<td align="right" valign="top" colspan="4">
<small><a href="/intlindices">More International Indices</a></small></td>
</tr>
</table>
</body>
</html>
我想去除这里面的 STI2,888.38 -27.73 (-0.95%) 等等,标签<a>取这里的文字</a> 或者 <span>这里的文字</span>
总之最后的结果是:
STI 2,888.38-27.73 (-0.95%)
SESDAQ N/A0.00 (0.00%)
BT-SRI N/A0.00 (0.00%)
KLSE Comp 1,289.51-3.34 (-0.26%)
Nikkei 225 10,735.03-144.11 (-1.32%)
Hang Seng 21,748.60-578.04 (-2.59%)
Dow 10,626.81-0.45 (-0.00%)
Nasdaq 2,279.51-2.80 (-0.12%)
More International Indices
敢问高手如何正则,正则之神在吧? 本帖最后由 afan 于 2010-1-22 18:24 编辑
纯体力活……
Dim $str = FileRead('a.txt'), $Error, $txt
$sR = StringRegExp($str, '(?s)<tr>.+?<a.+?>([^<]+)<.+?<.+?>([^<]+)</.+?smal.+?>([^<]+)</.+?(\([^\)]+?\))</', 3)
$Error = @error
$sR1 = StringRegExp($str, '\/intlindices.>([^<]+)</', 3)
$Error += @error
If $Error = 0 Then
For $i = 0 To UBound($sR) - 1 Step 4
$txt &= $sR[$i] & ' ' & $sR[$i + 1] & ' ' & $sR[$i + 2] & ' ' & $sR[$i + 3] & @CRLF
Next
MsgBox(0, 0, $txt & $sR1)
EndIf
$file = StringRegExpReplace($file, "</*(?<=[^s]).+?>", "")
$file = StringRegExpReplace($file, "(?<=[\s])\s{2,}", "")
$file = StringReplace(StringMid($file, 2, StringLen($file)-2), @CR, "|")
$array = StringSplit($file, "|")
...去干净标签 去干净空白字符 然后掐头去尾 换所有换行符为| 最后分割之... 绕圈子了...好维护点... 正则一般 不敢出来献丑 我路过 回复 2# afan
又见正则之神奇...看来要好好学习下这个正则表达式了! ...去干净标签 去干净空白字符 然后掐头去尾 换所有换行符为| 最后分割之... 绕圈子了...好维护点...
rolaka 发表于 2010-1-22 20:23 http://www.autoitx.com/images/common/back.gif
很强,向高手学习!
页:
[1]