获取滚动新闻列表中特定网址
腾讯滚动新闻列表页http://roll.news.qq.com
每页显示20条,想要抓取每一条链接,抓取20页,重复自动剔除
放到记事本中 用正则提取一下 坐等高手,学习下,感谢 #include <ie.au3>
#include <array.au3>
$dic=ObjCreate("scripting.dictionary")
$oie=_IECreate("http://roll.news.qq.com/",0,1,1,0)
For $n=1 To 20
$oie.document.parentwindow.execscript('gotoPage('&$n&')')
$ok=False
Do
Sleep(100)
For $link In $oie.document.links
If StringRegExp($link.href,"(?is)http\:\/\/news\.qq\.com\/a\/",0)=1 Then
$ok=True
ExitLoop
EndIf
Next
Until Not $oie.busy And $oie.readystate=4 And $ok
For $link In $oie.document.links
If StringRegExp($link.href,"(?is)http\:\/\/news\.qq\.com\/a\/",0)=1 Then
$dic($link.href)=$link.innertext
EndIf
Next
Next
$oie.document.parentwindow.execscript('javascript:window.opener=null;window.open("","_self");window.close();')
$arr=$dic.keys
Dim $result
For $n=0 To UBound($arr)-1
$result[$n]=$arr[$n]
$result[$n]=$dic($arr[$n])
Next
_ArrayDisplay($result)不算啥好方法,开拓一下思路吧
页:
[1]