如何提取网页上某表格的完整代码?[已解决]
本帖最后由 cashiba 于 2017-6-4 09:24 编辑#include <Array.au3>
#include <IE.au3>
Local $oIE = _IE_Example("table")
Local $oTable = _IETableGetCollection($oIE, 1)
Local $aTableData = _IETableWriteToArray($oTable, True)
_ArrayDisplay($aTableData)
_IEQuit($oIE)
如上示例,可以提取网页某表格的文本内容。
如果想提取这个表格的完整代码:<Table><Tr><Td>......</Td></Tr></Table>
怎么弄呢?
------------------------------------------------------------------------------------------------------------------
同样,现在很多网页使用CSS,用<Div><UL><Li>......</Li></UL></Div>取代了<Table><Tr><Td>......</Td></Tr></Table>
比喻MSN网站http://www.msn.com/zh-cn
其中绿色框中的源代码如下:<divdata-id="40" data-m='{"i":40,"p":10,"n":"linkmodule","y":13,"o":4}'data-aop="linkmodule_spartanlinkmodule">
<h3>新闻</h3>
<ul>
<li><a href="http://news.sina.com.cn"data-id="41" data-m='{"i":41,"p":40,"n":"NavLinks","y":14,"o":1}'>新浪新闻</a></li>
<li><a href="http://www.ifeng.com"data-id="42" data-m='{"i":42,"p":40,"n":"NavLinks","y":14,"o":2}'>凤凰网</a></li>
<li><a href="http://news.qq.com"data-id="43" data-m='{"i":43,"p":40,"n":"NavLinks","y":14,"o":3}'>腾讯新闻</a></li>
<li><a href="http://www.people.com.cn/"data-id="44" data-m='{"i":44,"p":40,"n":"NavLinks","y":14,"o":4}'>人民网</a></li>
<li><a href="http://mini.eastday.com/"data-id="45" data-m='{"i":45,"p":40,"n":"NavLinks","y":14,"o":5}'>头条新闻</a></li>
<li><a href="http://www.ynet.com/"data-id="46" data-m='{"i":46,"p":40,"n":"NavLinks","y":14,"o":6}'>北青网</a></li>
<li><a href="http://news.163.com"data-id="47" data-m='{"i":47,"p":40,"n":"NavLinks","y":14,"o":7}'>网易新闻</a></li>
<li><a href="http://news.sohu.com"data-id="48" data-m='{"i":48,"p":40,"n":"NavLinks","y":14,"o":8}'>搜狐新闻</a></li>
<li><a href="http://www.weibo.com"data-id="49" data-m='{"i":49,"p":40,"n":"NavLinks","y":14,"o":9}'>微博</a></li>
</ul>
</div>如果构造_IEDivUlWriteToArray这种函数是不是也可以?
也想提取出<Div><UL><Li>......</Li></UL></Div>这段源代码,如何操作呢 先取到div那框,就能取那框内的东西的了
Func _IETableGetCollection(ByRef $oObject, $iIndex = -1)
If Not IsObj($oObject) Then
__IEConsoleWriteError("Error", "_IETableGetCollection", "$_IESTATUS_InvalidDataType")
Return SetError($_IESTATUS_InvalidDataType, 1, 0)
EndIf
;
$iIndex = Number($iIndex)
Select
Case $iIndex = -1
Return SetError($_IESTATUS_Success, $oObject.document.GetElementsByTagName("table").length, _
$oObject.document.GetElementsByTagName("table"))
Case $iIndex > -1 And $iIndex < $oObject.document.GetElementsByTagName("table").length
Return SetError($_IESTATUS_Success, $oObject.document.GetElementsByTagName("table").length, _
$oObject.document.GetElementsByTagName("table").item($iIndex))
Case $iIndex < -1
__IEConsoleWriteError("Error", "_IETableGetCollection", "$_IESTATUS_InvalidValue", "$iIndex < -1")
Return SetError($_IESTATUS_InvalidValue, 2, 0)
Case Else
__IEConsoleWriteError("Warning", "_IETableGetCollection", "$_IESTATUS_NoMatch")
Return SetError($_IESTATUS_NoMatch, 1, 0)
EndSelect
EndFunc ;==>_IETableGetCollection
[{:face (189):}
Func _IETableWriteToArray(ByRef $oObject, $bTranspose = False)
If Not IsObj($oObject) Then
__IEConsoleWriteError("Error", "_IETableWriteToArray", "$_IESTATUS_InvalidDataType")
Return SetError($_IESTATUS_InvalidDataType, 1, 0)
EndIf
;
If Not __IEIsObjType($oObject, "table") Then
__IEConsoleWriteError("Error", "_IETableWriteToArray", "$_IESTATUS_InvalidObjectType")
Return SetError($_IESTATUS_InvalidObjectType, 1, 0)
EndIf
;
Local $iCols = 0, $oTds, $iCol
Local $oTrs = $oObject.rows
For $oTr In $oTrs
$oTds = $oTr.cells
$iCol = 0
For $oTd In $oTds
$iCol = $iCol + $oTd.colSpan
Next
If $iCol > $iCols Then $iCols = $iCol
Next
Local $iRows = $oTrs.length
Local $aTableCells[$iCols][$iRows]
Local $iRow = 0
For $oTr In $oTrs
$oTds = $oTr.cells
$iCol = 0
For $oTd In $oTds
$aTableCells[$iCol][$iRow] = String($oTd.innerText)
If @error Then ; Trap COM error, report and return
__IEConsoleWriteError("Error", "_IETableWriteToArray", "$_IESTATUS_COMError", @error)
Return SetError($_IESTATUS_ComError, @error, 0)
EndIf
$iCol = $iCol + $oTd.colSpan
Next
$iRow = $iRow + 1
Next
If $bTranspose Then
Local $iD1 = UBound($aTableCells, $UBOUND_ROWS), $iD2 = UBound($aTableCells, $UBOUND_COLUMNS), $aTmp[$iD2][$iD1]
For $i = 0 To $iD2 - 1
For $j = 0 To $iD1 - 1
$aTmp[$i][$j] = $aTableCells[$j][$i]
Next
Next
$aTableCells = $aTmp
EndIf
Return SetError($_IESTATUS_Success, 0, $aTableCells)
EndFunc ;==>_IETableWriteToArray
先取到div那框,就能取那框内的东西的了
绿色风 发表于 2017-5-26 23:14 http://www.autoitx.com/images/common/back.gif
如果目标Div有id属性,定位到目标Div比较容易,不然估计要遍历才行
还有一个方法就是正则了.....
如果固化为类似于_IEDivGetCollection、_IEDivWriteToArray的自定义函数就方便喽 这个不错,收藏! 我也有这方面的问题,看来这段代码又会给我一些指点!谢谢! 回复 1# cashiba #include <Array.au3>
#include <IE.au3>
Local $oIE = _IE_Example("table")
Local $oTable = _IETableGetCollection($oIE, 1)
msgbox(0,0,$oTable.outerHtml)
_IEQuit($oIE)
回复cashiba
komaau3 发表于 2017-6-2 22:13 http://www.autoitx.com/images/common/back.gif
非常感谢~!
{:face (332):} 学习了。。谢谢
学习了。。谢谢:face (1):
页:
[1]