怎么正则提取网页内容【已解决】
本帖最后由 holley 于 2022-7-22 09:47 编辑前情:如何获取http://www.huorong.cn/downloadv5.html返回的下载地址【已解决】-已解决问题区-AUTOIT CN - Powered by Autoit中文论坛 (autoitx.com)
根据此贴得知,火绒的真实请求地址:http://www.huorong.cn/versionShow.php
{"request_type":"2","virustime":"2022-07-20","filesize":"24M","virusVersion":"2022.7.20.1","version":"5.0.69.3","createtime":"2022-07-20 18:17:53","fullName":"sysdiag-full-5.0.69.3-2022.7.20.1.exe","allName":"sysdiag-all-5.0.69.3-2022.7.20.1.exe","urlFull":"https:\/\/down7.huorong.cn\/sysdiag-full-5.0.69.3-2022.7.20.1.exe","urlAll":"https:\/\/down7.huorong.cn\/sysdiag-all-5.0.69.3-2022.7.20.1.exe"}我的表达式:
All['"]:['"]*(\S+)["']
只能得到:
https:\/\/down7.huorong.cn\/sysdiag-all-5.0.69.3-2022.7.20.1.exe求教:怎样才能获取到实际下载地址呢???分段截取匹配还是有其它匹配表达式? 你获取的去掉\不就是? afan 发表于 2022-7-21 13:00
你获取的去掉\不就是?
我以为正则直接可以排除掉:face (2):
请教:au3里面有什么函数或命令(我是初学者)可以过滤一下这个地址吗? holley 发表于 2022-7-21 14:44
我以为正则直接可以排除掉
请教:au3里面有什么函数或命令(我是初学者)可以过滤一下这个地 ...
你可以在你获取的基础上再 StringReplace 替换\为空即可。
当然,如果仅捕获该地址也可以直接正则替换
Local $sSource = '{"request_type":"2","virustime":"2022-07-20","filesize":"24M","virusVersion":"2022.7.20.1","version":"5.0.69.3","createtime":"2022-07-20 18:17:53","fullName":"sysdiag-full-5.0.69.3-2022.7.20.1.exe","allName":"sysdiag-all-5.0.69.3-2022.7.20.1.exe","urlFull":"https:\/\/down7.huorong.cn\/sysdiag-full-5.0.69.3-2022.7.20.1.exe","urlAll":"https:\/\/down7.huorong.cn\/sysdiag-all-5.0.69.3-2022.7.20.1.exe"}'
;~ MsgBox(0, '源字符串', $sSource)
Local $sSRERe = StringRegExpReplace($sSource, '(?i)^.+All":"(h.+?:)[\\/]+([\w.]+)[\\/]+([^"]+).+$', '\1/\2/\3')
MsgBox(0, '替换结果', $sSRERe) Local $sSource = '{"request_type":"2","virustime":"2022-07-20","filesize":"24M","virusVersion":"2022.7.20.1","version":"5.0.69.3","createtime":"2022-07-20 18:17:53","fullName":"sysdiag-full-5.0.69.3-2022.7.20.1.exe","allName":"sysdiag-all-5.0.69.3-2022.7.20.1.exe","urlFull":"https:\/\/down7.huorong.cn\/sysdiag-full-5.0.69.3-2022.7.20.1.exe","urlAll":"https:\/\/down7.huorong.cn\/sysdiag-all-5.0.69.3-2022.7.20.1.exe"}'
Local $str
Local $aSRE = StringRegExp($sSource, '"urlAll":"(?|(https:)\\/\\(/.*)\\(/.*?)")', 3)
If Not @error Then
For $i = 0 To UBound($aSRE) - 1
$str = $str & $aSRE[$i]
Next
EndIf
MsgBox(0, '匹配', $str) lixiaolong 发表于 2022-7-21 17:11
多谢解答,只是我这边测试结果为空。
holley 发表于 2022-7-22 09:40
多谢解答,只是我这边测试结果为空。
#include <Array.au3>
Local $sSource = '{"request_type":"2","virustime":"2022-07-20","filesize":"24M","virusVersion":"2022.7.20.1","version":"5.0.69.3","createtime":"2022-07-20 18:17:53","fullName":"sysdiag-full-5.0.69.3-2022.7.20.1.exe","allName":"sysdiag-all-5.0.69.3-2022.7.20.1.exe","urlFull":"https:\/\/down7.huorong.cn\/sysdiag-full-5.0.69.3-2022.7.20.1.exe","urlAll":"https:\/\/down7.huorong.cn\/sysdiag-all-5.0.69.3-2022.7.20.1.exe"}'
;~ MsgBox(0, '源字符串', $sSource)
Local $sSRERe = StringRegExpReplace($sSource, '\\/', '/')
Local $aSRE = StringRegExp($sSRERe, '(?i)(?<=urlAll":")(.+?)(?="})', 3)
If Not @Error Then MsgBox(0, '匹配数量: ' & UBound($aSRE), '其中元素为: ' & $aSRE)
_ArrayDisplay($aSRE, UBound($aSRE))
holley 发表于 2022-7-22 09:40
多谢解答,只是我这边测试结果为空。
"urlAll":"(?|(https:)\\/\\(/.*)\\(/.*?)")
你的图片上代码不对啊 lixiaolong 发表于 2022-7-22 13:13
"urlAll":"(?|(https:)\\/\\(/.*)\\(/.*?)")
你的图片上代码不对啊
的确,问号变空白了,很是奇怪…
不过,这里用(?|..)重置是无意义的,可以不要 afan 发表于 2022-7-22 13:30
的确,问号变空白了,很是奇怪…
不过,这里用(?|..)重置是无意义的,可以不要
谢谢提醒,我还是多学习正则吧 lixiaolong 发表于 2022-7-22 13:13
"urlAll":"(?|(https:)\\/\\(/.*)\\(/.*?)")
你的图片上代码不对啊
再次感谢,,这样获取的跟a版结果一样
实际使用,需要将a版的改为:
Local $sSRERe = StringRegExpReplace($sSource, '(?i)^.+All":"(h.+?:)[\\/]+([\w.]+)[\\/]+([^"]+).+$', '\1/\/\2/\3')
本帖最后由 afan 于 2022-7-22 14:13 编辑
holley 发表于 2022-7-22 13:58
再次感谢,,这样获取的跟a版结果一样
实际使用,需要将a版的改为:
是这样,改得好,我的漏了个/, \1//\2/\3
实际上,对于这种特征很明显的取值(最右地址),可以很简单
Local $sSource = '{"request_type":"2","virustime":"2022-07-20","filesize":"24M","virusVersion":"2022.7.20.1","version":"5.0.69.3","createtime":"2022-07-20 18:17:53","fullName":"sysdiag-full-5.0.69.3-2022.7.20.1.exe","allName":"sysdiag-all-5.0.69.3-2022.7.20.1.exe","urlFull":"https:\/\/down7.huorong.cn\/sysdiag-full-5.0.69.3-2022.7.20.1.exe","urlAll":"https:\/\/down7.huorong.cn\/sysdiag-all-5.0.69.3-2022.7.20.1.exe"}'
Local $aSRE = StringRegExp($sSource, '.+":"(.+)"', 1)
If Not @Error Then MsgBox(0, '', StringReplace($aSRE, '\', ''))
对于复杂的Json结构,用Json函数去取值较好,而对于这种简单的,用正则一定是首选,简单高效。
页:
[1]