文本减肥：正则[已解决]-已解决问题区-AUTOIT CN

kevinch 发表于 2011-4-29 20:25:11

处理不了的原因可能是数据太多，我减少数据后就正常了，不过源数据也有点问题，用记事本重新保存一下后readline可以正常了，下面附上我现在写的代码，最好用附件的文件进行测试：$o_Dic=objcreate("scripting.dictionary")
$h_File=fileopen(@scriptdir&"\snt.txt",0)
while 1
$s_Str=FileReadLine($h_File)
if @error=-1 then ExitLoop
$a_Arr=StringRegExp($s_Str,"(?s)(\d{6}).*?\s{6,}((\S+\s?\s?\s?\s?)*\S+)",3)
for $i_N=0 to UBound($a_Arr)-1
if stringlen($a_Arr[$i_N])=6 and StringRegExpReplace($a_Arr[$i_N],"\d+","")="" then
$s_Tmp=$a_Arr[$i_N]&@tab&$a_Arr[$i_N+1]
$o_Dic($s_Tmp)=""
endif
next
WEnd
FileClose($h_File)

$h_File=FileOpen(@ScriptDir&"\Result.txt",2)
for $s_Str in $o_Dic.keys
FileWriteLine($h_File,$s_Str)
Next
FileClose($h_File)
MsgBox(0,"","处理完成")数据太多，用数组好像都会出问题，改用字典存的。

love5173 发表于 2011-4-29 22:42:30

本帖最后由 love5173 于 2011-4-29 22:45 编辑

收学识所限，我只能达到这个效果了，希望有更厉害的人，为你解决
#Include <String.au3>
$txt=FileRead("SNT.txt")
$new=Hex(StringToBinary($txt))
$new=StringRegExpReplace($new,'(\w\w)','\1-')
$new=StringRegExpReplace($new,'00','20')
$new=_HexToString (StringRegExpReplace($new,'(\w\w)-','\1'))
$new=StringRegExpReplace($new,'(?s).*?(\d{6})\h+\H+\h+\H+\h+(.*?)\h{10,}.*?','\1----\2'&@CRLF)
FileOpen("new.txt",2+8)
FileWrite("new.txt",$new)
fileClose("new.txt")

love5173 发表于 2011-4-29 23:09:49

楼主的这个问题难不是难在正则，跟排序上！而是文本里面有一些00 ，让函数处理出现问题，因为楼主的这个文件必须由指定程序读才能得到完美的页面，我实在是不知道怎么给他处理，网上搜索tnf格式的文件，未果，有使用这种格式的大哥可以给转化一下，估计操作能简化很多

netbean 发表于 2011-4-30 00:59:36

非常感谢，准备试试。
数据是通达信软件每天更新的，T0002\hq_cache目录下，shex.tnf，szex.tnf是原始文件，SNT.txt为合并而成。

netbean 发表于 2011-4-30 05:09:40

本帖最后由 netbean 于 2011-4-30 07:22 编辑

谢谢，kevinch，love5173 两位的代码都能减肥成功。

代码1，要用记事本打开再保存一下再用，速度较快；代码2可直接转换。

测试转换前后文件大小为：1143KB/78KB，实验搜索股票名称，如“实达集团”，得到代码“600734”写入文件，查询8支股票分别耗时：14-17秒，及4-5秒，减少三分之二时间。

试着在78KB的文件中用以下代码去除股票以外数据，文件变为36KB，但查询时间无变化，放弃这个修改。$new = StringTrimLeft($new, StringInStr($new, "600000")-1)
$new = StringLeft($new, StringInStr($new, "400001")-1)
$new1 = StringLeft($new, StringInStr($new, "出版传媒")+4)
$new2 = StringLeft(StringTrimLeft($new, StringInStr($new, "000001")-1), StringInStr($new, "国债")-7)
$new3 = StringTrimLeft($new, StringInStr($new, "300001")-1)
$new = $new1 & $new2 & $new3

netbean 发表于 2011-4-30 07:41:07

本帖最后由 netbean 于 2011-4-30 07:43 编辑

$SNT = FileOpen(@MyDocumentsDir & "\SNT.txt", 0)
$SNTList = FileRead($SNT)
$GetCode = StringMid($SNTList,StringInStr($SNTList, "实达集团")-7,6)
FileClose($SNT)
$GetCode = StringRegExpReplace($GetCode, '(\A6)', '1\1')
$GetCode = StringRegExpReplace($GetCode, '(\A0|\A3)', '0\1')
$ZXG = FileOpen("C:\通达信\ZXG.blk", 1)
FileWriteLine($ZXG, $GetCode)
FileClose($ZXG)搜索股票名称返回代码，不知能否优化？
通达信程序要求格式：SH代码加1，SZ代码加0变7位，故有5，6两句。

kevinch 发表于 2011-4-30 20:31:41

处理完成后将对应股票名放入字典当key，代码当做item，全部存入字典后，用key名查item，即点即出，耗时极少。

netbean 发表于 2011-5-1 07:52:15

多谢！
试试没有成功，麻烦给出代码？$h_File=FileOpen(@ScriptDir&"\Result.txt",0)
while 1
$s_Str=FileReadLine($h_File)
if @error=-1 then ExitLoop
$a_Arr=StringRegExp($s_Str,"(\d{6})",3)
...
$o_Dic.item("0")=$a_Arr["0"]
...
WEnd
FileClose($h_File)

kevinch 发表于 2011-5-1 12:22:39

$o_Dic=objcreate("scripting.dictionary")
......
while 1
......
$o_Dic("股票名")="代码" ;需要用循环一个一个的添加
......
wend
查询语句先给个框架

netbean 发表于 2011-5-1 15:40:14

$o_Dic=objcreate("scripting.dictionary")
$h_File=FileOpen(@ScriptDir&"\Result.txt",0)
while 1
$s_Str=FileReadLine($h_File)
if @error=-1 then ExitLoop
$o_Dic(StringMid($s_Str, 8))=StringLeft($s_Str, 6)
WEnd
FileClose($h_File)
$a_Arr=$o_Dic.keys
$b_Arr=$o_Dic.items
For $i=0 to UBound($o_Dic.keys)-1
If $AlertName = $a_Arr[$i] Then
$GetCode = $b_Arr[$i]
ExitLoop
EndIf
Next不知这样写对不对？
其次，只能查询出一个代码，就进入死循环，还未找出原因；
能否把建立字典放在查询子程序外，不用每次查询一/几个代码要反复建立字典？
谢谢！

easefull 发表于 2011-5-1 16:16:59

本帖最后由 easefull 于 2011-5-1 16:18 编辑

回复 17# love5173

折腾了两天,问题应该是出在Au3文本处理Null字符上.
我弄出来的代码也和你的类似.另外这代码还有股票名中间有空格的时候截取失败的问题(如00000043 超大盘),惭愧啊#include <Array.au3>
Local $sText
$sText = FileRead(FileOpen("SNT.txt", 16))
$sText = StringRegExpReplace($sText, '(.{2})','\1' & " ")
$sText = StringRegExpReplace($sText, '00', '20')
$sText = StringRegExpReplace($sText, '\s', '')
$sText = BinaryToString($sText)
Local $asResult = StringRegExp($sText, '\d{6}\s+\S\s+\S\s+\S+', 3)
_ArrayDisplay($asResult)

Local $sResult = ""
For $i = 0 To UBound($asResult) - 1 Step 1
$sResult &= StringRegExpReplace($asResult[$i], '(\d{6})(\s+\S\s+\S\s+)(\S+)', '$1' & @TAB & '$3' & @CRLF)
Next
ClipPut($sResult)
MsgBox(4096, "", $sResult)

netbean 发表于 2011-5-1 16:59:40

430034 大地股份
430035 中
430036 鼎普科技
430037 联
430038 信维科技
430039 华高世纪
430040 康
430041 中机非晶
430042 科
430043 世纪东方

感谢帮忙。发现有数据不完整的情况

easefull 发表于 2011-5-1 17:29:49

再折腾了一下,弄了个二进制匹配版.用以解决股票名中间空白符导致获取不全的问题.#include <Array.au3>
Local $oFile, $sText
$oFile = FileOpen("shex.tnf", 16)
$sText = FileRead($oFile)
$sText = StringRegExpReplace($sText, '(.{2})','\1' & " ")
FileClose($oFile)

Local $sRegExp= _
'3\d\s3\d\s3\d\s3\d\s3\d\s3\d\s' & _
'00\s00\s00\s' & _
'..\s'& _
'00\s00\s00\s00\s' & _
'..\s..\s'& _
'00\s00\s00\s00\s00\s00\s00\s00\s' & _
'.*?(?:00\s)'
Local $asResult = StringRegExp($sText, $sRegExp, 3)
_ArrayDisplay($asResult)

Local $sResult = ""
Local $sRegExp = _
'(3\d\s3\d\s3\d\s3\d\s3\d\s3\d\s)' & _
'00\s00\s00\s' & _
'..\s'& _
'00\s00\s00\s00\s' & _
'..\s..\s'& _
'00\s00\s00\s00\s00\s00\s00\s00\s' & _
'(.*?)(?:00\s)'
For $i = 0 To UBound($asResult) - 1 Step 1
$asTemp = StringRegExp($asResult[$i], $sRegExp, 3)
$sResult &= BinaryToString('0x' & StringStripWS($asTemp, 8)) &@TAB& BinaryToString('0x' & StringStripWS($asTemp, 8)) & @CRLF
Next
ClipPut($sResult)
MsgBox(4096, "", $sResult)

easefull 发表于 2011-5-1 17:38:07

另外,都有数据了要查询还不简单吗.#include <Array.au3>
Local $sText = FileRead("Result.txt")
Local $sRegExp= InputBox("查询", "请输入股票名或股票编号：")
Local $asResult = StringRegExp($sText, '.*' & $sRegExp & '.*?\v', 3)
_ArrayDisplay($asResult)其中txt文件要求每一个显示一个股票编号和股票名

kevinch 发表于 2011-5-1 18:49:36

回复 25# netbean

查询时直接msgbox(0,"","股票名的代码是："&$o_Dic("股票名"))不用再循环啦，直接可以用股票名从字典中提取出对应的代码，用法就是 $o_Dic("股票名") 得到的就是代码了

页: 1 [2] 3 4 5

AUTOIT CN's Archiver