【已解决】如何将文本中部分相同的行归类在一起?
本帖最后由 wujianfu 于 2012-4-24 13:27 编辑有一文本内容如下(邮箱地址是随机生成用于测试的,非真实存在):
4554452@15.com
ting455hai11985@1543.com
kel122ejiabing2002@163.com
shu54eijinat@15.com
445555@15.com
xahjhj1818@163.com
31jk21@15.com
ba24bymes@163.com
df4@15.com
we45nyibaoshe@123.com
hzc45546jf@1543.com
1112hgh233@163.com
sun545mingjie88@163.com
要求整理成如下形式(即相同邮箱主机的行归类在一起):
4554452@15.com
df4@15.com
shu54eijinat@15.com
445555@15.com
31jk21@15.com
ting455hai11985@1543.com
hzc45546jf@1543.com
kel122ejiabing2002@163.com
1112hgh233@163.com
sun545mingjie88@163.com
xahjhj1818@163.com
ba24bymes@163.com
we45nyibaoshe@123.com
自己尝试写了些代码,但效率非常低,而且到后面运行缓慢,求高手改进算法,我的代码如下:
#include <File.au3>
#include <Array.au3>
$CountLines=_FileCountLines(@ScriptDir&"\邮箱地址.txt")
$EmailAddress=FileOpen(@ScriptDir&"\邮箱地址.txt")
Dim $EmailServers
For $i=1 To $CountLines Step 1
$CountLinesDetails=FileReadLine($EmailAddress,$i)
$Server=StringMid($CountLinesDetails,StringInStr($CountLinesDetails,"@",0,-1))
If _ArraySearch($EmailServers,$Server,1)=-1 Then _ArrayAdd($EmailServers,$Server)
Next
If FileExists(@ScriptDir&"\已整理.txt") Then FileDelete(@ScriptDir&"\已整理.txt")
$NewTxtFile=FileOpen(@ScriptDir&"\已整理.txt",1)
For $t=1 To UBound($EmailServers)-1 Step 1
For $i=1 To $CountLines Step 1
$CountLinesDetails=FileReadLine($EmailAddress,$i)
If StringInStr($CountLinesDetails,$EmailServers[$t])<>0 Then FileWriteLine($NewTxtFile,$CountLinesDetails)
Next
Next
FileClose($EmailAddress)
FileClose($NewTxtFile)
邮箱地址.txt请用上面给出的地址测试。
#Include <File.au3>
$sfile = "邮箱地址.txt"
$afile = "已整理.ini"
For $n = 1 To _FileCountLines($sfile)
$line = FileReadLine($sfile,$n)
$sline = StringSplit($line,"@")
IniWrite($afile,$sline,$line,"")
Next
#include <array.au3>
Dim $str="4554452@15.com"&@CRLF& _
"ting455hai11985@1543.com"&@CRLF& _
"kel122ejiabing2002@163.com"&@CRLF& _
"shu54eijinat@15.com"&@CRLF& _
"445555@15.com"&@CRLF& _
"xahjhj1818@163.com"&@CRLF& _
"31jk21@15.com"&@CRLF& _
"ba24bymes@163.com"&@CRLF& _
"df4@15.com"&@CRLF& _
"we45nyibaoshe@123.com"&@CRLF& _
"hzc45546jf@1543.com"&@CRLF& _
"1112hgh233@163.com"&@CRLF& _
"sun545mingjie88@163.com"
$arr=StringSplit($str,@CRLF,1)
$dic=ObjCreate("scripting.dictionary")
For $n=1 To $arr
$arrt=StringSplit($arr[$n]&"@","@")
If $dic.exists($arrt) Then
$dic($arrt)=$dic($arrt)&@crlf&$arr[$n]
Else
$dic($arrt)=$arr[$n]
EndIf
Next
$arrt=$dic.items
$dic=0
MsgBox(0,"",_ArrayToString($arrt,@CRLF))提供另一种思路 回复 2# 502762378
用写配置文件的方法,不错的思路,再把等号删掉就行了 挺麻烦的 呵
#include <array.au3>
Local $fp = FileOpen( @ScriptDir & "\1.txt", 0)
Local $buf = FileRead($fp)
FileClose($fp)
Local $arr1 = StringSplit($buf, @CRLF)
;_ArrayDisplay($arr1, "包含空白行")
Local $i
For $i = $arr1 To 1 Step -1
If StringLen($arr1[$i]) = 0 Then
_ArrayDelete($arr1,$i)
$arr1 -= 1
EndIf
Next
_ArrayDisplay($arr1, "剔除空白行--归类前")
Local $arr2[$arr1+1]
$arr2 = $arr1
For $i = 1 To $arr2
$arr2[$i] = $arr1[$i]
$arr2[$i] = StringMid($arr2[$i],StringInStr($arr2[$i],"@"))
Next
;_ArrayDisplay($arr2, "@后的域名所有")
Local $arr3[$arr1+1]
$arr3 = 0
Local $j
Local $k
$arr3 = $arr2
$arr3 = 1
$i = 2
For $j = 2 To $arr2
For $k = 1 To $j-1
If StringCompare($arr2[$j], $arr2[$k]) = 0 Then ExitLoop
Next
If $k = $j Then
$arr3[$i] = $arr2[$j]
$arr3 += 1
$i += 1
EndIf
Next
For $i = $arr2 To $arr3+1 Step -1
_ArrayDelete($arr3, $i)
Next
;_ArrayDisplay($arr3, "@后的域名无重复")
$k = 1
For $i = 1 To $arr3
For $j = 1 To $arr2
If StringCompare($arr2[$j],$arr3[$i]) = 0 Then
$arr1[$k] = $arr2[$j]
$k += 1
EndIf
Next
Next
_ArrayDisplay($arr1, "最终结果--归类后")
截图
回复 1# wujianfu
这种要求,就不用自己写代码了,直接用工具就可以了。
如用vim来做,只需运行下面的命令就可以了
:sort /@.*/ r 感谢各位,结合502762378 给的代码,我自己再整理了一下,算是解决了。以下是代码:#Include <File.au3>
$EmailAddres="邮箱地址.txt"
$TxtFile="邮箱地址_归类.txt"
$IniFile=@TempDir&"\Temp.ini"
If FileExists($IniFile) Then FileDelete($IniFile)
For $i=1 To _FileCountLines($EmailAddres)
$LineDetails=FileReadLine($EmailAddres,$i)
$var=StringSplit($LineDetails,"@")
IniWriteSection($IniFile,$var,$LineDetails)
Next
If FileExists($TxtFile) Then FileDelete($TxtFile)
$TxtFileOpen=FileOpen($TxtFile,1)
For $i=1 To _FileCountLines($IniFile) Step 1
$LineDetails=FileReadLine($IniFile,$i)
If Not (StringLeft($LineDetails,1)="[" And StringRight($LineDetails,1)="]") Then
FileWriteLine($TxtFileOpen,$LineDetails)
EndIf
Next
FileClose($TxtFile)
FileDelete($IniFile)
ShellExecute($TxtFile) 回复 7# happytc
问题是我不会用vim啊,而且也不知道vim有这个命令。不知道用正则能不能解决,求正则高手。 Local $out
$EmailAddress = FileOpen(@ScriptDir & "\邮箱地址.txt")
$NewTxtFile = FileOpen(@ScriptDir & "\已整理.txt", 2)
$srex = StringRegExp(FileRead($EmailAddress), '.+', 3)
FileClose($EmailAddress)
For $i = 0 To UBound($srex) - 1
$a = StringRegExp($srex[$i], '@([^\.]+)\.com', 1)
For $s = 0 To UBound($srex) - 1
If StringRegExp($srex[$s], '@' & $a & '\.com', 0) And _
Not StringRegExp($out, $srex[$s], 0) Then
$out &= $srex[$s] & @CRLF
EndIf
Next
Next
FileWrite($NewTxtFile, $out)
FileClose($NewTxtFile)
后缀排序后,前缀需不需要排序?
比如将:
4554452@15.com
df4@15.com
shu54eijinat@15.com
445555@15.com
31jk21@15.com
排序成:
31jk21@15.com
445555@15.com
4554452@15.com
df4@15.com
shu54eijinat@15.com
楼上各位的想法很值得学习,2楼的502762378兄思路让人大开眼界,非常有创意.
如果前缀需要排序的话试试下面这个:
#include <array.au3>
Dim $str="4554452@15.com"&@CRLF& _
"ting455hai11985@1543.com"&@CRLF& _
"kel122ejiabing2002@163.com"&@CRLF& _
"shu54eijinat@15.com"&@CRLF& _
"445555@15.com"&@CRLF& _
"xahjhj1818@163.com"&@CRLF& _
"31jk21@15.com"&@CRLF& _
"ba24bymes@163.com"&@CRLF& _
"df4@15.com"&@CRLF& _
"we45nyibaoshe@123.com"&@CRLF& _
"hzc45546jf@1543.com"&@CRLF& _
"1112hgh233@163.com"&@CRLF& _
"sun545mingjie88@163.com"
$arr=StringRegExp($str,'[^\r\n]+',3)
Dim $arr2
For $i=0 to UBound($arr)-1
$arr2[$i]=$arr[$i]
$arr2[$i]=StringFormat('%10s',StringRegExpReplace($arr[$i],'.+@',''))
Next
_ArraySort($arr2,0,0,0,1)
Local $k=1
For $i=0 to UBound($arr2)-2
If $arr2[$i]==$arr2[$i+1] Then
$arr2[$i]=$k
$arr2[$i+1]=$k
Else
$arr2[$i]=$k
$k+=1
EndIf
Next
For $k=1 to $arr2
$kk=_ArrayFindAll($arr2,$k,0,0,0,0,2)
_ArraySort($arr2,0,$kk,$kk,0)
Next
For $i=0 to UBound($arr)-1
$arr[$i]=$arr2[$i]
Next
_ArrayDisplay($arr)
$str=_ArrayToString($arr,@CRLF)
FileWrite("output.txt",$str) 回复 11# 3mile
学习了,考虑更健全了!
页:
[1]