python抓取动态网页(excel中3.算法的性能,需要找到需求的文件进行排除)
优采云 发布时间: 2021-12-03 18:32python抓取动态网页(excel中3.算法的性能,需要找到需求的文件进行排除)
需求场景:需要找到收录源代码中指定的客户信息的某些字段。
版本 1:检索关键字,如果收录,则将其输出到控制台。
import os
rootDir = os.getcwd()
def scan_file(filename, dirname):
if("hello" in filename):
if("src" in dirname):
print(os.path.join(dirname,filename))
else:
with open(os.path.join(dirname,filename)) as f:
lines = f.readlines()
for l in lines:
#print(l)
if("hello" in l):
if("/src" in dirname):
print(os.path.join(dirname,filename))
break
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
scan_file(fname, dirName)
版本2:检索多个关键字,输出收录关键字和收录的关键字的文件
rootDir = os.getcwd()
keywords = ["hello","world","thanks"]
def scan_file(filename, dirname,keyword):
if(keyword in filename):
if("/src" in dirname):
return True
else:
with open(os.path.join(dirname,filename)) as f:
lines = f.readlines()
for l in lines:
if(keyword in l):
if("/src" in dirname):
return True
for dirName, subdirList, fileList in os.walk(rootDir):
for fname in fileList:
flag = False
for keyword in keywords:
if(scan_file(fname, dirName,keyword)):
if(flag is False):
flag = True
f = open('test.txt', 'a')
f.write(keyword)
f.write(" ,")
f.close()
if(flag is True):
f = open('test.txt', 'a')
f.write("\n"+os.path.join(dirName,fname)+"\n")
f.close()
这个版本实现了基本功能,但还不够完善。迭代空间:
1.算法的性能,包括代码的时间复杂度、冗余度和优雅度
2.输出结果的可读性,最好按照模块组织文件,并在excel中呈现
3.详情:排除不符合要求的文件,如png。
留给读者思考。