欢迎加入QQ学习交流群,与我们一起学习,一起进步吧!
群号:225361733
可以QQ扫一扫加入群聊哦!
本文使用编程猫官网进行教学,敬请谅解,不过本文作者通过自己钻研在第3次更新中推出了爬2种格式的代码,本文还给出了一些实例:如爬取汇图网,CSDN等!
PS : 爬2种格式的代码见文章末
本文的目录看这里:
- 前言
- 找资源部分
-
- 进入编程猫图鉴网找到聚集地
- 获取聚集地网址
- 代码部分
-
- 导入相应的库
-
- re库介绍
- 代码
- 获取整个网站的内容
-
- 扩展:状态码的意思
- 其他的代码……
- 总体代码
- 关于运行
-
- 运行前提
- 解决个问题
- 实例——爬汇图网图片
-
- 分析格式
- 修改代码
- 运行效果
- 进阶代码
-
- 代码部分
- 实例
-
- 代码
- 效果
前言
作为一个爬虫小白,我一直在学编程猫,最近编程猫从视频处理方面转战爬虫,我也沾了光……
今天就分享一下批量爬取图片的方法
PS:本文后还附赠了爬汇图网图片的方法
找资源部分
进入编程猫图鉴网找到聚集地
我们输入网址https://shequ.codemao.cn/wiki/book,进入编程猫官方社区的图鉴页面,随后按F12进入审查元素,依次点击network和XHR
按F5刷新,然后一个一个单击并点response排查,最后找到文件:all
获取聚集地网址
这行代码里有许多图片网址,我们确定就是他啦!接下来我们要确定代码网址:点击headers,发现一个url链接地址:https://api.codemao.cn/api/sprite/list/all
进入此网站,发现代码!
{"code":200,"msg":"成功","description":"Http request finish without mistake","data":{"sprite_list":[{"id":32,"name":"编程猫","faction_id":2,"star":4,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/001%E7%BC%96%E7%A8%8B%E7%8C%AB.png","NO":1,"faction_name":"普通"},{"id":29,"name":"猫老祖","faction_id":12,"star":6,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/002%E7%8C%AB%E8%80%81%E7%A5%96.png","NO":2,"faction_name":"神圣"},{"id":57,"name":"黑色编程猫","faction_id":2,"star":4,"handbook_image":"https://static.codemao.cn/sprite/handbook/%E9%BB%91%E8%89%B2%E7%BC%96%E7%A8%8B%E7%8C%AB%E5%9B%BE%E9%89%B4%E5%89%AF%E6%9C%AC.png","NO":6,"faction_name":"普通"},{"id":26,"name":"木叶龙","faction_id":3,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/003%E6%9C%A8%E5%8F%B6%E9%BE%99.png","NO":7,"faction_name":"草"},{"id":40,"name":"雷电猴","faction_id":5,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/006%E9%9B%B7%E7%94%B5%E7%8C%B4.png","NO":10,"faction_name":"电"},{"id":24,"name":"星能猫","faction_id":11,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/009%E6%98%9F%E8%83%BD%E7%8C%AB2.png","NO":13,"faction_name":"超能"},{"id":30,"name":"疾风雀","faction_id":10,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/012%E7%96%BE%E9%A3%8E%E9%9B%80.png","NO":16,"faction_name":"飞行"},{"id":22,"name":"导弹鲨","faction_id":7,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/015%E6%8D%A3%E8%9B%8B%E9%B2%A8.png","NO":18,"faction_name":"水"},{"id":71,"name":"花粉虫","faction_id":6,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/021-花粉虫-V2.png","NO":21,"faction_name":"虫"},{"id":33,"name":"花粉蝶","faction_id":6,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E8%8A%B1%E7%B2%89%E8%9D%B62.png","NO":23,"faction_name":"虫"},{"id":20,"name":"呆鲤鱼","faction_id":7,"star":1,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/024%E5%91%86%E9%B2%A4%E9%B1%BC.png","NO":24,"faction_name":"水"},{"id":21,"name":"大黄鸡","faction_id":9,"star":4,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/027%E5%A4%A7%E9%BB%84%E9%B8%A1.png","NO":27,"faction_name":"机械"},{"id":28,"name":"熔岩龙","faction_id":8,"star":4,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E7%86%94%E5%B2%A9%E9%BE%99.png","NO":32,"faction_name":"火"},{"id":31,"name":"笨笨鸭","faction_id":7,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/032%E4%B8%91%E5%B0%8F%E9%B8%AD.png","NO":33,"faction_name":"水"},{"id":34,"name":"草灵灵","faction_id":3,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/034%E8%8D%89%E7%81%B5%E7%81%B5.png","NO":35,"faction_name":"草"},{"id":50,"name":"象牙螺","faction_id":7,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/036%E8%B1%A1%E7%89%99%E8%9E%BA.png","NO":37,"faction_name":"水"},{"id":36,"name":"蓝雀","faction_id":10,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/038%E8%93%9D%E9%9B%80.png","NO":39,"faction_name":"飞行"},{"id":37,"name":"达达蟹","faction_id":7,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/058%E8%BE%BE%E8%BE%BE%E8%9F%B92.png","NO":41,"faction_name":"水"},{"id":42,"name":"飞电鼠","faction_id":5,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/043%E9%A3%9E%E7%94%B5%E9%BC%A02.png","NO":43,"faction_name":"电"},{"id":58,"name":"独角蛛","faction_id":6,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/068%E7%8B%AC%E8%A7%92%E8%9B%9B.png","NO":45,"faction_name":"虫"},{"id":59,"name":"妙音龙","faction_id":11,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/069%E5%A6%99%E9%9F%B3%E9%BE%99.png","NO":47,"faction_name":"超能"},{"id":61,"name":"地龙","faction_id":4,"star":4,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/070%E5%9C%B0%E9%BE%99.png","NO":49,"faction_name":"地"},{"id":62,"name":"绅士猫","faction_id":2,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/071%E7%BB%85%E5%A3%AB%E7%8C%AB3.png","NO":52,"faction_name":"普通"},{"id":65,"name":"拳击袋鼠","faction_id":4,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/074%E6%8B%B3%E5%87%BB%E8%A2%8B%E9%BC%A0.png","NO":54,"faction_name":"地"},{"id":64,"name":"冰牙犬","faction_id":7,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/073%E5%86%B0%E7%89%99%E7%8A%AC.png","NO":55,"faction_name":"水"},{"id":70,"name":"小火熊","faction_id":8,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E5%B0%8F%E7%81%AB%E7%86%8A.png","NO":57,"faction_name":"火"},{"id":63,"name":"晴天娃娃","faction_id":10,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/072%E6%99%B4%E5%A4%A9%E5%A8%83%E5%A8%83.png","NO":62,"faction_name":"飞行"},{"id":67,"name":"涂鸦狐","faction_id":2,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E6%B6%82%E9%B8%A6%E7%8B%B8.png","NO":64,"faction_name":"普通"},{"id":72,"name":"炸弹齿轮","faction_id":9,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E7%82%B8%E5%BC%B9%E9%BD%BF%E8%BD%AE.png","NO":66,"faction_name":"机械"},{"id":39,"name":"阿尔法","faction_id":11,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/051-阿尔法-V2.png","NO":78,"faction_name":"超能"},{"id":68,"name":"画笔海龟","faction_id":7,"star":3,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E7%94%BB%E7%AC%94%E6%B5%B7%E9%BE%9F.png","NO":79,"faction_name":"水"},{"id":69,"name":"网络爬虫","faction_id":6,"star":4,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/%E7%BD%91%E7%BB%9C%E7%88%AC%E8%99%AB.png","NO":80,"faction_name":"虫"},{"id":44,"name":"贝塔1","faction_id":11,"star":5,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/115%E8%B4%9D%E5%A1%941.png","NO":81,"faction_name":"超能"},{"id":45,"name":"贝塔2","faction_id":11,"star":5,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/116%E8%B4%9D%E5%A1%942.png","NO":82,"faction_name":"超能"},{"id":56,"name":"玛洛斯","faction_id":12,"star":6,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/064%E7%8E%9B%E6%B4%9B%E6%96%AF.png","NO":84,"faction_name":"神圣"},{"id":66,"name":"玛洛斯-贝塔","faction_id":11,"star":6,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/075%E6%9A%97%E9%BB%91%E7%8E%9B%E6%B4%9B%E6%96%AF.png","NO":85,"faction_name":"超能"},{"id":73,"name":"黄金呆鲤鱼","faction_id":7,"star":2,"handbook_image":"https://static.codemao.cn/sprite/handbook-v2/huangjingdaliyu.png","NO":86,"faction_name":"水"}]}}
代码中是有规律的:
我们只要构建一个正则表达式就OK啦!!!
来了解下通配符:
代码部分
导入相应的库
re库介绍
我们本次要用到我们常用的requests库和字符库—re库,我们来介绍一下re库:Re库是Python的标准库,主要用于字符串匹配。
调用方式:
import re
正则表达式的表示类型:
raw string类型(原生字符串类型):
re库采用raw string类型表示正则表达式,表示为:r'text'
例如:r'[1-9]\d{5}'
raw string是指不包含转义符的字符串
string类型,更繁琐。
例如:[1-9]\\d{5}
;\\d{3}-\\d{8}|\\d{4}-\\d{7}
当正则表达式包含转义符时,建议使用raw string类型来表示正则表达式。
re库的主要功能函数:
代码
咳咳!说多了,导入库的代码:
import requests
import re
获取整个网站的内容
webPage=requests.get("https://api.codemao.cn/api/sprite/list/all")
webPage=webPage.text
有人要问了,为啥要加个webPage=webPage.text
呢?
如果不加(代码如下)
import requests
import re
webPage=requests.get("https://api.codemao.cn/api/sprite/list/all")
print(webPage)
扩展:状态码的意思
其他的代码……
这些代码比较多,我直接上吧!解析写里面啦
image_re=re.compile(r'https.*?png')#构建正则表达式
sprite_image=image_re.findall(webPage)#过滤并传入数据
a=range(len(sprite_image))#保存数列
for b in a:sprite_image_1=requests.get(sprite_image[b])#对图片发送请求#存储信息spritePage=open("图鉴%s.png"%b,"wb")#新建并打开或打开文件'''前一个参数为打开文件,另一个是打开模式运行时,变量b会替换到%s的地方'''spritePage.write(sprite_image_1.content)#写入信息spritePage.close()#关闭并保存print("成功保存%s个图片\n"%b)#保存提示
总体代码
import requests
import re
#获取内容
webPage=requests.get("https://api.codemao.cn/api/sprite/list/all")
webPage=webPage.text
image_re=re.compile(r'https.*?png')#构建正则表达式
sprite_image=image_re.findall(webPage)#过滤并传入数据
a=range(len(sprite_image))#保存数列
for b in a:sprite_image_1=requests.get(sprite_image[b])#对图片发送请求#存储信息spritePage=open("图鉴%s.png"%b,"wb")#新建并打开或打开文件'''前一个参数为打开文件,另一个是打开模式运行时,变量b会替换到%s的地方'''spritePage.write(sprite_image_1.content)#写入信息spritePage.close()#关闭并保存print("成功保存%s个图片\n"%b)#保存提示
关于运行
运行前提
我们把它保存到一个文件夹里
运行一下吧!
打开文件夹看看!
获取成功!
解决个问题
由于列表属性,有个图编号是0,要是我们加上b=b+1
……
我们只得手动改一下了
实例——爬汇图网图片
分析格式
打开http://www.huitu.com/
这网站也做的太缜密了,我找了半天,找到了1个!
由此可得图片格式为http://show.huitu.com/pic/…………jpg
修改代码
将代码修改为
import requests
import re
#获取内容
webPage=requests.get("http://www.huitu.com/")
webPage=webPage.text
image_re=re.compile(r'http://show.huitu.com/pic/.*?jpg')#构建正则表达式
sprite_image=image_re.findall(webPage)#过滤并传入数据
a=range(len(sprite_image))#保存数列
for b in a:sprite_image_1=requests.get(sprite_image[b])#对图片发送请求#存储信息spritePage=open("图%s.png"%b,"wb")#新建并打开或打开文件'''前一个参数为打开文件,另一个是打开模式运行时,变量b会替换到%s的地方'''spritePage.write(sprite_image_1.content)#写入信息spritePage.close()#关闭并保存print("成功保存%s个图片\n"%b)#保存提示
我们变动了第4,6行代码的内容,并修改了第12行的保存名
运行效果
进阶代码
代码部分
进阶代码可以一次获取两个格式
代码:
import requests
import re
import os
webPage=requests.get("网址")
webPage=webPage.text
image_re_jpg=re.compile(r'https.*?jpg')
sprite_image_jpg=image_re_jpg.findall(webPage)
a=range(len(sprite_image_jpg))
for b in a:sprite_image_1_jpg=requests.get(sprite_image_jpg[b])spritePage_jpg=open("jpg图%s.jpg"%b,"wb")spritePage_jpg.write(sprite_image_1_jpg.content)spritePage_jpg.close()print("成功保存%s个jpg图片\n"%b)
image_re_png=re.compile(r'https.*?png')
sprite_image_png=image_re_png.findall(webPage)
c=range(len(sprite_image_png))
for d in c:sprite_image_1_png=requests.get(sprite_image_png[d])spritePage_png=open("png图%s.png"%d,"wb")spritePage_png.write(sprite_image_1_png.content)spritePage_png.close()print("成功保存%s个png图片\n"%d)
实例
代码
import requests
import re
import os
webPage=requests.get("https://blog.csdn.net/weixin_43233491")
webPage=webPage.text
image_re_jpg=re.compile(r'https.*?jpg')
sprite_image_jpg=image_re_jpg.findall(webPage)
a=range(len(sprite_image_jpg))
for b in a:sprite_image_1_jpg=requests.get(sprite_image_jpg[b])spritePage_jpg=open("jpg图%s.jpg"%b,"wb")spritePage_jpg.write(sprite_image_1_jpg.content)spritePage_jpg.close()print("成功保存%s个jpg图片\n"%b)
image_re_png=re.compile(r'https.*?png')
sprite_image_png=image_re_png.findall(webPage)
c=range(len(sprite_image_png))
for d in c:sprite_image_1_png=requests.get(sprite_image_png[d])spritePage_png=open("png图%s.png"%d,"wb")spritePage_png.write(sprite_image_1_png.content)spritePage_png.close()print("成功保存%s个png图片\n"%d)