【Java爬虫】004
【Java爬虫】004
1、方式一:直接通过创建Connection连接对象获取html
示例代码:
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取Document文档对象
Document document = ("/?m=vod-type-id-1.html").get();
//输出文档的html内容
println(document.html());
}
}
2、方式二:先获取Respe对象,再通过Respe对象获取html
(其中包含通过Respe对象获取其他信息的示例代码)
示例代码:
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import java.io.IOException;
import java.URL;
import StandardCharsets;
public class Main {
public static void main(String[] args) throws IOException {
//先获取Respe对象,再通过Respe对象获取html
Connection.Respe respe = ("/?m=vod-type-id-1.html").method(Connection.Method.GET).execute();
//获取请求的url
URL url = respe.url();
println("请求的url为:" + url);
//获取响应状态码
int statusCode = respe.statusCode();
println("响应状态码为:" + statusCode);
//获取响应数据类型
String contentType = ();
println("响应数据类型为:" + contentType);
//获取响应信息
String statusMessage = respe.statusMessage();
println("响应信息为:" + statusMessage);
//如果状态码等于200,说明获取请求成功
if(statusCode==200){
//获取html
String html = new String(respe.bodyAsBytes(), StandardCharsets.UTF_8);
//获取对应的Document对象(Document和html内容是一样的,Document更加格式化)
// Document document = respe.parse();
println(html);
}
}
}
运行结果:
1、设置单条请求头信息
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取Document文档对象
Connection connect = ("/?m=vod-type-id-1.html");
//设置一条请求头
connect.header("User-Agent","Mozilla/5.0 (Windows T 10.0; WOW64) AppleWebKit/57.6 (KHTML, like Gecko) Chrome/78.0.904.108 Safari/57.6");
//获取Document文档对象
Document document = connect.get();
//输出文档的html内容
println(document.html());
}
}
2、设置多条请求头信息
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) throws IOException {
//获取Document文档对象
Connection connect = ("/?m=vod-type-id-1.html");
//设置多条请求头:将多条请求头存入map集合
Map<String,String> headers = new HashMap<>();
headers.put("Accept","*/*");
headers.put("Content-Type","application/x-www-form-urlencoded");
headers.put("Referer","/?m=vod-type-id-1.html");
headers.put("User-Agent","Mozilla/5.0 (Windows T 10.0; WOW64) AppleWebKit/57.6 (KHTML, like Gecko) Chrome/78.0.904.108 Safari/57.6");
connect.headers(headers);
//获取Document文档对象
Document document = connect.get();
//输出文档的html内容
println(document.html());
}
}
、常规做法
做法:
使用一个静态Builder类,将使用的各种参数封装进去;
User-Agent和Referer从列表中随机挑选一个(防止被网站反爬虫程序发现);
常用User-Agent:
代码语言:javascript代码运行次数:0运行复制userAgent
1) Chrome
Win7:
Mozilla/5.0 (Windows T 6.1; WOW64) AppleWebKit/55.1 (KHTML, like Gecko) Chrome/14.0.85.16 Safari/55.1
2) Firefox
Win7:
Mozilla/5.0 (Windows T 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0
) Safari
Win7:
Mozilla/5.0 (Windows T 6.1; WOW64) AppleWebKit/54.50 (KHTML, like Gecko) Version/5.1 Safari/54.50
4) Opera
Win7:
Opera/9.80 (Windows T 6.1; U; zh-cn) Presto/2.9.168 Version/11.50
5) IE
Win7+ie9:
Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; Win64; x64; Trident/5.0; .ET CLR 2.0.50727; SLCC2; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; Tablet PC 2.0; .ET4.0E)
Win7+ie8:
Mozilla/4.0 (compatible; MSIE 8.0; Windows T 6.1; WOW64; Trident/4.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; .ET4.0C; InfoPath.)
WinXP+ie8:
Mozilla/4.0 (compatible; MSIE 8.0; Windows T 5.1; Trident/4.0; GTB7.0)
WinXP+ie7:
Mozilla/4.0 (compatible; MSIE 7.0; Windows T 5.1)
WinXP+ie6:
Mozilla/4.0 (compatible; MSIE 6.0; Windows T 5.1; SV1)
6) 傲游
傲游.1.7在Win7+ie9,高速模式:
Mozilla/5.0 (Windows; U; Windows T 6.1; ) AppleWebKit/54.12 (KHTML, like Gecko) Maxthon/.0 Safari/54.12
傲游.1.7在Win7+ie9,IE内核兼容模式:
Mozilla/4.0 (compatible; MSIE 7.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E)
7) 搜狗
搜狗.0在Win7+ie9,IE内核兼容模式:
Mozilla/4.0 (compatible; MSIE 7.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E; SE 2.X MetaSr 1.0)
搜狗.0在Win7+ie9,高速模式:
Mozilla/5.0 (Windows; U; Windows T 6.1; en-US) AppleWebKit/54. (KHTML, like Gecko) Chrome/6.0.472. Safari/54. SE 2.X MetaSr 1.0
8) 60
60浏览器.0在Win7+ie9:
Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E)
9) QQ浏览器
QQ浏览器6.9(11079)在Win7+ie9,极速模式:
Mozilla/5.0 (Windows T 6.1) AppleWebKit/55.1 (KHTML, like Gecko) Chrome/1.0.782.41 Safari/55.1 QQBrowser/6.9.11079.201
QQ浏览器6.9(11079)在Win7+ie9,IE内核兼容模式:
Mozilla/4.0 (compatible; MSIE 7.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E) QQBrowser/6.9.11079.201
10) 阿云浏览器
阿云浏览器1..0.1724 Beta(编译日期2011-12-05)在Win7+ie9:
Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; WOW64; Trident/5.0)
代码示例:
Builder静态类:
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup.data;
import java.util.Arrays;
import java.util.List;
public class Builder {
//常用User-Agent
private static final String[] userAgentStrs = {
"Mozilla/5.0 (Windows T 6.1; WOW64) AppleWebKit/55.1 (KHTML, like Gecko) Chrome/14.0.85.16 Safari/55.1",
"Mozilla/5.0 (Windows T 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0",
"Mozilla/5.0 (Windows T 6.1; WOW64) AppleWebKit/54.50 (KHTML, like Gecko) Version/5.1 Safari/54.50",
"Opera/9.80 (Windows T 6.1; U; zh-cn) Presto/2.9.168 Version/11.50",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; Win64; x64; Trident/5.0; .ET CLR 2.0.50727; SLCC2; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; Tablet PC 2.0; .ET4.0E)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows T 6.1; WOW64; Trident/4.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; .ET4.0C; InfoPath.)",
"Mozilla/4.0 (compatible; MSIE 8.0; Windows T 5.1; Trident/4.0; GTB7.0)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows T 5.1)",
"Mozilla/4.0 (compatible; MSIE 6.0; Windows T 5.1; SV1)",
"Mozilla/5.0 (Windows; U; Windows T 6.1; ) AppleWebKit/54.12 (KHTML, like Gecko) Maxthon/.0 Safari/54.12",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E)",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E; SE 2.X MetaSr 1.0)",
"Mozilla/5.0 (Windows; U; Windows T 6.1; en-US) AppleWebKit/54. (KHTML, like Gecko) Chrome/6.0.472. Safari/54. SE 2.X MetaSr 1.0",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E)",
"Mozilla/5.0 (Windows T 6.1) AppleWebKit/55.1 (KHTML, like Gecko) Chrome/1.0.782.41 Safari/55.1 QQBrowser/6.9.11079.201",
"Mozilla/4.0 (compatible; MSIE 7.0; Windows T 6.1; WOW64; Trident/5.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; .ET4.0E) QQBrowser/6.9.11079.201",
"Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; WOW64; Trident/5.0)"
};
//User-Agent库
public static List<String> userAgentList = Arrays.asList(userAgentStrs);
//User-Agent列表长度
public static int userAgentSize = userAgentList.size();
//RefererList库,可根据需求增加更多的referer
public static final String[] refererStrs = {
"https://www.***/"
};
//RefererList库
public static List<String> refererList = Arrays.asList(refererStrs);
//RefererList库长度
public static int refererSize = refererList.size();
//设置accept、accept-language、accept-Encoding
public static String accept = "*/*";
public static String acceptLanguage = "zh-cn,zh;q=0.5";
public static String acceptEncoding = "gzip, deflate";
public static String host;
}
Main测试类:
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import com.zb.book.jsoup.data.Builder;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;
public class Main {
public static void main(String[] args) throws IOException {
//获取Document文档对象
Connection connect = ("/?m=vod-type-id-1.html");
//设置host,这里不再进行具体设置
Builder.host = "";
//设置多条请求头:将多条请求头存入map集合
Map<String,String> headers = new HashMap<>();
headers.put("Accept","*/*");
headers.put("Content-Type","application/x-www-form-urlencoded");
//随机选一个Referer
headers.put("Referer",Builder.refererList.get(new Random().nextInt(Builder.refererSize)));
//随机选一个User-Agent
headers.put("User-Agent",Builder.userAgentList.get(new Random().nextInt(Builder.userAgentSize)));
headers.put("Accept-Language",Builder.acceptLanguage);
headers.put("Accept-Encoding",Builder.acceptEncoding);
connect.headers(headers);
//获取Document文档对象
Document document = connect.get();
//输出文档的html内容
println(document.html());
}
}
1、5种方式
(常用前种,代码示例见下方)
代码语言:javascript代码运行次数:0运行复制 Connection data(String key, String value);
Connection data(String... keyvals);
Connection data(Map<String, String> data);
Connection data(String key, String filename, InputStream inputStream);
Connection data(String key, String filename, InputStream inputStream, String contentType);
Connection data(Collection<Connection.KeyVal> data);
2、第一种方式代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取Connection连接对象
Connection connect = ("/?m=vod-type-id-1.html");
//设置提交的请求参数-核心内容
connect.data("key1","value1").data("key1","value2");
//获取Document文档对象
Document document = connect.get();
//输出文档的html内容
println(document.html());
}
}
、第二种方式代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取Connection连接对象
Connection connect = ("/?m=vod-type-id-1.html");
//设置提交的请求参数-核心内容
connect.data("key1","value1","key2","value2");
//获取Document文档对象
Document document = connect.get();
//输出文档的html内容
println(document.html());
}
}
4、第三种方式代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class Main {
public static void main(String[] args) throws IOException {
//获取Connection连接对象
Connection connect = ("/?m=vod-type-id-1.html");
//设置提交的请求参数-核心内容
Map<String,String> data = new HashMap<>();
data.put("key1","value1");
data.put("key2","value2");
connect.data(data);
//获取Document文档对象
Document document = connect.get();
//输出文档的html内容
println(document.html());
}
}
1、情况一代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取Document文档对象
Document document = Jsoup
.connect("/?m=vod-type-id-1.html")
.timeout(000)
.get();
//输出文档的html内容
println(document.html());
}
}
2、情况二代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取respe响应对象
Connection.Respe respe = Jsoup
.connect("/?m=vod-type-id-1.html")
.method(Connection.Method.GET)
.timeout(000)
.execute();
//获取Document文档对象
Document document = respe.parse();
//输出文档的html内容
println(document.html());
}
}
、备注
若未设置,默认为0秒;
1、什么是代理服务器
代理服务器是介于客户端和Web服务器之间的另一台服务器,基于代理服务器,浏览器不再直接从Web服务器获取数据,而是向代理服务器发出请求,信号会先发送到代理服务器,由代理服务器取回浏览器所需要的信息。也可以理解为中介。
2、为什么要使用代理服务器
好处一:
能够高度隐藏爬虫的真是IP,从而防止爬虫被服务器封锁;
好处二:
普通网络爬虫IP固定,需要设置随机休息时间,而代理服务器不需要,从而能够提高数据采集的效率;
、代理服务器的来源
免费代理服务的一些网站或网站接口,但此种稳定性差;
也可以通过付费的方式获取商业级代理,其提供的IP地址可用率较高,稳定性较强;
4、设置代理服务器的两种方式
说明:
这里只是用一个代理服务器的IP地址和端口进行演示,实际使用中往往需要构建代理服务器库,不断地切换代理服务器去请求URL库;
两个方法:
代码语言:javascript代码运行次数:0运行复制 Connection proxy(Proxy proxy);
Connection proxy(String host, int port);
方式一代码演示:
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
import java.InetSocketAddress;
import java.Proxy;
public class Main {
public static void main(String[] args) throws IOException {
//设置代理
Proxy proxy = new Proxy(Proxy.Type.HTTP, new InetSocketAddress("171.221.29.11", 808));
//获取Document文档对象
Document document = Jsoup
.connect("/?m=vod-type-id-1.html")
.proxy(proxy)
.get();
//输出文档的html内容
println(document.html());
}
}
方式二代码演示:
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Jsoup;
import org.Document;
import java.io.IOException;
public class Main {
public static void main(String[] args) throws IOException {
//获取Document文档对象
Document document = Jsoup
.connect("/?m=vod-type-id-1.html")
.proxy("171.221.29.11", 808)//设置代理
.get();
//输出文档的html内容
println(document.html());
}
}
1、概述
使用Jsoup下载图片、PDF和压缩文件时,需要将响应转化为输出流,目的是增强写文件的能力,即以字节为单位写入指定文件;
另外,针对图片和PDF等文件,之执行URL请求获取Respe时,必须通过ignoreContentType(boolean ignoreContentType)方法设置忽略对应内容的类型,否则会报错;
2、代码演示
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import java.io.*;
public class Main {
public static void main(String[] args) throws IOException {
//获取Connection连接对象
Connection connect = (".gif");
//获取respe
Connection.Respe respe = (Connection.Method.GET).ignoreContentType(true).execute();
//获取输入流
BufferedInputStream bufferedInputStream = respe.bodyStream();
//写出图片
byte[] buffer = new byte[1024];
int len = 0;
//创建缓冲流
FileOutputStream fileOutputStream = new FileOutputStream(new File("C:\\Users\\ZiBo\\Desktop\\1.gif"));
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
while ((len = bufferedInputStream.read(buffer,0,1024)) != -1){
bufferedOutputStream.write(buffer,0,len);
}
//缓冲流的释放与关闭
bufferedOutputStream.flush();
();
}
}
、运行结果(下载成功)
七、HTTPS请求证书
1、HTTPS概述
以https://为前缀的URL使用的是HTTPS协议,HTTPS是在HTTP的基础上加入了SSL(安全套接层)。SSL的作用是保障网络通信的安全性,其广泛应用于客户端与服务器之间的身份认证和加密数据传输。
SSL支持双向认证(服务器认证与客户端认证),将服务器证书下载到客户端,再将客户端的证书返回到服务器。目前,访问网络并不常用客户端证书,大部分用户都没有自己的客户端证书,但HTTPS总要求使用客户端证书。其中,使用最多的客户端证书是X.509证书。
网络爬虫在请求以https://为前缀的URL时,通常也需要创建X.509证书信任管理器。若没有创建证书,咋可能出现不到合法证书的错误。
2、代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import org.Document;
import javax.ssl.*;
import java.io.IOException;
import java.X509Certificate;
public class JsoupConnectSSLInit {
public static void main(String[] args) throws IOException {
initUnSecureTSL();
String url = "/";
//创建连接
Connection connect = (url);
//请求网页
Document document = connect.get();
//输出HTML
println(document.html());
}
private static void initUnSecureTSL() {
// 创建信任管理器(不验证证书)
final TrustManager[] trustAllCerts = new TrustManager[]{new X509TrustManager() {
//检查客户端证书
public void checkClientTrusted(final X509Certificate[] chain, final String authType) {
//do nothing 接受任意客户端证书
}
//检查服务器端证书
public void checkServerTrusted(final X509Certificate[] chain, final String authType) {
//do nothing 接受任意服务端证书
}
//返回受信任的X509证书
public X509Certificate[] getAcceptedIssuers() {
return null; //或者return new X509Certificate[0];
}
}};
try {
// 创建SSLContext对象,并使用指定的信任管理器初始化
SSLContext sslContext = SSLContext.getInstance("SSL");
sslContext.init(null, trustAllCerts, new java.security.SecureRandom());
基于信任管理器,创建套接字工厂 (ssl socket factory)
SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory();
//给HttpsURLConnection配置SSLSocketFactory
HttpsURLConnection.setDefaultSSLSocketFactory(sslSocketFactory);
} catch (Exception e) {
e.printStackTrace();
}
}
}
1、说明
默认情况下,Jsoup最大只能获取1MB的文件,我们在获取超过1MB的图片、压缩包等文件会导致无法查看;可以通过maxBodySize(int bytes)方法来设置请求文件限制;
2、代码示例
代码语言:javascript代码运行次数:0运行复制package com.zb.book.jsoup;
import org.jsoup.Connection;
import org.jsoup.Jsoup;
import java.io.*;
public class Main {
public static void main(String[] args) throws IOException {
//获取Connection连接对象
Connection connect = ("/%E5%AD%A4%E7%8B%AC5.1.apk");
//获取respe
Connection.Respe respe = (Integer.MAX_VALUE).method(Connection.Method.GET).ignoreContentType(true).execute();
//获取输入流
BufferedInputStream bufferedInputStream = respe.bodyStream();
//写出图片
byte[] buffer = new byte[1024];
int len = 0;
//创建缓冲流
FileOutputStream fileOutputStream = new FileOutputStream(new File("C:\\Users\\ZiBo\\Desktop\\1.apk"));
BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(fileOutputStream);
while ((len = bufferedInputStream.read(buffer,0,1024)) != -1){
bufferedOutputStream.write(buffer,0,len);
}
//缓冲流的释放与关闭
bufferedOutputStream.flush();
();
}
}
本文参与 腾讯云自媒体同步曝光计划,分享自作者个人站点/博客。 原始发表:2025-01-06,如有侵权请联系 cloudcommunity@tencent 删除java爬虫jsoup对象学习笔记 #感谢您对电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格的认可,转载请说明来源于"电脑配置推荐网 - 最新i3 i5 i7组装电脑配置单推荐报价格
上一篇:【Java爬虫】005
下一篇:【Java爬虫】003
推荐阅读
留言与评论(共有 11 条评论) |
本站网友 你永远不懂我伤悲 | 13分钟前 发表 |
代理服务器的使用1 | |
本站网友 姜皓天 | 21分钟前 发表 |
从而能够提高数据采集的效率; | |
本站网友 意见书格式 | 28分钟前 发表 |
使用最多的客户端证书是X.509证书 | |
本站网友 大夫在我国古代是指 | 1分钟前 发表 |
"Mozilla/5.0 (Windows; U; Windows T 6.1; en-US) AppleWebKit/54. (KHTML | |
本站网友 39521 | 6分钟前 发表 |
创建套接字工厂 (ssl socket factory) SSLSocketFactory sslSocketFactory = sslContext.getSocketFactory(); //给HttpsURLConnection配置SSLSocketFactory HttpsURLConnection.setDefaultSSLSocketFactory(sslSocketFactory); } catch (Exception e) { e.printStackTrace(); } } }八 | |
本站网友 移动合约机套餐 | 21分钟前 发表 |
如有侵权请联系 cloudcommunity@tencent 删除前往查看java爬虫jsoup对象学习笔记 | |
本站网友 雅安市委书记 | 16分钟前 发表 |
Jsoup最大只能获取1MB的文件 | |
本站网友 赢了 | 12分钟前 发表 |
0 | |
本站网友 北京家政网 | 0秒前 发表 |
Builder.userAgentList.get(new Random().nextInt(Builder.userAgentSize))); headers.put("Accept-Language" | |
本站网友 银屑病偏方 | 17分钟前 发表 |
Opera/9.80 (Windows T 6.1; U; zh-cn) Presto/2.9.168 Version/11.50 5) IE Win7+ie9: Mozilla/5.0 (compatible; MSIE 9.0; Windows T 6.1; Win64; x64; Trident/5.0; .ET CLR 2.0.50727; SLCC2; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; InfoPath.; .ET4.0C; Tablet PC 2.0; .ET4.0E) Win7+ie8: Mozilla/4.0 (compatible; MSIE 8.0; Windows T 6.1; WOW64; Trident/4.0; SLCC2; .ET CLR 2.0.50727; .ET CLR .5.0729; .ET CLR .0.0729; Media Center PC 6.0; .ET4.0C; InfoPath.) WinXP+ie8: Mozilla/4.0 (compatible; MSIE 8.0; Windows T 5.1; Trident/4.0; GTB7.0) WinXP+ie7: Mozilla/4.0 (compatible; MSIE 7.0; Windows T 5.1) WinXP+ie6: Mozilla/4.0 (compatible; MSIE 6.0; Windows T 5.1; SV1) 6) 傲游 傲游.1.7在Win7+ie9 |