java通过Jsoup爬取网页过程详解
时间:2021-05-12 09:11:41|栏目:JAVA代码|点击: 次
这篇文章主要介绍了java通过Jsoup爬取网页过程详解,文中通过示例代码介绍的非常详细,对大家的学习或者工作具有一定的参考学习价值,需要的朋友可以参考下
一,导入依赖
<!--java爬虫-->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.10.3</version>
</dependency>
<!--httpclient依赖-->
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
二,编写demo类
注意不要导错包了,是org.jsoup.nodes下面的
package com.taotao.entity;
import org.apache.http.HttpEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import java.io.IOException;
/**
* Author: TaoTao 2019/9/26
*/
public class intefaceTest {
public static void main(String[] args) throws IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();//创建httpClient
HttpGet httpGet = new HttpGet("http://www.cnblogs.com/");//创建httpget实例
CloseableHttpResponse response = httpClient.execute(httpGet);//执行get请求
HttpEntity entity = response.getEntity();//获取返回实体
String content = EntityUtils.toString(entity,"utf-8");//网页内容
response.close();//关闭流和释放系统资源
Jsoup.parse(content);
Document doc = Jsoup.parse(content);//解析网页得到文档对象
Elements elements = doc.getElementsByTag("title");//获取tag是title的所有dom文档
Element element = elements.get(0);//获取第一个元素
String title = element.text(); //.html是返回html
System.out.println("网页标题:"+title);
Element element1 = doc.getElementById("site_nav_top");//获取id=site_nav_top标签
String str = element1.text();
System.out.println("str:"+str);
}
}


阅读排行
- 1Java Swing组件BoxLayout布局用法示例
- 2java中-jar 与nohup的对比
- 3Java邮件发送程序(可以同时发给多个地址、可以带附件)
- 4Caused by: java.lang.ClassNotFoundException: org.objectweb.asm.Type异常
- 5Java中自定义异常详解及实例代码
- 6深入理解Java中的克隆
- 7java读取excel文件的两种方法
- 8解析SpringSecurity+JWT认证流程实现
- 9spring boot里增加表单验证hibernate-validator并在freemarker模板里显示错误信息(推荐)
- 10深入解析java虚拟机




