Python 爬虫实战爬取糗事百科段子_Python爬虫

Python 爬虫实战爬取糗事百科段子

创始人

2023-05-24 21:55:29

0次

下面是一个简单的 Python 爬虫实战示例，演示如何爬取糗事百科的段子：

import requests
from bs4 import BeautifulSoup

url = 'https://www.qiushibaike.com/text/'

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

response = requests.get(url, headers=headers)

soup = BeautifulSoup(response.text, 'html.parser')

articles = soup.find_all('div', class_='article')

for article in articles:
    author = article.find('h2').string.strip()
    content = article.find('div', class_='content').find('span').get_text(strip=True)
    print('作者：', author)
    print('内容：', content)
    print('-' * 50)

在这个示例中，我们首先定义了要爬取的网页 URL，并设置了请求头。然后，使用 requests 库发送 HTTP 请求，并使用 BeautifulSoup 库解析 HTML 页面。

接着，我们使用 find_all 方法找到所有的段子，遍历每个段子，提取出作者和内容，并打印到控制台上。

需要注意的是，糗事百科的网页结构可能会随时改变，因此需要根据实际情况进行调整。另外，为了避免被网站封禁 IP，可以考虑添加一些延时和随机 User-Agent 等策略

上一篇： Python中urllib和urllib2库的用法

下一篇： Python 爬虫利器 Requests 库的用法