自建各大平台热榜API_v1.1.5

WebRA2个月前更新 webra
3,517 17 0

v1.1.5 – 2024/01/27 16:12:57 更新

新增如下接口

  • IT之家(日榜、周榜、热评榜、月榜)
  • 腾讯新闻热点榜
  • 微信读书(飙升榜、新书榜、神作榜、小说榜、热搜榜、潜力榜、总榜)
  • 起点小说(月票榜、畅销榜、阅读指数榜、推荐榜、收藏榜、签约作者新书榜、公众作家新书榜、月票VIP榜)
  • 纵横小说(月票榜、24h畅销榜、新书榜、推荐榜、新书订阅榜、点击榜)

2023/04/13 10:57:37(暂未解决)
有哪位大佬能告知微信的热榜来源,我这边可提供部署api及更新

自定义热榜缓存问题 (已解决)

实际使用起来,缓存问题一直解决不了,查了数据库,在后台管理界面设置的几分钟过期,在数据中的过期时间是当前时间的历史时间,根本实现不了缓存失效然后清缓存的操作

实在翻不动iothem主题的代码了,直接在mysql创建一条定时任务事件,每6个小时清除一次热搜的缓存,这样他想不去获取新数据都没办法绕过去了。


CREATE EVENT delete_old_hot_data
ON SCHEDULE EVERY 6 HOUR -- 每6小时执行一次
DO
  DELETE FROM wp_options WHERE option_name like '%hot_data%';

我不确定这对性能有什么影响,热榜服务端是部署在与wordpress同服务器中的,响应时间理论是很短的,我对缓存和php都了解不多,就这样吧

vim ./wp-content/themes/onenav/inc/hot-search.php
---------------
function io_get_hot_search_data(){
    $rule_id    = esc_sql($_REQUEST['id']);
    $type       = esc_sql($_REQUEST['type']);
    #$cache_key  = "io_free_hot_data_{$rule_id}_{$type}";

    #$_data      = get_transient($cache_key);
    # 注释掉下面两行
    #if($_data)
    #    io_error(array("status" => 1, "data" => $_data), false, 10);
-------------
   # 跟上面同一文件中的内容,算是缩减下代码,这个修改只在热榜和博客部署在同服务器才可以这么用
   # 下面的内容进行注释
   # $_ua = array(
   #     '[dev]general information acquisition module - level 30 min, version:3.2',
   #     "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.164 Safari/537.36",
   #     "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36",
   # );
   # $default_ua = array('userAgent'=>$_ua[wp_rand(0,2)]);
   # $custom_api = get_option( 'io_hot_search_list' )[$type.'_list'];
   # $custom_data= $custom_api[$rule_id-1];
   # $api_url    = $custom_data['url'];
   # $api_cache  = isset($custom_data['cache']) ? (int)$custom_data['cache'] : 60;
   # $api_data   = isset($custom_data['request_data']) ? io_option_data_to_array($custom_data['request_data']) : '';
   # $api_method = strtoupper(isset($custom_data['request_type']) ? $custom_data['request_type'] : 'get');
   # $api_header = isset($custom_data['headers']) ? io_option_data_to_array($custom_data['headers'], $default_ua) : $default_ua;
   # $api_cookie = isset($custom_data['cookies']) ? io_option_data_to_array($custom_data['cookies']) : '';


   # $http = new Yurun\Util\HttpRequest;
   # $http->headers($api_header);
   # if($api_cookie)
   #     $http->cookies($api_cookie);

   # $response = $http->send($api_url, $api_data, $api_method);
============================================
    # 注释下方添加以下内容
    $custom_api = get_option( 'io_hot_search_list' )[$type.'_list'];
    $custom_data= $custom_api[$rule_id-1];
    $api_url    = $custom_data['url'];
    $http = new Yurun\Util\HttpRequest;
    $response = $http->get($api_url);

目前主题的自定义热榜JSON数据源有缓存问题,热榜服务端关闭服务,首页还是有数据,并且是好几天前的数据,自定义都不去获取后端数据刷新旧数据,自定义的缓存时间屁用没有。

官方修没指望了,既然给了自定义接口,这是要搞什么?

自建各大平台热榜API_v1.1.5

起因

一年之前我购买了一为的这个One Nav 主题,赠送了一年的热榜API,一年的时间过的嗖嗖的,眼看着API就用不了,小站一枚,一年98的价格比我站点服务器都贵。

既然吝啬自己的钱包,又不舍得让自己这点流量看不到热榜,不妨自己写一写,也分享出来。

本次写代码使用Pycharm写的,配合了一个插件,名字叫做:Codeium ,该插件需要注册才能免费使用

如下图所示,在我还没有写出代码的情况下,插件已经预测了我要写的内容

自建各大平台热榜API_v1.1.5

使用感受就是,你重复代码写的越多,插件就越能理解你要写的什么,他会检索你整个项目下的所有内容,并给出建议,建议采纳与否还是要你自己看代码是否合适。

流程图

自建各大平台热榜API_v1.1.5

更新日志及源代码

接口地址 接口说明
IP:5000/wuai吾爱破解的人气热门
ip:5000/zhihu知乎热榜
ip:5000/bili/daybilibili全站日榜
ip:5000/bili/hotbilibili热搜榜
ip:5000/acfunafcun热榜
ip:5000/hupu虎扑热榜
ip:5000/smzdm什么值得买热榜
ip:5000/weibo微博热榜
ip:5000/tieba贴吧热议榜
ip:5000/weixin这个接口不能用
ip:5000/ssp少数派热榜
ip:5000/36k/renqi36氪人气榜
ip:5000/36k/zonghe36氪综合榜
ip:5000/36k/shoucang36氪收藏榜
ip:5000/baidu百度热榜
ip:5000/douyin抖音热榜
ip:5000/csdnCSDN热榜
ip:5000/history历史上的今天(魔法)
ip:5000/douban豆瓣新片榜
ip:5000/ghbk果核剥壳
ip:5000/it/dayIT之家_日榜
ip:5000/it/weekIT之家_周榜
ip:5000/it/hotIT之家_热评榜
ip:5000/it/monthIT之家_月榜
ip:5000/tencent腾讯新闻热点榜
ip:5000/wxbook/soar微信读书_飙升榜
ip:5000/wxbook/new微信读书_新书榜
ip:5000/wxbook/god微信读书_神作榜
ip:5000/wxbook/novel微信读书_小说榜
ip:5000/wxbook/hot微信读书_热搜榜
ip:5000/wxbook/potential微信读书_潜力榜
ip:5000/wxbook/all微信读书_总榜
ip:5000/qidian/yuepiao起点中文网_月票榜
ip:5000/qidian/changxiao起点中文网_畅销榜
ip:5000/qidian/zhisu起点中文网_阅读指数榜
ip:5000/qidian/tuijina起点中文网_推荐榜
ip:5000/qidian/shoucang起点中文网_收藏榜
ip:5000/qidian/new起点中文网_签约作者新书榜
ip:5000/qidian/new_2起点中文网_公众作家新书榜
ip:5000/qidian/yuepiao_vip起点中文网_月票_VIP榜
ip:5000/zongheng/yuepiao纵横中文网_月票榜
ip:5000/zongheng/24h纵横中文网_24h畅销榜
ip:5000/zongheng/new纵横中文网_新书榜
ip:5000/zongheng/tuijian纵横中文网_推荐榜
ip:5000/zongheng/new_dingyue纵横中文网_新书订阅榜
ip:5000/zongheng/dianji纵横中文网_点击榜
接口地址接口说明
ip:5000/get_wuai_data吾爱破解的人气热门
ip:5000/get_zhihu_data知乎热榜
ip:5000/get_bilibili_databilibili全站日榜
ip:5000/get_bilibili_hotbilibili热搜榜
ip:5000/get_acfun_dataafcun热榜
ip:5000/get_hupu_data虎扑热榜
ip:5000/get_smzdm_data什么值得买热榜
ip:5000/get_weibo_data微博热榜
ip:5000/get_tieba_data贴吧热议榜
ip:5000/get_weixin_data这个接口不能用
ip:5000/get_ssp_data少数派热榜
ip:5000/get_36k_data/renqi36氪人气榜
ip:5000/get_36k_data/zonghe36氪综合榜
ip:5000/get_36k_data/shoucang36氪收藏榜

新增如下接口

  • IT之家(日榜、周榜、热评榜、月榜)
  • 腾讯新闻热点榜
  • 微信读书(飙升榜、新书榜、神作榜、小说榜、热搜榜、潜力榜、总榜)
  • 起点小说(月票榜、畅销榜、阅读指数榜、推荐榜、收藏榜、签约作者新书榜、公众作家新书榜、月票VIP榜)
  • 纵横小说(月票榜、24h畅销榜、新书榜、推荐榜、新书订阅榜、点击榜)
隐藏内容!
评论后才能查看!
  1. 移除讯代理相关代码,使用免费的代理接口(免费相对不稳定,但是可以用)
  2. 新增如下接口
    • 历史上的今天
      • 需要你的机器能访问zh.wikipedia.org这个网站(可能需要魔法),没有找到其他更好的网站了
    • 豆瓣新片
    • 果核剥壳
隐藏内容!
登录后才能查看!
  1. 修复52论坛、知乎访问
    • 关于52论坛,如果访问量上去了,会出现403权限拒绝的现象,这个解决方法就是加个代理
      • 我这里使用的讯代理,如果需要其他代理的支持,可以联系我
      • 讯代理这边9元1000个ip,然后把global_timeout_file参数设置为24,这样一天最多使用一个,1000个可以用1000天(3年)
    • 讯代理地址:该站点已停止运行!!!!!
    • 注册成功后
      • 点击顶部菜单栏中的【购买代理】
      • 选择优质代理,点击下方的【选择购买】
      • 在弹出的购买代理对话框中
        • 套餐类型选择【按量】
        • 然后点击【确定购买】
    • 购买后在网站的顶部菜单栏中点击【API接口】【优质代理API】
      • 选择你购买的订单
      • 提取数列选择【1】
      • 数据格式选择【JSON】
      • 点击【生成API链接】
      • 将生成的链接复制到代码的双引号中【proxy_address = “”】
  2. 修复访问网站,代码提示443
隐藏内容!
登录后才能查看!
隐藏内容!
登录后才能查看!
  1. 修复缓存文件一个小时后不更新的问题
  2. 新增csdn热榜
隐藏内容!
登录后才能查看!
  1. 优化代码为500行以内
  2. 更新百度热榜、抖音热榜
  3. 优化接口名称(将接口名简短表示)
  4. 修复若干bug
  5. 请看最新版本接口(本版本存在bug),不删除是因为要留存版本记录
隐藏内容!
登录后才能查看!
  1. 600多行PY代码
  2. 涉及吾爱、知乎、bilibili、acfun、虎扑、什么值得买、微博、贴吧、少数派、36氪
  3. 本版本相对于高版本只是接口不再更新,代码复用率低点,乱点,bug还是会同高版本同时修复
"""
@author: webra
@time: 2023/3/27 13:31
@description: 
@update: 2023/04/03 14:29:35
"""
import glob
import re
import time

import requests
from bs4 import BeautifulSoup
import datetime
import json
from lxml import etree
import random
import os


from flask import Flask

now = datetime.datetime.now()
timestamp = int(time.time())


app = Flask(__name__)

def read_file(file_path):
    with open(file_path, 'r') as f:
        content = f.read()
        return content

def del_file(file_path):
    if os.path.exists(file_path):
        os.remove(file_path)

def write_file(file_path, content):
    with open(file_path, 'w', encoding='utf-8') as f:
        f.write(content)

@app.route("/")
def index():
    data_dict = {"get_wuai_data": "吾爱热榜",
        "get_zhihu_data": "知乎热榜",
        "get_bilibili_data": "bilibili全站日榜",
        "get_bilibili_hot": "bilibili热搜榜",
        "get_acfun_data": "acfun热榜",
        "get_hupu_data": "虎扑步行街热榜",
        "get_smzdm_data": "什么值得买热榜",
        "get_weibo_data": "微博热榜",
        "get_tieba_data": "百度贴吧热榜",
        "get_weixin_data": "微信热榜",
        "get_ssp_data": "少数派热榜",
        "get_36k_data": "36Kr热榜"}

    json_data = {}
    json_data["secure"] = True
    json_data["title"] = "热榜接口"
    json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
    json_data["data"] = data_dict
    return json.dumps(json_data, ensure_ascii=False)
@app.route("/test")
def test():
    return "test"



user_agents = [
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36",
    "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Mobile Safari/537.36",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246",
    "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.135 Safari/537.36 Edge/12.246",
    "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; AS; rv:11.0) like Gecko",
    "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; AS; rv:11.0) like Gecko",
    "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; AS; rv:11.0) like Gecko",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; Trident/7.0; AS; rv:11.0) like Gecko",
    "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:54.0) Gecko/20100101 Firefox/54.0",
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:54.0) Gecko/20100101 Firefox/54.0",
    "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36"
]


def random_user_agent():
    return random.choice(user_agents)


def get_html(url, headers, cl=None):
    response = requests.get(url, headers=headers, stream=True)
    if cl is not None:
        return response
    html = BeautifulSoup(response.text, 'html.parser')
    return html

def get_data(filename, filename_re):
    file_names = glob.glob('./data/' + filename)
    file_names_len = len(file_names)
    if file_names_len == 1:
        # 文件名
        file_name = file_names[0]
        old_timestamp = int(re.findall(filename_re, file_name)[0])
        old_timestamp_datetime_obj = datetime.datetime.fromtimestamp(int(old_timestamp))
        time_diff = datetime.datetime.now() - old_timestamp_datetime_obj
        if time_diff > datetime.timedelta(hours=1):
            del_file(file_name)
            return None
        else:
            return read_file(file_name)
    else:
        if file_names_len > 1:
            [del_file(file_name) for file_name in file_names]
        return None

# 吾爱热榜
@app.route("/get_wuai_data")
def get_wuai_data():
    filename = "wuai_data_*.data"
    filename_re = "wuai_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "吾爱热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        headers = {
            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7",
            "Accept-Encoding": "gzip, deflate, br",
            "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
            "User-Agent": random_user_agent()
        }

        html = get_html("https://www.52pojie.cn/forum.php?mod=guide&view=hot", headers)
        articles = html.find_all('a', {'class': {"xst"}})
        articles_hot = html.find_all('span', {'class': {"xi1"}})
        num = 1
        for article, hot in zip(articles, articles_hot):
            data_dict = {}
            data_dict["index"] = num
            data_dict["title"] = article.string.strip()
            data_dict["url"] = "https://www.52pojie.cn/" + article.get('href')
            data_dict["hot"] = hot.string.strip()[:-3]
            num += 1
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "wuai_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content


# 知乎热榜
@app.route("/get_zhihu_data")
def get_zhihu_data():
    filename = "zhihu_data_*.data"
    filename_re = "zhihu_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "知乎热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")

        headers = {
            'cookie': '_zap=3195f691-793a-404b-a1c2-a152ead6b0ef; d_c0=AYBTpKH7IxaPTlkD0uawrZWC1OzdedJsmVg=|1673151895; ISSW=1; YD00517437729195:WM_TID=dCD5z+kvXkZBERAERReVJ4HqjJV+UbEX; _xsrf=ZRKz2ahWNVIiiMuVsgwncHmiJEnJfxNP; __snaker__id=0oLAfHUBBwwPa2yi; YD00517437729195:WM_NI=aUaJNpP+BSZZKrXd9A4FtimocRtXflmIudVaB2GQ3pbhE4PJ/ZZQbOxBl7emji362eQ1fwxZYVDH4VyUpXdoZgqArTuIoIC+WCUJeEIZyTNSJhClos1zt2x+AhoVOlM+SFQ=; YD00517437729195:WM_NIKE=9ca17ae2e6ffcda170e2e6eea2b36bf5f1f783e9468a8a8eb6c15e869b8eb1c8458cafffd8e66da3b781ccd82af0fea7c3b92aa38bafabcd7c94f58eb4d95289ae9885bc4bf89abc8af065a18ebcb9e568af97bb85d56b82ee89b2b27ca2b88bd5d342ed8696b2e145bc9ea4d8e867aab7fdd7b16f93efaa98fb529b8a8ca9b75ebb8bc0b9e7659ab9bfa2f160b797a1aac221959d9babe84f8b86b7b5cd598ab88d8dee79f3a6bcd3fb2183b8a1b7ce688ff59da7d437e2a3; q_c1=0b5f766a729d4411b01730ae52526275|1677133529000|1677133529000; tst=h; z_c0=2|1:0|10:1679897427|4:z_c0|80:MS4xMjVueENnQUFBQUFtQUFBQVlBSlZUVk9CRG1VaWJoVGFZQ0JYaTczbzNLcy1qRXlucUc4SGlRPT0=|2071b379b81c9c1ee8a391002cd616468bc31ac1c1883f75bfaf93f099c5bce3; Hm_lvt_98beee57fd2ef70ccdd5ca52b9740c49=1679672991,1679721877,1679839421,1679897389; SESSIONID=XB0nPsBsUVHWlgxviypJUHFVjvA26oSiK1qkBbd9VO7; JOID=W1gQBEJCI1qbP0u7NExKANeQvpEuE2Ro-28x_EUXfBfOXSzKRx0Xd_E2R7w4R0VDXFi445KUxEew11n4txI14hY=; osd=U1ASB09KK1iYMkOzNk9HCN-SvZwmG2Zr9mc5_kYadB_MXiHCTx8Uevk-Rb81T01BX1Ww65CXyU-41Vr1vxo34Rs=; Hm_lpvt_98beee57fd2ef70ccdd5ca52b9740c49=1679901672; KLBRSID=81978cf28cf03c58e07f705c156aa833|1679902388|1679897426',
            'referer': 'https://www.zhihu.com/',
            'sec-fetch-mode': 'cors',
            'sec-fetch-site': 'same-origin',
            'user-agent': random_user_agent()
        }
        html = get_html("https://www.zhihu.com/hot", headers, 1)
        etree_html = etree.HTML(html.content.decode('utf-8'))
        articles_title = etree_html.xpath('//*[@id="TopstoryContent"]/div/div/div[1]/section/div[2]/a/@title')
        articles_url = etree_html.xpath('//*[@id="TopstoryContent"]/div/div/div[1]/section/div[2]/a/@href')
        articles_hot = etree_html.xpath('//*[@id="TopstoryContent"]/div/div/div[1]/section/div[2]/div/text()')
        num = 1
        for title, url, hot in zip(articles_title, articles_url, articles_hot):
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = title
            data_dict["url"] = url
            data_dict["hot"] = hot[:-2]
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "zhihu_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content

# bilibili全站日榜,无got节点
@app.route("/get_bilibili_data")
def get_bilibili_data():
    filename = "bilibili_data_*.data"
    filename_re = "bilibili_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "哔哩哔哩全站热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")

        headers = {
            "User-Agent": random_user_agent()
        }

        html = get_html("https://api.bilibili.com/x/web-interface/ranking/v2?rid=0&type=all", headers, 1)
        html_dict = json.loads(html.text)
        num = 1
        for key in html_dict["data"]["list"]:
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = key["title"]
            data_dict["url"] = key["short_link"]
            data_dict["hot"] = ""
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "bilibili_data_" + str(timestamp) + ".data"
        print(filename)
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content

# bilibili热搜榜
@app.route("/get_bilibili_hot")
def get_bilibili_hot():
    filename = "bilibili_hot_*.data"
    filename_re = "bilibili_hot_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "bilibili热搜榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        res = requests.get("https://app.bilibili.com/x/v2/search/trending/ranking")
        html_dict = json.loads(res.text)
        for key in html_dict["data"]["list"]:
            data_dict = {}
            data_dict["index"] = key["position"]
            data_dict["title"] = key["show_name"]
            url = "https://search.bilibili.com/all?keyword=" + key[
                "keyword"] + "&from_source=webtop_search&spm_id_from=333.934"
            data_dict["url"] = url
            data_dict["hot"] = ""
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "bilibili_hot_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content


# acfun热榜
@app.route("/get_acfun_data")
def get_acfun_data():
    filename = "acfun_data_*.data"
    filename_re = "acfun_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "acfun热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        headers = {
            'referer': 'https://www.acfun.cn/',
            'user-agent': random_user_agent()
        }
        res = get_html("https://www.acfun.cn/rest/pc-direct/rank/channel?channelId=&subChannelId=&rankLimit=30&rankPeriod=DAY", headers, 1)
        html_dict = json.loads(res.text)
        num = 1
        for key in html_dict["rankList"]:
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = key["title"]
            data_dict["url"] = key["shareUrl"]
            data_dict["hot"] = key["viewCountShow"]
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "acfun_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content


# 虎扑步行街热榜
@app.route("/get_hupu_data")
def get_hupu_data():
    filename = "hupu_data_*.data"
    filename_re = "hupu_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "虎扑步行街热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        headers = {
            'referer': 'https://hupu.com/',
            'user-agent': random_user_agent()
        }
        res = requests.get("https://bbs.hupu.com/all-gambia", headers=headers)
        etree_html = etree.HTML(res.text)
        articles_title = etree_html.xpath('//*[@id="container"]/div/div[2]/div/div[2]/div/div[2]/div/div/div/div[1]/a/span/text()')
        articles_url = etree_html.xpath('//*[@id="container"]/div/div[2]/div/div[2]/div/div[2]/div/div/div/div[1]/a/@href')
        articles_top = etree_html.xpath('//*[@id="container"]/div/div[2]/div/div[2]/div/div[2]/div/div/div/div[1]/span[1]/text()')

        num = 1
        for title, url, top in zip(articles_title, articles_url, articles_top):
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = title
            data_dict["url"] = "https://bbs.hupu.com/" + url
            data_dict["top"] = top
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "hupu_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content




# 什么值得买热榜(日榜)
@app.route("/get_smzdm_data")
def get_smzdm_data():
    filename = "smzdm_data_*.data"
    filename_re = "smzdm_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "什么值得买热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        headers = {
            'referer': 'https://smzdm.com/',
            'user-agent': random_user_agent()
        }
        res = requests.get("https://post.smzdm.com/hot_1/", headers=headers)
        etree_html = etree.HTML(res.text)
        articles_title = etree_html.xpath('//*[@id="feed-main-list"]/li/div/div[2]/h5/a/text()')
        articles_url = etree_html.xpath('//*[@id="feed-main-list"]/li/div/div[2]/h5/a/@href')
        articles_top = etree_html.xpath('//*[@id="feed-main-list"]/li/div/div[2]/div[2]/div[2]/a[1]/span[2]/text()')
        num = 1
        for title, url, top in zip(articles_title, articles_url, articles_top):
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = title
            data_dict["url"] = url
            data_dict["top"] = top
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "smzdm_data_" + str(timestamp) + ".data"
        print(data)
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content



# 微博热榜
@app.route("/get_weibo_data")
def get_weibo_data():
    filename = "weibo_data_*.data"
    filename_re = "weibo_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "微博热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        headers = {
            'referer': 'https://smzdm.com/',
            'user-agent': random_user_agent()
        }
        res = requests.get("https://weibo.com/ajax/statuses/hot_band", headers=headers)
        html_dict = json.loads(res.text)
        for key in html_dict["data"]["band_list"]:
            try:
                data_dict = {}
                data_dict["index"] = key["realpos"]
                data_dict["title"] = key["word"]
                data_dict["url"] = "https://s.weibo.com/weibo?q=%23" + key["word"] + "%23"
                data_dict["hot"] = key["num"]
            except KeyError:
                continue
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "weibo_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content


# 百度贴吧热议榜
@app.route("/get_tieba_data")
def get_tieba_data():
    filename = "tieba_data_*.data"
    filename_re = "tieba_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "百度贴吧热议榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        import requests
        headers = {
            'referer': 'https://tieba.baidu.com/',
            'user-agent': random_user_agent()
        }
        res = requests.get("https://tieba.baidu.com/hottopic/browse/topicList?res_type=1", headers=headers)
        etree_html = etree.HTML(res.text)
        articles_title = etree_html.xpath('/html/body/div[2]/div/div[2]/div/div[2]/div[1]/ul/li/div/div/a/text()')
        articles_url = etree_html.xpath('/html/body/div[2]/div/div[2]/div/div[2]/div[1]/ul/li/div/div/a/@href')
        articles_top = etree_html.xpath('/html/body/div[2]/div/div[2]/div/div[2]/div[1]/ul/li/div/div/span[2]/text()')
        num = 1
        for title, url, top in zip(articles_title, articles_url, articles_top):
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = title
            data_dict["url"] = url
            data_dict["top"] = top.replace("实时讨论", "")
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "tieba_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content

# 微信热榜
# @app.route("/get_weixin_data")
# def get_weixin_data():
#     json_data = {}
#     data_list = []
#     json_data["secure"] = True
#     json_data["title"] = "微信热榜"
#     json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
#     import requests
#     headers = {
#         'referer': 'https://tophub.today/',
#         'user-agent': random_user_agent()
#     }
#     res = requests.get("https://tophub.today/n/WnBe01o371", headers=headers)
#     etree_html = etree.HTML(res.text)
#     articles_title = etree_html.xpath('//*[@id="page"]/div[2]/div[2]/div[1]/div[2]/div/div[1]/table/tbody/tr/td[2]/a/text()')
#     articles_url = etree_html.xpath('//*[@id="page"]/div[2]/div[2]/div[1]/div[2]/div/div[1]/table/tbody/tr/td[2]/a/@href')
#     articles_top = etree_html.xpath('//*[@id="page"]/div[2]/div[2]/div[1]/div[2]/div/div[1]/table/tbody/tr/td[3]/text()')
#     num = 1
#     for title, url, top in zip(articles_title, articles_url, articles_top):
#         data_dict = {}
#         data_dict["index"] = num
#         num += 1
#         data_dict["title"] = title
#         data_dict["url"] = "https://tophub.today" + url
#         data_dict["top"] = top
#         data_list.append(data_dict)
#     json_data["data"] = data_list
#     return json.dumps(json_data, ensure_ascii=False)



# 少数派热榜
@app.route("/get_ssp_data")
def get_ssp_data():
    filename = "ssp_data_*.data"
    filename_re = "ssp_data_(.*?).data"
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data = {}
        data_list = []
        json_data["secure"] = True
        json_data["title"] = "少数派热榜"
        json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
        headers = {
            'referer': 'https://sspai.com/',
            'user-agent': random_user_agent()
        }
        res = requests.get(
            "https://sspai.com/api/v1/article/tag/page/get?limit=100000&tag=%E7%83%AD%E9%97%A8%E6%96%87%E7%AB%A0",
            headers=headers)

        html_dict = json.loads(res.text)
        num = 1
        for key in html_dict["data"]:
            try:
                data_dict = {}
                data_dict["index"] = num
                num += 1
                data_dict["title"] = key["title"]
                data_dict["url"] = "https://sspai.com/post/" + str(key["id"])
                data_dict["top"] = key["like_count"]
            except KeyError:
                continue
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "ssp_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content

# 36Kr热榜
@app.route("/get_36k_data")
def html_get_36k_data():
    json_data = {}
    json_data["secure"] = True
    json_data["title"] = "36Kr热榜"
    json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
    data = [
        {"type": "renqi"},
        {"type": "shoucang"},
        {"type": "zonghe"}
    ]
    json_data["data"] = data
    return json.dumps(json_data, ensure_ascii=False)

@app.route("/get_36k_data/<type>")
def get_36k_data_type(type):
    if type == "renqi":
        return get_36k_data("renqi")
    elif type == "shoucang":
        return get_36k_data("shoucang")
    elif type == "zonghe":
        return get_36k_data("zonghe")
    else:
        json_data = {}
        json_data["secure"] = False
        json_data["title"] = "Error"
        return json.dumps(json_data, ensure_ascii=False)



# type =  renqi|  zonghe| shoucang
def get_36k_data(type):
    json_data = {}
    data_list = []
    json_data["update_time"] = now.strftime("%Y-%m-%d %H:%M:%S")
    filename = "36kr_" + type + "_data_*.data"
    filename_re = "36kr_" + type + "_data_(.*?).data"
    if type == "renqi":
        title = "36Kr人气榜"
    elif type == "zonghe":
        title = "36Kr综合榜"
    elif type == "shoucang":
        title = "36Kr收藏榜"
    else:
        json_data["secure"] = False
        json_data["title"] = "Error"
        return json.dumps(json_data, ensure_ascii=False)
    file_content = get_data(filename, filename_re)
    if file_content is None:
        json_data["secure"] = True
        json_data["title"] = title
        headers = {
            'referer': 'https://www.36kr.com/',
            'user-agent': random_user_agent()
        }
        url = "https://www.36kr.com/hot-list/" + type + "/" + now.strftime("%Y-%m-%d") + "/1"
        res = requests.get(url, headers=headers)
        etree_html = etree.HTML(res.text)
        articles_title = etree_html.xpath('//*[@id="app"]/div/div[2]/div[3]/div/div/div[2]/div[1]/div/div/div/div/div[2]/div[2]/a/text()')
        articles_url = etree_html.xpath('//*[@id="app"]/div/div[2]/div[3]/div/div/div[2]/div[1]/div/div/div/div/div[2]/div[2]/a/@href')
        articles_top = etree_html.xpath('//*[@id="app"]/div/div[2]/div[3]/div/div/div[2]/div[1]/div/div/div/div/div[2]/div[2]/div/span/span/text()')
        num = 1
        for title, url, top in zip(articles_title, articles_url, articles_top):
            data_dict = {}
            data_dict["index"] = num
            num += 1
            data_dict["title"] = title
            data_dict["url"] = "https://www.36kr.com" + url
            data_dict["top"] = top.replace("热度", "")
            data_list.append(data_dict)
        json_data["data"] = data_list
        data = json.dumps(json_data, ensure_ascii=False)
        filename = "36kr_" + type + "_data_" + str(timestamp) + ".data"
        write_file("./data/" + filename, data)
        return data
    else:
        return file_content


if __name__ == "__main__":
    app.run()

代码可以优化,我是知道的,重复内容挺多的,但是,但是,但是它能运行,且可以获取到我想要的内容,我暂时不想优化,不想优化,不想优化,代码很简单,相当于一个又一个的例子,你如果有点编程基础的话,应该不难看懂,我就不多解释了哈。

2023/04/01 1:32:12最终还是优化了,发现新增其他热榜有点不友好。

部署方法

wiuid/webra_hot_api: 自建热门网站的热榜 (github.com)

  1. 打开该地址
  2. 下载webra_top.zip包
  3. 将该压缩包放置在linux_x86_64的系统上
# 将下载下来的webra_top.zip 进行解压缩
unzip webra_top.zip
cd webra_top
# 赋予执行权限,init是用shell语句写的
chmod +x init

./init
Usage: ./init.sh {start|stop|restart|status}
# 启动
./init start
# 关闭
./init stop
# 重启
./init restart
# 查看状态
./init status

# 查看端口监听是否存在
ss -tanpl
State                 Recv-Q                Send-Q                               Local Address:Port                               Peer Address:Port               Process                                         
LISTEN                0                     128                                      127.0.0.1:5000                                    0.0.0.0:*                   users:(("top",pid=106379,fd=3))                

请确保你的环境有py3,这里不做过多介绍

下面一切内容在你是linux服务器上执行

# 找个位置拷贝代码
# 比如/root 这个路径下创建一个top文件夹
cd top
vim top.py
# 输入i进入编辑模式
# 将上面的代码一股脑粘贴进去
:wq # 保存并退出
# 在top目录下再创建一个data目录,用于数据缓存,同一时间的请求,只爬取一次各个平台的数据
mkdir data
# 在/root/top/路径下,去执行这个
python3 top.py

No Module Named XXX
# 这个时候应该会报没有xxx模块的错误
# 你就执行下面的命令
pip3 install XXX

# 重复几次后,涉及的全部模块应该就装的差不多了
# 能够正常执行后就可以执行最后一条命令了,用于后台运行
nohup python3 top.py &

# 默认5000端口,暂不支持修改

运行后,就可以去一为的后台设置自定义热榜了,如下图,其他的模仿着写吧

自建各大平台热榜API_v1.1.5

2023/04/14 10:32:12 不再建议使用该方法去部署,该方法部署后,会导致脚本频繁抓取站点信息,导致网站ip被该站点封禁,抓取不到数据

在你站点是使用宝塔的前提下
登录到宝塔后台界面,点击左侧软件商店–>应用搜索并安装以下两个软件,版本选择最新即可

  • Python项目管理器
  • 进程守护管理器

在软件商店的已安装界面点
击Python项目管理器的设置

  1. 点击版本管理,选择安装默认显示的python版本即可,点击安装版本,等待安装完成
  2. 在后台找个目录用于存放top代码
    1. 比如/root/top目录,在该目录下创建一个data目录
    2. 将热榜api的代码粘贴进top目录下的top.py文件
    3. 将以下内容粘贴进top目录下的requirements.txt文件中
  3. 宝塔软件商店的已安装界面,点击Python项目管理器的项目管理
  4. 点击添加项目,按照下图进行选择
  5. 点击确定后就可以对接口进行访问了,访问形式不会因部署方法的改变而改变
beautifulsoup4==4.12.0
bs4==0.0.1
certifi==2022.12.7
charset-normalizer==2.0.12
click==8.0.4
dataclasses==0.8
Flask==2.0.3
idna==3.4
importlib-metadata==4.8.3
itsdangerous==2.0.1
Jinja2==3.0.3
lxml==4.9.2
MarkupSafe==2.0.1
numpy==1.19.5
pandas==1.1.5
python-dateutil==2.8.2
pytz==2023.3
requests==2.27.1
six==1.16.0
soupsieve==2.3.2.post1
typing-extensions==4.1.1
urllib3==1.26.15
Werkzeug==2.0.3
zipp==3.6.0
自建各大平台热榜API_v1.1.5
© 版权声明

相关文章

17 条评论

  • mrchen 作者

    没有一为导航怎么测试调用啊

    回复
    • 一为导航的后台设置中有主题自带的自定义热源,可以自己配置接口进行热榜或者其他数据展示。自定义的脚本搭建方法,可以见本文章,详细的后台操作可以加qq群或者微信私聊?

      回复
      • mrchen 作者

        加你了 同意下

        回复
  • xiaoruan 作者

    图片寄了啊 图片(i.imgur.com/qg3rmss.png) 图片(i.imgur.com/fjiqnxg.png)
    这也不知道下一步怎么做啊

    回复
    • 目前修复了,有看到吗

      回复
  • kylin 作者

    感谢站长,今天才看到(12.10),进入讯代理页面。发现正好提示停止运营了。。

    回复
    • 没有啊,我用了好久了 这个代{1}理,怎么会停止运营呢?或者加入水友群(侧边栏最下方)、网页最底端有我的微信二维码,可以唠唠嗑

      回复
      • kylin 作者

        好的,你进官网看一下,那个官网上面写的

        回复
        • 淦= =我刚看见 我的眼啊 瞎了 我去找找其他代理,不然52论坛搞不了

          回复
  • 阿星 游客

    大神可以集合到rsshub上面,而且它已经自带来知乎、微博等榜单

    回复
    • 一直对rss不太了解,所以也是第一次听rsshub这个东西

      回复
    • 我大概看了下,虽然处理方法简单了,但是不能直接用于热榜榜单,还是需要二次加工才可以用,我这个基本框架已经写出来了,再增加新的榜单也容易

      回复
      • 阿星 游客

        哈哈哈,感谢大神回复,我也是个小白,那就用大神的

        回复
  • 11 游客

    学习一下,谢谢分享

    回复
  • xibel 投稿者

    试试看看被

    回复
  • wp 游客

    为什么都是 7

    回复
    • 好像是后台没有获取到,或者是你配置的有问题,具体可以加群或者私聊我

      回复