使用Rust爬取Billboard

使用Rust爬取Billboard

使用Rust爬取Billboard Rust爬虫实践,爬取Billboard上前20歌曲,并转为json格式 用到的crate: reqwest (请求库) scraper (html解析库) tokio(异步库) serde_json, serde_derive, serde(feature de

使用Rust爬取Billboard

Rust爬虫实践,爬取Billboard上前20歌曲,并转为json格式

用到的crate:

  • reqwest (请求库)

  • scraper (html解析库)

  • tokio(异步库)

  • serde_json, serde_derive, serde(feature derive) (json库)

基本结构体

派生Debug接口方便调试

派生Serialize自动实现序列化

#[derive(Debug, Serialize)]
pub struct BillboardSong {
    rank: usize, // 排名
    name: String,
    author: String,
    cover_url: String //封面URL
}

#[derive(Debug, Serialize)]
pub struct BillboardData {
    songs: Vec<BillboardSong>
}

获取网页内容

let resp = reqwest::get("https://www.billboard-japan.com/charts/detail?a=niconico")
        .await?
        .text()
        .await?;

这里函数返回值必须是个Result,因为用到问号操作符自动处理Err

reqwest的get函数是个异步操作,要通过异步获取值

所以要用tokio库

解析html

let fragment = Html::parse_document(&resp);
let selector = Selector::parse("tbody > tr").unwrap();

// 获取列表段的HTML
let result = fragment.select(&selector);

let mut songs: Vec<BillboardSong> = vec![];

for element in result {

    let rank_selector = Selector::parse(".rank_detail > .rank").unwrap();
    let name_selector = Selector::parse(".name_detail > .musuc_title").unwrap();
    let author_selector = Selector::parse(".name_detail > .artist_name").unwrap();
    let cover_selector = Selector::parse("img").unwrap();

    // 不要问我为什么重新解析
    let table = Html::parse_fragment(element.html().as_str());

    // 这么多unwrap和iter真的是地狱
    let rank = table.select(&rank_selector).next().unwrap().text().next().unwrap().trim();
    let name = table.select(&name_selector).next().unwrap().text().next().unwrap().trim();
    let author = table.select(&author_selector).next().unwrap().text().next().unwrap().trim();
    let cover_url = table.select(&cover_selector).next().unwrap().value().attr("src").unwrap();
        
    songs.push(BillboardSong {
        rank: rank.parse::<usize>().unwrap(),
        name: name.to_string(),
        author: author.to_string(),
        cover_url: "https://www.billboard-japan.com".to_string()+&cover_url
    });

}

Ok(BillboardData { songs })

测试一下输出:

let result = billboard::get_top20_vocaloid_song().await?;

println!("{}", serde_json::to_string(&result).unwrap());

结果:

{
    "songs": [
        {
            "rank": 1,
            "name": "きゅうくらりん",
            "author": "いよわ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000182/202x_image182966.jpg"
        },
        {
            "rank": 2,
            "name": "カゲロウデイズ",
            "author": "じん",
            "cover_url": "https://www.billboard-japan.com/scale/jackets/00000095/202x_P2_G6285292W.JPG"
        },
        {
            "rank": 3,
            "name": "強風オールバック",
            "author": "ゆこぴ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000199/202x_image199248.jpg"
        },
        {
            "rank": 4,
            "name": "リアライズ",
            "author": "柊マグネタイト",
            "cover_url": "https://www.billboard-japan.com/scale/common/202x_img_noimage.png"
        },
        {
            "rank": 5,
            "name": "ラ ビットホール",
            "author": "DECO*27",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000201/202x_image201119.jpg"
        },
        {
            "rank": 6,
            "name": "少女レ イ",
            "author": "みきとP",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000139/202x_image139321.jpg"
        },
        {
            "rank": 7,
            "name": "リレイアウター",
            "author": "稲葉曇",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000204/202x_image204160.jpg"
        },
        {
            "rank": 8,
            "name": "アポカリプスなう",
            "author": "ピノキオピー",
            "cover_url": "https://www.billboard-japan.com/scale/common/202x_img_noimage.png"
        },
        {
            "rank": 9,
            "name": "寝起きヤシの木",
            "author": "ゆこぴ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000203/202x_image203866.jpg"
        },
        {
            "rank": 10,
            "name": "人マニア",
            "author": "原口沙輔",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000204/202x_image204459.jpg"
        },
        {
            "rank": 11,
            "name": "酔いどれ知らず",
            "author": "Kanaria",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000186/202x_image186267.jpg"
        },
        {
            "rank": 12,
            "name": "フォニイ",
            "author": "ツミキ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000175/202x_image175158.jpg"
        },
        {
            "rank": 13,
            "name": "サヨナラ!!ウイルス",
            "author": "キュー",
            "cover_url": "https://www.billboard-japan.com/scale/common/202x_img_noimage.png"
        },
        {
            "rank": 14,
            "name": "マーシャル・マキシマイザー",
            "author": "柊マグネタイト",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000184/202x_image184890.jpg"
        },
        {
            "rank": 15,
            "name": "ライアーダンサー",
            "author": "マサラダ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000202/202x_image202193.jpg"
        },
        {
            "rank": 16,
            "name": "神っぽいな",
            "author": "ピノキオピー",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000176/202x_image176594.jpg"
        },
        {
            "rank": 17,
            "name": "ネ 土 会 ェ 貝 南 犬 ☆ カ ゞ ん I よ ″ る ノ  D A !!。",
            "author": "ぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬ ぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬぬ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000204/202x_image204173.jpg"
        },
        {
            "rank": 18,
            "name": "ラヴィ(Lavie)",
            "author": "すりぃ",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000194/202x_image194878.jpg"
        },
        {
            "rank": 19,
            "name": "おどロボ",
            "author": "海茶",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000204/202x_image204163.jpg"
        },
        {
            "rank": 20,
            "name": "ずんだパーリナイ",
            "author": "なみぐる",
            "cover_url": "https://www.billboard-japan.com/scale/rankinfo/00000198/202x_image198689.jpg"
        }
    ]
}

LICENSED UNDER CC BY-NC-SA 4.0
Comment