自动化脚本

爬虫

好久没有写过博客了，最近有感觉，很多时候去钻一个点，会漏掉很多有趣的其他方面的东西，我相信这大概不是我愿意丢弃的，所以适当的脱离一下还是有好处的

爬虫，作为一项很有趣的，满足自己各种xxx的技术，当然不能放过，刚好在知乎刷到了相关的内容，这周就来盘一盘（希望能顺利写完 5.11 19：43

puppeteer

首先这个库和我这个乡巴佬所认知的爬虫就不太一样，这是个JavaScript库（啊已经不是python了吗），因此需要用到node.js，官方可以安装Node.js — 在任何地方运行 JavaScript

shell

node -v

应该输出版本号，验证node.js正确安装

选择一个文件夹，输入命令

shell

npm install puppeteer

然后就会自动生成node_modules，注意要在你的工作目录中做这步操作

然后你就可以开始玩puppeteer啦

const puppeteer = require('puppeteer')
const browser = await puppeteer.launch()
const page = await browser.newPage()

// Navigate the page to a URL.
await page.goto('https://developer.chrome.com/')

// Set screen size.
await page.setViewport({ width: 1080, height: 1024 })

// Type into search box using accessible input name.
await page.locator('aria/Search').fill('automate beyond recorder')

// Wait and click on first result.
await page.locator('.devsite-result-item-link').click()

// Locate the full title with a unique string.
const textSelector = await page.locator('text/Customize and automate').waitHandle()
const fullTitle = await textSelector?.evaluate((el) => el.textContent)

// Print the full title.
console.log('The title of this blog post is "%s".', fullTitle)

await browser.close()

就按着语法去玩就好了

于是就可以浅显的实战了

javascript

const puppeteer = require('puppeteer')

;(async () => {
  const browser = await puppeteer.launch()
  const page = await browser.newPage()
  page.setViewport({ width: 1920, height: 1080 })

  // networkidle0 代表所有网络请求都完成了
  await page.goto('https:xxx', { waitUntil: 'networkidle0' })

  const category = await page.$('#category-0')
  const title = await page.evaluate((el) => {
    if (el) {
      return el.innerHTML
    }
  }, category)

  console.log(title)

  await page.screenshot({ path: 'xxx.png' })
  await browser.close()
})()

代码来自https://kirigaya.cn

爬虫

puppeteer

playwright