1.流程介绍
2.页面分析
3. 爬虫脚本
3.1 编写结构体
代码文件位置:service/crawl/fund/top_crawl.go
type fundItem struct { FundCode string `selector:"td:nth-of-type(1)"` FundName string `selector:"td:nth-of-type(2)"` NetWorth string `selector:"td:nth-of-type(3) > span.fb"` TopDate string `selector:"td:nth-of-type(3) > span.date"` DayChange string `selector:"td:nth-of-type(4)"` WeekChange string `selector:"td:nth-of-type(5)"` MouthChange string `selector:"td:nth-of-type(6)"` ThreeMouthChange string `selector:"td:nth-of-type(7)"` SixMouthChange string `selector:"td:nth-of-type(8)"` YearChange string `selector:"td:nth-of-type(9)"` TwoYearChange string `selector:"td:nth-of-type(10)"` ThreeYearChange string `selector:"td:nth-of-type(11)"` CurrentChange string `selector:"td:nth-of-type(12)"` CreateChange string `selector:"td:nth-of-type(13)"` } type TopCrawlService struct { Item []*fundItem `selector:"tr"` }
|
td:nth-of-type(n)
代表每行的第n
列,@注意:这里的NetWorth和TopDate,取的都是第3列的数据
3.2 爬取网页
代码文件位置:service/crawl/fund/top_crawl.go
func (f *TopCrawlService) CrawlHtml() { collector := colly.NewCollector( colly.UserAgent(crawl.UserAgent), ) collector.OnRequest(func(request *colly.Request) { request.Headers.Set("Accept-Language", "zh-CN,zh;q=0.9,en;q=0.8,zh-TW;q=0.7") }) collector.OnError(func(response *colly.Response, err error) { global.GvaLogger.Sugar().Errorf("基金排行榜,信息获取失败: %s", err) return }) collector.OnHTML("#tblite_hh", func(element *colly.HTMLElement) { err := element.Unmarshal(f) if err != nil { fmt.Println("element.Unmarshal error: ", err) } }) collector.OnResponse(func(response *colly.Response) { newBody := strings.ReplaceAll(string(response.Body), "%", "") response.Body = []byte(newBody) }) err := collector.Visit("https://fundact.eastmoney.com/banner/hh.html") if err != nil { global.GvaLogger.Sugar().Errorf("基金排行榜爬取失败: %s", err) } }
|
3.3 数据清洗
代码文件位置:service/crawl/fund/top_crawl.go
func (f *TopCrawlService) ConvertEntity() []entity.FundDayTop { var topList []entity.FundDayTop for _, item := range f.Item { if item.FundCode == "" { continue } fundTmp := entity.FundDayTop{} fundTmp.FundCode = item.FundCode format := time.Now().Format("2006") fundTmp.TopDate = fmt.Sprintf("%s-%s", format, item.TopDate) fundTmp.FundName, _ = utils.GbkToUtf8(item.FundName) fundTmp.NetWorth, _ = strconv.ParseFloat(item.NetWorth, 64) fundTmp.DayChange, _ = strconv.ParseFloat(item.DayChange, 64) fundTmp.WeekChange, _ = strconv.ParseFloat(item.WeekChange, 64) fundTmp.MouthChange, _ = strconv.ParseFloat(item.MouthChange, 64) fundTmp.ThreeMouthChange, _ = strconv.ParseFloat(item.ThreeMouthChange, 64) fundTmp.SixMouthChange, _ = strconv.ParseFloat(item.SixMouthChange, 64) fundTmp.YearChange, _ = strconv.ParseFloat(item.YearChange, 64) fundTmp.TwoYearChange, _ = strconv.ParseFloat(item.TwoYearChange, 64) fundTmp.ThreeYearChange, _ = strconv.ParseFloat(item.ThreeYearChange, 64) fundTmp.CurrentChange, _ = strconv.ParseFloat(item.CurrentChange, 64) fundTmp.CreateChange, _ = strconv.ParseFloat(item.CreateChange, 64) topList = append(topList, fundTmp) } return topList }
|
4. 定时任务
4.1 实现cron.Job
接口
代码文件位置:crontab/fund_top_cron.go
type FundTopCron struct {} func (c FundTopCron) Run() { fmt.Println("基金排行榜-定时任务准备运行....") f := &fund.TopCrawlService{} f.CrawlHtml() fundDayTopList := f.ConvertEntity() if !f.ExistTopDate() { result := global.GvaMysqlClient.Create(fundDayTopList) if result.Error != nil { global.GvaLogger.Sugar().Errorf("本次任务保存数据失败:%条",result.Error) return } global.GvaLogger.Sugar().Infof("本次任务保存数据:%条",result.RowsAffected) return } global.GvaLogger.Sugar().Info("任务运行成功,无数据要保存!") fmt.Println("基金排行榜-定时任务运行结束!") }
|
4.2 注册任务
代码文件位置:initialize/cron.go
func addJob(c *cron.Cron) { ... _, _ = c.AddJob("@every 10s", crontab.FundTopCron{}) }
|
5. 效果展示
5.1 运行项目
... [GIN-debug] GET /demo/es/create --> 52lu/fund-analye-system/api/demo.CreateIndex (3 handlers) [GIN-debug] GET /demo/es/searchById --> 52lu/fund-analye-system/api/demo.SearchById (3 handlers)
【 当前环境: dev 当前版本: v1.0.0 接口地址: http://0.0.0.0:8088 】 基金排行榜-定时任务准备运行.... 基金排行榜-定时任务运行结束!
|
5.2 数据表
关注公众号【猿码记】,回复【基金】获取源码地址。