您好,我正在编写一个 python 脚本来生成网页的每月和每日访问次数。输入文件:
ArticleName Date        Hour    Count/Visit
Aa   20130601    10000   1
Aa   20130601    10000   1
Ew   20130601    10000   1
H    20130601    10000   2
H    20130602    10000   1
R    20130601    20000   2
R    20130602    10000   1
Ra   20130601    0   1
Ra   20130601    10000   2
Ra   20130602    10000   1
Ram  20130601    0   2
Ram  20130601    10000   3
Ram  20130602    10000   4
Re   20130601    20000   1
Re   20130602    10000   3
Rz   20130602    10000   1
我需要计算每个页面的每月和每日总浏览量。
输出:
ArticleName Date     DailyView MonthlyView
Aa   20130601 2 2
Ew   20130601 1 1
H    20130601 2 2
H    20130602 1 3
R    20130601 2 2
R    20130602 1 4
Ra   20130601 5 5
Ra   20130602 1 6
Ram  20130601 5 5
Ram  20130602 4 9
Re   20130601 1 1
Re   20130602 3 4
Rz   20130602 1 1
我的脚本:
#!/usr/bin/python
import sys
last_date = 20130601
last_hour = 0
last_count = 0
last_article = None
monthly_count = 0
daily_count = 0
for line in sys.stdin:
  article, date, hour, count = line.split()
  count = int(count)
  date = int(date)
  hour = int(hour)
  #Articles match and date match
  if last_article == article and last_date == date:
      daily_count = count+last_count
      monthly_count = count+last_count
      # print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
  #Article match but date doesn't match 
  if last_article == article and last_date != date:
          monthly_count = count
          daily_count=count
          print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
  #Article doesn't match
  if last_article != article:
          last_article = article
          last_count = count
          monthly_count = count
          daily_count=count
          last_date = date
          print '%s\t%s\t%s\t%s' % (article, date, daily_count, monthly_count)
我能够获得大部分输出,但我的输出在以下两种情况下是错误的: 1. 如果 ArticleName 和 ArticleDate 相同,则无法获得总结 ArticleName 的方法。例如,此脚本给出 Ra 行的输出: Ra 20130601 1 1 Ra 20130601 3 3 Ra 20130602 1 1 所以最后 Ra 应该打印 1+3+1=5 作为最终的每月总计数而不是 1。
- 由于我在第三个 if 条件下显示所有不等于上一篇文章的文章,因此我两次获得具有相同文章名称和日期的文章的值。像:Ra 20130601 1 1不应该被打印出来。有人知道如何纠正这个吗?如果您需要更多信息,请告诉我。