我UDP_file.txt
包含:
2014-03-02 07:59:37;source-address=123.235.78.125 source-port=1780
2014-03-02 07:59:37;source-address=123.235.132.181 source-port=56399
2014-03-02 07:59:37;source-address=123.234.141.253 source-port=49170
2014-03-02 07:59:37;source-address=123.234.104.225 source-port=39123
2014-03-02 07:59:37;source-address=123.234.104.225 fake-port=0000
我需要做的是:
- 加载文件,
- 正则表达式它,
- 行比匹配模式保存在文件'good_records.txt'中,
- 与模式不匹配的行保存在文件“bad_records.txt”中
.
val file_in = sc.textFile("UPD_file.txt")
val FullName = """(^.{19}).+source-address=([^"]+) source-port=([^"]+)""".r
当我在一行上测试模式时,它可以工作:
scala> val FullName(ip,sa,sp) = "2014-03-02 07:59:37;source-address=10.114.104.225 source-port=3912
ip: String = 2014-03-02 07:59:37
sa: String = 10.114.104.225
sp: String = 39123
或者
scala> "2014-03-02 07:59:37;source-address=10.115.78.125 source-port=1780" match { case FullName(ip,sa,sp) }
(2014-03-02 07:59:37,10.115.78.125,1780)
但我不知道如何在加载文件的每一行上使用它。
file_in.AndWhatNow?
你能帮我吗?我将不胜感激任何建议。
帕维尔