这实际上是我几天前提出的问题的延续。我采用了应用函子路线并制作了自己的实例。
我需要逐行解析文件中的大量 json 语句。一个示例 json 语句是这样的 -
{"question_text": "How can NBC defend tape delaying the Olympics when everyone has
Twitter?", "context_topic": {"followers": 21, "name": "NBC Coverage of the London
Olympics (July & August 2012)"}, "topics": [{"followers": 2705,
"name": "NBC"},{"followers": 21, "name": "NBC Coverage of the London
Olympics (July & August 2012)"},
{"followers": 17828, "name": "Olympic Games"},
{"followers": 11955, "name": "2012 Summer Olympics in London"}],
"question_key": "AAEAABORnPCiXO94q0oSDqfCuMJ2jh0ThsH2dHy4ATgigZ5J",
"__ans__": true, "anonymous": false}
对 json 格式感到抱歉。变坏了
我有大约 10000 个这样的 json 语句,我需要解析它们。我写的代码是这样的 -
parseToRecord :: B.ByteString -> Question
parseToRecord bstr = (\(Ok x) -> x) decodedObj where decodedObj = decode (B.unpack bstr) :: Result Question
main :: IO()
main = do
-- my first line in the file tells how many json statements
-- are there followed by a lot of other irrelevant info...
ts <- B.getContents >>= return . fst . fromJust . B.readInteger . head . B.lines
json_text <- B.getContents >>= return . tail . B.lines
let training_data = take (fromIntegral ts) json_text
let questions = map parseToRecord training_data
print $ questions !! 8922
这段代码给了我一个运行时错误Non-exhaustive patterns in lambda
。代码中引用的错误\(Ok x) -> x
。通过反复试验,我得出的结论是,该程序在第 8921 个索引之前工作正常,但在第 8922 个迭代时失败。
我检查了相应的 json 语句,并尝试通过调用它的函数来独立解析它,它可以工作。但是,当我调用map时它不起作用。我真的不明白发生了什么。在“学习haskell for a great good”中学习了一点haskell后,我想潜入一个真实世界的编程项目,但似乎被困在这里。
编辑::完整代码如下
{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 -optc-O2 #-}
{-# OPTIONS_GHC -fno-warn-incomplete-uni-patterns #-}
import qualified Data.ByteString.Lazy.Char8 as B
import Data.Maybe
import NLP.Tokenize
import Control.Applicative
import Control.Monad
import Text.JSON
data Topic = Topic
{ followers :: Integer,
name :: String
} deriving (Show)
data Question = Question
{ question_text :: String,
context_topic :: Topic,
topics :: [Topic],
question_key :: String,
__ans__ :: Bool,
anonymous :: Bool
} deriving (Show)
(!) :: (JSON a) => JSObject JSValue -> String -> Result a
(!) = flip valFromObj
instance JSON Topic where
-- Keep the compiler quiet
showJSON = undefined
readJSON (JSObject obj) =
Topic <$>
obj ! "followers" <*>
obj ! "name"
readJSON _ = mzero
instance JSON Question where
-- Keep the compiler quiet
showJSON = undefined
readJSON (JSObject obj) =
Question <$>
obj ! "question_text" <*>
obj ! "context_topic" <*>
obj ! "topics" <*>
obj ! "question_key" <*>
obj ! "__ans__" <*>
obj ! "anonymous"
readJSON _ = mzero
isAnswered (Question _ _ _ _ status _) = status
isAnonymous (Question _ _ _ _ _ status) = status
parseToRecord :: B.ByteString -> Question
parseToRecord bstr = handle decodedObj
where handle (Ok k) = k
handle (Error e) = error (e ++ "\n" ++ show bstr)
decodedObj = decode (B.unpack bstr) :: Result Question
--parseToRecord bstr = (\(Ok x) -> x) decodedObj where decodedObj = decode (B.unpack bstr) :: Result Question
main :: IO()
main = do
ts <- B.getContents >>= return . fst . fromJust . B.readInteger . head . B.lines
json_text <- B.getContents >>= return . tail . B.lines
let training_data = take (fromIntegral ts) json_text
let questions = map parseToRecord training_data
let correlation = foldr (\x acc -> if (isAnonymous x == isAnswered x) then (fst acc + 1, snd acc + 1) else (fst acc, snd acc + 1)) (0,0) questions
print $ fst correlation
这是可以作为可执行文件输入的数据。我正在使用 ghc 7.6.3。如果程序名称是 ans.hs,我按照这些步骤操作。
$ ghc --make ans.hs
$ ./ans < path/to/the/file/sample/answered_data_10k.in
多谢!