0

我想将几个小的 bzip2 文件组合成一个序列文件。我看到了一个创建序列文件的代码并尝试了它。但它给出了如下奇怪的输出。这是因为它无法读取 bzip2 文件吗?

SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text  

�*org.apache.hadoop.io.compress.DefaultCodec����gWŒ‚ÊO≈îbº¡vœÖ��� ���
    .DS_StorexúÌò±
  ¬0EÔ4.S∫a�6∞¢0P∞=0ì·‡/d)ÄDï˛ì¨w≈ù7÷ùØ›⁄ÖüO;≥X¬`’∂µóÆ Æâ¡=Ñ   B±lP6Û˛ÜbÅå˜C¢3}ª‘�Lp¥oä"ùËL?jK�&:⁄”Åét¢3]Î
º∑¿˘¸68§ÄÉùø:µ√™*é-¿fifi>!~¯·0Ùˆú ¶   eõ¯c‡ÍÉa◊':”ÍÑòù;I1•�∂©���00.json.bz2xúL\gWTK∞%
,Y
ä( HJFêúsŒ\PrRrŒ9ÁCŒ9√0ÃZUÏÌÊΩÔ≤Ù‚Ãô”’UªvÌÍÓ3£oˆä2ä<˝”-”ãȧπË/d;u¥Û£üV;ÀÒÛ¯Ú˜ˇ˚…≥2¢5Í0‰˝8M⁄,S¸¢`f•†`O<ëüD£≈tÃ¥ó`•´D˚~aº˝«õ˜v'≠)(F|§fiÆÕ ?y¬àœTÒÊYåb…U%E?⁄§efiWˇÒY#üÛÓÓ‚
⁄è„ÍåÚÊU5‡  æ‚Â?q‘°�À{©?íWyü÷ÈûF<[˘éŒhãd>x_ÅÁ
fiÒ_eâ5-—|-M)˙)¸R·ªCÆßs„F>UŒ©ß{o„uÔ&∫˚˚Ÿ?Ä©ßW,”◊Ê∫â«õxã¸[yûgÈñFmx|‡ªÍ¶”¶‡Óp-∆ú§ı
<JN t «F4™@Àä¥Jœ¥‰√|E„‘œ„&amp;º§@g|ˆá{iõOx

代码是

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.util.GenericOptionsParser;

public class cinput {

    /**
     * @param args
     * @throws IOException
     * @throws IllegalAccessException
     * @throws InstantiationException
     */
    public static void main(String[] args) throws IOException,
            InstantiationException, IllegalAccessException {
        // TODO Auto-generated method stub

        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get(conf);
        String[] otherArgs = new GenericOptionsParser(conf, args)
        .getRemainingArgs();
        Path inputFile = new Path(otherArgs[0]);
        Path outputFile = new Path(otherArgs[1]);
        FSDataInputStream inputStream;
        Text key = new Text();
        Text value = new Text();
        SequenceFile.Writer writer = SequenceFile.createWriter(fs, conf,
                outputFile, key.getClass(), value.getClass());
        FileStatus[] fStatus = fs.listStatus(inputFile);

        for (FileStatus fst : fStatus) {
            String str = "";
            System.out.println("Processing file : " + fst.getPath().getName() + " and the size is : " + fst.getPath().getName().length());
            inputStream = fs.open(fst.getPath());
            key.set(fst.getPath().getName());
            while(inputStream.available()>0) {
                str = str+inputStream.readLine();
               // System.out.println(str);
            }
            value.set(str);
            writer.append(key, value);

        }
        fs.close();
        IOUtils.closeStream(writer);
        System.out.println("SEQUENCE FILE CREATED SUCCESSFULLY........");
    }
}

我传递的输入是 Json.bzip2 文件。有人可以指出为什么我得到奇怪的输出。

4

0 回答 0