问题标签 [sequencefile]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

159 问题

0 投票

0 回答

60 浏览

hadoop - 将空格分隔的文件（每行 = 向量）转换为 SequenceFile

我创建了如下的大文本文件（4 GB）。

每行描述一个向量，每列表示向量的每个元素。每个元素由一个空格分隔。

现在，我想使用 Apache Mahout 对所有向量执行 K-Means 聚类，但我收到了错误"not a SequenceFile"。

如何创建格式符合 mahout 要求的文件？

2014-08-19T12:52:29.950

0 投票

4 回答

7777 浏览

hadoop - 如何将 -text HDFS 命令的输出复制到另一个文件中？

有什么方法可以使用 HDFS 命令将 hdfs 文件的文本内容复制到另一个文件系统中：

我可以使用 -cat 或任何方法将 -text 的输出打印到另一个文件中吗？：

hadoop hdfs sequencefile

2014-08-22T04:24:29.003

0 投票

0 回答

87 浏览

hadoop - SequenceFiles and Hadoop streaming

I have a use case wherein I use hadoop streaming to run an executable as map process. In the input side, I have large number of sequence files. Each seq file has, says 8 keys and corresponding values which are list of float arrays. Instead of letting one map process to process one seq file, I prefer to allocate a group of seq files to one map process. Hence, I decided to merge all those seq files into one large file. Assume this big seq file is made up of 50,000 small seq files.

Now, is it possible to configure my hadoop streaming utility to allocate a portion of seq file to each map process?
How to make each map process gets the list of file names that they need to process? How can I retrieve these information in my map executable? The executable is plain groovy script designed to process stdin. In such cases, how my stdin will look like (how to determine key/value pairs, and what will be their contents) Or, since I have merged sequence files they become one big file and lost their individual identities which means that I cannot have their filenames and I need to play with bunch of sequence files' key/values?
I think, this big seq file will have key / value where key is filename and value is the contents of that file which in turn contains 8 keys and corresponding values? If this is the case, when hadoop splits this big file depending on the number of maps possible (lets say 10 map possible in my cluster), each map would get around 5000 keys and corresponding values? Then, in my map exec, how can I access these keys and values?

Any hint will greatly help

hadoop hadoop-streaming sequencefile

2014-08-24T13:47:29.747

0 投票

2 回答

1608 浏览

hadoop - 附加到现有的序列文件

在我的用例中，我需要一种将键/值对附加到现有序列文件的方法。怎么做？任何线索都会有很大帮助。我正在使用 hadoop 2x。

另外，我遇到了以下文档。谁能告诉我如何使用它来追加？

public static org.apache.hadoop.io.SequenceFile.Writer createWriter(FileContext fc, Configuration conf, Path name, Class keyClass, Class valClass, org.apache.hadoop.io.SequenceFile.CompressionType compressionType, CompressionCodec codec, org.apache. hadoop.io.SequenceFile.Metadata 元数据，EnumSet createFlag，org.apache.hadoop.fs.Options.CreateOpts...opts) 抛出 IOException

hadoop sequencefile

2014-09-15T10:09:19.390

0 投票

1 回答

333 浏览

java - 如何将hadoop序列文件值更改为杰克逊解析器？

我有一个问题，我真的不知道该怎么办。我有一个包含网页链接的 Hadoop 序列文件。Hadoop 序列文件的每个条目，键是一个网页的 URL，值是它的属性和链接。该值实际上是 Json 格式。我想读取所有序列文件并将值传递给杰克逊解析器以获取链接，但它总是失败。这是我的代码：

文件“metadata-00000”是原始的 Hadoop 序列文件。如您所见，该值实际上是 json 格式，我想在 Jackson 解析器中对其进行分析。但是，这条线总是失败：

例外是：

那么我应该如何处理呢？如何将 Writable 值传输到 json 解析器？谢谢！

java json hadoop jackson sequencefile

2014-09-19T15:34:07.023

0 投票

1 回答

293 浏览

java - I'm in trouble in K-Means using Mapreduce (modified)

I think my code is not wrong but, it doesn't work correctly. This is K-means clustering using mapreduce. (https://github.com/30stm/K-Means-using-mapreduce/tree/master)

Make a dataset using DatasetWriter.java, and make centroids using CreateCentroids.java. Then, excute KMeansClusteringJob.java

This code works at the first iteration, but It doesn't work from second iteration. I checked map function and reduce function, I think the problem is reduce function. (Map function finds closest centroid from each point. Reduce function calculate new centroid and replace the new one.) After first iteration, cen.seq (centroid file) is imperfect.

Somebody help me ;)

p.s : I wrote a question about reduce code, my original problem is this one.

java mapreduce k-means sequencefile

2014-10-27T14:46:10.927

0 投票

1 回答

740 浏览