Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
据我了解,Spark 使用并行 IO 读取文件。该结论来自其他堆栈溢出响应。
我的问题是使用独立方法还是集体方法来触发读取数据?换句话说,是每个工作人员读取一组数据,还是工作人员相互通信并协作以有效地读取数据?
Each Apache Spark workers has Executors, Workers can be deployed as distributed or standalone mode. Each Worker process its own data that it processes. For more detail see this answer or this link
工作人员通过驱动程序进行通信每个工作人员处理自己的数据