24

为了在这里提供一些背景信息,我已经关注项目织机一段时间了。我已经阅读了织机的状态。我做过异步编程。

异步编程(由 java nio 提供)在任务等待时将线程返回到线程池,并且竭尽全力不阻塞线程。这带来了很大的性能提升,我们现在可以处理更多请求,因为它们不受操作系统线程数量的直接约束。但是我们在这里失去的是上下文。同一个任务现在不只与一个线程相关联。一旦我们将任务与线程分离,所有的上下文都会丢失。异常跟踪没有提供非常有用的信息,调试也很困难。

随着项目的出现,virtual threads它成为了单一的并发单元。现在您可以在单个virtual thread.

到目前为止一切都很好,但文章继续说明,项目织机:

一个简单的同步 Web 服务器将能够处理更多请求,而无需更多硬件。

我不明白我们如何通过异步 API 获得项目织机的性能优势?asynchrounous APIs确保不要让任何线程空闲。那么,project loom 做了什么来使其比asynchronousAPI 更高效和高性能呢?

编辑

让我重新表述这个问题。假设我们有一个 http 服务器,它接收请求并使用支持的持久数据库执行一些 crud 操作。比如说,这个 http 服务器处理了很多请求 - 100K RPM。两种实现方式:

  1. HTTP 服务器有一个专用的线程池。当一个请求进来时,一个线程将任务向上传送直到它到达DB,其中任务必须等待来自DB的响应。此时,线程返回线程池并继续执行其他任务。当 DB 响应时,它再次由线程池中的某个线程处理并返回 HTTP 响应。
  2. HTTP 服务器只是virtual threads为每个请求生成。如果有 IO,虚拟线程只是等待任务完成。然后返回 HTTP 响应。基本上,没有针对virtual threads.

鉴于硬件和吞吐量保持不变,任何一种解决方案在响应时间或处理更多吞吐量方面是否会比另一种解决方案更好?

我的猜测是性能不会有任何差异。

4

3 回答 3

15

我们没有从异步 API 中获益。我们可能会获得类似于异步的性能,但使用同步代码。

于 2020-08-12T06:34:39.337 回答
6

@talex的回答很明确。进一步添加它。

Loom is more about a native concurrency abstraction, which additionally helps one write asynchronous code. Given its a VM level abstraction, rather than just code level (like what we have been doing till now with CompletableFuture etc), It lets one implement asynchronous behavior but with reduce boiler plate.

With Loom, a more powerful abstraction is the savior. We have seen this repeatedly on how abstraction with syntactic sugar, makes one effectively write programs. Whether it was FunctionalInterfaces in JDK8, for-comprehensions in Scala.

With loom, there isn't a need to chain multiple CompletableFuture's (to save on resources). But one can write the code synchronously. And with each blocking operation encountered (ReentrantLock, i/o, JDBC calls), the virtual-thread gets parked. And because these are light-weight threads, the context switch is way-cheaper, distinguishing itself from kernel-threads.

When blocked, the actual carrier-thread (that was running the run-body of the virtual thread), gets engaged for executing some other virtual-thread's run. So effectively, the carrier-thread is not sitting idle but executing some other work. And comes back to continue the execution of the original virtual-thread whenever unparked. Just like how a thread-pool would work. But here, you have a single carrier-thread in a way executing the body of multiple virtual-threads, switching from one to another when blocked.

We get the same behavior (and hence performance) as manually written asynchronous code, but instead avoiding the boiler-plate to do the same thing.


Consider the case of a web-framework, where there is a separate thread-pool to handle i/o and the other for execution of http requests. For simple HTTP requests, one might serve the request from the http-pool thread itself. But if there are any blocking (or) high CPU operations, we let this activity happen on a separate thread asynchronously.

This thread would collect the information from an incoming request, spawn a CompletableFuture, and chain it with a pipeline (read from database as one stage, followed by computation from it, followed by another stage to write back to database case, web service calls etc). Each one is a stage, and the resultant CompletablFuture is returned back to the web-framework.

When the resultant future is complete, the web-framework uses the results to be relayed back to the client. This is how Play-Framework and others, have been dealing with it. Providing an isolation between the http thread handling pool, and the execution of each request. But if we dive deeper in this, why is it that we do this?

One core reason is to use the resources effectively. Particularly blocking calls. And hence we chain with thenApply etc so that no thread is blocked on any activity, and we do more with less number of threads.

This works great, but quite verbose. And debugging is indeed painful, and if one of the intermediary stages results with an exception, the control-flow goes hay-wire, resulting in further code to handle it.

With Loom, we write synchronous code, and let someone else decide what to do when blocked. Rather than sleep and do nothing.

于 2020-12-09T12:47:15.260 回答
4
  1. http 服务器有一个专用的线程池.... 有多大的线程池?(CPU 数量)*N + C?N>1 可以回退到抗缩放,因为锁争用会延长延迟;其中 N=1 可能无法充分利用可用带宽。这里有一个很好的分析。

  2. http 服务器刚刚产生......这将是这个概念的一个非常幼稚的实现。一个更现实的方法是努力从动态池中收集,该池为每个阻塞的系统调用保留一个真正的线程,为每个真正的 CPU 保留一个线程。至少 Go 背后的人是这么想的。

关键是要让 {handlers, callbacks, completions, virtual threads, goroutines : all PEAs in a pod} 避免争夺内部资源;因此,除非绝对必要,否则它们不会依赖基于系统的阻塞机制。这属于避免锁定的旗帜,并且可以通过各种排队策略(参见 libdispatch)等来完成。请注意,这会使PEA 与底层系统线程分离,因为它们在它们之间是内部多路复用的。这是您对分离概念的担忧。在实践中,您传递您最喜欢的语言抽象上下文指针。

正如1所示,有可以与这种方法直接相关的切实成果;和一些无形资产。锁定很容易——你只需在你的交易周围做一个大锁,你就可以开始了。那不成比例;但是细粒度的锁定很难。上工难,粮食细度难选。什么时候使用{锁、CVs、信号量、障碍、...}在教科书的例子中很明显;在深度嵌套的逻辑中少一些。在大多数情况下,锁避免使这种情况消失,并仅限于诸如 malloc() 之类的竞争叶组件。

我对此持怀疑态度,因为研究通常表明系统规模很差,它被转化为锁避免模型,然后被证明更好。我还没有看到让一些有经验的开发人员来分析系统的同步行为,将其转换为可伸缩性,然后测量结果的方法。但是,即使这是一场胜利,有经验的开发人员也是一种稀有(ish)且昂贵的商品;可扩展性的核心实际上是财务。

于 2020-08-18T12:03:43.767 回答