Generally speaking, there is no need to worry about the distribution of the response. Although you are showing a bivariate plot, it is possible that the multi-modality is explained by X2
(or other, missing variables)
It is the distribution of the model residuals that matters (if it matters at all).
If the residuals are non-normal, then certain inferences may be invalid, although this may not be a problem at all if the model is used for prediction.
If you really do have a curvilinear association then you could consider:
- transformations
- non-linear terms
- splines
- generalised additive models (GAMs)
- non-linear models
Of course, if the underlying problem is that you have missing explanatory variables, then some of these approaches may lead to an overfitted model.