You ask two different questions: learning and performance.
- It took me about a month to become comfortable with functional programming using recursion, pattern matching,
map
, filter
, and fold
. I did all that with ML but it translated to Haskell very easily.
- It took me two or three years to wrap my head around monads, but that's because I read the wrong stuff. I think there are better tutorials now. But if you're beginning, avoid monads for a while.
- It took me several months to get good at creating new type classes, but using the existing ones was easy.
- I'm still not sure I have the hang of lazy evaluation. But I love Haskell's purity and tend to treat lazy evaluation as an unhappy accident that only a few people (like John Hughes) know how to exploit.
You've observed a performance problem only because you've adapted an algorithm loaded with mutation, which Tony Hoare designed for imperative languages, and tried to translate into Haskell. In Haskell as in any other functional language the expensive operation is allocation. Try writing a merge sort and you'll find it's simple and performs very well.
How do you avoid making similar mistakes in the future? Have a look at Chris Okasaki's book Purely Functional Data Structures. Great book, and it will help you learn the 'functional way of doing things' without giving up performance.