Sorry for the lousy title, but let me explain the problem I'm having. I'm currently working on a project, and a part of this includes a search engine for addresses, which I have in elasticsearch. What I'm trying to do is use fuzzy_like_this_field queries when a new character is entered in my search bar to generate autocomplete results and try to "guess" which of the (~1 million) addresses the user is typing.
My issue is that I currently have a size limit on my query, as returning all of the results was both unnecessary and expensive, time-wise. My issue, is that I often am not getting the "correct" result unless I return 1000 or more results from the query. For example, if I enter "100 broad" in trying to search for "100 broadway" and I only return 200 results (about the max that I can do without it taking too long), 100 broadway is nowhere to be found, even though all of the returned results have a higher levenshtein distance than the result that I want. I get "100 broadway" as the first result if I return 2000 results from my query, but it takes too long. I can't even filter the results that got returned to bring the correct one to the top, because it's not being returned.
Shouldn't putting a size limit of N on the query return the best N results, not a seemingly random subset of them?
Sorry if this is poorly worded or too vague.