0

I have 50-100mb dataset that users need to have access to. It's static, so doesn't make sense to host a server for it. There are two kinds of operations I'll perform on the data:

  1. Reading objects by unique ObjectId. Each object is ~3kb.
  2. Full text search through ~300.000 strings. Each string is 4-60 characters.

I'm considering to store data as JSON files. The 300k strings will be stored separately. I'll use https://github.com/nextapps-de/flexsearch or something similar to perform search over it. I've done something similar before with ~10mb dataset back in 2016. I used just regex search and it was working flawlessly.

Are there reasons to use RealmDB, SQLite, PouchDB or something else instead of just JSON?

4

3 回答 3

4

I wish I did this question an year ago...

In the office I currently work we tried creating an app by using PouchDB and react native, we basically saw PouchDB as an advantage because it wouldn't require our API to send all data over and over again on every refresh triggered by the user, it would only send the data that changed based on the client's checkpoint. As the data in the server was quite heavy (around 6k entries with more than 200 attributes each) we tried at all costs to go easy on the client's data plan.

Months after this implementation was in place we implemented a search functionality with many different options for sorting and filtering, and not only we had to throw away all our implementation of PouchDB, but we had to start from scratch replacing all its logic with indexed JSON values. PouchDB performance was extremely slow, it was taking more than 5 seconds or so to retrieve results, and we just couldn't afford to delay this time on our scope.

In the end we accomplished to reach a very quick search running flex search inside our indexed JSONs. Don't do the same mistake we did, PouchDB costed us too much budget and precious time. It was a terrible choice.

Unfortunately I cannot offer proof or more details from a reputable source, I can only share the own personal terrible experience I had when I thought we were reaching the end of a project and we had to start from scratch. it was a mess.

于 2020-04-20T16:02:54.637 回答
3

Oh boy, a bountied, opinion based question!

I have about 5 years experience with pouchDB specifically, a little with SQLite. I have but a cursory experience with RealmDB - I tried it out and decided it was not a good fit for my hybrid/mobile needs.

pouchDB exceeds in on one area hands down - synchronization/replication just like it's big brother CouchDB. Providing interaction with an offline database that synchronizes with a remote database is huge for many mobile apps. pouchDB is schemaless, leveraging JSON documents. With pouchDB one may choose among several data stores via adapters. As there can be quota headaches1 for your data size the right choice may likely be the SQLite adapter. pouchDB does not support full text search.

SQLite is what its name implies - a relational database, requiring a schema. An advantage to SQLite is platform support and the size of the database is not subject to quota headaches like web storage (e.g. IndexedDB). SQLite supports full text search, and apps can deploy with a canned database.

Between pouchDB and SQLite lies RealmDB - it is a schema based object database that supports synchronization/replication. Like pouchDB, it does not support full text search.

Now your requirements

  • Looking up object by id
  • 300k static text
  • full-text search

I read 'static' to mean immutable.

Since your data does not change and full-text search is required, pouchDB and RealmDB would not be good choices. If there is a requirement to enhance, remove or add to the data, either would make sense as changes to data on a single server would replicate changes to the local database, practically in a seamless fashion.

SQLite might be a reasonable choice since it supports search and it is possible to deploy a canned database with the app. However, SQLite can be slow in hybrid apps.

So,

  • pouchDB and RealmDB would be massive overkill and not a good fit.
  • SQLite would add a fair bit of complexity.

For your specific requirements I'd stay on your path, though I have a care as it appears flexsearch loads its index into memory - if its performance returns some penalty then SQLite, with it's ability to deploy a canned database and providing a search facility may prove a reasonable trade off versus complexity.

Good luck!


1 Quota Headaches

于 2020-04-20T20:20:23.507 回答
1

I would say it really just depends on whether you want and NEED to leverage the power of relational queries. Because your data is never changing I would use JSON unless you are trying to perform complex comparisons between your data. In your case it sounds like you are just going to be searching for the particular ObjectId so JSON is your best bet especially because you are saying you won't need to change the data later.

If you organize your JSON so that your ObjectId are in a sorted order you will easily be able to search quickly.

于 2020-04-19T23:28:08.887 回答