Skip to content

Experimented with MongoDB

December 13, 2011

Today, I experimented with MongoDB (aka humongous) which is a NOSQL db. Yes, they call it schema free and document oriented and (may be) column based db. I had a fantastic first hand experience working on it.

Created a Mongo DB on a high spec unix server and inserted 70000000 documents in a collection. The DB server scaled to around 2.1GB. Well, now that is not good for this kind of data.

Well, they say, file storage if more than 16 mb BSON is based on GridFS protocol.(specific method for storing and retrieving from file) BSON = binary of JSON

Data is retrieved/saved as Memory mapped files. Here comes it’s limitation on 32 bit system.

Probably, best thing was the ease with which I created a no schema db in less than 15 mins. Without any indexing, I was able to retrieve row based on a column value in around 2300ms from the DB.
Mongo uses something called a Capped Collection based on the insertion order.

I dug into this to create a User Journey graph for an application. So that, I can capture all some of the prominent user inputs and represent them on a timeline at a latter stage for analysis of user behavior, usage complexity and time based user workflow.

MongoDB uses the operating system’s virtual memory system to handle writing data back to disk. The way this works is that MongoDB maps the entire database into the OS’s virtual memory and lets the OS decide which pages are ‘hot’ and need to be in physical memory, as well as which pages are ‘cold’ and can be swapped out to disk. MongoDB also has a periodic process that makes sure that the database on disk is sync’d up with the database in memory. (MongoDB also includes a journalling system to make sure you don’t lose data)

What this means is that you have very different behavior depending on whether the particular page you’re reading or writing is in memory or not. If it is not in memory, this is called a page fault and the query or update has to wait for the very slow operation of loading the necessary data from disk. If the data is in memory (the OS has decided it’s a ‘hot’ page), then an update is literally just writing to system RAM, a very fast operation. This is why 10gen (the makers of MongoDB) recommend that, if at all possible, you have sufficient RAM on your server to keep the entire working set (the set of data you access frequently) in memory at all times.

Need to look at indexing, sharding, replication, scalability and memory contraints on mongo db.
But a real good first hand experience.

Back to this after 2 yrs, the pertinent question is whether proper modelling of schema more important now with document based db ?

Time ordered or time series data is a good fit for MongoDB?

Advertisements

From → NoSQL

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: