Mongodb Tutorial 1 - Introduction
Table of Contents
To run mongo commands from the source of a JavaScript file,
cat source.js | mongo # or
mongo < source.js # or just
mongo source.js
To import/export data,
$ mongoimport -d <database> -c <collection> -f <file>
$ mongoimport -d <dababase> -c <collection> < file.json
$ mongoexport -d <dababase> -c <collection> --out file.json
$ mongorestore -d <database> -c <collection> file.bson
# by default writes BSON file to dump/ in current directory
$ mongodump -d <database> -c <collection [--out <path>]
What is MongoDB?
A document based NoSQL database with JSON (javascript object notation) elements. One important advantage is to support common data access patterns with one single query without joins. Actually, MongoDB does not support join, which makes it easier to shard/scale out. Joins and multi-table transactions are difficult to do in parallel, which requires scaling up (expensive single server).
Application Architecture
Mongo shell/driver connects to mongod server process through TCP. The course will build a blog website with MongoDB as the datastore. The java course uses SparkJava and Freemarker . The python course uses Bottle and its simple template engine. The drivers will be mongo java and pymongo.
JSON and BSON Documents
For more details on JSON standard, read here . General format: key value pairs in the form of { key : value}. Keys must be strings followed by colon (:) and the corresponding value. Fields are separated by comma (,). Value types include string, date, number, boolean, array, object, and nested fields (recursive).
You can find BSON specs here . MongoDB actually stores data in BSON, binary JSON format. MondoDB drivers sends/receives data as BSON. The drivers map BSON to language appropriate data types. BSON is lightweight, traversable (writing, reading, indexing), and efficient (encoding/decoding quickly).
BSON supports more data types:
- number (byte, int32, int64, double)
- date
- binary
- supports images
How json documents are encoded as bson:
//JSON
{ "hello" : "world" }
//BSON
"\x16\x00\x00\x00\x02hello\x00
\x06\x00\x00\x00world\x00\x00"
//length of document, type of value, field length, null terminators .etc
Installing MongoDB
Downdload mongodb
from here
. A tip for the linux versions: after extracting the tarballs, you could simply copy the executables from the bin folder into your virtualenv’s bin path/to/venv/bin
folder, assuming you are using pymonogo in a virtualenv with python 2. Alternatively you can copy to /usr/local/bin as suggested for a global use.
$ tar xvf mongodb-linux-x86_64-ubuntu1404-3.2.6.tgz
$ cp mongodb-linux-x86_64-ubuntu1404-3.2.6/bin/* path/to/venv/bin/
$ cd path/to/venv/
$ source bin/activate
$ mkdir -p /data/db
$ sudo chmod 777 /data
$ sudo chmod 777 /data/db
(venv) $ mongod
# in another terminal window
(venv) $ mongo
MongoDB shell version: 3.2.6
connecting to: test
> db.names.insert({'name':'Andrew Erlicson'})
WriteResult({ "nInserted" : 1 })
> db.names.find()
{ "_id" : ObjectId("5778642550b9dd3f38d82b4e"), "name" : "Andrew Erlicson" }
On Windows, download the msi installer and install as directed. Add mongodb bin (C:\Program Files\MongoDB\Server\3.2\bin) folder to PATH.
CRUD Operations
In mongo shell,
> help //list of mongo commands
> show dbs
local 0.070GB
test 0.001GB
> show collections
names
> // video.movies refers to video database movies collection
> db.names.find() // global variable db refers to current database
> use video // mongodb creates database in lazy fashion when data inserted
> db.movies.insertOne({"title": "Jaws", "year": 1975, "imdb": "tt0073195"})
{ "acknowledged":true, "insertId":ObjectId("5778b5782430a299a54686b5")}
// mongodb will add _id field if not specified
> db.movies.find()
{ "_id" : ObjectId("5778b5782430a299a54686b5"),
"title" : "Jaws", "year" : 1975, "imdb" : "tt0073195" }
> db.movies.find({}).pretty()
{
"_id" : ObjectId("5778b5782430a299a54686b5"),
"title" : "Jaws",
"year" : 1975,
"imdb" : "tt0073195"
}
> var c = db.movies.find() // returns a cursor
> c.hasNext()
true
> c.next()
{
"_id" : ObjectId("5778b5782430a299a54686b5"),
"title" : "Jaws",
"year" : 1975,
"imdb" : "tt0073195"
}
Example Project Blog Site
Relational model for the blog. We will need six tables fully denormalized.
posts | comments | tags | post-tags | post-comments | authors |
---|---|---|---|---|---|
post_id | comment_id | tag_id | post_id | post_id | author_id |
author_id | name | name | tag_id | comment_id | username |
title | comment | - | - | - | password |
post | - | - | - | - | |
date | - | - | - | - | - |
In order to show a blog post with comments and tags, we need to join all the six tables.
As for the document model, for a post JSON document in posts collection:
{
title : "free online tutorial",
body : "......",
author : "erlicson",
date : ISODate(......),
comments : [ { name: "joe biden", email : "joe@mongodb.org", comment:"..." },
{.....}, {.....}
],
tags: ["cycling", "education", "startups"]
}
We will need a authors collection with username as primary key:
{
_id : "erlicosn",
password : "..."
}
The data is hierarchical. If email is missing from any comment, it does not have to be there. You can leave it out. MongoDB is schemaless and flexible about that.We only need 1 collection, the post collection to display a blog post.
Introduction to Schema Design
“To embed or not embed, that is the question.”
With relational database, you consider normal forms (3rd, 4th, Boyce-Codd) and dependencies. Maybe start with 3rd normal form and combine a few things.
With mongodb, how do you know when to embed? For example, to embed the tags and comments into the posts collection. The answer is that they are typically accessed at the same time. It’s very rare to access a tag independently of accessing a post. The comment itself does not apply to more than one post.
An operation like changing a tag named “cycling” to “biking” for the whole site would be easier in relational world but it is an unusual change to make, something you are not changing all the time.
Another practical concern is the document size. In mongodb, documents cannot be more than 16 MB.
MongoDB Basics Cheatsheet
command | effect |
---|---|
db.runCommand{dropDatabase:1} | drop the current database, deleting associated files |
db.dropDatabase() | same as above |
db.runCommand{drop:collname} | delete collname collection from the current database and associated indexes |
db.collname.drop() | same as above |
show dbs | show all databases |
show collections | show all collections in current database |
use dbname | switch to dbname database |
Resources
Link to the MongoDB tutorial series.
comments powered by Disqus