MongoDB上手笔记4——在很多数据中进行查询

上一次把简单的Restful API搭建起来了,接下来让我们往数据库里面填充一些数据。

俗话说的好,一天一个苹果,考不上博士。假设小明每天吃一个苹果,为了记录截止到某一天小明一共吃了多少个苹果,我在db logs的collection applesIAte中,记录了截止到某一天小明一共吃了多少苹果。

假设小明命中注定不会读博,所以小明骨骼惊奇的生下来就不吃奶,而是出生第一天就吃苹果,每天以苹果为生(好像一个并吃不饱的样子,请不要在意这些细节)。然后每天为了建设祖国而奋斗,一直活到了100岁,最后在100岁生日的当晚因为看到了祖国的建设成果过于兴奋(以及吃了太多的苹果)而含笑九泉。

按照这以上假设,我写了个脚本,记录了无私而强韧的小明的不平凡的一生中吃了的苹果数量。这段脚本中用了个insert_many()的方法,在一次性写入很多条document的时候比较快。

from flask_pymongo import MongoClient
mongo = MongoClient(host="localhost", port=27999)
db = mongo.logs
auth_res = db.authenticate("readwriter", "123456")
apples = [{
    "day": day,
    "apple": day
} for day in range(1, 100*365+1)]
db.applesIAte.insert_many(apples)

然后我们看一看小明这光辉的一生。

> db.applesIAte.stats()
{
        "ns" : "logs.applesIAte",
        "count" : 36500,
        "size" : 1752064,
        "avgObjSize" : 48,
        "numExtents" : 5,
        "storageSize" : 2793472,
        "lastExtentSize" : 2097152,
        "paddingFactor" : 1,
        "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
        "userFlags" : 1,
        "capped" : false,
        "nindexes" : 1,
        "totalIndexSize" : 1193696,
        "indexSizes" : {
                "_id_" : 1193696
        },
        "ok" : 1
}

恩,小明活了36500天(原谅我忘了闰年这事),真是了不起!

在回顾小明这光辉的一生(中吃了多少苹果)之前,让我们先做一些小手脚。这样我们才能看看我们能以多快的速度回顾(查询)了小明的一生(丫可是活了100年口阿!)

> db.setProfilingLevel(2)
{ "was" : 0, "slowms" : 100, "ok" : 1 }

上面这句话的意思是,我们所有的回顾小明一生的行为(查询)的相关信息都会被记录下来。

接下来我们看看小明在10000天的时候一共吃了多少苹果。(此处也可以用findOne({day: 10000}),不过并不影响结论)

> db.applesIAte.find({day: 10000})
{ "_id" : ObjectId("56a9c39cabf8322aa0828dec"), "day" : 10000, "apple" : 10000 }

显然他吃了10000个,awesome!那让我们看看我们花了多长时间发现他吃了这么多苹果。

> db.system.profile.find()
{
    "op":"query",
    "ns":"logs.applesIAte",
    "query":{
        "day":10000
    },
    "ntoreturn":0,
    "ntoskip":0,
    "nscanned":0,
    "nscannedObjects":36500,
    "keyUpdates":0,
    "writeConflicts":0,
    "numYield":285,
    "locks":{
        "Global":{
            "acquireCount":{
                "r":NumberLong(572)
            }
        },
        "MMAPV1Journal":{
            "acquireCount":{
                "r":NumberLong(286)
            }
        },
        "Database":{
            "acquireCount":{
                "r":NumberLong(286)
            }
        },
        "Collection":{
            "acquireCount":{
                "R":NumberLong(286)
            }
        }
    },
    "nreturned":1,
    "responseLength":62,
    "millis":20,
    "execStats":{
        "stage":"COLLSCAN",
        "filter":{
            "day":{
                "$eq":10000
            }
        },
        "nReturned":1,
        "executionTimeMillisEstimate":10,
        "works":36502,
        "advanced":1,
        "needTime":36500,
        "needFetch":0,
        "saveState":285,
        "restoreState":285,
        "isEOF":1,
        "invalidates":0,
        "direction":"forward",
        "docsExamined":36500
    },
    "ts":    ISODate("2016-01-28T07:53:57.857    Z"),
    "client":"127.0.0.1",
    "allUsers":[
        {
            "user":"boss",
            "db":"admin"
        },
        {
            "user":"readwriter",
            "db":"logs"
        }
    ],
    "user":"readwriter@logs"
}

上面这一大段基本上都没啥用,我们只需要两条——"millis":20,"docsExamined":36500。也就是查询一共花了20ms,查看了36500条document才找到了这条记录。(好慢!)

如果小明不是一个普通的人,是一个脱离了低级趣味的神仙,他活了一万年,甚至十万年前他就和猛犸象谈笑风生了(可他依旧不想读博)。那么我们的记录会越来越多,查询速度会越来越慢。

而我们为days加上索引可以有效的解决这个问题。因为day是单调增长的,所以参数是{day: 1}。

> db.applesIAte.createIndex({day: 1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}
> db.applesIAte.stats()
{
        "ns" : "logs.applesIAte",
        "count" : 36500,
        "size" : 1752064,
        "avgObjSize" : 48,
        "numExtents" : 5,
        "storageSize" : 2793472,
        "lastExtentSize" : 2097152,
        "paddingFactor" : 1,
        "paddingFactorNote" : "paddingFactor is unused and unmaintained in 3.0. It remains hard coded to 1.0 for compatibility only.",
        "userFlags" : 1,
        "capped" : false,
        "nindexes" : 2,
        "totalIndexSize" : 2117584,
        "indexSizes" : {
                "_id_" : 1193696,
                "day_1" : 923888
        },
        "ok" : 1
}

从stats()返回的结果来看,day的索引确实是加上了。那么结果如何呢?

{
    "op":"query",
    "ns":"logs.applesIAte",
    "query":{
        "day":20000
    },
    "ntoreturn":0,
    "ntoskip":0,
    "nscanned":1,
    "nscannedObjects":1,
    "keyUpdates":0,
    "writeConflicts":0,
    "numYield":0,
    "locks":{
        "Global":{
            "acquireCount":{
                "r":NumberLong(2)
            }
        },
        "MMAPV1Journal":{
            "acquireCount":{
                "r":NumberLong(1)
            }
        },
        "Database":{
            "acquireCount":{
                "r":NumberLong(1)
            }
        },
        "Collection":{
            "acquireCount":{
                "R":NumberLong(1)
            }
        }
    },
    "nreturned":1,
    "responseLength":62,
    "millis":0,
    "execStats":{
        "stage":"FETCH",
        "nReturned":1,
        "executionTimeMillisEstimate":0,
        "works":2,
        "advanced":1,
        "needTime":0,
        "needFetch":0,
        "saveState":0,
        "restoreState":0,
        "isEOF":1,
        "invalidates":0,
        "docsExamined":1,
        "alreadyHasObj":0,
        "inputStage":{
            "stage":"IXSCAN",
            "nReturned":1,
            "executionTimeMillisEstimate":0,
            "works":2,
            "advanced":1,
            "needTime":0,
            "needFetch":0,
            "saveState":0,
            "restoreState":0,
            "isEOF":1,
            "invalidates":0,
            "keyPattern":{
                "day":1
            },
            "indexName":"day_1",
            "isMultiKey":false,
            "direction":"forward",
            "indexBounds":{
                "day":[
                    "[20000.0, 20000.0]"
                ]
            },
            "keysExamined":1,
            "dupsTested":0,
            "dupsDropped":0,
            "seenInvalidated":0,
            "matchTested":0
        }
    },
    "ts":    ISODate("2016-01-28T08:05:46.904    Z"),
    "client":"127.0.0.1",
    "allUsers":[
        {
            "user":"boss",
            "db":"admin"
        },
        {
            "user":"readwriter",
            "db":"logs"
        }
    ],
    "user":"readwriter@logs"
}

结果发现"docsExamined":1,"millis":0。我们只查询了一条数据就直接找到了到20000天的时候小明吃了多少苹果,花的时间更是小到近似于0了!所以说当小明成仙之后,我们必须对day加上索引才能知道他为了不读博吃了多少苹果XD。

后记:最近看了一本书,作者比鸟哥还谐,整本书写的很像《春物》这种轻小说的风格。于是尝试着『学习』了一下这种风格,试过之后感觉有些浮夸啊XD