alexdong's notebook http://notes.alexdong.com An software engineering notebook from Alex Dong. For startup diary, please visit http://startup.alexdong.com/. posterous.com Mon, 25 Jul 2011 01:14:00 -0700 How to configure django/mod_wsgi to avoid a frozen apache because of Python GIL? http://notes.alexdong.com/how-to-configure-djangomodwsgi-to-avoid-a-fro http://notes.alexdong.com/how-to-configure-djangomodwsgi-to-avoid-a-fro

Do you know that a long running request could block the whole Apache process that's hosting it?

Yes, we have run into this problem today. It took us quite a while to figure it out. So here is the story. 

Trunk.ly allows our user to add a new link directly via the web interface. Behind the scene,  the python process hosted by mod_wsgi "within" apache needs to resolve the dns, expand any url shortener wrapped around it, download the actual html and submit it to our backend search server. We noticed that if someone added a url that takes a long time for our crawler to download, the whole apache will be frozen there. Further user requests will be blocked and eventually the front-end nginx will start throwing out 500 Timeout errors. (We use nginx to serve static files and pass dynamic requests to apache as explained here. Notice that this tutorial has the same flaw we'll see soon.)

So one request can bring the whole server down. Bad.  

Our settings are: 

  • MPM Worker for Apache: Basically apache has a process group which consists of a few processes and each process has a number of threads. All these threads will take a request and process them.  (Internals of Apache's scheduling/worker system and some nice diagrams can be found here. )
  • mod_wsgi works in daemon mode: This means python processes run in their own process and mod_wsgi acts as a bridge between apache and the python process. 
  • mod_wsgi has 1 process and 25 threads as specified in mod_wsgi's IntegrationWithDjango,  WSGIDaemonProcess site-1 user=user-1 group=user-1 threads=25

The above settings basically says create a single process with 25 threads in it and let each thread deal with the incoming http requests. At the beginning I didn't think the whole apache process was frozen because of Python GIL.  Theoretically Python GIL should yield whenever there is an IO request but somehow this didn't happen in the context specified above. So a natual solution is to assign each request to one process instead of a thread.  Here is the new mod_wsgi settings  

WSGIDaemonProcess site-1 user=user-1 group=user-1 processes=9 threads=1

The above setting tells mod_wsgi to spawn 9 processes and each has 1 thread. Basically the same thing python's multiprocessing does to replace threading. After restarting the apache process, the frozen process problem disappeared. 

Notes

  1. One unknown I still haven't figured out yet is why the GIL doesn't yield. If you have any hints, please leave a comment.  
  2. Our backend crawler and processing pipeline have been running in gevent for many months, so gunicorn looks like a very natural next step for us.  I'd love to hear about your experience if you have moved from mod_wsgi to gunicorn. 
  3. Here are a few more links trunk.ly users have shared about GIL - Python Global Interpreter Lock

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 15 Jun 2011 22:31:00 -0700 How to save 200% RAM by selecting the right key data type for #MongoDB http://notes.alexdong.com/choose-the-right-data-type-for-mongodb http://notes.alexdong.com/choose-the-right-data-type-for-mongodb

At trunk.ly, we need to store and process more than 100,000 links every day. Each url is represented as the md5 value of the url. Recently, we start to notice an increasing page fault in mongostat and performance starts to degrade.  After investigation, we have realized that we can no longer keep all index in RAM anymore. 

In this post, I am going to show given 10 million md5 records, how 100% to 200% memory can be saved by adopting a slightly different data type as the index.

Here are the python code to create 1,000,000 records with 4 different index type: ObjectId, int, md5 string and base64 binary string:

#!/usr/bin/env python

import pymongo
import bson
from pymongo import Connection

db = connection.test_database

print('ObjectID')
for i in range(1, 1000000):
    db.objectids.insert({'i': i})

print('int')
for i in range(1, 1000000):
    db.ints.insert({'_id': i, 'i': i})

print('Base64 BSON')
for i in range(1, 1000000):
    db.base64s.insert({'_id': \
        bson.Binary(hashlib.md5(str(i)).digest(), 
        bson.binary.MD5_SUBTYPE), 'i': i})

print('string')
for i in range(1, 1000000):
    db.strings.insert({'_id': hashlib.md5(str(i)).digest(), 'i': i})

Here are the mongo index status we get for each index type:

> db.base64s.stats()
{
        "totalIndexSize" : 67076096,
}
> db.objectids.stats()
{
        "totalIndexSize" : 41598976,
}
> db.ints.stats()
{
        "totalIndexSize" : 32522240,
}
> db.strings.stats()
{
        "totalIndexSize" : 90914816,

}

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Sat, 19 Mar 2011 02:48:00 -0700 What I didn't know about MongoDB http://notes.alexdong.com/what-i-didnt-know-about-mongodb http://notes.alexdong.com/what-i-didnt-know-about-mongodb

Here are the notes I took while reading through the book http://oreilly.com/catalog/0636920001096

 

Data type

 

Ordered key/value pairs

Key/value pairs in documents are ordered—the earlier document is distinct from 

the following document: 

{"greeting" : "Hello, world!", "foo" : 3} 

{"foo" : 3, "greeting" : "Hello, world!"} 

 

Use subcollections is recommended

The MongoDB web console organizes the data in its DBTOP section by 

subcollection (see Chapter 8 for more information on administration). 

• Most drivers provide some syntactic sugar for accessing a subcollection of a given 

collection. For example, in the database shell, db.blog will give you the blog col- 

lection, and db.blog.posts will give you the blog.posts collection. 

Subcollections are a great way to organize data in MongoDB, and their use is highly 

recommended. 

 

What a built-in function is doing

A good way of figuring out what a function is doing is to type it without the parentheses. 

This will print the JavaScript source code for the function. For example, if we are curious 

about how the update function works or cannot remember the order of parameters, we 

can do the following: 

> db.foo.update 

function (query, obj, upsert, multi) { 

    assert(query, "need a query"); 

    assert(obj, "need an object"); 

    this._validateObject(obj); 

    this._mongo.update(this._fullName, query, obj, 

 

Avoid overwrite entire documents and 8-byte integer representation

JavaScript has one “number” type. Because MongoDB has three number types (4-byte 

integer, 8-byte integer, and 8-byte float), the shell has to hack around JavaScript’s lim- 

itations a bit. By default, any number in the shell is treated as a double by MongoDB. 

This means that if you retrieve a 4-byte integer from the database, manipulate its docu- 

ment, and save it back to the database even without changing the integer, the integer 

will be resaved as a floating-point number. Thus, it is generally a good idea not to 

overwrite entire documents from the shell

 

if you save an 8-byte integer and look at it in the shell, the shell will display it as an 

embedded document indicating that it might not be exact. For example, if we save a 

document with a "myInteger" key whose value is the 64-bit integer, 3, and then look at 

it in the shell, it will look like this: 

> doc = db.nums.findOne() 

    "_id" : ObjectId("4c0beecfd096a2580fe6fa08"), 

    "myInteger" : { 

        "floatApprox" : 3 

    } 

If you insert an 8-byte integer that cannot be accurately displayed as a double, the shell 

will add two keys, "top" and "bottom", containing the 32-bit integers representing the 

4 high-order bytes and 4 low-order bytes of the integer, respectively. For instance, if 

we insert 9223372036854775807, the shell will show us the following: 

> db.nums.findOne() 

    "_id" : ObjectId("4c0beecfd096a2580fe6fa09"), 

    "myInteger" : { 

        "floatApprox" : 9223372036854776000, 

        "top" : 2147483647, 

        "bottom" : 4294967295 

    } 

 

ObjectID explained

If you create multiple new ObjectIds in rapid succession, you can see that only the last 

few digits change each time. In addition, a couple of digits in the middle of the 

ObjectId will change (if you space the creations out by a couple of seconds). This is 

because of the manner in which ObjectIds are created. The 12 bytes of an ObjectId are 

generated as follows: 

0 1 2 3 4 5 6 7 8 9 10 11 

Timestamp Machine PID Increment 

The first four bytes of an ObjectId are a timestamp in seconds since the epoch. This 

provides a couple of useful properties: 

• The timestamp, when combined with the next five bytes (which will be described 

in a moment), provides uniqueness at the granularity of a second. 

• Because the timestamp comes first, it means that ObjectIds will sort in roughly 

insertion order. This is not a strong guarantee but does have some nice properties, 

such as making ObjectIds efficient to index. 

 

MongoDB’s philosophy on pushing tasks to client driver whenever possible

Although ObjectIds are designed to be lightweight and easy to generate, there is 

still some overhead involved in their generation. The decision to generate them on 

the client side reflects an overall philosophy of MongoDB: work should be pushed 

out of the server and to the drivers whenever possible. This philosophy reflects the 

fact that, even with scalable databases like MongoDB, it is easier to scale out at the 

application layer than at the database layer. Moving work to the client side reduces 

the burden requiring the database to scale. 

• By generating ObjectIds on the client side, drivers are capable of providing richer 

APIs than would be otherwise possible. For example, a driver might have its 

insert method either return the generated ObjectId or inject it directly into the 

document that was inserted. If the driver allowed the server to generate 

ObjectIds, then a separate query would be required to determine the value of 

"_id" for an inserted document. 

 

 

CRUD

 

Batch Insert 

If you have a situation where you are inserting multiple documents into a collection, 

you can make the insert faster by using batch inserts. Batch inserts allow you to pass 

an array of documents to the database. 

Sending dozens, hundreds, or even thousands of documents at a time can make inserts 

significantly faster. A batch insert is a single TCP request, meaning that you do not 

incur the overhead of doing hundreds of individual requests. It can also cut insert time 

by eliminating a lot of the header processing that gets done for each message. When 

an individual document is sent to the database, it is prefixed by a header that tells the 

database to do an insert operation on a certain collection. By using batch insert, the 

database doesn’t need to reprocess this information for each document. 

 

Question: if 1 out of 1000 insertion failed due to conflicts, will the rest 999 still succeed? 

 

Fire-and-forget update by default

Updates are atomic: if two updates happen at the same time, whichever one reaches 

the server first will be applied, and then the next one will be applied. Thus, conflicting 

updates can safely be sent in rapid-fire succession without any documents being cor- 

rupted: the last update will “win.” 

 

The three operations that this chapter focused on (inserts, removes, and updates) seem 

instantaneous because none of them waits for a database response. They are not asyn- 

chronous; they can be thought of as “fire-and-forget” functions: the client sends the 

documents to the server and immediately continues. The client never receives an “OK, 

got that” or a “not OK, could you send that again?” response. 

The benefit to this is that the speed at which you can perform these operations is terrific. 

You are often only limited by the speed at which your client can send them and the 

speed of your network. 

 

Aware of $push becoming bottleneck

Using "$push" and other array modifiers is encouraged and often necessary, but it is 

good to keep in mind the trade-offs of such updates. If "$push" becomes a bottleneck, 

it may be worth pulling an embedded array out into a separate collection. 

 

The save Shell Helper 

save is a shell function that lets you insert a document if it doesn’t exist and update it 

if it does. It takes one argument: a document. If the document contains an "_id" key, 

save will do an upsert. Otherwise, it will do an insert. This is just a convenience function 

so that programmers can quickly modify documents in the shell: 

> var x = db.foo.findOne() 

> x.num = 42 

42 

> db.foo.save(x) 

Without save, the last line would have been a more cumbersome 

db.foo.update({"_id" : x._id}, x). 

 

Multiupdate 

Multiupdates are a great way of performing schema migrations or rolling out new fea- 

tures to certain users. Suppose, for example, we want to give a gift to every user who 

has a birthday on a certain day. We can use multiupdate to add a "gift" to their account: 

> db.users.update({birthday : "10/13/1978"}, 

... {$set : {gift : "Happy Birthday!"}}, false, true) 

This would add the "gift" key to all user documents with birthdays on October 13, 

1978. 

 

To see the number of documents updated by a multiple update, you can run the 

getLastError database command (which might be better named "getLastOpStatus"). 

The "n" key will contain the number of documents affected by the update: 

> db.count.update({x : 1}, {$inc : {x : 1}}, false, true) 

> db.runCommand({getLastError : 1}) 

    "err" : null, 

    "updatedExisting" : true, 

    "n" : 5, 

    "ok" : true 

 

 

Query snapshot and connection pools

 

For each connection to a MongoDB server, the database creates a queue for that con- 

nection’s requests. When the client sends a request, it will be placed at the end of its 

connection’s queue. Any subsequent requests on the connection will occur after the 

enqueued operation is processed. Thus, a single connection has a consistent view of 

the database and can always read its own writes. 

Note that this is a per-connection queue: if we open two shells, we will have two con- 

nections to the database. If we perform an insert in one shell, a subsequent query in 

the other shell might not return the inserted document. However, within a single shell, 

if we query for the document after inserting, the document will be returned. This be- 

havior can be difficult to duplicate by hand, but on a busy server, interleaved inserts/ 

queries are very likely to occur. Often developers run into this when they insert data in 

one thread and then check that it was successfully inserted in another. For a second or 

two, it looks like the data was not inserted, and then it suddenly appears. 

This behavior is especially worth keeping in mind when using the Ruby, Python, and 

Java drivers, because all three drivers use connection pooling. For efficiency, these 

drivers open multiple connections (a pool) to the server and distribute requests across 

them. 

 

 

Querying

 

Limit the fields returned

Sometimes, you do not need all of the key/value pairs in a document returned. If this 

is the case, you can pass a second argument to find (or findOne) specifying the keys you 

want. This reduces both the amount of data sent over the wire and the time and memory 

used to decode documents on the client side. 

For example, if you have a user collection and you are interested only in the "user 

name" and "email" keys, you could return just those keys with the following query: 

> db.users.find({}, {"username" : 1, "email" : 1}) 

    "_id" : ObjectId("4ba0f0dfd22aa494fd523620"), 

    "username" : "joe", 

    "email" : "joe@example.com

As you can see from the previous output, the "_id" key is always returned, even if it 

isn’t specifically listed. 

 

Cursor chains and load loading

When you call find, the shell does not query the database immediately. It waits until 

you actually start requesting results to send the query, which allows you to chain ad- 

ditional options onto a query before it is performed. Almost every method on a cursor 

object returns the cursor itself so that you can chain them in any order. For instance, 

all of the following are equivalent: 

> var cursor = db.foo.find().sort({"x" : 1}).limit(1).skip(10); 

> var cursor = db.foo.find().limit(1).sort({"x" : 1}).skip(10); 

> var cursor = db.foo.find().skip(10).limit(1).sort({"x" : 1}); 

At this point, the query has not been executed yet. All of these functions merely build 

the query. Now, suppose we call the following: 

> cursor.hasNext() 

 

Index

 

Avoiding Large Skips 

Using skip for a small number of documents is fine. For a large number of results, 

skip can be slow (this is true in nearly every database, not just MongoDB) and should 

be avoided. Usually you can build criteria into the documents themselves to avoid 

having to do large skips, or you can calculate the next query based on the result from 

the previous one. 

Paginating results without skip 

The easiest way to do pagination is to return the first page of results using limit and 

then return each subsequent page as an offset from the beginning. 

> // do not use: slow for large skips 

> var page1 = db.foo.find(criteria).limit(100) 

> var page2 = db.foo.find(criteria).skip(100).limit(100) 

> var page3 = db.foo.find(criteria).skip(200).limit(100) 

... 

However, depending on your query, you can usually find a way to paginate without 

skips. For example, suppose we want to display documents in descending order based 

on "date". We can get the first page of results with the following: 

> var page1 = db.foo.find().sort({"date" : -1}).limit(100) 

Then, we can use the "date" value of the last document as the criteria for fetching the 

next page: 

 

Key metrics for query performance

"nscanned" : 64 

This is the number of documents that the database looked through. You want to 

make sure this is as close to the number returned as possible. 

"n" : 64 

This is the number of documents returned. We’re doing pretty well here, because 

the number of documents scanned exactly matches the number returned. Of 

course, given that we’re returning the entire collection, it would be difficult to do 

otherwise. 

"millis" : 0 

The number of milliseconds it took the database to execute the query. 0 is a good 

time to shoot for. 

 

MongoDB query optimizer and parallel query plan execution model

MongoDB has a query optimizer and is very clever about 

choosing which index to use. When you first do a query, the query optimizer tries out 

a number of query plans concurrently. The first one to finish will be used, and the rest 

of the query executions are terminated. That query plan will be remembered for future 

queries on the same keys. The query optimizer periodically retries other plans, in case 

you’ve added new data and the previously chosen plan is no longer best. The only part 

you should need to worry about is giving the query optimizer useful indexes to choose 

from. 

 

 

Aggregation

 

Counting the total number of documents in a collection is fast regardless of collection 

size. 

 

 

Advanced Topics

 

Capped collection use cases and benefits

First, inserts into a capped collection are extremely 

fast. When doing an insert, there is never a need to allocate additional space, and the 

server never needs to search through a free list to find the right place to put a document. 

The inserted document can always be placed directly at the “tail” of the collection, 

overwriting old documents if needed. By default, there are also no indexes to update 

on an insert, so an insert is essentially a single memcpy. 

Another interesting property of capped collections is that queries retrieving documents 

in insertion order are very fast. Because documents are always stored in insertion order, 

queries for documents in that order just walk over the collection, returning documents 

in the exact order that they appear on disk. By default, any find performed on a capped 

collection will always return results in insertion order. 

 

When to use DBRefs

In short, the best times to use DBRefs are when you’re storing heterogeneous references 

to documents in different collections, like in the previous example or when you want 

to take advantage of some additional DBRef-specific functionality in a driver or tool. 

Otherwise, it’s generally best to just store an "_id" and use that as a reference, because 

that representation tends to be more compact and easier to work with. 

 

 

Administration

 

Backing up from Slave is recommended

Backing up from a slave is the recommended way to handle data backups with MongoDB. 

 

What happens when “repairing a database”

The underlying process of repairing a database is actually pretty easy to understand: all of the documents in the database are exported and 

then immediately imported, ignoring any that are invalid. After that is complete, all 

indexes are rebuilt. Understanding this mechanism explains some of the properties of 

repair. It can take a long time for large data sets, because all of the data is validated and 

all indexes are rebuilt. Repairing can also leave a database with fewer documents than 

it had before the corruption originally occurred, because any corrupt documents are 

simply ignored. 

Repairing a database will also perform a compaction. Any extra free 

space (which might exist after dropping large collections or removing 

large number of documents, for example) will be reclaimed after a 

repair. 

 

 

 

Replication

 

Replica Sets vs Master Slave

A replica set is basically a master-slave cluster with automatic failover. The biggest 

difference between a master-slave cluster and a replica set is that a replica set does not 

have a single master: one is elected by the cluster and may change to another node if 

the current master goes down. However, they look very similar: a replica set always has 

a single master node (called a primary) and one or more slaves (called secondaries).

 

The nice thing about replica sets is how automatic everything is. First, the set itself does 

a lot of the administration for you, promoting slaves automatically and making sure 

you won’t run into inconsistencies. 

 

Read Scaling 

Scaling out reads with slaves is easy: just set up master-slave replication like usual, and 

make connections directly to the slave servers to handle queries. The only trick is that 

there is a special query option to tell a slave server that it is allowed to handle a query. 

(By default, queries will not be executed on a slave.) This option is called slaveOkay, 

and all MongoDB drivers provide a mechanism for setting it. Some drivers also provide 

facilities to automate the process of distributing queries to slaves—this varies on a per- 

driver basis, however.

 

Using Slaves for Data Processing 

Another interesting technique is to use slaves as a mechanism for offloading intensive 

processing or aggregation to avoid degrading performance on the master. To do this, 

start a normal slave, but with the addition of the --master command-line argument. 

Starting with both --slave and --master may seem like a bit of a paradox. What it 

means, however, is that you’ll be able to write to the slave, query on it like usual, and 

basically treat it like you would a normal MongoDB master node. In addition, the slave 

will continue to replicate data from the actual master. This way, you can perform 

blocking operations on the slave without ever affecting the performance of the master 

node. 

When using this technique, you should be sure never to write to any 

database on the slave that is being replicated from the master. The slave 

will not revert any such writes in order to properly mirror the master. 

The slave should also not have any of the databases that are being re- 

plicated when it first starts up. If it does, those databases will not ever 

be fully synced but will just update with new operations. 

 

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Mon, 03 Jan 2011 17:45:48 -0800 How to tell how many files the apache processes have opened? http://notes.alexdong.com/how-to-tell-how-many-files-the-apache-process http://notes.alexdong.com/how-to-tell-how-many-files-the-apache-process

It's obvious that using sudo lsof -p , one can tell how many files the specified process has opened. However, since we have so many apache2 processes running, how can we see how many files all of them have opened? Here is my take:

pgrep apache2 | xargs -n1 -I % sudo lsof -p % | wc -l

Please refer to the comments of how to use xargs to clean up all lucene indexes.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 29 Dec 2010 00:49:00 -0800 How to change python log level without restarting the process http://notes.alexdong.com/how-to-change-python-log-level-without-restar http://notes.alexdong.com/how-to-change-python-log-level-without-restar

Let's say we have 6 crawler servers, each with a few crawler.py processes running on them. I noticed that the crawlers on 'clifton' is really slow. It'll be really nice if I can change the log level for that process to DEBUG without affecting the others and without restarting the process. Wondering anyone know how to do it?

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Sat, 25 Dec 2010 03:25:00 -0800 Fix all lucene indexes in one directory http://notes.alexdong.com/fix-all-lucene-indexes-in-one-directory http://notes.alexdong.com/fix-all-lucene-indexes-in-one-directory
find /var/lib/lucene/index -type d 
-exec java -cp lucene-core-2.3.1.jar 
org.apache.lucene.index.CheckIndex {} -fix \;
  • -type d lists all directories in specified path /var/lib/lucene/index/
  • java -cp lucene-core-2.3.1.jar org.apache.lucene.index.CheckIndex {path} -fix will automatically check and fix index problems for specified {path}

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Tue, 14 Dec 2010 23:15:00 -0800 Rails 3 how to create a compound multi-column named index http://notes.alexdong.com/rails-3-how-to-create-a-compound-multi-column http://notes.alexdong.com/rails-3-how-to-create-a-compound-multi-column

There are some black-magic conventions on how rails 3 deals with index names in add_index and remove_index. After experiments with db:migrate and db:rollback, looks like I have to explicitly specify the index name in a format index_{table_name}_on_{key} in self.up and use only {key} in self.down.

class AddIndexToAudienceDetail < ActiveRecord::Migration
  def self.up
    add_index :audience_details, [:twitter_accounts_id, :week], 
      { :name => 'index_audience_details_on_weekly', :unique => true }
  end 

  def self.down
    remove_index :audience_details, 'weekly'
  end 
end

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Sun, 22 Aug 2010 16:50:00 -0700 Prefer TNonblockingServer to TThreadPoolServer or TThreadedServer http://notes.alexdong.com/prefer-tnonblockingserver-to-tthreadpoolserve http://notes.alexdong.com/prefer-tnonblockingserver-to-tthreadpoolserve

I noticed that there are a few options in terms of which Thrift server we can use in the c++ code, specifically TNonblockingServer, TThreadedServer, and TThreadPoolServer. It seems like TNonblockingServer is the way to go since it can support much more concurrent requests and still using a thread pool behind the scene to crunch through the tasks. It also avoids the cost of constructing/destructing the threads.

Here are a couple of links I’ve collected here:

  1. Facebook’s update on thrift 1:

    Here at Facebook, we’re working on a fully asynchronous client and server for C++. This server uses event-driven I/O like the current TNonblockingServer, but its interface to the application code is all based on asynchronous callbacks. This will allow us to write servers that can service thousands of simultaneous requests (each of which requires making calls to other Thrift or Memcache servers) with only a few threads.

  2. Related posts on stackoverflow: 2, 3

    That being said, you won’t necessarily be able to actually do work faster (handlers still execute in a thread pool), but more clients will be able to connect to you at once.

In order to use the non-blocking server, we’ll have to link against not only the libthrift.so, but also libthriftnb.so and libevent.so, otherwise, you’ll observe these error messages:

/usr/local/include/thrift/server/TNonblockingServer.h:207: undefined reference to vtable for apache::thrift::server::TNonblockingServer' /usr/local/include/thrift/server/TNonblockingServer.h:212: undefined reference toapache::thrift::server::TNonblockingServer::setThreadManager(boost::shared_ptr)‘ hyperserver.o: In function ~TNonblockingServer': /usr/local/include/thrift/server/TNonblockingServer.h:246: undefined reference tovtable for apache::thrift::server::TNonblockingServer’ collect2: ld returned 1 exit status make: *** [server] Error 1

/usr/local/lib/libthriftnb.so: undefined reference to event_base_loop' /usr/local/lib/libthriftnb.so: undefined reference toevent_del' /usr/local/lib/libthriftnb.so: undefined reference to event_get_method' /usr/local/lib/libthriftnb.so: undefined reference toevent_add' /usr/local/lib/libthriftnb.so: undefined reference to event_init' /usr/local/lib/libthriftnb.so: undefined reference toevent_get_version' /usr/local/lib/libthriftnb.so: undefined reference to event_base_set' /usr/local/lib/libthriftnb.so: undefined reference toevent_set' collect2: ld returned 1 exit status make: *** [server] Error 1

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 18 Aug 2010 01:30:00 -0700 Set operation shortcut: vector http://notes.alexdong.com/set-operation-shortcut-vector http://notes.alexdong.com/set-operation-shortcut-vector

In STL/boost and any general purpose programming library, set is implemented using a tree structure like RB-tree, binary search tree or even treap. But if we can predict that the underlying data is already ordered and unique, we can actually use vector to dramatically improve the speed of constructing and set operations. Consider following two programs, you can see that the vector version is 8 times faster than the set version.

#!cpp
$ cat sets.cc
int main() {
unordered_set l1, l2, lr;
for(int i = 0; i <= 20000; i++) {
l1.insert(i);
}
for(int i = 17800; i <= 40000; i++) {
l2.insert(i);
}

set_intersection(l1.begin(), l1.end(),
l2.begin(), l2.end(),
inserter(lr, lr.begin()));

return 0;
}

$ g++ sets.cc -o sets; time ./sets
./sets 0.02s user 0.00s system 21% cpu 0.095 total

$ cat vectors.cc
int main() {
vector l1, l2, lr;
for(int i = 0; i <= 20000; i++) {
l1.push_back(i);
}
for(int i = 17800; i <= 40000; i++) {
l2.push_back(i);
}

set_intersection(l1.begin(), l1.end(),
l2.begin(), l2.end(),
back_inserter(lr));

return 0;
}

$ g++ vectors.cc -o vectors; time ./vectors
./vectors 0.00s user 0.00s system 32% cpu 0.012 total

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Tue, 03 Aug 2010 18:40:00 -0700 Using hiredis c library in c++ http://notes.alexdong.com/using-hiredis-c-library-in-c http://notes.alexdong.com/using-hiredis-c-library-in-c

antirez wrote a c based high level redis client called hiredis. Here is a list of changes to make in order to use it in a c++ program.

First, patch the hiredis.h file to include extern "C" {} marks. Here is the diff and if you’d like to learn more on what does this all mean, check out this link.

% git diff hiredis.h
diff --git a/hiredis.h b/hiredis.h
index f28dcdb..0c1df87 100644
--- a/hiredis.h
+++ b/hiredis.h
@@ -30,6 +30,10 @@
#ifndef __HIREDIS_H #define __HIREDIS_H
+#ifdef __cplusplus
+extern "C" {
+#endif
+
#define REDIS_REPLY_ERROR 0 #define REDIS_REPLY_STRING 1 #define REDIS_REPLY_ARRAY 2
@@ -51,4 +55,9 @@ redisReply *redisConnect(int *fd, const char *ip, int port);
void freeReplyObject(redisReply *r); redisReply *redisCommand(int fd, const char *format, ...);
+#ifdef __cplusplus
+}
+#endif
+
+
#endif

Second, let’s build the files and copy them to the corresponding directory

make
sudo cp hiredis.h /usr/include/
sudo cp sds.h /usr/include/
sudo cp libhiredis.so /usr/lib/

Then, in your c++ program, you should be able to do things like:

++
#include 
#include 

using namespace std;

void main() {
    int fd;
    redisReply *reply;
    reply = redisConnect(&fd, "127.0.0.1", 6379);
    if (reply != NULL) {
        cout << "Connection error: " <<  reply->reply << endl;
        freeReplyObject(reply);
    } else {
        cout << "You're all set" << endl;
    }
}

To compile the result, use g++ -lhiredis main.cc -o main.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 28 Jul 2010 20:34:00 -0700 Mysql disk read/write like crazy. Mr. debian-sys-maint, thank you. http://notes.alexdong.com/mysql-disk-readwrite-like-crazy-mr-debian-sys http://notes.alexdong.com/mysql-disk-readwrite-like-crazy-mr-debian-sys

Today I noticed that there are lots of connections being bank up in mysql. So I ran the dstat real quick, trying to figure out what’s really going on there. Here is the output:

alexdong@server% dstat
——total-cpu-usage—— -dsk/total- -net/total- —-paging— —-system— usr sys idl wai hiq siq| read writ| recv send| in out | int csw

1   0  97   1   0   0| 398k  204k|   0     0 |  29k   20k| 281   579 
4   0  86  10   0   1|1144k 2488k|2277B 1822B| 572k    0 | 381  1031 
3   0  24  72   0   0|5304k  152k|3304B 1391B|2640k    0 | 483  1187 
2   2  16  80   0   0|5264k  152k|2719B 1948B|2660k    0 | 531  1172 
3   1   4  92   0   0|6048k    0 | 624B  760B|3096k    0 | 549  1193 
0   1   0  99   0   0|7728k    0 | 126B  354B|3820k    0 | 526   946 
1   1   0  98   0   0|6160k  784k|1774B 1177B|3124k    0 | 513   973 
0   0   0 100   0   0|6856k  288k| 132B  436B|3396k    0 | 509   894 
0   0   0  99   0   0|6584k    0 | 297B  783B|3312k    0 | 413   820 
2   0   0  99   0   0|7864k  264k|2601B 1811B|3964k    0 | 478   973 
1   0   0  98   0   0|7376k  112k|1594B  634B|3608k    0 | 428   906 
0   0   0 100   0   0|7488k   32k| 503B 1305B|3732k   16k| 480   883 
1   0   0  98   0   0|8112k  144k|  84k 3489B|4080k 8192B| 554   996 
2   1   0  97   0   0|7240k   96k| 126B  443B|3528k    0 | 421   956 
3   0   0  97   0   0|7976k    0 |1995B 1281B|4068k    0 | 439   924

It’s quite obvious that there are heavy disk IO going on. What’s strange about this case is there is very little pages being paged out. So what’s going on here, who is using the disk like crazy? After running atop, it’s very obvious that it is MySQL causing this problem.

So what is mysql doing? A quick tail -f /var/log/mysql/mysqld.log shows something like this:

48 Connect      debian-sys-maint@localhost on 
48 Query        select @@version_comment limit 1
48 Query        select count(*) into @diskscard from `mysql`.`time_zone_transition`
48 Quit
49 connections  debian-sys-maint@localhost on 
49 Query        select @@version_comment                 limit 1
49 Query        select count(*) into @discard from `mysql`.`time_zone_transitionzone_transition_type`
49 Quit
50 Connect      debian-sys-maint@logcalhost on 
50 Query        select @@version_comment limit 1
50 Query        selectyselect count(*) into @discard from `mysql`.`user`
50 Quit

No wonders the disk is having such a hard time. So who the hell is this debian-sys-maint. This stackoverflow article provides decent background info on this. A quick search also brought up this blog suggesting we turn /etc/mysql/debian-start script off by commenting out two lines in the file. Here is what I ended up with:

# The following commands should be run when the server is up but in background
# where they do not block the server start and in one shell instance so that
# they run sequentially. They are supposed not to echo anything to stdout.
# If you want to disable the check for crashed tables comment
# "check_for_crashed_tables" out.  
# (There may be no output to stdout inside the background process!)
echo "Checking for corrupt, not cleanly closed and upgrade needing tables."
(
    # upgrade_system_tables_if_necessary;
    check_root_accounts;
    # check_for_crashed_tables;
 ) >&2 &

I accidentally upgrade MySQL, which I believe is causing a upgrade and check on the server. This still seem to me a healthy check, just need to schedule it in a much proper time.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 28 Jul 2010 06:49:00 -0700 Getting started with boost.python on ubuntu 10.04 http://notes.alexdong.com/getting-started-with-boostpython-on-ubuntu-10 http://notes.alexdong.com/getting-started-with-boostpython-on-ubuntu-10

This article will walk you through the steps of using c++’s boost library to extend python. More specifically, we’re going to write a method using c++ and call that method from python. This article relies on Ubuntu 10.04 heavily on environment setup tasks. But the main steps and concepts should be useful in other OSs as well.

Install

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/bin/sh

# the basic boost library
sudo apt-get install libboost-dev

# install the boost.python library
sudo apt-get install libboost-python-dev

# install the extra libraries we need
sudo apt-get install libboost-date-time-dev

# install the boost build tool: bjam
sudo apt-get install boost-build

Python driver

1
2
3
4
5
6
#!/usr/bin/env python

import plumrain

s = "2010-10-9 8:12:00"
print plumrain.utc_date_str_to_int(s, 10)

C++ code

Here is the implementation and the interface declaration for boost.python. The method does a time zone sensitive date period calculation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#include "boost/date_time/gregorian/gregorian.hpp"
#include "boost/date_time/posix_time/posix_time.hpp"
#include <boost/python.hpp>

#include <iostream>
#include <string>

int utc_date_str_to_int(const char* s, int offset) {
    using namespace boost::gregorian;
    using namespace boost::posix_time;

    date base(2006, Jan, 1);

    /* parse input strings into datetime object */
    ptime st = time_from_string(s);

    /* offset the input time with timezone */
    ptime ct = st + hours(offset);
    date cd = ct.date();

    days diff = cd - base;
    return diff.days();
}

BOOST_PYTHON_MODULE(plumrain)
{
    using namespace boost::python;
    def("utc_date_str_to_int", utc_date_str_to_int);
}


int main() {
    // std::cout << utc_str_to_date("2010-10-9 8:12:00", 10) << std::endl;
    for(int i=0; i<10000; i++) {
     utc_date_str_to_int("2010-10-9 17:12:00", 10);
    }
    return 0;
}

What’s worth mention is the declaration in line 25 to 29. It is these code that’s telling boost.python “make these functions visible in python for me”. Also, please note that besides the boost.python library, we also included the header file from boost_date_time library.

Configure bjam

The build part was actually the most challenging part. We’ll use the bjam build tool to do this for us. Unfortunately, the error message from the bjam tool wasn’t quite helpful and sometime misleading. So please make sure you double check the spacing before and after comma and when things go wrong, read between lines.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
using python ;

# Specify that the boost-python library exists under the name
# boost_python. That is, because the library was installed at the
# standard search path as /usr/lib/libboost_python.so, bjam will find
# it automatically. No need to specify the absolute path.
# lib libboost_python : : <name>boost_python-mt ;
lib libboost_python : : <name>boost_python ;
lib libboost_date_time : : <name>boost_date_time ;

# Set up the project-wide requirements that everything uses the
# boost_python library.
project
    : requirements <library>libboost_python
    : requirements <library>libboost_date_time
;

# Declare the extension modules. You can specify multiple
# source files after the colon separated by spaces.
python-extension plumrain : plumrain.cc ;

Please note the Jamroot file added reference to the libboost_date_time library. Also, please double check the boost-build path in boost-build.jam exists.

Just build it.

Now that we’ve got everything we want, let’s build it. Here are three ways to build your code:

bjam
bjam release optimization=speed
bjam -a -oMakefils debug

The first will build the library in debug mode. In Ubuntu, it’ll generate the .so file in ./bin/gcc-4.4.3/debug/plumrain.so. The second will build the library in release mode just as its name suggested. The third will generate the build instructions in Makefile just to shed some light into “black magic” of bjam.

To test the result, copy the plumrain.so file to where the python code is, then just type python perf.py, if everything works well, you should see the result.

Common errors

Depends on your cpu type, you might end up getting this error if you bjam on a amd64 platform

gcc.compile.c++ bin/gcc-4.4.1/release/plumrain.o
gcc.link.dll bin/gcc-4.4.1/release/plumrain.so
/usr/bin/ld: cannot find -lboost_date_time
collect2: ld returned 1 exit status

"g++"    -o "bin/gcc-4.4.1/release/plumrain.so" -Wl,-h -Wl,plumrain.so -shared -Wl,--start-group "bin/gcc-4.4.1/release/plumrain.o"  -Wl,-Bstatic  -Wl,-Bdynamic -lboost_date_time -lboost_python -lutil -lpthread -ldl -Wl,--end-group -Wl,--strip-all 

...failed gcc.link.dll bin/gcc-4.4.1/release/plumrain.so...
...failed updating 1 target...
...updated 4 targets...

When this /usr/bin/ld: cannot find -lboost_date_time happens, you want to do a ls /usr/lib/*boost_date_time* to find out the exact name of the so file you’re referencing. In my case, this means modifying the reference to -mt version in Jamroot

lib libboost_python : : boost_python-mt ;
lib libboost_date_time : : boost_date_time-mt ;

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Sun, 25 Jul 2010 21:01:00 -0700 Quick introduction to hypertable's Thrift API using Python http://notes.alexdong.com/quick-introduction-to-hypertables-thrift-api http://notes.alexdong.com/quick-introduction-to-hypertables-thrift-api

The purpose of this post is to provide a quick introduction with Hypertable’s Thrift API.. We’ll start from the basics, then gradually cover more advanced performance related topics. My environment is Ubuntu 10.04 and the hypertable binary package for 0.9.3.4. Just like all great open source projects, hypertable is undergoing fast changes. So I’ll try my best to provide links not only to latest online documentations but also to the source code just in case some documentation lags behind. Before reading this post, please make sure you’ve read the getting started guide and got familiar with the HQL language.

Minimal bookmark manager: set, get via HQL and hypertable shell

Let’s use hypertable to build a minimal personal bookmark manager, which is essentially a map between descriptions like “work” to links like “tribalytic.com”.

$ ht shell  
hypertable> create table bookmarks (url);  

Elapsed time:  0.03 s  
hypertable> INSERT INTO bookmarks VALUES   
-> ("homepage", "url", "http://alexdong.com");  

Elapsed time:  0.00 s  
Avg value size:  19.00 bytes  
Total cells:  1  
Throughput:  773.40 cells/s  
Resends:  0  
hypertable> INSERT INTO bookmarks VALUES   
("blog", "url", "http://notes.alexdong.com");  

Elapsed time:  0.00 s  
Avg value size:  25.00 bytes  
Total cells:  1  
Throughput:  913.24 cells/s  
Resends:  0  
hypertable> SELECT * from bookmarks;  
blog    url    http://notes.alexdong.com  
homepage        url    http://alexdong.com  

Elapsed time:  0.00 s  
Avg value size:  22.00 bytes  
Avg key size:  9.50 bytes  
Throughput:  198738.17 bytes/s  
Total cells:  2  
Throughput:  6309.15 cells/s  
hypertable> exit

Install

Before diving into the code, make sure we’ve setup the python dev environment properly:

  1. add /opt/hypertable/current/lib/py into your PYTHONPATH. The hyperthrift is in the /opt/hypertable/current/lib/gen-py/. So in order to use it as from hyperthrift.gen.ttypes import *, you’ll need to move it to the PYTHONPATH we just specified like this: cd /opt/hypertable/current/lib/py/; sudo mv gen-py/hyperthrift ./.

  2. install the thrift from cheese shop: sudo easy_install thrift

Basic: HQL exec and query

Now, let’s add a couple of records using the thrift api.

$ cat basic.py  
#!/usr/bin/env python  

from hypertable.thriftclient import *  
from hyperthrift.gen.ttypes import *  

client = ThriftClient("localhost", 38080)  
client.hql_exec("""INSERT INTO bookmarks VALUES   
        ("work", "url", "http://tribalytic.com") """)  
print client.hql_query('SELECT * FROM bookmarks REVS 1')

Now, if you execute the basic.py, you should see something like this:

$ python basic.py  
HqlResult(mutator=None, cells=[  
Cell(value='http://notes.alexdong.com', \
     key=Key(column_family='url', column_qualifier='', \
     timestamp=1280100808495279001L, flag=255, row='blog', \
     revision=1280100808495279001L)),   
Cell(value='http://alexdong.com', \
     key=Key(column_family='url', column_qualifier='', \
     timestamp=1280100783276423001L, flag=255, row='homepage', \
     revision=1280100783276423001L)),   
Cell(value='http://tribalytic.com', \
     key=Key(column_family='url', column_qualifier='', \
     timestamp=1280101746911510002L, flag=255, row='work', \
     revision=1280101746911510002L))],   
results=None, scanner=None)

Please notice the REVS 1 in the hql_query. It specifies how many revisions we want to get back from hypertable. Without specifying it, if you execute the python basic.py a few times, you’ll notice the same data showing up with different revision numbers. The list of all options in the HQL language can be found here, the source code can be found here: src/cc/Hypertable/Lib/HqlParser.h: struct Parser: grammer.

The HqlResult.cells contains the results we’re interested in. We’ll cover scanner, mutator later. For now, the key take-away is the results we’re getting from hypertable is ‘deserialized’ into a list of Cell objects, which is essentially a Key and its value.

Introducing Mutator

For most performance sensitive applications, we want to buffer insertion together and “commit” them in a batch, similar to the unix’s fsync or SQL’s COMMIT. Mutator. Now, hypertable introduced two buffered input/output channels: mutator and scanner. Let’s take a look at mutator fist. To make it easy to understand, you can think ThriftClient as mysql’s Connection and mutator as cursor. In order to make changes to the hypertable, we need to call ThriftClient.flush_mutator to “commit” the changes to the server.

$ cat mutator.py  
#!/usr/bin/env python  

from hypertable.thriftclient import *  
from hyperthrift.gen.ttypes import *  

client = ThriftClient("localhost", 38080)  
mutator = client.open_mutator('bookmarks', 0, 0)  
client.set_cell(mutator, \  
Cell(Key('twitter', 'url', None), 'http://twitter.com/alexdong'))  
client.set_cell(mutator, \  
Cell(Key('github', 'url', None), 'http://github.com/alexdong'))  
client.flush_mutator(mutator)  

print client.hql_query('SELECT * FROM bookmarks WHERE row = "twitter" REVS 1')  

$ python mutator.py  
HqlResult(mutator=None, \
  cells=[
    Cell(
      value='http://twitter.com/alexdong', 
      key=Key(column_family='url', column_qualifier='',
              timestamp=1280107204496367002L, flag=255, 
              row='twitter', revision=1280107204496367002L)
      )], 
  results=None, scanner=None)

The Key object specifies “row key”, “column family” and “column qualifier”. Since we are not using “Column family” here, we leave the third argument as blank, or None. The Cell contains a Key and the value.

Asynchronous updates: put_cell family

The set_cell is a family of synchronous operations. Hypertable offers a set of asynchronous functions called put_cell. As explained in the api documentation, “Open a shared periodic mutator which causes cells to be written asyncronously. Users beware: calling this method merely writes cells to a local buffer and does not guarantee that the cells have been persisted. If you want guaranteed durability, use the open_mutator+set_cells* interface instead.” Following code puts two more bookmarks into the hypertable by using put_cells_as_arrays.

$ cat put_cells.py  
#!/usr/bin/env python  

import time  
from hypertable.thriftclient import *  
from hyperthrift.gen.ttypes import *  

client = ThriftClient("localhost", 38080)  
cells = [ 
    ['delicious', 'url', "", 'http://delicious.com/dongxun'],  
    ['bit.ly', 'url', "", 'http://bit.ly/u/alexdong'] ]  
client.put_cells_as_arrays("bookmarks", \  
                MutateSpec("bookmark_app", 1000, 2),cells)  

# sleep for 2 seconds to wait the async operation to finish  
time.sleep(2)  

for cell in client.hql_query('SELECT * FROM bookmarks REVS 1').cells:  
print cell.key.row, "->", cell.value  

$ python put_cells.py  
bit.ly -> http://bit.ly/u/alexdong  
blog -> http://notes.alexdong.com  
delicious -> http://delicious.com/dongxun  
github -> http://github.com/alexdong  
homepage -> http://alexdong.com  
twitter -> http://twitter.com/alexdong  
work -> http://tribalytic.com

In the set_cell example, we are using two objects to represent a cell. Given the object construction/destruction overhead, there have been some tests showing that by replacing the Cell, Key objects with array, there will be about 3 times increase in read performance. Use the CellAsArray type, each cell can be represented as ["row_key", "column_family", "column_qualifier", "value", "timestamp"] array. Here the “as_array” feels similar to MySQL’s DictCursor vs. standard Cursor.

Also please notice that instead of using None in “column_qualifier” field, we use an emtpy string "". Otherwise, you’ll receive an error message like this:

% python put_cells.py  
Traceback (most recent call last):  
File "put_cells.py", line 12, in   
   cells)  
File "/opt/hypertable/current/lib/py/hyperthrift/gen/ClientService.py", line 1189, in put_cells_as_arrays  
    self.send_put_cells_as_arrays(tablename, mutate_spec, cells)  
File "/opt/hypertable/current/lib/py/hyperthrift/gen/ClientService.py", line 1198, in send_put_cells_as_arrays  
    args.write(self._oprot)  
File "/opt/hypertable/current/lib/py/hyperthrift/gen/ClientService.py", line 4966, in write  
    oprot.writeString(iter150)  
File "/usr/local/lib/python2.6/dist-packages/Thrift-0.2.0-py2.6-linux-i686.egg/thrift/protocol/TBinaryProtocol.py", line 122, in writeString  
    self.writeI32(len(str))  
TypeError: object of type 'NoneType' has no len()

Also, if you’re using put_cells_as_arrays in your test code, make sure you call refresh_mutator to retrieve a new one for each test.

High performance reading: Scanner and next_row_as_arrays iterator

Now that we’ve covered the ground of buffered asynchronous writing, let’s take a look at high performance reading using Scanner. To those who are familiar with MySQL, the concept of using scanner is quite similar to the SSCursor. Instead of reading all the records into client side memory, there is a server-side cursor that’s “streaming” the result set to client side.

$ cat scanner.py  
#!/usr/bin/env python  

import time  
from hypertable.thriftclient import *  
from hyperthrift.gen.ttypes import *  

client = ThriftClient("localhost", 38080)  
r = client.hql_exec2("""SELECT url FROM bookmarks  
WHERE "delicious" <= ROW <= "homepage"  
REVS 1""", 0, 1)  
scanner = r.scanner  
while True:  
   cells = client.next_row_as_arrays(scanner)  
   if not len(cells): break  
   print cells  

client.close_scanner(scanner)  

$ python scanner.py  
['delicious', 'url', '', 'http://delicious.com/dongxun', '1280110063691009002']  
['github', 'url', '', 'http://github.com/alexdong', '1280107204496367001']  
['homepage', 'url', '', 'http://alexdong.com', '1280100783276423001']

hql_exec takes three parameters, the third of which is “unbuffered”. When this value is True, the method will return a Scanner object we can use to iterate through the results. In the example above, we call next_row_as_arrays to read one row each time.

After fiddling around with the code src/cc/Hypertable/Lib/HqlInterpreter.cc for a while, it seems like next_row_as_arrays is not the optimal way either. We don’t want to waste lots of setup/tear down works just to read one row. Instead, we want to read just-enough data to justify the underlying cost. Hypertable offers another api called next_cells_as_arrays. The number of cells to be read in is specified in configuration files ThriftBroker.NextThreshold field. As of writing, there is no way to specify a scanner specific value. Here is the code to replace the call to next_row_as_arrays.

cells = client.next_cells_as_arrays(scanner)  
if not len(cells): break  
print cells

Hope this helps. Please leave a comment if you have any questions.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 21 Jul 2010 04:05:00 -0700 Wordpress permalink changes using nginx url rewrite http://notes.alexdong.com/wordpress-permalink-changes-using-nginx-url-r http://notes.alexdong.com/wordpress-permalink-changes-using-nginx-url-r

We just changed the permalink of wordpress to a short form like this: http://blog.tribalytic.com/nestle-vs-greenpeace-greenpeace-won-the-battle-but-did-nestle-win-the-war/. The old permalink looks like this: http://blog.tribalytic.com/2010/06/08/nestle-vs-greenpeace-greenpeace-won-the-battle-but-did-nestle-win-the-war/. After a little bit research, I finally figured out how to use nginx to send 301 for old permalinks. Here is the final nginx configuration file:

1: if (!-e $request_filename) {
2:     rewrite "^/[0-9]{4}/[0-9]{2}/[0-9]{2}/(.+)$" /$1 permanent;
3:     rewrite ^(.+)$ /index.php?q=$1 last;
}

If you have already setup wordpress using nginx, you'll probably have line 1 and 3. Line 2 is the 'meat' here. Two things worth notice: 1) the use of "quotation mark" around the regular expression since we're using [{}] there. 2) the permanent direction will ask to send 301 so that search engine crawlers and visitors will get notified right away. Please refer to the nginx rewrite help for details.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Wed, 21 Jul 2010 02:32:00 -0700 Compression schemes in hypertable http://notes.alexdong.com/compression-in-hypertable http://notes.alexdong.com/compression-in-hypertable

In tribalytic, we need to store billions of tweets from all australia twitter users, plus an inverted list to support the free search into hypertable. One big benefit of choosing hypertable over hbase is the potential compression we can get. Today I did some research and here is a summary of hypertable's compression scheme.

Compression semantic

In this post, Doug described the semantic for using compression:

CREATE TABLE COMPRESSOR="zlib" foo ( 
  column1, 
  column2 
);

What will be compressed

Luke described what can be compressed and the evaluation criteria in this post:

The main criteria is the throughput for encode/decode typical
commit log and cellstore blocks (default compressed block size is 
64KB, about  100-200KB raw size).

Which compression schemes are available

As for the available compression schemes and their type, a quick grep points to src/cc/Hypertable/Lib/BlockCompressionCodec.h as this:

class BlockCompressionCodec : public ReferenceCount {
  public:
    enum Type { UNKNOWN=-1, NONE=0, BMZ=1, ZLIB=2, LZO=3, QUICKLZ=4,
                COMPRESSION_TYPE_LIMIT=5 };

How to evaluate?

In this post, Doug described how to check the compression ratio:

The best way to measure the compression ratio is to insert data into 
Hypertable and then inspect the CellStore files by dumping the trailer, 
which contains the compression ratio, defined as: 

  compressed_size / uncompressed_size 
Here's example output: 
$ /opt/hypertable/current/bin/ht csdump 
/hypertable/tables/query-log/default/AB2A0D28DE6B77FFDD6C72AF/cs2 
BLOCK INDEX: 
0: offset=0 size=20585 row=5586315 
1: offset=20585 size=67742 row=5589539 
[...] 
945: offset=62186001 size=63272 row=999704 
946: offset=62249273 size=39744 row=9999967 
sizeof(OffsetT) = 4 
BLOOM FILTER SIZE: 0 
TRAILER: 
[CellStoreTrailerV2] 
  fix_index_offset: 62289017 
  var_index_offset: 62292831 
  filter_offset: 62305781 
  index_entries: 0 
  total_entries: 5122831 
  filter_length: 1373625 
  filter_items_estimate: 143309 
  filter_items_actual: 5121851 
  blocksize: 65536 
  revision: 1270524885697818309 
  timestamp_min: 1270524813360446944 
  timestamp_max: 1270524885697818309 
  create_time: 1270524896305355000 
  table_id: 1 
  table_generation: 1 
  flags=0 
  compression_ratio: 0.327462 
  compression_type: 3 
  bloom_filter_mode=ROWS 
  bloom_filter_hash_count=6 
  version: 2 
As far as the BMZ compressor goes, it is specified to AccessGroup 
granularity.  It works best when the data contains lots of replicated data 
that is the length of the bmz fingerprint.  The default fingerprint length 
is 19 bytes.  My guess is that mixing disparate column families within an 
access group probably won't impact the compression ratio that much, but it 
is best to group data within a bmz-compressed access group that has lots of 
fingerprint-length replication. 
We currently don't have any prefix compression, but that will be added soon 
(e.g. within the next couple of months).

Where the compression_type is the enum mentioned above. The compression_ratio is the 'gold'. Of course, the smaller, the better.

Bonus: prefix compression

Since 0.9.3.2, prefix compression has been added to hypertable. Prefix compression happens before other block-level compressions so that we get the benefits from both. Even better, CellStore blocks are stored in the block cache prefix compressed. So not only does prefix compression save on disk space, but it also saves on block cache memory. It effectively increases the size of the block cache. This ticket has some records about its motivation and some implementation details: The current cell store block format for storing key value pairs in a block is not very efficient. BMZ block compressor can help but I think there is significant benefit for a more immutable format that efficient to store and traverse. Prefix compression should be implemented as another layer of block compressor instead of just another block compression codec, so it can be combined with other generic compressors.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Fri, 16 Jul 2010 02:53:00 -0700 zsh auto-completion exception: file exists: /dev/null http://notes.alexdong.com/zsh-auto-completion-exception-file-exists-dev http://notes.alexdong.com/zsh-auto-completion-exception-file-exists-dev

I received this when I tried to do tab-completion in zsh:

_expand:81: file exists: /dev/null
_expand:89: file exists: /dev/null
_expand:106: file exists: /dev/null
_path_files:413: file exists: /dev/null
_path_files:413: file exists: /dev/null
_path_files:413: file exists: /dev/null

After looking around for a while, I realized that maybe some ubuntu package has rendered it unaccessible. So after removing it and recreating it again, it solved the problem. Thanks for this link

.

sudo rm /dev/null
sudo mknod -m 0666 /dev/null c 1 3

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Fri, 16 Jul 2010 01:43:00 -0700 logrotate settings for mongodb http://notes.alexdong.com/logrotate-settings-for-mongodb http://notes.alexdong.com/logrotate-settings-for-mongodb
/var/log/mongodb/*.log {
       weekly
       rotate 10
       copytruncate
       delaycompress
       compress
       notifempty
       missingok
}

This will essentially rotate the log every week and zip them up automatically. Please note the copytruncate here, setting this basically avoids creating a new log file and restart mongodb server.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Thu, 08 Jul 2010 07:13:00 -0700 Write a munin plugin for redis using python in 5-minutes http://notes.alexdong.com/write-a-munin-plugin-for-redis-using-python-i http://notes.alexdong.com/write-a-munin-plugin-for-redis-using-python-i

We have a bucket-based processing pipeline where each bucket is essentially a Redis "Set". We want to see the the size of each bucket over time so that we can optimize our bucket sorting algorithm. Redis has a nice SCARD command for this purpose.

We're using munin as our monitoring system. Munin is quite famous for its extensive Plugin Exchange but unfortunately most of the scripts here are either shell script or perl script. I decided to write one using python. Here is the code:

#!/usr/bin/python

import sys, os
import redis

if sys.argv[-1] == 'config':
    print "graph_title Fetcher queue"
    print "graph_vlabel tasks"
    print "pipeline.label length"
    print "graph_info The active tasks in the fetcher pipeline" + \
        " that are waiting to be executed. "
    print "pipeline.info Size of the fetcher pipeline in redis set." + \
        " Retrieved using SCARD"
    print "graph_category pipeline"
else:
    host = os.getenv('host', '127.0.0.1')
    r = redis.Redis(host)

    # it's a share agreement that we use `pipeline.{{name}}` to index into the
    # redis. In munin plugin terms, `pipeline` is a "field" and `fetcher` is a 
    # value as described in http://munin-monitoring.org/wiki/MuninNomenclature
    pipelines = ['pipeline.fetcher']
    for name in pipelines:
        print name, ' ', r.scard(name)

Here is the corresponding /etc/munin/plugin-conf.d/munin-node file.

[process-pipeline]
env.PATH /usr/sbin:/user/bin:/sbin:/bin
env.host redis-host

A few things worth a bit explanation:

  • Error message:
    Insecure directory in $ENV{PATH} while running with -T switch at /usr/sbin/munin-run line 416.
    . The solution is to manually specify the env.PATH. http://munin-monitoring.org/ticket/863

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Mon, 05 Jul 2010 05:55:00 -0700 unix script to merge all css files into a single one http://notes.alexdong.com/unix-script-to-merge-all-css-files-into-a-sin http://notes.alexdong.com/unix-script-to-merge-all-css-files-into-a-sin
for file in *.css
do
 # do something on "$file"
 cat "$file" >> /var/www/cdn.example.com/cache/large.css
done

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong
Mon, 05 Jul 2010 05:39:00 -0700 One-liner to remove all .svn hidden folders in current directory: magic find http://notes.alexdong.com/one-liner-to-remove-all-svn-hidden-folders-in http://notes.alexdong.com/one-liner-to-remove-all-svn-hidden-folders-in
find . -name .svn -exec rm -r -f -d {} \;

The tricky part is the "-d" parameter telling "rm" that it's removing a hidden folder.

Permalink | Leave a comment  »

]]>
http://files.posterous.com/user_profile_pics/486592/avatar.jpg http://posterous.com/users/3sioqoByX9hT Alex Dong alexdong Alex Dong