How to configure django/mod_wsgi to avoid a frozen apache because of Python GIL?

Do you know that a long running request could block the whole Apache process that's hosting it?

Yes, we have run into this problem today. It took us quite a while to figure it out. So here is the story. 

Trunk.ly allows our user to add a new link directly via the web interface. Behind the scene,  the python process hosted by mod_wsgi "within" apache needs to resolve the dns, expand any url shortener wrapped around it, download the actual html and submit it to our backend search server. We noticed that if someone added a url that takes a long time for our crawler to download, the whole apache will be frozen there. Further user requests will be blocked and eventually the front-end nginx will start throwing out 500 Timeout errors. (We use nginx to serve static files and pass dynamic requests to apache as explained here. Notice that this tutorial has the same flaw we'll see soon.)

So one request can bring the whole server down. Bad.  

Our settings are: 

  • MPM Worker for Apache: Basically apache has a process group which consists of a few processes and each process has a number of threads. All these threads will take a request and process them.  (Internals of Apache's scheduling/worker system and some nice diagrams can be found here. )
  • mod_wsgi works in daemon mode: This means python processes run in their own process and mod_wsgi acts as a bridge between apache and the python process. 
  • mod_wsgi has 1 process and 25 threads as specified in mod_wsgi's IntegrationWithDjango,  WSGIDaemonProcess site-1 user=user-1 group=user-1 threads=25

The above settings basically says create a single process with 25 threads in it and let each thread deal with the incoming http requests. At the beginning I didn't think the whole apache process was frozen because of Python GIL.  Theoretically Python GIL should yield whenever there is an IO request but somehow this didn't happen in the context specified above. So a natual solution is to assign each request to one process instead of a thread.  Here is the new mod_wsgi settings  

WSGIDaemonProcess site-1 user=user-1 group=user-1 processes=9 threads=1

The above setting tells mod_wsgi to spawn 9 processes and each has 1 thread. Basically the same thing python's multiprocessing does to replace threading. After restarting the apache process, the frozen process problem disappeared. 

Notes

  1. One unknown I still haven't figured out yet is why the GIL doesn't yield. If you have any hints, please leave a comment.  
  2. Our backend crawler and processing pipeline have been running in gevent for many months, so gunicorn looks like a very natural next step for us.  I'd love to hear about your experience if you have moved from mod_wsgi to gunicorn. 
  3. Here are a few more links trunk.ly users have shared about GIL - Python Global Interpreter Lock

 

How to save 200% RAM by selecting the right key data type for #MongoDB

At trunk.ly, we need to store and process more than 100,000 links every day. Each url is represented as the md5 value of the url. Recently, we start to notice an increasing page fault in mongostat and performance starts to degrade.  After investigation, we have realized that we can no longer keep all index in RAM anymore. 

In this post, I am going to show given 10 million md5 records, how 100% to 200% memory can be saved by adopting a slightly different data type as the index.

Here are the python code to create 1,000,000 records with 4 different index type: ObjectId, int, md5 string and base64 binary string:

#!/usr/bin/env python

import pymongo
import bson
from pymongo import Connection

db = connection.test_database

print('ObjectID')
for i in range(1, 1000000):
    db.objectids.insert({'i': i})

print('int')
for i in range(1, 1000000):
    db.ints.insert({'_id': i, 'i': i})

print('Base64 BSON')
for i in range(1, 1000000):
    db.base64s.insert({'_id': \
        bson.Binary(hashlib.md5(str(i)).digest(), 
        bson.binary.MD5_SUBTYPE), 'i': i})

print('string')
for i in range(1, 1000000):
    db.strings.insert({'_id': hashlib.md5(str(i)).digest(), 'i': i})

Here are the mongo index status we get for each index type:

> db.base64s.stats()
{
        "totalIndexSize" : 67076096,
}
> db.objectids.stats()
{
        "totalIndexSize" : 41598976,
}
> db.ints.stats()
{
        "totalIndexSize" : 32522240,
}
> db.strings.stats()
{
        "totalIndexSize" : 90914816,

}

What I didn't know about MongoDB

Here are the notes I took while reading through the book http://oreilly.com/catalog/0636920001096

 

Data type

 

Ordered key/value pairs

Key/value pairs in documents are ordered—the earlier document is distinct from 

the following document: 

{"greeting" : "Hello, world!", "foo" : 3} 

{"foo" : 3, "greeting" : "Hello, world!"} 

 

Use subcollections is recommended

The MongoDB web console organizes the data in its DBTOP section by 

subcollection (see Chapter 8 for more information on administration). 

• Most drivers provide some syntactic sugar for accessing a subcollection of a given 

collection. For example, in the database shell, db.blog will give you the blog col- 

lection, and db.blog.posts will give you the blog.posts collection. 

Subcollections are a great way to organize data in MongoDB, and their use is highly 

recommended. 

 

What a built-in function is doing

A good way of figuring out what a function is doing is to type it without the parentheses. 

This will print the JavaScript source code for the function. For example, if we are curious 

about how the update function works or cannot remember the order of parameters, we 

can do the following: 

> db.foo.update 

function (query, obj, upsert, multi) { 

    assert(query, "need a query"); 

    assert(obj, "need an object"); 

    this._validateObject(obj); 

    this._mongo.update(this._fullName, query, obj, 

 

Avoid overwrite entire documents and 8-byte integer representation

JavaScript has one “number” type. Because MongoDB has three number types (4-byte 

integer, 8-byte integer, and 8-byte float), the shell has to hack around JavaScript’s lim- 

itations a bit. By default, any number in the shell is treated as a double by MongoDB. 

This means that if you retrieve a 4-byte integer from the database, manipulate its docu- 

ment, and save it back to the database even without changing the integer, the integer 

will be resaved as a floating-point number. Thus, it is generally a good idea not to 

overwrite entire documents from the shell

 

if you save an 8-byte integer and look at it in the shell, the shell will display it as an 

embedded document indicating that it might not be exact. For example, if we save a 

document with a "myInteger" key whose value is the 64-bit integer, 3, and then look at 

it in the shell, it will look like this: 

> doc = db.nums.findOne() 

    "_id" : ObjectId("4c0beecfd096a2580fe6fa08"), 

    "myInteger" : { 

        "floatApprox" : 3 

    } 

If you insert an 8-byte integer that cannot be accurately displayed as a double, the shell 

will add two keys, "top" and "bottom", containing the 32-bit integers representing the 

4 high-order bytes and 4 low-order bytes of the integer, respectively. For instance, if 

we insert 9223372036854775807, the shell will show us the following: 

> db.nums.findOne() 

    "_id" : ObjectId("4c0beecfd096a2580fe6fa09"), 

    "myInteger" : { 

        "floatApprox" : 9223372036854776000, 

        "top" : 2147483647, 

        "bottom" : 4294967295 

    } 

 

ObjectID explained

If you create multiple new ObjectIds in rapid succession, you can see that only the last 

few digits change each time. In addition, a couple of digits in the middle of the 

ObjectId will change (if you space the creations out by a couple of seconds). This is 

because of the manner in which ObjectIds are created. The 12 bytes of an ObjectId are 

generated as follows: 

0 1 2 3 4 5 6 7 8 9 10 11 

Timestamp Machine PID Increment 

The first four bytes of an ObjectId are a timestamp in seconds since the epoch. This 

provides a couple of useful properties: 

• The timestamp, when combined with the next five bytes (which will be described 

in a moment), provides uniqueness at the granularity of a second. 

• Because the timestamp comes first, it means that ObjectIds will sort in roughly 

insertion order. This is not a strong guarantee but does have some nice properties, 

such as making ObjectIds efficient to index. 

 

MongoDB’s philosophy on pushing tasks to client driver whenever possible

Although ObjectIds are designed to be lightweight and easy to generate, there is 

still some overhead involved in their generation. The decision to generate them on 

the client side reflects an overall philosophy of MongoDB: work should be pushed 

out of the server and to the drivers whenever possible. This philosophy reflects the 

fact that, even with scalable databases like MongoDB, it is easier to scale out at the 

application layer than at the database layer. Moving work to the client side reduces 

the burden requiring the database to scale. 

• By generating ObjectIds on the client side, drivers are capable of providing richer 

APIs than would be otherwise possible. For example, a driver might have its 

insert method either return the generated ObjectId or inject it directly into the 

document that was inserted. If the driver allowed the server to generate 

ObjectIds, then a separate query would be required to determine the value of 

"_id" for an inserted document. 

 

 

CRUD

 

Batch Insert 

If you have a situation where you are inserting multiple documents into a collection, 

you can make the insert faster by using batch inserts. Batch inserts allow you to pass 

an array of documents to the database. 

Sending dozens, hundreds, or even thousands of documents at a time can make inserts 

significantly faster. A batch insert is a single TCP request, meaning that you do not 

incur the overhead of doing hundreds of individual requests. It can also cut insert time 

by eliminating a lot of the header processing that gets done for each message. When 

an individual document is sent to the database, it is prefixed by a header that tells the 

database to do an insert operation on a certain collection. By using batch insert, the 

database doesn’t need to reprocess this information for each document. 

 

Question: if 1 out of 1000 insertion failed due to conflicts, will the rest 999 still succeed? 

 

Fire-and-forget update by default

Updates are atomic: if two updates happen at the same time, whichever one reaches 

the server first will be applied, and then the next one will be applied. Thus, conflicting 

updates can safely be sent in rapid-fire succession without any documents being cor- 

rupted: the last update will “win.” 

 

The three operations that this chapter focused on (inserts, removes, and updates) seem 

instantaneous because none of them waits for a database response. They are not asyn- 

chronous; they can be thought of as “fire-and-forget” functions: the client sends the 

documents to the server and immediately continues. The client never receives an “OK, 

got that” or a “not OK, could you send that again?” response. 

The benefit to this is that the speed at which you can perform these operations is terrific. 

You are often only limited by the speed at which your client can send them and the 

speed of your network. 

 

Aware of $push becoming bottleneck

Using "$push" and other array modifiers is encouraged and often necessary, but it is 

good to keep in mind the trade-offs of such updates. If "$push" becomes a bottleneck, 

it may be worth pulling an embedded array out into a separate collection. 

 

The save Shell Helper 

save is a shell function that lets you insert a document if it doesn’t exist and update it 

if it does. It takes one argument: a document. If the document contains an "_id" key, 

save will do an upsert. Otherwise, it will do an insert. This is just a convenience function 

so that programmers can quickly modify documents in the shell: 

> var x = db.foo.findOne() 

> x.num = 42 

42 

> db.foo.save(x) 

Without save, the last line would have been a more cumbersome 

db.foo.update({"_id" : x._id}, x). 

 

Multiupdate 

Multiupdates are a great way of performing schema migrations or rolling out new fea- 

tures to certain users. Suppose, for example, we want to give a gift to every user who 

has a birthday on a certain day. We can use multiupdate to add a "gift" to their account: 

> db.users.update({birthday : "10/13/1978"}, 

... {$set : {gift : "Happy Birthday!"}}, false, true) 

This would add the "gift" key to all user documents with birthdays on October 13, 

1978. 

 

To see the number of documents updated by a multiple update, you can run the 

getLastError database command (which might be better named "getLastOpStatus"). 

The "n" key will contain the number of documents affected by the update: 

> db.count.update({x : 1}, {$inc : {x : 1}}, false, true) 

> db.runCommand({getLastError : 1}) 

    "err" : null, 

    "updatedExisting" : true, 

    "n" : 5, 

    "ok" : true 

 

 

Query snapshot and connection pools

 

For each connection to a MongoDB server, the database creates a queue for that con- 

nection’s requests. When the client sends a request, it will be placed at the end of its 

connection’s queue. Any subsequent requests on the connection will occur after the 

enqueued operation is processed. Thus, a single connection has a consistent view of 

the database and can always read its own writes. 

Note that this is a per-connection queue: if we open two shells, we will have two con- 

nections to the database. If we perform an insert in one shell, a subsequent query in 

the other shell might not return the inserted document. However, within a single shell, 

if we query for the document after inserting, the document will be returned. This be- 

havior can be difficult to duplicate by hand, but on a busy server, interleaved inserts/ 

queries are very likely to occur. Often developers run into this when they insert data in 

one thread and then check that it was successfully inserted in another. For a second or 

two, it looks like the data was not inserted, and then it suddenly appears. 

This behavior is especially worth keeping in mind when using the Ruby, Python, and 

Java drivers, because all three drivers use connection pooling. For efficiency, these 

drivers open multiple connections (a pool) to the server and distribute requests across 

them. 

 

 

Querying

 

Limit the fields returned

Sometimes, you do not need all of the key/value pairs in a document returned. If this 

is the case, you can pass a second argument to find (or findOne) specifying the keys you 

want. This reduces both the amount of data sent over the wire and the time and memory 

used to decode documents on the client side. 

For example, if you have a user collection and you are interested only in the "user 

name" and "email" keys, you could return just those keys with the following query: 

> db.users.find({}, {"username" : 1, "email" : 1}) 

    "_id" : ObjectId("4ba0f0dfd22aa494fd523620"), 

    "username" : "joe", 

    "email" : "joe@example.com

As you can see from the previous output, the "_id" key is always returned, even if it 

isn’t specifically listed. 

 

Cursor chains and load loading

When you call find, the shell does not query the database immediately. It waits until 

you actually start requesting results to send the query, which allows you to chain ad- 

ditional options onto a query before it is performed. Almost every method on a cursor 

object returns the cursor itself so that you can chain them in any order. For instance, 

all of the following are equivalent: 

> var cursor = db.foo.find().sort({"x" : 1}).limit(1).skip(10); 

> var cursor = db.foo.find().limit(1).sort({"x" : 1}).skip(10); 

> var cursor = db.foo.find().skip(10).limit(1).sort({"x" : 1}); 

At this point, the query has not been executed yet. All of these functions merely build 

the query. Now, suppose we call the following: 

> cursor.hasNext() 

 

Index

 

Avoiding Large Skips 

Using skip for a small number of documents is fine. For a large number of results, 

skip can be slow (this is true in nearly every database, not just MongoDB) and should 

be avoided. Usually you can build criteria into the documents themselves to avoid 

having to do large skips, or you can calculate the next query based on the result from 

the previous one. 

Paginating results without skip 

The easiest way to do pagination is to return the first page of results using limit and 

then return each subsequent page as an offset from the beginning. 

> // do not use: slow for large skips 

> var page1 = db.foo.find(criteria).limit(100) 

> var page2 = db.foo.find(criteria).skip(100).limit(100) 

> var page3 = db.foo.find(criteria).skip(200).limit(100) 

... 

However, depending on your query, you can usually find a way to paginate without 

skips. For example, suppose we want to display documents in descending order based 

on "date". We can get the first page of results with the following: 

> var page1 = db.foo.find().sort({"date" : -1}).limit(100) 

Then, we can use the "date" value of the last document as the criteria for fetching the 

next page: 

 

Key metrics for query performance

"nscanned" : 64 

This is the number of documents that the database looked through. You want to 

make sure this is as close to the number returned as possible. 

"n" : 64 

This is the number of documents returned. We’re doing pretty well here, because 

the number of documents scanned exactly matches the number returned. Of 

course, given that we’re returning the entire collection, it would be difficult to do 

otherwise. 

"millis" : 0 

The number of milliseconds it took the database to execute the query. 0 is a good 

time to shoot for. 

 

MongoDB query optimizer and parallel query plan execution model

MongoDB has a query optimizer and is very clever about 

choosing which index to use. When you first do a query, the query optimizer tries out 

a number of query plans concurrently. The first one to finish will be used, and the rest 

of the query executions are terminated. That query plan will be remembered for future 

queries on the same keys. The query optimizer periodically retries other plans, in case 

you’ve added new data and the previously chosen plan is no longer best. The only part 

you should need to worry about is giving the query optimizer useful indexes to choose 

from. 

 

 

Aggregation

 

Counting the total number of documents in a collection is fast regardless of collection 

size. 

 

 

Advanced Topics

 

Capped collection use cases and benefits

First, inserts into a capped collection are extremely 

fast. When doing an insert, there is never a need to allocate additional space, and the 

server never needs to search through a free list to find the right place to put a document. 

The inserted document can always be placed directly at the “tail” of the collection, 

overwriting old documents if needed. By default, there are also no indexes to update 

on an insert, so an insert is essentially a single memcpy. 

Another interesting property of capped collections is that queries retrieving documents 

in insertion order are very fast. Because documents are always stored in insertion order, 

queries for documents in that order just walk over the collection, returning documents 

in the exact order that they appear on disk. By default, any find performed on a capped 

collection will always return results in insertion order. 

 

When to use DBRefs

In short, the best times to use DBRefs are when you’re storing heterogeneous references 

to documents in different collections, like in the previous example or when you want 

to take advantage of some additional DBRef-specific functionality in a driver or tool. 

Otherwise, it’s generally best to just store an "_id" and use that as a reference, because 

that representation tends to be more compact and easier to work with. 

 

 

Administration

 

Backing up from Slave is recommended

Backing up from a slave is the recommended way to handle data backups with MongoDB. 

 

What happens when “repairing a database”

The underlying process of repairing a database is actually pretty easy to understand: all of the documents in the database are exported and 

then immediately imported, ignoring any that are invalid. After that is complete, all 

indexes are rebuilt. Understanding this mechanism explains some of the properties of 

repair. It can take a long time for large data sets, because all of the data is validated and 

all indexes are rebuilt. Repairing can also leave a database with fewer documents than 

it had before the corruption originally occurred, because any corrupt documents are 

simply ignored. 

Repairing a database will also perform a compaction. Any extra free 

space (which might exist after dropping large collections or removing 

large number of documents, for example) will be reclaimed after a 

repair. 

 

 

 

Replication

 

Replica Sets vs Master Slave

A replica set is basically a master-slave cluster with automatic failover. The biggest 

difference between a master-slave cluster and a replica set is that a replica set does not 

have a single master: one is elected by the cluster and may change to another node if 

the current master goes down. However, they look very similar: a replica set always has 

a single master node (called a primary) and one or more slaves (called secondaries).

 

The nice thing about replica sets is how automatic everything is. First, the set itself does 

a lot of the administration for you, promoting slaves automatically and making sure 

you won’t run into inconsistencies. 

 

Read Scaling 

Scaling out reads with slaves is easy: just set up master-slave replication like usual, and 

make connections directly to the slave servers to handle queries. The only trick is that 

there is a special query option to tell a slave server that it is allowed to handle a query. 

(By default, queries will not be executed on a slave.) This option is called slaveOkay, 

and all MongoDB drivers provide a mechanism for setting it. Some drivers also provide 

facilities to automate the process of distributing queries to slaves—this varies on a per- 

driver basis, however.

 

Using Slaves for Data Processing 

Another interesting technique is to use slaves as a mechanism for offloading intensive 

processing or aggregation to avoid degrading performance on the master. To do this, 

start a normal slave, but with the addition of the --master command-line argument. 

Starting with both --slave and --master may seem like a bit of a paradox. What it 

means, however, is that you’ll be able to write to the slave, query on it like usual, and 

basically treat it like you would a normal MongoDB master node. In addition, the slave 

will continue to replicate data from the actual master. This way, you can perform 

blocking operations on the slave without ever affecting the performance of the master 

node. 

When using this technique, you should be sure never to write to any 

database on the slave that is being replicated from the master. The slave 

will not revert any such writes in order to properly mirror the master. 

The slave should also not have any of the databases that are being re- 

plicated when it first starts up. If it does, those databases will not ever 

be fully synced but will just update with new operations. 

 

How to tell how many files the apache processes have opened?

It's obvious that using sudo lsof -p , one can tell how many files the specified process has opened. However, since we have so many apache2 processes running, how can we see how many files all of them have opened? Here is my take:

pgrep apache2 | xargs -n1 -I % sudo lsof -p % | wc -l

Please refer to the comments of how to use xargs to clean up all lucene indexes.

How to change python log level without restarting the process

Let's say we have 6 crawler servers, each with a few crawler.py processes running on them. I noticed that the crawlers on 'clifton' is really slow. It'll be really nice if I can change the log level for that process to DEBUG without affecting the others and without restarting the process. Wondering anyone know how to do it?

Fix all lucene indexes in one directory

find /var/lib/lucene/index -type d 
-exec java -cp lucene-core-2.3.1.jar 
org.apache.lucene.index.CheckIndex {} -fix \;
  • -type d lists all directories in specified path /var/lib/lucene/index/
  • java -cp lucene-core-2.3.1.jar org.apache.lucene.index.CheckIndex {path} -fix will automatically check and fix index problems for specified {path}

Rails 3 how to create a compound multi-column named index

There are some black-magic conventions on how rails 3 deals with index names in add_index and remove_index. After experiments with db:migrate and db:rollback, looks like I have to explicitly specify the index name in a format index_{table_name}_on_{key} in self.up and use only {key} in self.down.

class AddIndexToAudienceDetail < ActiveRecord::Migration
  def self.up
    add_index :audience_details, [:twitter_accounts_id, :week], 
      { :name => 'index_audience_details_on_weekly', :unique => true }
  end 

  def self.down
    remove_index :audience_details, 'weekly'
  end 
end

Prefer TNonblockingServer to TThreadPoolServer or TThreadedServer

I noticed that there are a few options in terms of which Thrift server we can use in the c++ code, specifically TNonblockingServer, TThreadedServer, and TThreadPoolServer. It seems like TNonblockingServer is the way to go since it can support much more concurrent requests and still using a thread pool behind the scene to crunch through the tasks. It also avoids the cost of constructing/destructing the threads.

Here are a couple of links I’ve collected here:

  1. Facebook’s update on thrift 1:

    Here at Facebook, we’re working on a fully asynchronous client and server for C++. This server uses event-driven I/O like the current TNonblockingServer, but its interface to the application code is all based on asynchronous callbacks. This will allow us to write servers that can service thousands of simultaneous requests (each of which requires making calls to other Thrift or Memcache servers) with only a few threads.

  2. Related posts on stackoverflow: 2, 3

    That being said, you won’t necessarily be able to actually do work faster (handlers still execute in a thread pool), but more clients will be able to connect to you at once.

In order to use the non-blocking server, we’ll have to link against not only the libthrift.so, but also libthriftnb.so and libevent.so, otherwise, you’ll observe these error messages:

/usr/local/include/thrift/server/TNonblockingServer.h:207: undefined reference to vtable for apache::thrift::server::TNonblockingServer' /usr/local/include/thrift/server/TNonblockingServer.h:212: undefined reference toapache::thrift::server::TNonblockingServer::setThreadManager(boost::shared_ptr)‘ hyperserver.o: In function ~TNonblockingServer': /usr/local/include/thrift/server/TNonblockingServer.h:246: undefined reference tovtable for apache::thrift::server::TNonblockingServer’ collect2: ld returned 1 exit status make: *** [server] Error 1

/usr/local/lib/libthriftnb.so: undefined reference to event_base_loop' /usr/local/lib/libthriftnb.so: undefined reference toevent_del' /usr/local/lib/libthriftnb.so: undefined reference to event_get_method' /usr/local/lib/libthriftnb.so: undefined reference toevent_add' /usr/local/lib/libthriftnb.so: undefined reference to event_init' /usr/local/lib/libthriftnb.so: undefined reference toevent_get_version' /usr/local/lib/libthriftnb.so: undefined reference to event_base_set' /usr/local/lib/libthriftnb.so: undefined reference toevent_set' collect2: ld returned 1 exit status make: *** [server] Error 1

Set operation shortcut: vector

In STL/boost and any general purpose programming library, set is implemented using a tree structure like RB-tree, binary search tree or even treap. But if we can predict that the underlying data is already ordered and unique, we can actually use vector to dramatically improve the speed of constructing and set operations. Consider following two programs, you can see that the vector version is 8 times faster than the set version.

#!cpp
$ cat sets.cc
int main() {
unordered_set l1, l2, lr;
for(int i = 0; i <= 20000; i++) {
l1.insert(i);
}
for(int i = 17800; i <= 40000; i++) {
l2.insert(i);
}

set_intersection(l1.begin(), l1.end(),
l2.begin(), l2.end(),
inserter(lr, lr.begin()));

return 0;
}

$ g++ sets.cc -o sets; time ./sets
./sets 0.02s user 0.00s system 21% cpu 0.095 total

$ cat vectors.cc
int main() {
vector l1, l2, lr;
for(int i = 0; i <= 20000; i++) {
l1.push_back(i);
}
for(int i = 17800; i <= 40000; i++) {
l2.push_back(i);
}

set_intersection(l1.begin(), l1.end(),
l2.begin(), l2.end(),
back_inserter(lr));

return 0;
}

$ g++ vectors.cc -o vectors; time ./vectors
./vectors 0.00s user 0.00s system 32% cpu 0.012 total

Using hiredis c library in c++

antirez wrote a c based high level redis client called hiredis. Here is a list of changes to make in order to use it in a c++ program.

First, patch the hiredis.h file to include extern "C" {} marks. Here is the diff and if you’d like to learn more on what does this all mean, check out this link.

% git diff hiredis.h
diff --git a/hiredis.h b/hiredis.h
index f28dcdb..0c1df87 100644
--- a/hiredis.h
+++ b/hiredis.h
@@ -30,6 +30,10 @@
 #ifndef __HIREDIS_H
 #define __HIREDIS_H

+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define REDIS_REPLY_ERROR 0
 #define REDIS_REPLY_STRING 1
 #define REDIS_REPLY_ARRAY 2
@@ -51,4 +55,9 @@ redisReply *redisConnect(int *fd, const char *ip, int port);
 void freeReplyObject(redisReply *r);
 redisReply *redisCommand(int fd, const char *format, ...);

+#ifdef __cplusplus
+}
+#endif
+
+
 #endif

Second, let’s build the files and copy them to the corresponding directory

make
sudo cp hiredis.h /usr/include/
sudo cp sds.h /usr/include/
sudo cp libhiredis.so /usr/lib/

Then, in your c++ program, you should be able to do things like:

++
#include 
#include 

using namespace std;

void main() {
    int fd;
    redisReply *reply;
    reply = redisConnect(&fd, "127.0.0.1", 6379);
    if (reply != NULL) {
        cout << "Connection error: " <<  reply->reply << endl;
        freeReplyObject(reply);
    } else {
        cout << "You're all set" << endl;
    }
}

To compile the result, use g++ -lhiredis main.cc -o main.

About

A Programming Artist believes in Minimalism. CTO of http://trunk.ly/. Proud owner of vim, zsh, and wikiReader. A man without a mobile phone.

http://alexdong.com/
http://twitter.com/alexdong/
http://trunk.ly/alexdong/

TwitterFacebook