Apr 21, 2008

Limits of Django or limits of db back-end?

Fiam has posted a slightly controversial piece titled The Limits of Django. It is a great read, I totally enjoyed it. However, he has come up with some very shocking stats of Django ORM approach vs. C caching daemon (93 seconds vs. 0.8 second to query a database of 500,000 votes of 1,000 users on 5,000 stories. A few have suggested that his database/data model maybe to blame, I think there's a great degree of truth to that. To try it out, I have created a sample database called django_test and populated it with a bunch of data like so:

-- users
create table users (id integer not null primary key);
-- stories
create table stories (id integer not null primary key);
-- votes
CREATE TABLE votes (
  story_id integer not null
     constraint story_id_fk references stories,
  user_id integer not null
     constraint user_id_fk references users,
  score integer default 0);
create index user_story_idx on votes (story_id, user_id);

Populating with some random data:

#!/bin/bash
PSQL="/opt/local/bin/psql8 -q"
for ((i=1;i<=50000;i+=1))
do
    $PSQL -c "insert into stories (id) values ($i);" django_test
done
for ((i=9922;i<=10000;i+=1))
do
    $PSQL -c "insert into stories (id) values ($i);" django_test
done
for ((i=1;i<=500000;i+=1))
do
    score=$RANDOM
    userid=$RANDOM
    storyid=$RANDOM
    let "score %= 100"
    let "userid %= 1000"
    let "storyid %= 10000"
    $PSQL -c "insert into votes (user_id,story_id,score) values ($userid,$storyid,$score)" django_test
done

(Yes, this is very-very crude, I know)

Here are my results on PostgreSQL 8.1.3, running on a G4 PowerBook with 2GB of RAM under Mac OS X 10.5.2:

django_test=# select count(*) from users; select count (*) from stories; select count(*) from votes;
count
-------
 1000
(1 row)

count
-------
10000
(1 row)

count
-------
 34625
(1 row)

I gave up waiting for population of 500K rows into votes table and run a few selects like the following one as it were (inserts were still running on background and I have not done VACUUM ANALYSE yet):

django_test=# \timing
Timing is on.
django_test=# select distinct(story_id), score from votes where votes.user_id=123 order by votes.score desc;
 story_id | score 
----------+-------
     3391 |    95
      812 |    94
     9215 |    93
     9599 |    92
     8510 |    85
      220 |    84
     8611 |    81
      740 |    78
     9193 |    74
     2406 |    70
     2201 |    61
     2740 |    59
     4500 |    52
     5993 |    52
     5692 |    51
     2912 |    49
     6566 |    49
     2950 |    48
     3149 |    47
       36 |    45
     1193 |    43
      629 |    38
     5720 |    36
     9429 |    35
      455 |    33
     6628 |    32
     2598 |    30
     2623 |    28
     2224 |    24
     6794 |    24
     2254 |    20
      934 |     8
     6267 |     8
     3999 |     7
     5879 |     3
(35 rows)

Time: 27.376 ms

That's less than 0.03 seconds. Not bad, eh?

No comments:

Post a Comment