Optimizing ORDER BY changed DESC LIMIT 5

the five latest nodes satisfying some condition. MySQL (and I believe any RDBMS) will pick the whole result set, sort it, and then limit it. On NowPublic homepage this meant tens, sometimes hundreds of thousands nodes to sort. We have a very nice box to serve database queries, but not this nice. So we implemented a cutoff -- surely anything older than some time is not front page worthy. Now it just sorts the nodes that gets submitted in a short timespan: SELECT ... WHERE changed > $cutoff_time

Next problem was cutting off nodes belonging to certain terms. This is tricky. Very tricky. Let's say we wanted precalculate the timestamp when the tenth oldest node belonging to a certain tid was changed. You would think executing INSERT INTO term_cutoff SELECT tid, MIN(changed) FROM node INNER JOIN term_node t USING(nid) WHERE changed IN (SELECT n1.changed FROM node n1 INNER JOIN term_node tn1 USING(nid) WHERE tn1.tid = t.tid ORDER BY changed DESC LIMIT 10) from a cronscript would do the trick. After all, the subquery SELECTs the ten newest and the outer SELECTs the oldest of these. MySQL will tell you that ERROR 1235 (42000): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery. End of story. This can't be circumvented. So, should I go to my boss and tell him, sorry, the monster box just does not cut the mustard? There must be a way.

And there is. Let's retrieve to PHP every distinct tid from term_node, grab the ten latest and store the results into a table. Nice lil' cronscript would this be and surely would finish quick: there are ~200K distinct tids in term_node and on average there are more than 10 nodes belonging to each, so we are talking of loading two million records into PHP (Note: the problem is that some terms have tens of thousands of nodes and sorting these take a long time). And this won't get any better as the site becomes more and more successful. Still, we do not need to run it too often as it takes quite a long while before the cutoff time becomes so obsolete that the performance benefit from cuting off the query becomes neglible. So, this is much better than getting myself fired, but still not something I like.

Time to call in heavy cavalry:

DROP PROCEDURE IF EXISTS calculate_term_cutoff;
DELIMITER //
CREATE PROCEDURE calculate_term_cutoff()
BEGIN
  DECLARE done INT DEFAULT 0;
  DECLARE t INT;
  DECLARE cur1 CURSOR FOR SELECT DISTINCT(tid) FROM term_node;
  DECLARE CONTINUE HANDLER FOR SQLSTATE '02000' SET done = 1;

  OPEN cur1;
  TRUNCATE term_cutoff1;

  REPEAT
    FETCH cur1 INTO t;
    IF NOT done THEN
      INSERT INTO term_cutoff1 (tid, timestamp) SELECT t1.tid, n1.changed FROM node n1 INNER JOIN term_node t1 USING(nid) where t1.tid = t ORDER BY changed DESC LIMIT 10;
    END IF;
  UNTIL done END REPEAT;
  CLOSE cur1;
  CREATE TABLE term_cutoff2 AS SELECT tid, MIN(timestamp) FROM term_cutoff1 GROUP BY tid;
  SET @old_count = (SELECT COUNT(*) FROM term_cutoff);
  SET @new_count = (SELECT COUNT(*) FROM term_cutoff2);
  IF @new_count > @old_count THEN
    RENAME TABLE term_cutoff TO term_cutoff_old, term_cutoff2 TO term_cutoff;
    ALTER TABLE term_cutoff ADD PRIMARY KEY(tid);
    DROP TABLE term_cutoff_old;
  END IF;

END//
DELIMITER ;

I wanted to iterate over it in MySQL. Iterate? They call that cursors in SQL. The cursor skeleton is copied from the MySQL manual. We take special care that if something goes kerplunk then we do not put in a bogus term_cutoff table in place -- we could lose some serious performance benefit. The only other interesting is the double RENAME TABLE -- it's atomic, so the swap happens instantly.

To use the table, I have

$cutoff = db_result(db_query('SELECT cutoff FROM term_cutoff WHERE tid = %d', $tid));
$result = db_query_range("SELECT .... WHERE tid = %d AND n.changed > %d ORDER BY changed DESC", $tid, $cutoff, 0, 5);

This little trick made the home page load on our dev server in about 900-1000ms instead 1300-1400ms...

Commenting on this Story is closed.

DESC LIMIT 5 problem

Submitted by Anonymous on Tue, 2009-07-28 04:19.

Hello,
I have a problem with the next query 642-655 dumps, it supposed to give 5 row's back but i gives just 2 row's back.The temptable with id=5 (NS0-501 dumps) should give 15 row's back if there were no conditions. could somebody tell me what to do about this.

$query6 =" SELECT * FROM temptableWHERE id=5 ORDER BY timestamp DESC LIMIT
0,5";

Thanks in advance

Geovani PK0-002 dumps

Custom delimeter.

Submitted by Anonymous on Tue, 2009-09-15 14:48.

check range delimeter.

------
JJm
Custom Buy Essay online - Buy Term papers online - Buy Report Custom writing