How to select a random row from a table in MySQL?

$0
=
$3
+
$0
No specific Bitcoin Bounty has been announced by author. Still, anyone could send Bitcoin Tips to those who provide a good answer.
0

Let's say I have a table with an integer column called ID being a primary key. It is unique and not-null, but we cannot guarantee it being sequential without gaps. I.e. we may have rows with ID = 1,2,6,7,8 and we do not have rows with ID 3,4,5 etc etc. This is why we cannot just generate a random number and get a row with that corresponding ID.

Also we want all rows to have equal chance of being selected, so b/c of the gaps in ID values we cannot use simple approach with random number in 0 to max(id) range

The number of rows in the table is not known either.

Here is the table schema:

CREATE TABLE IF NOT EXISTS `test_data` (
  `id` int(10) unsigned NOT NULL,
  `create_date` datetime DEFAULT NULL,
  `text_1` varchar(255) NOT NULL DEFAULT 'BMqFXUslYnGsYsPxHTtZVbcwnEWFmSXxTAUV9YxXXDH5ClUEUO8kFz0cW1xC3o9aMSwabnEr43W23KZnKvrk8PHEJv18SU5JHTH72sLTtleitBJBIWmIpul7LtuYOpc4iRDqEAT80UeG7L2l4r1pr2jEMW7222reAOuIcBIUcsH9LYlojeQjVkc9ZhYXgnN3xRGHLJ3L0MGoXO4GHttEv053DqkkKYEye34bpGI2tJ0IE9M8BIFf2u08jB50nhD',
  `text_2` varchar(255) NOT NULL DEFAULT 'hoA6tWi8AEcikkJM50Mz800PGTUKNnyj3OCKhyJ4ExaJf6bYbqXlNWo4y0XXXo7HuvsNgYWnn16211RbKDesQ852QA33s1eni4pBoraEs3YiV0W69yMY7Nf0pvQI198HUVKYPWk9zpK38PDphtPJXO2z5Wb8mbBN0gN8iK5xzUQQDwoAJlO3Z8xXn2OWyVjKswRbZNKW6l0tvn0zN4S4BoR9gkN7s4Ov9tTGeF4uwWYhPEs0WsDqatMjmbnMQmC'
) ENGINE=InnoDB;

ALTER TABLE `test_data`
 ADD PRIMARY KEY (`id`), ADD UNIQUE KEY `id` (`id`);

How can I select a random row from that MySQL table using just SQL syntax? I.e. without using PHP code? Thanks!

Tags: ,

1 Answer - total earned - 12.5 mBTC ($3 USD)

1
=
1
=
$3
1 tip with total amount of 12.5 mBTC($3 USD) have been sent by alex

I've populated your test table with 20,000 rows. Then I have tested (and benchmarked for performance) 3 different ways to get your random row. The approach #2 was the fastest one:

1. Simple Approach - 0.09 sec execution time

SELECT * FROM `test_data`
ORDER BY RAND()
LIMIT 1

Execution plan:

Executing plan

As you may see, it requires full table scan, i.e. reading all 20,000 rows, calculating 20,000 random numbers, then SORTING 20,000 rows (by RAND), then fetching the first row after sorting. Also, the sorting happens not in memory but using I/), i.e. "using filesort".

So I decided to try making it a bit more efficient - see below ...

2. Avoiding Table Scan - 0.07 sec execution time (fastest)

Instead of the full table scan we just go through full index scan to get the random id, then fetch the complete record once. The execution time reduced from 0.09 to 0.07 sec.

SELECT t2.* FROM  
(SELECT id FROM `test_data`
ORDER BY RAND()
LIMIT 1) AS t1
INNER JOIN test_data as t2 ON t1.id = t2.id

3. Procedure - fetching loop with random number of fetches - 0.30 sec (slowest execution)

The idea was to get random row by performing random number of FETCHES from the unsorted cursor in the loop. The last fetched ID would be the ID of resulting random record. In theory, benefits of this approach would include:

  • no sorting needed at all
  • average scan depth should be N/2, not N like in the full scans above
  • random number needs to be calculated only once, not 20,000 times (this is minor, but still...)

here is the code

CREATE PROCEDURE test()
BEGIN
  DECLARE done INT DEFAULT FALSE;
  DECLARE current_id INT;
  DECLARE row_counter INT;
  DECLARE random_row_number INT;
  DECLARE cur1 CURSOR FOR SELECT id FROM test_data;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET @done = TRUE;

  SELECT @row_counter := COUNT(*) FROM test_data;

  SET @random_row_number = floor(1 + (rand() * @row_counter));

  SET @row_counter = 1;                                                   

  OPEN cur1;                                                   

  read_loop: LOOP
    FETCH cur1 INTO current_id;

    IF @done THEN
      LEAVE read_loop;
    END IF;

    IF @row_counter >= @random_row_number THEN
      LEAVE read_loop;
    END IF;

    SET @row_counter = @row_counter + 1;                                       
  END LOOP;

  CLOSE cur1;

  SELECT * FROM test_data WHERE id = current_id; 
 END

unfortunately, this procedural approach was good in theory but turned to be the slowest in practice. Most likely b/c MySQL cursor operations slowed things down. If anyone has any other idea why this approach with random number of fetches was so slow, please comment!

Thanks!

SEND BITCOIN TIPS

Thank you!

0

Post Answer


Top 3 Tippers
Recent Tips

Will Bitcoin Tipping become a new Internet Culture?


Please share our story! THANK YOU!