洗牌问题

Leetcode上最近出了一道新题，是这样的：

Shuffle a set of numbers without duplicates.

Example:

// Init an array with set 1, 2, and 3. int[] nums = {1,2,3}; Solution solution = new Solution(nums);

// Shuffle the array [1,2,3] and return its result. Any permutation of [1,2,3] must equally likely to be returned. solution.shuffle();

// Resets the array back to its original configuration [1,2,3]. solution.reset();

// Returns the random shuffling of array [1,2,3]. solution.shuffle();

这是一道关于洗牌的问题，看到这个问题我就想到了之前的Reservoir Sampling，也是关于随机选择数据的问题。那么这两者是否有关系呢？首先要说明的是，Reservoir Sampling中的Algorithm R可以看作下面要介绍的Fisher-Yates洗牌算法的一种特殊情况。

Fisher-Yates洗牌算法

Fisher-Yates洗牌算法，故名思意，就是用来将有限的序列来生成随机排列的序列。Fisher-Yates洗牌算法是由Ronald Fisher和Frank Yates首先提出的，算法描述如下(使用纸和笔，并用标有数字的纸堆产生随机数)：

列出数字1到N
产生一个随机数k，其范围在1到剩下未划掉的数字个数之间
从未段开始选择没有被划掉的第k个数字并写到另一个集合的末段，同时划掉该数字
重复步骤2-3只到所有数字都被划掉
步骤3得到的集合就是原始集合的一个随机序列

上面算法很好证明，对于第i个数来说：

之前不被选中的概率是P1 = (N-1)/N * (N-2)/(N-1) * (N-i+1)/(N-i+2)
而本次被选中的概率是P2 = 1/(N-i+1)
所以第i个数字被选中的概率为P1 * P2 = 1/N，即对于所有集合中元素来说其概率都是1/N。

Fisher-Yates洗牌算法实现

上面的Fisher-Yates洗牌算法原始实现的空间复杂度是O(2N)，对于较大的集合来说不是很友好，下面实现采用in-place的Fisher-Yates*洗牌算法。代码如下：

class Solution {
public:
    Solution(vector<int> nums) {
        val = nums;
        data = nums;
        srand((int)(time(NULL)));
    }
    
    /** Resets the array to its original configuration and return it. */
    vector<int> reset() {
        data = val;
        return data;
    }
    
    /** Returns a random shuffling of the array. */
    vector<int> shuffle() {
        for(int i = data.size()-1; i > 0; --i) {
            int j = rand()%(i+1);  // key point, generate random in 0 - i
            swap(data[i], data[j]);
        }
        return data;
    }
private:
    vector<int> data, val;
};

Sattolo’s algorithm

Sattolo’s algorithm是上面Fisher-Yates洗牌算法的一个变种, 不同于Fisher-Yates洗牌算法能够产生n! 个序列， Sattolo’s algorithm只能产生(n-1)! 中序列。

/** Returns a random shuffling of the array. */
vector<int> shuffle() {
    for(int i = data.size()-1; i > 0; --i) {
        int j = rand()%i;  // key point, different from Fisher-Yates shufff algorithm
        swap(data[i], data[j]);
    }
    return data;
}

参考

志飞 / 2016-08-15
Published under (CC) BY-NC-SA