LevelDB源碼分析之七:Random
一.原理:
C語(yǔ)言中偽隨機(jī)數(shù)生成算法實(shí)際上是采用了"線性同余法"。具體的計(jì)算如下:?
seed = (seed * A + C ) % M
其中A,C,M都是常數(shù)(一般會(huì)取質(zhì)數(shù))。當(dāng)C=0時(shí),叫做乘同余法。
假設(shè)我們定義隨機(jī)數(shù)函數(shù):
void?rand(int?&seed) { ????seed?=?(seed?*?A?+?C?)?%?M; }
每次調(diào)用rand函數(shù)都會(huì)產(chǎn)生一個(gè)隨機(jī)值賦值給seed,可以看出實(shí)際上用rand函數(shù)生成的是一個(gè)遞推的序列,一切值都來(lái)源于最初的seed。所以當(dāng)初始的seed取一樣的時(shí)候,得到的序列都相同。
我們稱seed為種子,一個(gè)偽隨機(jī)數(shù)常用的原則就是M盡可能的大。例如,對(duì)于32位的機(jī)器來(lái)說(shuō),選擇M=2^31-1=2147483647, A=7^5=16807時(shí)可以取得最佳效果。
二.代碼實(shí)現(xiàn):
現(xiàn)在我們來(lái)看看levelDB里隨機(jī)數(shù)Random類是如何實(shí)現(xiàn)的:
在Random類中,A為16807,M為2147483647,C為0;
#ifndef?STORAGE_LEVELDB_UTIL_RANDOM_H_ #define?STORAGE_LEVELDB_UTIL_RANDOM_H_ #includenamespace?leveldb?{ ???? ????//?A?very?simple?random?number?generator.??Not?especially?good?at ????//?generating?truly?random?bits,?but?good?enough?for?our?needs?in?this ????//?package. ???? ????class?Random ????{ ????private: ????????uint32_t?seed_; ????public: ????????//?0x7fffffffu?==?2147483647L?==?2^31-1?==?01111111?11111111?11111111?11111111 ????????//?表達(dá)式s?&?0x7fffffffu,確保結(jié)果值在[0,2147483647]范圍內(nèi) ????????explicit?Random(uint32_t?s)?:?seed_(s?&?0x7fffffffu)?? ????????{ ????????????//?Avoid?bad?seeds. ????????????//?seed_不能為零或M,否則所有的后續(xù)計(jì)算的值將為零或M ????????????if?(seed_?==?0?||?seed_?==?2147483647L) ????????????{ ????????????????seed_?=?1; ????????????} ????????} ????????//?16807隨機(jī)數(shù) ????????uint32_t?Next() ????????{? ????????????//01111111?11111111?11111111?11111111 ????????????static?const?uint32_t?M?=?2147483647L;???//?2^31-1 ????????????//0100?0001?1010?0111 ????????????static?const?uint64_t?A?=?16807;??//?bits?14,?8,?7,?5,?2,?1,?0 ????????????//?We?are?computing ????????????//???????seed_?=?(seed_?*?A)?%?M,????where?M?=?2^31-1 ????????????// ????????????//?seed_?must?not?be?zero?or?M,?or?else?all?subsequent?computed?values ????????????//?will?be?zero?or?M?respectively.??For?all?other?values,?seed_?will?end ????????????//?up?cycling?through?every?number?in?[1,M-1] ????????????//?這里將seed_*A設(shè)置為隨機(jī)數(shù)生成器product,注意到product是64位的。 ????????????//?那么seed_=product?%?M就相當(dāng)于得到大小在[1,M-1]之間的隨機(jī)數(shù)。 ????????????uint64_t?product?=?seed_?*?A; ????????????//?Compute?(product?%?M)?using?the?fact?that?((x?<<?31)?%?M)?==?x. ????????????//?對(duì)于product?%?M,使用(product?>>?31)?+?(product?&?M)進(jìn)行運(yùn)算優(yōu)化, ????????????//?考慮到右移和與操作的代價(jià)遠(yuǎn)小于取余操作。 ????????????//?下面等式用到了((x?<<?31)?%?M)?==?x的技巧(等式的證明見第三節(jié)) ????????????//?product%M?==?static_cast((product?>>?31)?+?(product?&?M))的證明見第四節(jié) ????????????seed_?=?static_cast((product?>>?31)?+?(product?&?M));? ????????????//?The?first?reduction?may?overflow?by?1?bit,?so?we?may?need?to ????????????//?repeat.??mod?==?M?is?not?possible;?using?>?allows?the?faster ????????????//?sign-bit-based?test. ????????????if?(seed_?>?M) ????????????{ ????????????????seed_?-=?M; ????????????} ????? ????????????return?seed_; ????????} ????????//?Returns?a?uniformly?distributed?value?in?the?range?[0..n-1] ????????//?返回范圍[0..n-1]內(nèi)的均勻分布值。 ????????//?REQUIRES:?n?>?0 ????????uint32_t?Uniform(int?n)?{?return?Next()?%?n;?} ???????? ????????//?Randomly?returns?true?~"1/n"?of?the?time,?and?false?otherwise. ????????//?REQUIRES:?n?>?0 ????????bool?OneIn(int?n)?{?return?(Next()?%?n)?==?0;?} ???????? ????????//?Skewed:?pick?"base"?uniformly?from?range?[0,max_log]?and?then ????????//?return?"base"?random?bits.??The?effect?is?to?pick?a?number?in?the ????????//?range?[0,2^max_log-1]?with?exponential?bias?towards?smaller?numbers. ????????//?偏態(tài):Uniform(max_log?+?1)取值范圍是[0,max_log],1左移[0,max_log]得到 ????????//?范圍是[1,2^max_log],uniform([1,2^max_log])得到的范圍是[0,2^max_log-1] ????????uint32_t?Skewed(int?max_log) ????????{ ????????????return?Uniform(1?<<?Uniform(max_log?+?1)); ????????} ????}; ???? }??//?namespace?leveldb
三.證明等式(x<<31)%M == x成立。其中M等于2^31-1
計(jì)算表達(dá)式左邊(x << 31) % M,由于x<<31等于x*2^31,
則(x << 31) % M=(x*2^31)%M=(x + x*(2^31-1))%M=(x + x*M)%M=x
四.證明等式(product%M) == (product>>31)+(product&M),其中M等于2^31-1
因?yàn)閜roduct類型是uint64_t,可以將product從左到右分解成高33位和低31位,如下:
? ? ? ?高33位 ? ? ? ? ? ? ? ? ? ? ?低31
(product>>31)<<31+product&M
(product>>31)<