Python 的 heapq 模塊源碼分析

CoderBear 發(fā)布于2019-07-30 18:43 / 3122人閱讀

摘要：原文鏈接起步模塊實(shí)現(xiàn)了適用于列表的最小堆排序算法。本文內(nèi)容將分為三個(gè)部分，第一個(gè)部分簡(jiǎn)單介紹模塊的使用第二部分回顧堆排序算法第三部分分析中的實(shí)現(xiàn)。總結(jié)堆排序結(jié)合圖來(lái)理解還是比較好理解的。

原文鏈接：https://www.hongweipeng.com/i...

起步

heapq 模塊實(shí)現(xiàn)了適用于Python列表的最小堆排序算法。

堆是一個(gè)樹(shù)狀的數(shù)據(jù)結(jié)構(gòu)，其中的子節(jié)點(diǎn)都與父母排序順序關(guān)系。因?yàn)槎雅判蛑械臉?shù)是滿二叉樹(shù)，因此可以用列表來(lái)表示樹(shù)的結(jié)構(gòu)，使得元素 N 的子元素位于 2N + 1 和 2N + 2 的位置（對(duì)于從零開(kāi)始的索引）。

本文內(nèi)容將分為三個(gè)部分，第一個(gè)部分簡(jiǎn)單介紹 heapq 模塊的使用；第二部分回顧堆排序算法；第三部分分析heapq中的實(shí)現(xiàn)。

heapq 的使用

創(chuàng)建堆有兩個(gè)基本的方法：heappush() 和 heapify()，取出堆頂元素用 heappop()。

heappush() 是用來(lái)向已有的堆中添加元素，一般從空列表開(kāi)始構(gòu)建：

import heapq

data = [97, 38, 27, 50, 76, 65, 49, 13]
heap = []

for n in data:
    heapq.heappush(heap, n)

print("pop:", heapq.heappop(heap)) # pop: 13
print(heap) # [27, 50, 38, 97, 76, 65, 49]

如果數(shù)據(jù)已經(jīng)在列表中，則使用 heapify() 進(jìn)行重排：

import heapq

data = [97, 38, 27, 50, 76, 65, 49, 13]

heapq.heapify(data)

print("pop:", heapq.heappop(data)) # pop: 13
print(data) # [27, 38, 49, 50, 76, 65, 97]

回顧堆排序算法

堆排序算法基本思想是：將無(wú)序序列建成一個(gè)堆，得到關(guān)鍵字最?。ɑ蜃畲蟮挠涗?；輸出堆頂?shù)淖钚?（大）值后，使剩余的 n-1 個(gè)元素重又建成一個(gè)堆，則可得到n個(gè)元素的次小值；重復(fù)執(zhí)行，得到一個(gè)有序序列，這個(gè)就是堆排序的過(guò)程。

堆排序需要解決兩個(gè)問(wèn)題：

如何由一個(gè)無(wú)序序列建立成一個(gè)堆？

如何在輸出堆頂元素之后，調(diào)整剩余元素，使之成為一個(gè)新的堆？

新添加元素和，如何調(diào)整堆？

先來(lái)看看第二個(gè)問(wèn)題的解決方法。采用的方法叫“篩選”，當(dāng)輸出堆頂元素之后，就將堆中最后一個(gè)元素代替之；然后將根結(jié)點(diǎn)值與左、右子樹(shù)的根結(jié)點(diǎn)值進(jìn)行比較，并與其中小者進(jìn)行交換；重復(fù)上述操作，直至葉子結(jié)點(diǎn)，將得到新的堆，稱(chēng)這個(gè)從堆頂至葉子的調(diào)整過(guò)程為“篩選”。

如上圖所示，當(dāng)堆頂 13 輸出后，將堆中末尾的 97 替代為堆頂，然后堆頂與它的子節(jié)點(diǎn) 38 和 27 中的小者交換；元素 97 在新的位置上在和它的子節(jié)點(diǎn) 65 和 49 中的小者交換；直到元素97成為葉節(jié)點(diǎn)，就得到了新的堆。這個(gè)過(guò)程也叫下沉。

讓堆中位置為 pos 元素進(jìn)行下沉的如下：

def heapdown(heap, pos):
    endpos = len(heap)
    while pos < endpos:
        lchild = 2 * pos + 1
        rchild = 2 * pos + 2
        if lchild >= endpos: # 如果pos已經(jīng)是葉節(jié)點(diǎn)，退出循環(huán)
            break
        childpos = lchild   # 假設(shè)要交換的節(jié)點(diǎn)是左節(jié)點(diǎn)
        if rchild < endpos and heap[childpos] > heap[rchild]:
            childpos = rchild
            
        if heap[pos] < heap[childpos]: # 如果節(jié)點(diǎn)比子節(jié)點(diǎn)都小，退出循環(huán)
            break
        heap[pos], heap[childpos]  = heap[childpos], heap[pos]  # 交換
        pos = childpos

再來(lái)看看如何解決第三個(gè)問(wèn)題：新添加元素和，如何調(diào)整堆？這個(gè)的方法正好與下沉相反，首先將新元素放置列表的最后，然后新元素與其父節(jié)點(diǎn)比較，若比父節(jié)點(diǎn)小，與父節(jié)點(diǎn)交換；重復(fù)過(guò)程直到比父節(jié)點(diǎn)大或到根節(jié)點(diǎn)。這個(gè)過(guò)程使得元素從底部不斷上升，從下至上恢復(fù)堆的順序，稱(chēng)為上浮。

將位置為 pos 進(jìn)行上浮的代碼為：

def heapup(heap, startpos, pos):   # 如果是新增元素，startpos 傳入 0
    while pos > startpos:
        parentpos = (pos - 1) // 2
        if heap[pos] < heap[parentpos]:
            heap[pos], heap[parentpos] = heap[parentpos], heap[pos]
            pos = parentpos
        else:
            break

第一個(gè)問(wèn)題：如何由一個(gè)無(wú)序序列建立成一個(gè)堆？從無(wú)序序列的第 n/2 個(gè)元素（即此無(wú)序序列對(duì)應(yīng)的完全二叉樹(shù)的最后一個(gè)非終端結(jié)點(diǎn) ）起，至第一個(gè)元素止，依次進(jìn)行下沉：

for i in reversed(range(len(data) // 2)):
    heapdown(data, i)

heapq 源碼分析

添加新元素到堆中的 heappush() 函數(shù)：

def heappush(heap, item):
    """Push item onto heap, maintaining the heap invariant."""
    heap.append(item)
    _siftdown(heap, 0, len(heap)-1)

把目標(biāo)元素放置列表最后，然后進(jìn)行上浮。盡管它命名叫 down ,但這個(gè)過(guò)程是上浮的過(guò)程，這個(gè)命名也讓我困惑，后來(lái)我才知道它是因?yàn)樵氐乃饕粩鄿p小，所以命名 down 。下沉的過(guò)程它也就命名為 up 了。

def _siftdown(heap, startpos, pos):
    newitem = heap[pos]
    # Follow the path to the root, moving parents down until finding a place
    # newitem fits.
    while pos > startpos:
        parentpos = (pos - 1) >> 1
        parent = heap[parentpos]
        if newitem < parent:
            heap[pos] = parent
            pos = parentpos
            continue
        break
    heap[pos] = newitem

一樣是通過(guò) newitem 不斷與父節(jié)點(diǎn)比較。不一樣的是這里缺少了元素交換的過(guò)程，而是計(jì)算出新元素最后所在的位置 pos 并進(jìn)行的賦值。顯然這是優(yōu)化后的代碼，減少了不斷交換元素的冗余過(guò)程。

再來(lái)看看輸出堆頂元素的函數(shù) heappop():

def heappop(heap):
    """Pop the smallest item off the heap, maintaining the heap invariant."""
    lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
    if heap:
        returnitem = heap[0]
        heap[0] = lastelt
        _siftup(heap, 0)
        return returnitem
    return lastelt

通過(guò) heap.pop() 獲得列表中的最后一個(gè)元素，然后替換為堆頂 heap[0] = lastelt ，再進(jìn)行下沉：

def _siftup(heap, pos):
    endpos = len(heap)
    startpos = pos
    newitem = heap[pos]
    # Bubble up the smaller child until hitting a leaf.
    childpos = 2*pos + 1    # 左節(jié)點(diǎn)，默認(rèn)替換左節(jié)點(diǎn)
    while childpos < endpos:
        # Set childpos to index of smaller child.
        rightpos = childpos + 1  # 右節(jié)點(diǎn)
        if rightpos < endpos and not heap[childpos] < heap[rightpos]:
            childpos = rightpos  # 當(dāng)右節(jié)點(diǎn)比較小時(shí)，應(yīng)交換的是右節(jié)點(diǎn)
        # Move the smaller child up.
        heap[pos] = heap[childpos]
        pos = childpos
        childpos = 2*pos + 1
    # The leaf at pos is empty now.  Put newitem there, and bubble it up
    # to its final resting place (by sifting its parents down).
    heap[pos] = newitem
    _siftdown(heap, startpos, pos)

這邊的代碼將準(zhǔn)備要下沉的元素視為新元素 newitem ，將其當(dāng)前的位置 pos 視為空位置，由其子節(jié)點(diǎn)中的小者進(jìn)行取代，反復(fù)如此，最后會(huì)在葉節(jié)點(diǎn)留出一個(gè)位置，這個(gè)位置放入 newitem ，再讓新元素進(jìn)行上浮。

再來(lái)看看讓無(wú)序數(shù)列重排成堆的 heapify() 函數(shù)：

def heapify(x):
    """Transform list into a heap, in-place, in O(len(x)) time."""
    n = len(x)
    for i in reversed(range(n//2)):
        _siftup(x, i)

這部分就和理論上的一致，從最后一個(gè)非葉節(jié)點(diǎn) (n // 2) 到根節(jié)點(diǎn)為止，進(jìn)行下沉。

總結(jié)

堆排序結(jié)合圖來(lái)理解還是比較好理解的。這種數(shù)據(jù)結(jié)構(gòu)常用于優(yōu)先隊(duì)列（標(biāo)準(zhǔn)庫(kù)Queue的優(yōu)先隊(duì)列用的就是堆）。 heapq 模塊中還有很多其他 heapreplace ，heappushpop 等大體上都很類(lèi)似。

GPU云服務(wù)器云服務(wù)器 python的源碼 heapq 源碼編譯模塊 linux 模塊源碼

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://m.hztianpu.com/yun/42933.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

CoderBear

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

ION Cloud：兩周年慶，新加坡CN2/圣何塞CN2 GIA云服務(wù)器年付8折，$336/年起

閱讀 2675·2021-08-20 09:38
Web組件簡(jiǎn)介

閱讀 1428·2019-08-30 15:43
CSS如何修改placeholder樣式

閱讀 653·2019-08-29 17:13
css+html 扁平化博客

閱讀 1670·2019-08-29 14:01
數(shù)據(jù)告訴你，胡歌的微世界

閱讀 1374·2019-08-29 13:29
LoaderX - 圖片加載插件

閱讀 2396·2019-08-23 18:29
RN 從上手到“放棄”

閱讀 2113·2019-08-23 17:51
【3】JavaScript 基礎(chǔ)深入——對(duì)象

閱讀 1993·2019-08-23 17:16

成人无码视频,亚洲精品久久久久av无码,午夜精品久久久久久毛片,亚洲中文字幕日韩无码

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

Python 的 heapq 模塊源碼分析

相關(guān)文章

Python基礎(chǔ)之(十)模塊

python之排序操作及heapq模塊

Python 列表推導(dǎo)及優(yōu)先級(jí)隊(duì)列的實(shí)現(xiàn)

Python每日一練0006

**4-array/heapq/queue模塊**

發(fā)表評(píng)論

0條評(píng)論

CoderBear

男|高級(jí)講師

TA的文章

ION Cloud：兩周年慶，新加坡CN2/圣何塞CN2 GIA云服務(wù)器年付8折，$336/年起

Web組件簡(jiǎn)介

CSS如何修改placeholder樣式

css+html 扁平化博客

數(shù)據(jù)告訴你，胡歌的微世界

LoaderX - 圖片加載插件

RN 從上手到“放棄”

【3】JavaScript 基礎(chǔ)深入——對(duì)象

最新活動(dòng)

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

Python 的 heapq 模塊源碼分析

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！