## Algorithm - Ch3 貪婪演算法 Greedy Algorithm

Chapter 3  貪婪演算法 Greedy Algorithm
3.1   貪婪演算法簡介

• Activity – selection problem : find maximum independent set in interval graph
• Huffman codes
• Min spanning tree problem : Kruskal’s algorithm and Prim’s algorithm (見Graph Algorithm)
• Fractional knapsack problem

• greedy set-covering algorithm (heuristic)
• Approximate-Subset-Sum problem (Knapsack-problem)

[補充] 貪婪演算法可以獲得整體最佳解的充分必要條件是它必須具備一種稱為擬陣(matriod)的數學結構。其實應該說，貪婪演算法的正確性的來源正是擬陣。

• 一個關於有限集合 E 的擬陣是一個 (E,U) 對，U 是一個系統E的子集，滿足下列條件:
1. $\phi \in U$
2. $A \subseteq B , B \in U \Rightarrow A \in U$
3. $A ,B \in U 且\mid A \mid < \mid B \mid \Rightarrow \exists x \in B \setminus A 且 A \cup {x} \in U$
• $\mid A \mid$是集合A的基數。例如 $A=\{a,b,c\}，\mid A \mid = 3 。$

3.2   經典貪婪演算法問題

• 正式定義題目 : We are given n activities $a_1,..., a_n$ where i-th activity $a_i = [s_i , f_i)$ starts at time $s_i$ and finishes at time $f_i$. They require the same resource (for example, one lecture hall). Two activities are $a_i$ and $a_j$ are compatible if $[s_i, f_i) ∩ [s_j , f_j ) = ∅$. The activity-selection problem (ASP) is to select a maximum size subset of mutually compatible activities.
• 解法 : Greedy choice. Select an activity with minimum $f_i$. Remove it and incompatible activities. Continue until no more activities left.
• 複雜度分析 : $O(nlogn)$，花在排序上。
• 概念補充
• 把撞期的行程，表示成圖，稱作 Interval Graph ，有著很特別的數學性質。工作排程問題等價於在interval graph中找maximum independent set。
• 在general graph中找max independent set是NP-hard問題。

• 正式定義題目 : Optimal code problem. Given an alphabet C of n characters and frequency of each character, find optimal prefix code so that the compressed file has minimum length. Huffman algorithm constructs the optimal tree T. The characters are the leaves of T.
• 解法 : select two vertices x and y of lowest frequency and replace them by a vertex z so that x and y are the children of z and the frequency of z is the sum of frequencies of x and y.
• 複雜度分析 : 使用binary min-heap資料結構  (Heap介紹詳見資料結構筆記)
• 建heap花費 $O(n)$ 的時間
• 每次 $ExtractMin()$ 和 $Insert()$ 花費 $O(log n)$ 的時間，這個動作重複做n次
• 因此整個演算法花費 $O(nlogn)$ 的時間

Reference

wiki - 擬陣

wiki - interval graph

CS6363: Design and Analysis of Computer Algorithms Prof. Sergey Bereg