\documentclass[twoside]{article}
\usepackage[T1]{fontenc}
\usepackage[latin9]{inputenc}
\usepackage{amssymb, amsmath}
\usepackage{mathrsfs}
\usepackage{esint}
\oddsidemargin 0in \evensidemargin 0in \topmargin -0.5in
\headheight 0.2in \headsep 0.2in
\textwidth 6.5in \textheight 9in
\parskip 1.5ex \parindent 0ex \footskip 40pt
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% LyX specific LaTeX commands.
\newcommand{\noun}[1]{\textsc{#1}}
\begin{document}
\framebox[6.4in]{
\begin{minipage}{6.4in}
\vspace{1mm}
\center \makebox[6.2in]{{\bf CS369M: Algorithms for Modern Massive Data Set Analysis \hfill Lecture 12 - 11/04/2009}}
\vspace{2mm} \\
\center \makebox[6.2in]{{\Large Introduction to Graph Partitioning}}
\vspace{1mm} \\
\center \makebox[6.2in]{{\it Lecturer: Michael Mahoney \hfill Scribes: Noah Youngs and Weidong Shao}}
\vspace{1mm}
\end{minipage}
} \vspace{2mm} \\
\mbox{{ \it *Unedited Notes}}
\section{Graph Partition}
A graph partition problem is to cut a graph into 2 or more ``good'' pieces. The methods are based on
\begin{enumerate}
\item spectral. Either global (e.g., Cheeger inequality,) or local.
\item flow-based. min-cut/max-flow theorem. LP formulation. Embeddings. Local Improvement.
\item combination of spectral and flow.
\end{enumerate}
Note that not all graphs have good partitions.
Question: Can we certify that there are no good clusters in a graph?
``Good'' clusters have the following properties:
\begin{enumerate}
\item internally (intra) - well connected.
\item externally (inter) - relatively poor
\end{enumerate}
How do we quantify this?
Extreme cases:
\begin{enumerate}
\item split into 2 disconnected pieces
\item split into $S, \bar S$ on 2 maximum complete induced subgraphs.
\end{enumerate}
\section{Min cut problem}
\underbar{\noun{Define}} Given $G=(V, E)$, a cut is a partition of $V$, $(S, \bar S)$, where $S \subset V$.\\
Given $s, t \in V$, an $(s, t)$ cut is a cut s.t. $s\in S, t\in \bar S$ \\
A cut set of a cut is ${(u, v): (u, v) \in E, u\in S, v\in \bar S }$
The min cut problem: find the cut of "smallest" edge weights
\begin{enumerate}
\item good: Polynomial time algorithm (min-cut = max flow)
\item bad: often get very inbalanced cut
\item in theory: cut algorithms are used as a sub-routine in divide and conquer algorithm
\item in practice: often want to "interpret" the clusters or partitions
\end{enumerate}
\section{Max Flow Problem}
\underbar{\noun{Define}} Call the capacity of an edge $(u,v)\in E$ : $e_{uv}$ \\
Let there be a cost function: $c: E \rightarrow R^+ $ , delineated $c_{uv}$ or $c_{e}$\\
Then a flow is function of $f: E \rightarrow R^+ $
\begin{enumerate}
\item $f_{uv} \le C_{uv} \forall u, v$ (capacity constraints)
\item $\sum_{(u,v)\in E} f_{uv} = \sum f_{vu}$ (conservation of flows)
\end{enumerate}
Then the value of the flow
\[ |f| = \sum _v f_{s v} \]
The MAX flow problem:
\[ \max |f|\]
The capacity of $(s, t)$ cut is $c(S, \bar S) =\sum C_{uv}$.
The min cut problem is
\[\min C(S, T)\]
Note: this is a "single flow problem" ... i.e. only one $s$ and one $t$
Theorem: the max value of an $s-t$ flow is equal to the min capacity of an $s-t$ cut.
Proof idea:
$\max flow \le \min cut$ (weak duality)
Does there exists a cut that achieves equality?\\
Yes, from the strong duality theorem we can also solve the dual of the max-flow problem, which is the min-flow problem
Primal: (max flow)
\[ \max |f| \]
subject to
\[ f_{uv} \le C_{uv}\]
Dual: (min cut)
\[\min \sum_{(i, j) \in E} c_{ij} d_{ij} \]
s.t.
\[ d_{ij} - p_i + p_j \ge 0, ij\in E \]
\[ p_s=1, p_t=0, p_i \geq 0, \in V \]
\[ d_{ij} \geq 0, ij \in E \]
Can we add a "balance" condition?
\begin {enumerate}
\item want a good cut value $E(S, \bar S)$
\item want $S, \bar S$ both to be balanced - same size, or approximately same size
\end{enumerate}
the answer is "Yes"\\
Explicit balance conditions:\\
Graph bisection - min cut s.t. $|S| = |\bar S| = n/2$\\
$\beta$ balanced cut
min cut s.t $|S| = \beta n $, $ |\bar S| = (1-\beta) n$\\
Implicit Balance conditions:
\begin{enumerate}
\item input balance constraints
\item expansion. $\frac{E(S, \bar S)}{\frac{|S| }{n}}$ (def this as :h(S) )
\item sparsity $\frac{E(S, \bar S)}{|S| |\bar S|}$ (def this as :sp(S) )
\item conductance $\frac{E(S, \bar S)}{\frac{Vol(S)}{n}}$ (with $Vol(S) = \sum_{ij\in E}{ deg(V_i)}$
\item normalized cut $\frac{E(S, \bar S)}{vol(|S|) vol( |\bar S|)}$
(latter two are used in ML)
\item quotien cut $\frac{E(S, \bar S)}{min(vol(|S|), vol( |\bar S|))}$
\end{enumerate}
expansion and sparcity: are "same" (in the following sense:)
\[\min h(S) \approx \min sp(S)\]
Quotient cuts yield a tight bound on cheeger inequality\\
In-practice: bias towards high degree nodes\\
Note:
quotient cuts get balanced implicitly, no explicit constraints on inter or intra connectivity
$Z^2$ on random geometric graps or nice planer graphs yield good quotient cuts
More generally,
- very inbalanced
- disconnected clusters.
\\
\\
\\
Example:
extremely sparse random graph
$G(n, p)$ model, $p \ge \log n^2 / n $ expander
$ p ~ log n/n $
\section{Graph Partition Algorithms}
\subsection{Local Improvement}
Developed in the 70's\\
Often it is a greedy improvemnt\\
Local minima are a big problem\\
Usual methods improve them by constant factors\\
- simulated annealing\\
- big difference in practice\\
Kernighan-Lin algorithm, fundamental work, no-longer used due to $\Theta (n^2)$ performance
Fiduccia-Mattheyses algorithm, linear time, still commonly used
METIS algorithm from Karypis and Kumar, works very well in practice, especially on low dimensional graphs
\subsection{Spectral methods}
Develped in the 70's and 80's\\
Serivce level gaurantee (Cheeger's inequality)\\
At root, this is relaxation or rounding method related to QIP formualation :
$ MAX_{ x\in (-1. 1)^n} \frac{x^t L x }{x^t x } $ \\ \\
- quadratic worst case.
\begin{itemize}
\item hyperplane rounding:\\
-compute an eigenvector \\
- cut according to some rules\\
- post processing with local improvments
\end{itemize}
\subsection{Flow-based methods}
Developed in the 90's\\
Consider all pairs, multi-commodity flow problem.\\
Want to route the commodities s.t. the constraints are satisfied without bottlenecks.
Idea: bottleneck in flow computation corresponds to good cuts.\\
$k-$commodity problem: does not satisfy strong duality.
does satisfy approx min-cut max flow
value gap $\le \Theta(log n)$
\begin{itemize}
\item releax flow to LP
\item embed solution in $l_1$
\item Round soltuion to ${0,1}$, $\Theta(\log n)$ worst case.
\end{itemize}
\subsection{Additional Graph Partitioning Notes}
These methods "fail".... i.e. achieve the worst case, on the following graphs:\\
- spectral methods - fail on long stringy pieces ----- -------------- \\
- flow-based methods - fail on expander graphs. n choose 2 pairs but most pairs are far apart. (log n) apart. \\
Improvements/extensions for large data:
there exist hybrid flow based and local methods\\
(cut around the cut)
local spectrum methods \\
--- good cut around a start node of a given size\\
-- time depends on the size of the output.
\subsection{Methods that combine spectral and flow}
\begin{itemize}
\item ARV algorithm (developed a few years ago by Arora, Rao, and Vazirani)
\item most hyrbid algorithms are theoretical, but some implementations embed in SDP.
\item approximate solution
(two-player game).
\item boosting \& emsemble methods
\end{itemize}
\section{References}
\begin{enumerate}
\item Schaeffer, "Graph Clustering", Computer Science Review 1(1): 27-64, 2007
\item Kernighan, B. W.; Lin, Shen (1970). "An efficient heuristic procedure for partitioning graphs". Bell Systems Technical Journal 49: 291-307.
\item CM Fiduccia, RM Mattheyses. "A Linear-Time Heuristic for Improving Network Partitions". Design Automation Conference.
\item G Karypis, V Kumar (1999). "A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs". Siam Journal on Scientific Computing.
\end{enumerate}
\end{document}