update

2025-11-28 12:57:59 +01:00
parent e10e311f0f
commit b67fa4db89
1 changed files with 19 additions and 7 deletions
--- a/compression.tex
+++ b/compression.tex
@@ -17,7 +17,8 @@
 \usepackage{tikz}
 \usepackage{pgfplots}
 \usetikzlibrary{positioning}
-%\usegdlibrary{trees}
+\usetikzlibrary{trees}
+%\usetikzlibrary{graphs, graphdrawing}
 %%% math
 \usepackage{amsmath}
 %%% citations
@@ -41,6 +42,7 @@ In coding theory, the events of an information source are to be encoded in a man
 the information provided by the source.
 The process of encoding can thus be described by a function $C$ transforming from a source alphabet $X$ to a code alphabet $Y$.
 Symbols in the alphabets are denominated $x_i$ and $y_j$ respectively, and have underlying probabilities $p_{i}$.
+% TODO fix use of alphabet / symbol / code word: alphabet is usually binary -> code word is 010101
 \begin{equation}
    C: X \rightarrow Y \qquad X=\{x_1,x_2,...x_n\} \qquad Y=\{y_1,y_2,...y_m\}
    \label{eq:formal-code}
@@ -96,24 +98,33 @@ In the case of the capital code in fact every word other than the longest possib
 lower in the table. As a result, the receiver cannot instantaneously decode each word but rather has to wait for the leading 0
 of the next codeword.

+
 Further, a code is said to be \textit{efficient} if it has the smallest possible average word length, i.e. matches
 the entropy of the source alphabet.

 \section{Kraft-McMillan inequality}
+The Kraft-McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code.
+In the form shown in \autoref{eq:kraft-mcmillan} it is intuitive to understand given a code tree.
+Because prefix codes require code words to only be situated on the leaves of a code tree,
+for every code word $i$ using an alphabet of size $r$, it uses up exactly $r^{-l_i}$ of the available code words.
+The sum over all of them can thus never be larger than one else
+the code will not be uniquely decodable \cite{enwiki:kraft-mcmillan}.
+\begin{equation}
+    \sum_l r^{-l_i} \leq 1
+    \label{eq:kraft-mcmillan}
+\end{equation}

 \section{Shannon-Fano}
 Shannon-Fano coding is one of the earliest methods for constructing prefix codes.
-It divides symbols into equal groups based on their probabilities, recursively partitioning them to assign shorter codewords
-to more frequent symbols.
-While intuitive, Shannon-Fano coding does not always achieve optimal compression,
-paving the way for more advanced techniques like Huffman coding.
+It is a top-down method that divides symbols into equal groups based on their probabilities,
+recursively partitioning them to assign shorter codewords to more frequent events.

 \begin{algorithm}
-\begin{algorithmic}[1]
+\begin{algorithmic}
    \State first line
 \end{algorithmic}
 \label{alg:shannon-fano}
-\caption{Shannon-Fano compression algorithm}
+\caption{Shannon-Fano compression}
 \end{algorithm}

 \section{Huffman Coding}
@@ -136,6 +147,7 @@ The Lempel-Ziv-Welch (LZW) algorithm is a dictionary-based compression method th
 of recurring patterns in the data.
 Unlike entropy-based methods, LZW does not require prior knowledge of symbol probabilities,
 making it highly adaptable and efficient for a wide range of applications, including image and text compression.
+Because the dictionary does not have to be transmitted explicitly, LZW is also useful for streaming data.
 \cite{dewiki:lzw}