update
This commit is contained in:
@@ -17,7 +17,8 @@
|
|||||||
\usepackage{tikz}
|
\usepackage{tikz}
|
||||||
\usepackage{pgfplots}
|
\usepackage{pgfplots}
|
||||||
\usetikzlibrary{positioning}
|
\usetikzlibrary{positioning}
|
||||||
%\usegdlibrary{trees}
|
\usetikzlibrary{trees}
|
||||||
|
%\usetikzlibrary{graphs, graphdrawing}
|
||||||
%%% math
|
%%% math
|
||||||
\usepackage{amsmath}
|
\usepackage{amsmath}
|
||||||
%%% citations
|
%%% citations
|
||||||
@@ -41,6 +42,7 @@ In coding theory, the events of an information source are to be encoded in a man
|
|||||||
the information provided by the source.
|
the information provided by the source.
|
||||||
The process of encoding can thus be described by a function $C$ transforming from a source alphabet $X$ to a code alphabet $Y$.
|
The process of encoding can thus be described by a function $C$ transforming from a source alphabet $X$ to a code alphabet $Y$.
|
||||||
Symbols in the alphabets are denominated $x_i$ and $y_j$ respectively, and have underlying probabilities $p_{i}$.
|
Symbols in the alphabets are denominated $x_i$ and $y_j$ respectively, and have underlying probabilities $p_{i}$.
|
||||||
|
% TODO fix use of alphabet / symbol / code word: alphabet is usually binary -> code word is 010101
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
C: X \rightarrow Y \qquad X=\{x_1,x_2,...x_n\} \qquad Y=\{y_1,y_2,...y_m\}
|
C: X \rightarrow Y \qquad X=\{x_1,x_2,...x_n\} \qquad Y=\{y_1,y_2,...y_m\}
|
||||||
\label{eq:formal-code}
|
\label{eq:formal-code}
|
||||||
@@ -96,24 +98,33 @@ In the case of the capital code in fact every word other than the longest possib
|
|||||||
lower in the table. As a result, the receiver cannot instantaneously decode each word but rather has to wait for the leading 0
|
lower in the table. As a result, the receiver cannot instantaneously decode each word but rather has to wait for the leading 0
|
||||||
of the next codeword.
|
of the next codeword.
|
||||||
|
|
||||||
|
|
||||||
Further, a code is said to be \textit{efficient} if it has the smallest possible average word length, i.e. matches
|
Further, a code is said to be \textit{efficient} if it has the smallest possible average word length, i.e. matches
|
||||||
the entropy of the source alphabet.
|
the entropy of the source alphabet.
|
||||||
|
|
||||||
\section{Kraft-McMillan inequality}
|
\section{Kraft-McMillan inequality}
|
||||||
|
The Kraft-McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code.
|
||||||
|
In the form shown in \autoref{eq:kraft-mcmillan} it is intuitive to understand given a code tree.
|
||||||
|
Because prefix codes require code words to only be situated on the leaves of a code tree,
|
||||||
|
for every code word $i$ using an alphabet of size $r$, it uses up exactly $r^{-l_i}$ of the available code words.
|
||||||
|
The sum over all of them can thus never be larger than one else
|
||||||
|
the code will not be uniquely decodable \cite{enwiki:kraft-mcmillan}.
|
||||||
|
\begin{equation}
|
||||||
|
\sum_l r^{-l_i} \leq 1
|
||||||
|
\label{eq:kraft-mcmillan}
|
||||||
|
\end{equation}
|
||||||
|
|
||||||
\section{Shannon-Fano}
|
\section{Shannon-Fano}
|
||||||
Shannon-Fano coding is one of the earliest methods for constructing prefix codes.
|
Shannon-Fano coding is one of the earliest methods for constructing prefix codes.
|
||||||
It divides symbols into equal groups based on their probabilities, recursively partitioning them to assign shorter codewords
|
It is a top-down method that divides symbols into equal groups based on their probabilities,
|
||||||
to more frequent symbols.
|
recursively partitioning them to assign shorter codewords to more frequent events.
|
||||||
While intuitive, Shannon-Fano coding does not always achieve optimal compression,
|
|
||||||
paving the way for more advanced techniques like Huffman coding.
|
|
||||||
|
|
||||||
\begin{algorithm}
|
\begin{algorithm}
|
||||||
\begin{algorithmic}[1]
|
\begin{algorithmic}
|
||||||
\State first line
|
\State first line
|
||||||
\end{algorithmic}
|
\end{algorithmic}
|
||||||
\label{alg:shannon-fano}
|
\label{alg:shannon-fano}
|
||||||
\caption{Shannon-Fano compression algorithm}
|
\caption{Shannon-Fano compression}
|
||||||
\end{algorithm}
|
\end{algorithm}
|
||||||
|
|
||||||
\section{Huffman Coding}
|
\section{Huffman Coding}
|
||||||
@@ -136,6 +147,7 @@ The Lempel-Ziv-Welch (LZW) algorithm is a dictionary-based compression method th
|
|||||||
of recurring patterns in the data.
|
of recurring patterns in the data.
|
||||||
Unlike entropy-based methods, LZW does not require prior knowledge of symbol probabilities,
|
Unlike entropy-based methods, LZW does not require prior knowledge of symbol probabilities,
|
||||||
making it highly adaptable and efficient for a wide range of applications, including image and text compression.
|
making it highly adaptable and efficient for a wide range of applications, including image and text compression.
|
||||||
|
Because the dictionary does not have to be transmitted explicitly, LZW is also useful for streaming data.
|
||||||
\cite{dewiki:lzw}
|
\cite{dewiki:lzw}
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user