This commit is contained in:
eneller
2025-11-28 12:57:59 +01:00
parent e10e311f0f
commit b67fa4db89

View File

@@ -17,7 +17,8 @@
\usepackage{tikz} \usepackage{tikz}
\usepackage{pgfplots} \usepackage{pgfplots}
\usetikzlibrary{positioning} \usetikzlibrary{positioning}
%\usegdlibrary{trees} \usetikzlibrary{trees}
%\usetikzlibrary{graphs, graphdrawing}
%%% math %%% math
\usepackage{amsmath} \usepackage{amsmath}
%%% citations %%% citations
@@ -41,6 +42,7 @@ In coding theory, the events of an information source are to be encoded in a man
the information provided by the source. the information provided by the source.
The process of encoding can thus be described by a function $C$ transforming from a source alphabet $X$ to a code alphabet $Y$. The process of encoding can thus be described by a function $C$ transforming from a source alphabet $X$ to a code alphabet $Y$.
Symbols in the alphabets are denominated $x_i$ and $y_j$ respectively, and have underlying probabilities $p_{i}$. Symbols in the alphabets are denominated $x_i$ and $y_j$ respectively, and have underlying probabilities $p_{i}$.
% TODO fix use of alphabet / symbol / code word: alphabet is usually binary -> code word is 010101
\begin{equation} \begin{equation}
C: X \rightarrow Y \qquad X=\{x_1,x_2,...x_n\} \qquad Y=\{y_1,y_2,...y_m\} C: X \rightarrow Y \qquad X=\{x_1,x_2,...x_n\} \qquad Y=\{y_1,y_2,...y_m\}
\label{eq:formal-code} \label{eq:formal-code}
@@ -96,24 +98,33 @@ In the case of the capital code in fact every word other than the longest possib
lower in the table. As a result, the receiver cannot instantaneously decode each word but rather has to wait for the leading 0 lower in the table. As a result, the receiver cannot instantaneously decode each word but rather has to wait for the leading 0
of the next codeword. of the next codeword.
Further, a code is said to be \textit{efficient} if it has the smallest possible average word length, i.e. matches Further, a code is said to be \textit{efficient} if it has the smallest possible average word length, i.e. matches
the entropy of the source alphabet. the entropy of the source alphabet.
\section{Kraft-McMillan inequality} \section{Kraft-McMillan inequality}
The Kraft-McMillan inequality gives a necessary and sufficient condition for the existence of a prefix code.
In the form shown in \autoref{eq:kraft-mcmillan} it is intuitive to understand given a code tree.
Because prefix codes require code words to only be situated on the leaves of a code tree,
for every code word $i$ using an alphabet of size $r$, it uses up exactly $r^{-l_i}$ of the available code words.
The sum over all of them can thus never be larger than one else
the code will not be uniquely decodable \cite{enwiki:kraft-mcmillan}.
\begin{equation}
\sum_l r^{-l_i} \leq 1
\label{eq:kraft-mcmillan}
\end{equation}
\section{Shannon-Fano} \section{Shannon-Fano}
Shannon-Fano coding is one of the earliest methods for constructing prefix codes. Shannon-Fano coding is one of the earliest methods for constructing prefix codes.
It divides symbols into equal groups based on their probabilities, recursively partitioning them to assign shorter codewords It is a top-down method that divides symbols into equal groups based on their probabilities,
to more frequent symbols. recursively partitioning them to assign shorter codewords to more frequent events.
While intuitive, Shannon-Fano coding does not always achieve optimal compression,
paving the way for more advanced techniques like Huffman coding.
\begin{algorithm} \begin{algorithm}
\begin{algorithmic}[1] \begin{algorithmic}
\State first line \State first line
\end{algorithmic} \end{algorithmic}
\label{alg:shannon-fano} \label{alg:shannon-fano}
\caption{Shannon-Fano compression algorithm} \caption{Shannon-Fano compression}
\end{algorithm} \end{algorithm}
\section{Huffman Coding} \section{Huffman Coding}
@@ -136,6 +147,7 @@ The Lempel-Ziv-Welch (LZW) algorithm is a dictionary-based compression method th
of recurring patterns in the data. of recurring patterns in the data.
Unlike entropy-based methods, LZW does not require prior knowledge of symbol probabilities, Unlike entropy-based methods, LZW does not require prior knowledge of symbol probabilities,
making it highly adaptable and efficient for a wide range of applications, including image and text compression. making it highly adaptable and efficient for a wide range of applications, including image and text compression.
Because the dictionary does not have to be transmitted explicitly, LZW is also useful for streaming data.
\cite{dewiki:lzw} \cite{dewiki:lzw}