This commit is contained in:
eneller
2025-11-26 19:24:53 +01:00
parent 25f725049e
commit e015e816bd
3 changed files with 27 additions and 10 deletions

View File

@@ -78,3 +78,10 @@
url = "https://de.wikipedia.org/w/index.php?title=Kraft-Ungleichung&oldid=172862410",
note = "[Online; Stand 26. November 2025]"
}
@misc{ dewiki:partition,
author = "Wikipedia",
title = "Partitionsproblem --- Wikipedia{,} die freie Enzyklopädie",
year = "2025",
url = "https://de.wikipedia.org/w/index.php?title=Partitionsproblem&oldid=255787013",
note = "[Online; Stand 26. November 2025]"
}

View File

@@ -36,31 +36,41 @@ As the volume of data grows exponentially around the world, compression is only
Not only does it enable the storage of large amounts of information needed for research in scientific domains
like DNA sequencing and analysis, it also plays a vital role in keeping stored data accessible by
facilitating cataloging, search and retrieval.
The concept of entropy is closely related to the design of efficient codes.
\begin{equation}
The concept of entropy introduced in the previous entry is closely related to the design of efficient codes for compression.
\begin{figure}[H]
\begin{minipage}{0.5\textwidth}
\begin{equation}
H = E(I) = - \sum_i p_i \log_2(p_i)
\label{eq:entropy-information}
\end{equation}
\end{minipage}
\begin{minipage}{0.5\textwidth}
\begin{equation}
E(L) = \sum_i p_i l_i
\label{eq:expected-codelength}
\end{equation}
\end{minipage}
\end{figure}
In coding theory, the events of an information source are to be encoded in a manner that minimizes the bits needed to store
the information provided by the source.
The understanding of entropy as the expected information $E(I)$ of a message provides an intuition that,
given a source with a given entropy (in bits), any coding can not have a lower average word length (in bits)
\begin{equation}
E(l) = \sum_i p_i l_i
\end{equation}
than this entropy without losing information.
This is the content of Shannons's source coding theorem,
introduced in \citeyear{shannon1948mathematical} \cite{enwiki:shannon-source-coding}.
In his paper, \citeauthor{shannon1948mathematical} proposed two principal ideas to minimize the average length of a code.
The first is to use short codes for symbols with higher probability.
This is an intuitive approach as more frequent symbols have a higher impact on average code length.
The second idea is to encode events that frequently occur together at the same time, allowing for greater flexibility
in code design.
\section{Kraft-McMillan inequality}
\section{Shannon-Fano}
Shannon-Fano coding is one of the earliest methods for constructing prefix codes.
It divides symbols into groups based on their probabilities, recursively partitioning them to assign shorter codewords
It divides symbols into equal groups based on their probabilities, recursively partitioning them to assign shorter codewords
to more frequent symbols.
While intuitive, Shannon-Fano coding does not always achieve optimal compression,
paving the way for more advanced techniques like Huffman coding.

View File

@@ -348,7 +348,7 @@ The capacity of the binary symmetric channel is given by:
where $H_2(p) = -p \log_2(p) - (1-p)\log_2(1-p)$ is the binary entropy function.
As $p$ increases, uncertainty grows and channel capacity declines.
When $p = 0.5$, output bits are completely random and no information can be transmitted ($C = 0$).
As already shown in \autoref{fig:graph-entropy}, an error rate over $p > 0.5$ is equivalent to $ 1-p < 0.5$,
As illustrated in \autoref{fig:graph-entropy}, an error rate over $p > 0.5$ is equivalent to $ 1-p < 0.5$,
though not relevant in practice.
Shannons theorem is not constructive as it does not provide an explicit method for constructing such efficient codes,