lec-crypto/compression.tex

\documentclass{article}
%%% basic layouting
\usepackage[utf8x]{inputenc}
\usepackage[margin=1in]{geometry} % Adjust margins
\usepackage{caption}
\usepackage{hyperref}
\PassOptionsToPackage{hyphens}{url} % allow breaking urls
\usepackage{float}
\usepackage{wrapfig}
\usepackage{subcaption}
\usepackage{parskip} % dont indent after paragraphs, figures
\usepackage{xcolor}
%%% algorithms
\usepackage{algorithm}
\usepackage{algpseudocodex}
% graphs and plots
\usepackage{tikz}
\usepackage{pgfplots}
\usetikzlibrary{positioning}
%\usegdlibrary{trees}
%%% math
\usepackage{amsmath}
%%% citations
\usepackage[style=ieee, backend=biber, maxnames=1, minnames=1]{biblatex}
%\usepackage{csquotes} % Recommended for biblatex
\addbibresource{compression.bib}

\title{Compression}
\author{Erik Neller}
\date{\today}

\begin{document}
\maketitle
\section{Introduction}
As the volume of data grows exponentially around the world, compression is only gaining in importance to all disciplines.
Not only does it enable the storage of large amounts of information needed for research in scientific domains
like DNA sequencing and analysis, it also plays a vital role in keeping stored data accessible by
facilitating cataloging, search and retrieval.
The concept of entropy is closely related to the design of efficient codes.

\begin{equation}
    H = E(I) = - \sum_i p_i \log_2(p_i)
    \label{eq:entropy-information}
\end{equation}
The understanding of entropy as the expected information $E(I)$ of a message provides an intuition that,
given a source with a given entropy (in bits), any coding can not have a lower average word length (in bits)

\begin{equation}
    E(l) = \sum_i p_i l_i
\end{equation}
than this entropy without losing information.
This is the content of Shannons's source coding theorem,
introduced in \citeyear{shannon1948mathematical} \cite{enwiki:shannon-source-coding}.
In his paper, \citeauthor{shannon1948mathematical} proposed two principal ideas to minimize the average length of a code.
The first is to use short codes for symbols with higher probability.
This is an intuitive approach as more frequent symbols have a higher impact on average code length.


\section{Kraft-McMillan inequality}
% https://de.wikipedia.org/wiki/Kraft-Ungleichung
% https://en.wikipedia.org/wiki/Kraft%E2%80%93McMillan_inequality
\section{Shannon-Fano}
Shannon-Fano coding is one of the earliest methods for constructing prefix codes.
It divides symbols into groups based on their probabilities, recursively partitioning them to assign shorter codewords
to more frequent symbols.
While intuitive, Shannon-Fano coding does not always achieve optimal compression,
paving the way for more advanced techniques like Huffman coding.

\begin{algorithm}
\begin{algorithmic}[1]
    \State first line
\end{algorithmic}
\label{alg:shannon-fano}
\caption{Shannon-Fano compression algorithm}
\end{algorithm}

\section{Huffman Coding}
Huffman coding is an optimal prefix coding algorithm that minimizes the expected codeword length
for a given set of symbol probabilities.
By constructing a binary tree where the most frequent symbols are assigned the shortest codewords,
Huffman coding achieves the theoretical limit of entropy for discrete memoryless sources.
Its efficiency and simplicity have made it a cornerstone of lossless data compression.

\section{Arithmetic Coding}
Arithmetic coding is a modern compression technique that encodes an entire message as a single interval
within the range $[0, 1)$.
By iteratively refining this interval based on the probabilities of the symbols in the message,
arithmetic coding can achieve compression rates that approach the entropy of the source.
Its ability to handle non-integer bit lengths makes it particularly powerful
for applications requiring high compression efficiency.

\section{LZW Algorithm}
The Lempel-Ziv-Welch (LZW) algorithm is a dictionary-based compression method that dynamically builds a dictionary
of recurring patterns in the data.
Unlike entropy-based methods, LZW does not require prior knowledge of symbol probabilities,
making it highly adaptable and efficient for a wide range of applications, including image and text compression.
\cite{dewiki:lzw}


\printbibliography
\end{document}