update

2025-11-30 16:38:04 +01:00
parent 597111974e
commit df511b4a3e
1 changed files with 21 additions and 5 deletions
--- a/compression.tex
+++ b/compression.tex
@@ -215,6 +215,26 @@ arithmetic coding can achieve compression rates that approach the entropy of the
 Its ability to handle non-integer bit lengths makes it particularly powerful
 for applications requiring high compression efficiency.

+In the basic form, a message is first written in the base of the alphabet with a leading '$0.$': $ \text{ABBCAB} = 0.011201_3$,
+in this case yielding a ternary number as the alphabet is $ |\{A,B,C\}| = 3 $.
+This number can then be encoded to the target base (usually 2) with sufficient precision to yield back the original number, resulting in $0.0010110001_2$.
+The decoder only gets the rational number $q$ and the length $n$ of the original message.
+The encoding can then be easily reversed by changing base and rounding to $n$ digits.
+
+In general, arithmetic coding can produce near-optimal output for any given source probability distribution.
+This is achieved by adjusting the intervals that are interpreted as a given source symbol.
+Given the following source probabilities of $p_A = \frac{6}{8}, p_B = p_C = \frac{1}{8}$ the intervals would be adjusted to
+$ A= [0,\frac{6}{8}), B=(\frac{6}{8}, \frac{7}{8}), C=(\frac{7}{8},1]$.
+Instead of transforming the base of the number and rounding to appropriate precision, the encoder recursively refines the interval and in the end chooses a number inside that interval.
+\begin{enumerate}
+    \item \textbf{Symbol:A} $A=[0, \frac{6}{8})$
+    \item $ A= [0,(\frac{6}{8})^2), B=((\frac{6}{8})^2, \frac{7}{8} \cdot \frac{6}{8}), C=(\frac{7}{8} \cdot \frac{6}{8},1 \cdot \frac{6}{8}]$.
+    \item \textbf{Symbol:B} $B=((\frac{6}{8})^2, \frac{7}{8} \cdot \frac{6}{8}) = (\frac{36}{64}, \frac{42}{64})$
+\end{enumerate}
+
+Depending on implementation, the source message can also be encoded in base $n+1$, reserving room for a special \verb|END-OF-DATA| symbol that the decoder
+will look for and consequently stop reading from the input $q$.
+
 \section{LZW Algorithm}
 The Lempel-Ziv-Welch (LZW) algorithm is a dictionary-based compression method that dynamically builds a dictionary
 of recurring patterns in the data as compression proceeds. Unlike entropy-based methods such as Huffman or arithmetic coding,
@@ -301,16 +321,12 @@ When the dictionary becomes full, most implementations stop adding new entries,

 LZW has seen widespread practical deployment in compression standards and applications.
 The GIF image format uses LZW compression, as does the TIFF image format in some variants.
-The V.42bis modem compression standard incorporates LZW-like techniques.
-More recent variants such as LZSS, LZMA, and Deflate (used in ZIP and gzip)
-extend the LZW concept with additional refinements like literal-length-distance encoding
-and Huffman coding post-processing to achieve better compression ratios.

 The relationship between dictionary-based methods like LZW and entropy-based methods like Huffman
 is complementary rather than competitive. LZW excels at capturing structure and repetition,
 while entropy-based methods optimize symbol encoding based on probability distributions.
 This has led to hybrid approaches that combine both techniques, such as the Deflate algorithm,
-which uses LZSS (a variant of LZ77) followed by Huffman coding of the output.
+which uses LZSS (a variant of LZ77) followed by Huffman coding of the output to achieve better compression ratios.


 \printbibliography