update
This commit is contained in:
@@ -215,6 +215,26 @@ arithmetic coding can achieve compression rates that approach the entropy of the
|
||||
Its ability to handle non-integer bit lengths makes it particularly powerful
|
||||
for applications requiring high compression efficiency.
|
||||
|
||||
In the basic form, a message is first written in the base of the alphabet with a leading '$0.$': $ \text{ABBCAB} = 0.011201_3$,
|
||||
in this case yielding a ternary number as the alphabet is $ |\{A,B,C\}| = 3 $.
|
||||
This number can then be encoded to the target base (usually 2) with sufficient precision to yield back the original number, resulting in $0.0010110001_2$.
|
||||
The decoder only gets the rational number $q$ and the length $n$ of the original message.
|
||||
The encoding can then be easily reversed by changing base and rounding to $n$ digits.
|
||||
|
||||
In general, arithmetic coding can produce near-optimal output for any given source probability distribution.
|
||||
This is achieved by adjusting the intervals that are interpreted as a given source symbol.
|
||||
Given the following source probabilities of $p_A = \frac{6}{8}, p_B = p_C = \frac{1}{8}$ the intervals would be adjusted to
|
||||
$ A= [0,\frac{6}{8}), B=(\frac{6}{8}, \frac{7}{8}), C=(\frac{7}{8},1]$.
|
||||
Instead of transforming the base of the number and rounding to appropriate precision, the encoder recursively refines the interval and in the end chooses a number inside that interval.
|
||||
\begin{enumerate}
|
||||
\item \textbf{Symbol:A} $A=[0, \frac{6}{8})$
|
||||
\item $ A= [0,(\frac{6}{8})^2), B=((\frac{6}{8})^2, \frac{7}{8} \cdot \frac{6}{8}), C=(\frac{7}{8} \cdot \frac{6}{8},1 \cdot \frac{6}{8}]$.
|
||||
\item \textbf{Symbol:B} $B=((\frac{6}{8})^2, \frac{7}{8} \cdot \frac{6}{8}) = (\frac{36}{64}, \frac{42}{64})$
|
||||
\end{enumerate}
|
||||
|
||||
Depending on implementation, the source message can also be encoded in base $n+1$, reserving room for a special \verb|END-OF-DATA| symbol that the decoder
|
||||
will look for and consequently stop reading from the input $q$.
|
||||
|
||||
\section{LZW Algorithm}
|
||||
The Lempel-Ziv-Welch (LZW) algorithm is a dictionary-based compression method that dynamically builds a dictionary
|
||||
of recurring patterns in the data as compression proceeds. Unlike entropy-based methods such as Huffman or arithmetic coding,
|
||||
@@ -301,16 +321,12 @@ When the dictionary becomes full, most implementations stop adding new entries,
|
||||
|
||||
LZW has seen widespread practical deployment in compression standards and applications.
|
||||
The GIF image format uses LZW compression, as does the TIFF image format in some variants.
|
||||
The V.42bis modem compression standard incorporates LZW-like techniques.
|
||||
More recent variants such as LZSS, LZMA, and Deflate (used in ZIP and gzip)
|
||||
extend the LZW concept with additional refinements like literal-length-distance encoding
|
||||
and Huffman coding post-processing to achieve better compression ratios.
|
||||
|
||||
The relationship between dictionary-based methods like LZW and entropy-based methods like Huffman
|
||||
is complementary rather than competitive. LZW excels at capturing structure and repetition,
|
||||
while entropy-based methods optimize symbol encoding based on probability distributions.
|
||||
This has led to hybrid approaches that combine both techniques, such as the Deflate algorithm,
|
||||
which uses LZSS (a variant of LZ77) followed by Huffman coding of the output.
|
||||
which uses LZSS (a variant of LZ77) followed by Huffman coding of the output to achieve better compression ratios.
|
||||
|
||||
|
||||
\printbibliography
|
||||
|
||||
Reference in New Issue
Block a user