update
This commit is contained in:
84
entropy.tex
84
entropy.tex
@@ -2,11 +2,13 @@
|
|||||||
\usepackage[utf8x]{inputenc}
|
\usepackage[utf8x]{inputenc}
|
||||||
\usepackage[margin=1in]{geometry} % Adjust margins
|
\usepackage[margin=1in]{geometry} % Adjust margins
|
||||||
\usepackage{caption}
|
\usepackage{caption}
|
||||||
|
\usepackage{wrapfig}
|
||||||
\usepackage{subcaption}
|
\usepackage{subcaption}
|
||||||
\usepackage{parskip} % dont indent after paragraphs, figures
|
\usepackage{parskip} % dont indent after paragraphs, figures
|
||||||
\usepackage{xcolor}
|
\usepackage{xcolor}
|
||||||
%\usepackage{csquotes} % Recommended for biblatex
|
%\usepackage{csquotes} % Recommended for biblatex
|
||||||
\usepackage{tikz}
|
\usepackage{tikz}
|
||||||
|
\usepackage{pgfplots}
|
||||||
\usepackage{float}
|
\usepackage{float}
|
||||||
\usepackage{amsmath}
|
\usepackage{amsmath}
|
||||||
\PassOptionsToPackage{hyphens}{url}
|
\PassOptionsToPackage{hyphens}{url}
|
||||||
@@ -55,17 +57,18 @@ $k_B$ refers to the Boltzmann constant, which he himself did not determine.
|
|||||||
\textit{Claude Shannon} adapted the concept of entropy to information theory.
|
\textit{Claude Shannon} adapted the concept of entropy to information theory.
|
||||||
In an era of advancing communication technologies, the question he addressed was of increasing importance:
|
In an era of advancing communication technologies, the question he addressed was of increasing importance:
|
||||||
How can messages be encoded and transmitted efficiently?
|
How can messages be encoded and transmitted efficiently?
|
||||||
As a measure, Shannon's formula uses the \textit{Bit}, quantifying the efficiency of codes
|
He proposed 3(4) axioms a measure of information would have to comply with:
|
||||||
and media for transmission and storage.
|
|
||||||
According to his axioms, a measure for information has to comply with the following criteria:
|
|
||||||
\begin{enumerate}
|
\begin{enumerate}
|
||||||
\item $I(1) = 0$: events that always occur do not communicate information.
|
\item $I(1) = 0$: events that always occur do not communicate information.
|
||||||
\item $I(p)$ is monotonically decreasing in p: an increase in the probability of an event
|
\item $ I'(p) \leq 0$ is monotonically decreasing in p: an increase in the probability of an event
|
||||||
decreases the information from an observed event, and vice versa.
|
decreases the information from an observed event, and vice versa.
|
||||||
\item $I(p_1 \cdot p_2) = I(p_1) + I(p_2)$: the information learned from independent events
|
\item $I(p_1 \cdot p_2) = I(p_1) + I(p_2)$: the information learned from independent events
|
||||||
is the sum of the information learned from each event.
|
is the sum of the information learned from each event.
|
||||||
\item $I(p)$ is a twice continuously differentiable function of p.
|
\item $I(p)$ is a twice continuously differentiable function of p.
|
||||||
\end{enumerate}
|
\end{enumerate}
|
||||||
|
|
||||||
|
As a measure, Shannon's formula uses the \textit{Bit}, quantifying the efficiency of codes
|
||||||
|
and media for transmission and storage.
|
||||||
In information theory, entropy can be understood as the expected information of a message.
|
In information theory, entropy can be understood as the expected information of a message.
|
||||||
\begin{equation}
|
\begin{equation}
|
||||||
H = E(I) = - \sum_i p_i \log_2(p_i)
|
H = E(I) = - \sum_i p_i \log_2(p_i)
|
||||||
@@ -78,6 +81,60 @@ that tomorrows message will be the same. When some day we get the message 'Vanco
|
|||||||
not only semantically (because it announces the eruption of a volcano) but statistically because it was very unlikely
|
not only semantically (because it announces the eruption of a volcano) but statistically because it was very unlikely
|
||||||
given the transmission history.
|
given the transmission history.
|
||||||
|
|
||||||
|
However, uncertainty (entropy) in this situation would be relatively low.
|
||||||
|
Because we attach high surprise only to the unlikely message of an eruption, the significantly more likely message
|
||||||
|
carries less information - we already expected it before it arrived.
|
||||||
|
|
||||||
|
Putting the axioms and our intuitive understanding of information and uncertainty together,
|
||||||
|
\autoref{fig:graph-entropy}
|
||||||
|
|
||||||
|
\begin{figure}[H]
|
||||||
|
\begin{minipage}{.5\textwidth}
|
||||||
|
\begin{tikzpicture}
|
||||||
|
\begin{axis}[
|
||||||
|
domain=0:1,
|
||||||
|
samples=100,
|
||||||
|
axis lines=middle,
|
||||||
|
xlabel={$p$},
|
||||||
|
ylabel={Information},
|
||||||
|
xmin=0, xmax=1,
|
||||||
|
ymin=0, ymax=6.1,
|
||||||
|
grid=both,
|
||||||
|
width=8cm,
|
||||||
|
height=6cm,
|
||||||
|
every axis x label/.style={at={(current axis.right of origin)}, anchor=west},
|
||||||
|
every axis y label/.style={at={(current axis.above origin)}, anchor=south},
|
||||||
|
]
|
||||||
|
\addplot[thick, blue] {-log2(x)};
|
||||||
|
\end{axis}
|
||||||
|
\end{tikzpicture}
|
||||||
|
\caption{Information contained in a message depending on its probability $p$}
|
||||||
|
\label{fig:graph-information}
|
||||||
|
\end{minipage}
|
||||||
|
\begin{minipage}{.5\textwidth}
|
||||||
|
\begin{tikzpicture}
|
||||||
|
\begin{axis}[
|
||||||
|
domain=0:1,
|
||||||
|
samples=100,
|
||||||
|
axis lines=middle,
|
||||||
|
xlabel={$p$},
|
||||||
|
ylabel={Entropy},
|
||||||
|
xmin=0, xmax=1,
|
||||||
|
ymin=0, ymax=1.1,
|
||||||
|
grid=both,
|
||||||
|
width=8cm,
|
||||||
|
height=6cm,
|
||||||
|
every axis x label/.style={at={(current axis.right of origin)}, anchor=west},
|
||||||
|
every axis y label/.style={at={(current axis.above origin)}, anchor=south},
|
||||||
|
]
|
||||||
|
\addplot[thick, blue] {-x * log2(x) - (1-x) * log2(1-x)};
|
||||||
|
\end{axis}
|
||||||
|
\end{tikzpicture}
|
||||||
|
\caption{Entropy of an event source with two possible events, depending on their probabilities $(p, 1-p)$}
|
||||||
|
\label{fig:graph-entropy}
|
||||||
|
\end{minipage}
|
||||||
|
\end{figure}
|
||||||
|
|
||||||
The base 2 is chosen for the logarithm as our computers rely on a system of the same base, but theoretically
|
The base 2 is chosen for the logarithm as our computers rely on a system of the same base, but theoretically
|
||||||
arbitrary bases can be used as they are proportional according to $\log_a b = \frac{\log_c b}{\log_c a} $.
|
arbitrary bases can be used as they are proportional according to $\log_a b = \frac{\log_c b}{\log_c a} $.
|
||||||
|
|
||||||
@@ -136,8 +193,23 @@ However, drawbacks include overfitting and poor robustness, where minimal altera
|
|||||||
can lead to a change in tree structure.
|
can lead to a change in tree structure.
|
||||||
|
|
||||||
\subsection{Cross-Entropy}
|
\subsection{Cross-Entropy}
|
||||||
Kullback-Leibler = $H(p,q) - H(p)$
|
When dealing with two distributions, the \textit{cross-entropy}, also called Kullback-Leibler divergence
|
||||||
as a cost function in machine learning
|
between a true distribution $p$
|
||||||
|
and an estimated distribution $q$ is defined as:
|
||||||
|
\begin{equation}
|
||||||
|
H(p, q) = -\sum_x p(x) \log_2 q(x)
|
||||||
|
\end{equation}
|
||||||
|
The \textit{Kullback–Leibler divergence} measures how much information is lost when $q$
|
||||||
|
is used to approximate $p$:
|
||||||
|
\begin{equation}
|
||||||
|
D_{KL}(p \| q) = H(p, q) - H(p)
|
||||||
|
\end{equation}
|
||||||
|
In machine learning, this term appears in many loss/cost functions — notably in classification problems
|
||||||
|
(cross-entropy loss) and in probabilistic models such as Variational Autoencoders (VAEs).
|
||||||
|
There, the true and predicted label are used as the true and estimated distribution, respectively.
|
||||||
|
In a supervised training example, the cross entropy loss degenerates to $-\log(p_{pred i})$ as the
|
||||||
|
true label vector is assumed to be the unit vector $e_i$ (one-hot).
|
||||||
|
|
||||||
\subsection{Coding}
|
\subsection{Coding}
|
||||||
%Coding of a source of an information and communication channel
|
%Coding of a source of an information and communication channel
|
||||||
% https://www.youtube.com/watch?v=ErfnhcEV1O8
|
% https://www.youtube.com/watch?v=ErfnhcEV1O8
|
||||||
|
|||||||
Reference in New Issue
Block a user