Three Econometric Footnotes | Hidden LLN, KL Divergence in MLE, and what is Machine Learning?
The Hidden Weight of GMM Consistency Conditions Consider estimating parameter $\theta\in \Theta$ from data $\lbrace w_i\rbrace_{i \in [N]}$ Assume: Parameter space $\Theta\in \R^K$ is compact. The criterion function of GMM $$ > s_N(\theta) = s(\vec w; \theta) > $$ is continuous in $\theta$ $\forall, \vec w$. $s_N(\cdot)$ well behaves: $$ > \sup_{\theta\in \Theta}|s_N(\theta) - s_\infty(\theta) \xrightarrow{p}0. > $$ $s_\infty(\theta)$ has a unique minimum at $\theta_0$. Then $\hat \theta_{GMM} \xrightarrow{p}\theta$. The proof is essentially a topological argument — uniform convergence of continuous functions on a compact set, plus a unique minimum, pins down the limit of the minimizers. It is clean, elegant, and almost suspiciously general. ...