Set identification

In statistics and econometrics, '''set identification''' (or '''partial identification''') extends the concept of identifiability (or "point identification") in statistical models to environments where the model and the distribution of observable variables are not sufficient to determine a unique value for the model parameters, but instead constrain the parameters to lie in a strict subset of the parameter space. Statistical models that are set (or partially) identified arise in a variety of settings in economics, including game theory and the Rubin causal model. Unlike approaches that deliver point-identification of the model parameters, methods from the literature on partial identification are used to obtain set estimates that are valid under weaker modelling assumptions.{{sfn|Tamer|2010}}

== History ==

Early works containing the main ideas of set identification included {{harvtxt|Frisch|1934}} and {{harvtxt|Marschak|Andrews|1944}}. However, the methods were significantly developed and promoted by Charles Manski, beginning with {{harvtxt|Manski|1989}} and {{harvtxt|Manski|1990}}.

Partial identification continues to be a major theme in research in econometrics. {{harvtxt|Powell|2017}} named partial identification as an example of theoretical progress in the econometrics literature, and {{harvtxt|Bonhomme|Shaikh|2017}} list partial identification as “one of the most prominent recent themes in econometrics.”

== Definition == Let <math> U \in \mathcal{U} \subseteq \mathbb{R}^{d_{u}} </math> denote a vector of latent variables, let <math> Z \in \mathcal{Z} \subseteq \mathbb{R}^{d_{z}} </math> denote a vector of observed (possibly endogenous) explanatory variables, and let <math display="inline"> Y \in \mathcal{Y} \subseteq \mathbb{R}^{d_{y}} </math> denote a vector of observed endogenous outcome variables. A '''structure''' is a pair <math> s= (h,\mathcal{P}_{U\mid Z})</math>, where <math> \mathcal{P}_{U\mid Z} </math> represents a collection of conditional distributions, and <math> h </math> is a structural function such that <math> h(y,z,u) = 0 </math> for all realizations <math> (y,z,u) </math> of the random vectors <math> (Y,Z,U) </math>. A '''model''' is a collection of admissible (i.e. possible) structures <math> s </math>.<ref name=":0">{{Cite journal |title=Generalized Instrumental Variable Models - The Econometric Society |url=https://www.econometricsociety.org/publications/econometrica/2017/05/01/generalized-instrumental-variable-models |access-date=2024-01-05 |website=www.econometricsociety.org |language=en |doi=10.3982/ecta12223|url-access=subscription }}</ref><ref name=":1">{{Cite journal |last=Matzkin |first=Rosa L. |date=2013-08-02 |title=Nonparametric Identification in Structural Economic Models |url=https://www.annualreviews.org/doi/10.1146/annurev-economics-082912-110231 |journal=Annual Review of Economics |language=en |volume=5 |issue=1 |pages=457–486 |doi=10.1146/annurev-economics-082912-110231 |issn=1941-1383|url-access=subscription }}</ref>

Let <math> \mathcal{P}_{Y\mid Z}(s) </math> denote the collection of conditional distributions of <math> Y \mid Z </math> consistent with the structure <math> s </math>. The admissible structures <math> s </math> and <math> s' </math> are said to be '''observationally equivalent''' if <math> \mathcal{P}_{Y\mid Z}(s) = \mathcal{P}_{Y\mid Z}(s')</math>.<ref name=":0" /><ref name=":1" /> Let <math> s^\star </math> denotes the true (i.e. data-generating) structure. The model is said to be point-identified if for every <math> s \neq s^\star </math> we have <math> \mathcal{P}_{Y\mid Z}(s) \neq \mathcal{P}_{Y\mid Z}(s^\star)</math>. More generally, the model is said to be '''set''' (or '''partially''') '''identified''' if there exists at least one admissible <math> s\neq s^\star </math> such that <math> \mathcal{P}_{Y\mid Z}(s)\neq \mathcal{P}_{Y\mid Z}(s^\star) </math>. The '''identified set''' of structures is the collection of admissible structures that are observationally equivalent to <math> s^\star </math>.{{sfn|Lewbel|2019}}

In most cases the definition can be substantially simplified. In particular, when <math> U </math> is independent of <math> Z </math> and has a known (up to some finite-dimensional parameter) distribution, and when <math> h </math> is known up to some finite-dimensional vector of parameters, each structure <math> s </math> can be characterized by a finite-dimensional parameter vector <math> \theta \in \Theta \subset \mathbb{R}^{d_{\theta}}</math>. If <math> \theta_0 </math> denotes the true (i.e. data-generating) vector of parameters, then the '''identified set''', often denoted as <math> \Theta_{I} \subset \Theta </math>, is the set of parameter values that are observationally equivalent to <math>\theta_0</math>.{{sfn|Lewbel|2019}}

== Example: missing data == This example is due to {{harvtxt|Tamer|2010}}. Suppose there are two binary random variables, {{math|''Y''}} and {{math|''Z''}}. The econometrician is interested in <math>\mathrm P(Y = 1)</math>. There is a missing data problem, however: {{math|''Y''}} can only be observed if <math>Z = 1</math>.

By the law of total probability, :<math>\mathrm P(Y = 1) = \mathrm P(Y = 1 \mid Z = 1) \mathrm P(Z = 1) + \mathrm P(Y = 1 \mid Z = 0) \mathrm P(Z = 0).</math> The only unknown object is <math>\mathrm P(Y = 1 \mid Z = 0)</math>, which is constrained to lie between 0 and 1. Therefore, the identified set is :<math>\Theta_I = \{ p \in [0, 1] : p = \mathrm P(Y = 1 \mid Z = 1) \mathrm P(Z = 1) + q \mathrm P(Z = 0), \text{ for some } q \in [0,1]\}.</math> Given the missing data constraint, the econometrician can only say that <math>\mathrm P(Y = 1) \in \Theta_I</math>. This makes use of all available information.

== Statistical inference == Set estimation cannot rely on the usual tools for statistical inference developed for point estimation. A literature in statistics and econometrics studies methods for statistical inference in the context of set-identified models, focusing on constructing confidence intervals or confidence regions with appropriate properties. For example, a method developed by {{harvtxt|Chernozhukov|Hong|Tamer|2007}} constructs confidence regions that cover the identified set with a given probability.

== Notes == {{reflist}}

== References == * {{cite journal | last1=Bonhomme | first1=Stephane | last2=Shaikh | first2=Azeem | title=Keeping the econ in econometrics:(micro-) econometrics in the journal of political economy. | journal=The Journal of Political Economy | volume=125 | issue=6 | date=2017 |pages=1846–1853 | doi=10.1086/694620}} * {{cite journal | last1=Chernozhukov | first1=Victor |authorlink1=Victor Chernozhukov | last2=Hong | first2=Han | last3=Tamer | first3=Elie | title=Estimation and Confidence Regions for Parameter Sets in Econometric Models | journal=Econometrica | publisher=The Econometric Society | volume=75 | issue=5 | year=2007 | issn=0012-9682 | doi=10.1111/j.1468-0262.2007.00794.x | pages=1243–1284| hdl=1721.1/63545 | hdl-access=free }} * {{cite book | last=Frisch | first=Ragnar |author-link=Ragnar Frisch | title=Statistical Confluence Analysis by means of Complete Regression Systems | publisher=University Institute of Economics, Oslo | date =1934 }} * {{cite journal | last=Manski | first=Charles | title=Anatomy of the Selection Problem | journal=The Journal of Human Resources | volume=24 | issue=3 | date=1989 |pages=343–360 |doi=10.2307/145818| jstor=145818 }} * {{cite journal | last=Manski | first=Charles | title=Nonparametric Bounds on Treatment Effects | journal=The American Economic Review | volume=80 | issue=2 | date=1990 |pages=319–323 | jstor = 2006592}} * {{cite journal | last1=Marschak | first1=Jacob | last2=Andrews | first2=Williams | title=Random Simultaneous Equations and the Theory of Production | journal=Econometrica | publisher=The Econometric Society | volume=12 | issue=3/4 | date=1944 | doi=10.2307/1905432 | pages=143–205 | jstor=1905432 }} * {{cite journal | last=Powell | first=James | title=Identification and Asymptotic Approximations: Three Examples of Progress in Econometric Theory | journal=Journal of Economic Perspectives | volume=31 | issue=2 | date=2017 |pages=107–124 |doi=10.1257/jep.31.2.107 |doi-access=free}} * {{cite journal | last=Lewbel | first=Arthur |author-link=Arthur Lewbel | title=The Identification Zoo: Meanings of Identification in Econometrics | journal=Journal of Economic Literature | publisher=American Economic Association | volume=57 | issue=4 | date=2019-12-01 | issn=0022-0515 | doi=10.1257/jel.20181361 | pages=835–903 | s2cid=125792293 }} *{{Cite journal| doi = 10.1146/annurev.economics.050708.143401| volume = 2| issue = 1| pages = 167–195| last = Tamer| first = Elie| title = Partial Identification in Econometrics| journal = Annual Review of Economics| date = 2010| url = https://nrs.harvard.edu/urn-3:HUL.InstRepos:34728615| url-access = subscription}}

== Further reading == * {{cite book | last1=Ho | first1=Kate | author-link=Kate Ho | last2=Rosen | first2=Adam M. | editor-last=Honore | editor-first=Bo |editor-link=Bo Honoré | editor-last2=Pakes | editor-first2=Ariel |editor-link2=Ariél Pakes | editor-last3=Piazzesi | editor-first3=Monika |editor-link3=Monika Piazzesi | editor-last4=Samuelson | editor-first4=Larry |editor-link4=Larry Samuelson | title=Advances in Economics and Econometrics | chapter=Partial Identification in Applied Research: Benefits and Challenges | year=2017 | pages=307–359 | publisher=Cambridge University Press | location=Cambridge | isbn=978-1-108-22722-3 | doi=10.1017/9781108227223.010 | url=https://www.nber.org/papers/w21641.pdf | chapter-url=https://scholar.princeton.edu/sites/default/files/kateho/files/wc-paper-05august2016.pdf}} *{{Cite journal| doi = 10.1111/1468-0262.00144| issn = 0012-9682| volume = 68| issue = 4| pages = 997–1010| last1 = Manski| first1 = Charles F.| authorlink1 = Charles Manski | last2 = Pepper| first2 = John V.| title = Monotone Instrumental Variables: With an Application to the Returns to Schooling| journal = Econometrica| date = July 2000| jstor = 2999533| url = https://www.nber.org/papers/t0224.pdf}} *{{Cite book| publisher = Springer-Verlag| isbn = 978-0-387-00454-9| last = Manski| first = Charles F.| author-link = Charles Manski | title = Partial Identification of Probability Distributions| location = New York| date = 2003}}

Category:Econometric modeling Category:Estimation theory