# Value function

> Mediated Wiki article. Canonical URL: https://mediated.wiki/source/Value_function
> Markdown URL: https://mediated.wiki/source/Value_function.md
> Source: https://en.wikipedia.org/wiki/Value_function
> Source revision: 1346694828
> License: Creative Commons Attribution-ShareAlike 4.0 International (https://creativecommons.org/licenses/by-sa/4.0/)

Maximized objective function of an optimization problem

The **value function** of an [optimization problem](/source/Optimization_problem) gives the [value](/source/Value_(mathematics)) attained by the [objective function](/source/Objective_function) at a solution, while only depending on the [parameters](/source/Parameter) of the problem.[1][2] In a [controlled](/source/Control_theory) [dynamical system](/source/Dynamical_system), the value function represents the optimal payoff of the system over the interval [ t , t 1 ] {\displaystyle [t,t_{1}]} when started at the time- t {\displaystyle t} [state variable](/source/State_variable) x ( t ) = x {\displaystyle x(t)=x} .[3] If the objective function represents some cost that is to be minimized, the value function can be interpreted as the cost to finish the optimal program, and is thus referred to as "cost-to-go function."[4][5] In an economic context, where the objective function usually represents [utility](/source/Utility), the value function is conceptually equivalent to the [indirect utility function](/source/Indirect_utility_function).[6][7]

In a problem of [optimal control](/source/Optimal_control), the value function is defined as the [supremum](/source/Supremum) of the objective function taken over the set of admissible controls. Given ( t 0 , x 0 ) ∈ [ 0 , t 1 ] × R d {\displaystyle (t_{0},x_{0})\in [0,t_{1}]\times \mathbb {R} ^{d}} , a typical optimal control problem is to

- maximize J ( t 0 , x 0 ; u ) = ∫ t 0 t 1 I ( t , x ( t ) , u ( t ) ) d t + ϕ ( x ( t 1 ) ) {\displaystyle {\text{maximize}}\quad J(t_{0},x_{0};u)=\int _{t_{0}}^{t_{1}}I(t,x(t),u(t))\,\mathrm {d} t+\phi (x(t_{1}))}

subject to

- d x ( t ) d t = f ( t , x ( t ) , u ( t ) ) {\displaystyle {\frac {\mathrm {d} x(t)}{\mathrm {d} t}}=f(t,x(t),u(t))}

with initial state variable x ( t 0 ) = x 0 {\displaystyle x(t_{0})=x_{0}} .[8] The objective function J ( t 0 , x 0 ; u ) {\displaystyle J(t_{0},x_{0};u)} is to be maximized over all admissible controls u ∈ U [ t 0 , t 1 ] {\displaystyle u\in U[t_{0},t_{1}]} , where u {\displaystyle u} is a [Lebesgue measurable function](/source/Measurable_function) from [ t 0 , t 1 ] {\displaystyle [t_{0},t_{1}]} to some prescribed arbitrary set in R m {\displaystyle \mathbb {R} ^{m}} . The value function is then defined as

V ( t , x ( t ) ) = max u ∈ U ∫ t t 1 I ( τ , x ( τ ) , u ( τ ) ) d τ + ϕ ( x ( t 1 ) ) {\displaystyle V(t,x(t))=\max _{u\in U}\int _{t}^{t_{1}}I(\tau ,x(\tau ),u(\tau ))\,\mathrm {d} \tau +\phi (x(t_{1}))}

with V ( t 1 , x ( t 1 ) ) = ϕ ( x ( t 1 ) ) {\displaystyle V(t_{1},x(t_{1}))=\phi (x(t_{1}))} , where ϕ ( x ( t 1 ) ) {\displaystyle \phi (x(t_{1}))} is the "scrap value". If the optimal pair of control and state trajectories is ( x ∗ , u ∗ ) {\displaystyle (x^{\ast },u^{\ast })} , then V ( t 0 , x 0 ) = J ( t 0 , x 0 ; u ∗ ) {\displaystyle V(t_{0},x_{0})=J(t_{0},x_{0};u^{\ast })} . The function h {\displaystyle h} that gives the optimal control u ∗ {\displaystyle u^{\ast }} based on the current state x {\displaystyle x} is called a feedback control policy,[4] or simply a policy function.[9]

Bellman's principle of optimality roughly states that any optimal policy at time t {\displaystyle t} , t 0 ≤ t ≤ t 1 {\displaystyle t_{0}\leq t\leq t_{1}} taking the current state x ( t ) {\displaystyle x(t)} as "new" initial condition must be optimal for the remaining problem. If the value function happens to be [continuously differentiable](/source/Differentiable_function),[10] this gives rise to an important [partial differential equation](/source/Partial_differential_equation) known as [Hamilton–Jacobi–Bellman equation](/source/Hamilton%E2%80%93Jacobi%E2%80%93Bellman_equation),

- − ∂ V ( t , x ) ∂ t = max u { I ( t , x , u ) + ∂ V ( t , x ) ∂ x f ( t , x , u ) } {\displaystyle -{\frac {\partial V(t,x)}{\partial t}}=\max _{u}\left\{I(t,x,u)+{\frac {\partial V(t,x)}{\partial x}}f(t,x,u)\right\}}

where the [maximand](https://en.wiktionary.org/wiki/maximand) on the right-hand side can also be re-written as the [Hamiltonian](/source/Hamiltonian_(control_theory)), H ( t , x , u , λ ) = I ( t , x , u ) + λ ( t ) f ( t , x , u ) {\displaystyle H\left(t,x,u,\lambda \right)=I(t,x,u)+\lambda (t)f(t,x,u)} , as

- − ∂ V ( t , x ) ∂ t = max u H ( t , x , u , λ ) {\displaystyle -{\frac {\partial V(t,x)}{\partial t}}=\max _{u}H(t,x,u,\lambda )}

with ∂ V ( t , x ) / ∂ x = λ ( t ) {\displaystyle \partial V(t,x)/\partial x=\lambda (t)} playing the role of the [costate variables](/source/Costate_variable).[11] Given this definition, we further have d λ ( t ) / d t = ∂ 2 V ( t , x ) / ∂ x ∂ t + ∂ 2 V ( t , x ) / ∂ x 2 ⋅ f ( x ) {\displaystyle \mathrm {d} \lambda (t)/\mathrm {d} t=\partial ^{2}V(t,x)/\partial x\partial t+\partial ^{2}V(t,x)/\partial x^{2}\cdot f(x)} , and after differentiating both sides of the HJB equation with respect to x {\displaystyle x} ,

- − ∂ 2 V ( t , x ) ∂ t ∂ x = ∂ I ∂ x + ∂ 2 V ( t , x ) ∂ x 2 f ( x ) + ∂ V ( t , x ) ∂ x ∂ f ( x ) ∂ x {\displaystyle -{\frac {\partial ^{2}V(t,x)}{\partial t\partial x}}={\frac {\partial I}{\partial x}}+{\frac {\partial ^{2}V(t,x)}{\partial x^{2}}}f(x)+{\frac {\partial V(t,x)}{\partial x}}{\frac {\partial f(x)}{\partial x}}}

which after replacing the appropriate terms recovers the [costate equation](/source/Costate_equation)

- − λ ˙ ( t ) = ∂ I ∂ x + λ ( t ) ∂ f ( x ) ∂ x ⏟ = ∂ H ∂ x {\displaystyle -{\dot {\lambda }}(t)=\underbrace {{\frac {\partial I}{\partial x}}+\lambda (t){\frac {\partial f(x)}{\partial x}}} _{={\frac {\partial H}{\partial x}}}}

where λ ˙ ( t ) {\displaystyle {\dot {\lambda }}(t)} is [Newton notation](/source/Newton_notation) for the derivative with respect to time.[12]

The value function is the unique [viscosity solution](/source/Viscosity_solution) to the Hamilton–Jacobi–Bellman equation.[13] In an [online](/source/Online_algorithm) closed-loop approximate optimal control, the value function is also a [Lyapunov function](/source/Lyapunov_function) that establishes global asymptotic stability of the closed-loop system.[14]

## References

1. **[^](#cite_ref-1)** [Fleming, Wendell H.](/source/Wendell_Fleming); Rishel, Raymond W. (1975). [*Deterministic and Stochastic Optimal Control*](https://books.google.com/books?id=qJDbBwAAQBAJ&pg=PA81). New York: Springer. pp. 81–83. [ISBN](/source/ISBN_(identifier)) [0-387-90155-8](https://en.wikipedia.org/wiki/Special:BookSources/0-387-90155-8).

1. **[^](#cite_ref-2)** Caputo, Michael R. (2005). [*Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications*](https://books.google.com/books?id=XZ2yYSVKWJkC&pg=PA185). New York: Cambridge University Press. p. 185. [ISBN](/source/ISBN_(identifier)) [0-521-60368-4](https://en.wikipedia.org/wiki/Special:BookSources/0-521-60368-4).

1. **[^](#cite_ref-3)** Weber, Thomas A. (2011). *Optimal Control Theory : with Applications in Economics*. Cambridge: The MIT Press. p. 82. [ISBN](/source/ISBN_(identifier)) [978-0-262-01573-8](https://en.wikipedia.org/wiki/Special:BookSources/978-0-262-01573-8).

1. ^ [***a***](#cite_ref-Bertsekas_Tsitsiklis_4-0) [***b***](#cite_ref-Bertsekas_Tsitsiklis_4-1) Bertsekas, Dimitri P.; Tsitsiklis, John N. (1996). *Neuro-Dynamic Programming*. Belmont: Athena Scientific. p. 2. [ISBN](/source/ISBN_(identifier)) [1-886529-10-8](https://en.wikipedia.org/wiki/Special:BookSources/1-886529-10-8).

1. **[^](#cite_ref-5)** ["EE365: Dynamic Programming"](https://stanford.edu/class/ee365/lectures/dp.pdf#page=3) (PDF).

1. **[^](#cite_ref-6)** [Mas-Colell, Andreu](/source/Andreu_Mas-Colell); [Whinston, Michael D.](/source/Michael_Whinston); Green, Jerry R. (1995). *Microeconomic Theory*. New York: Oxford University Press. p. 964. [ISBN](/source/ISBN_(identifier)) [0-19-507340-1](https://en.wikipedia.org/wiki/Special:BookSources/0-19-507340-1).

1. **[^](#cite_ref-7)** Corbae, Dean; Stinchcombe, Maxwell B.; Zeman, Juraj (2009). [*An Introduction to Mathematical Analysis for Economic Theory and Econometrics*](https://books.google.com/books?id=j5P83LtzVO8C&pg=PA145). Princeton University Press. p. 145. [ISBN](/source/ISBN_(identifier)) [978-0-691-11867-3](https://en.wikipedia.org/wiki/Special:BookSources/978-0-691-11867-3).

1. **[^](#cite_ref-8)** [Kamien, Morton I.](/source/Morton_Kamien); Schwartz, Nancy L. (1991). *Dynamic Optimization : The Calculus of Variations and Optimal Control in Economics and Management* (2nd ed.). Amsterdam: North-Holland. p. 259. [ISBN](/source/ISBN_(identifier)) [0-444-01609-0](https://en.wikipedia.org/wiki/Special:BookSources/0-444-01609-0).

1. **[^](#cite_ref-9)** [Ljungqvist, Lars](/source/Lars_Ljungqvist); [Sargent, Thomas J.](/source/Thomas_J._Sargent) (2018). [*Recursive Macroeconomic Theory*](https://books.google.com/books?id=Jm1qDwAAQBAJ&pg=PA106) (Fourth ed.). Cambridge: MIT Press. p. 106. [ISBN](/source/ISBN_(identifier)) [978-0-262-03866-9](https://en.wikipedia.org/wiki/Special:BookSources/978-0-262-03866-9).

1. **[^](#cite_ref-10)** Benveniste and [Scheinkman](/source/Jos%C3%A9_Scheinkman) established sufficient conditions for the differentiability of the value function, which in turn allows an application of the [envelope theorem](/source/Envelope_theorem), see Benveniste, L. M.; Scheinkman, J. A. (1979). "On the Differentiability of the Value Function in Dynamic Models of Economics". *Econometrica*. **47** (3): 727–732. [doi](/source/Doi_(identifier)):[10.2307/1910417](https://doi.org/10.2307%2F1910417). [JSTOR](/source/JSTOR_(identifier)) [1910417](https://www.jstor.org/stable/1910417). Also see Seierstad, Atle (1982). "Differentiability Properties of the Optimal Value Function in Control Theory". *Journal of Economic Dynamics and Control*. **4**: 303–310. [doi](/source/Doi_(identifier)):[10.1016/0165-1889(82)90019-7](https://doi.org/10.1016%2F0165-1889%2882%2990019-7).

1. **[^](#cite_ref-11)** Kirk, Donald E. (1970). *Optimal Control Theory*. Englewood Cliffs, NJ: Prentice-Hall. p. 88. [ISBN](/source/ISBN_(identifier)) [0-13-638098-0](https://en.wikipedia.org/wiki/Special:BookSources/0-13-638098-0).

1. **[^](#cite_ref-12)** Zhou, X. Y. (1990). "Maximum Principle, Dynamic Programming, and their Connection in Deterministic Control". *Journal of Optimization Theory and Applications*. **65** (2): 363–373. [doi](/source/Doi_(identifier)):[10.1007/BF01102352](https://doi.org/10.1007%2FBF01102352). [S2CID](/source/S2CID_(identifier)) [122333807](https://api.semanticscholar.org/CorpusID:122333807).

1. **[^](#cite_ref-13)** Theorem 10.1 in Bressan, Alberto (2019). ["Viscosity Solutions of Hamilton-Jacobi Equations and Optimal Control Problems"](http://personal.psu.edu/axb62/PSPDF/HJlnotes19.pdf#page=54) (PDF). *Lecture Notes*.

1. **[^](#cite_ref-14)** Kamalapurkar, Rushikesh; Walters, Patrick; Rosenfeld, Joel; Dixon, Warren (2018). ["Optimal Control and Lyapunov Stability"](https://books.google.com/books?id=R3haDwAAQBAJ&pg=PA27). *Reinforcement Learning for Optimal Feedback Control: A Lyapunov-Based Approach*. Berlin: Springer. pp. 26–27. [ISBN](/source/ISBN_(identifier)) [978-3-319-78383-3](https://en.wikipedia.org/wiki/Special:BookSources/978-3-319-78383-3).

## Further reading

- Caputo, Michael R. (2005). ["Necessary and Sufficient Conditions for Isoperimetric Problems"](https://books.google.com/books?id=XZ2yYSVKWJkC&pg=PA174). *Foundations of Dynamic Economic Analysis : Optimal Control Theory and Applications*. New York: Cambridge University Press. pp. 174–210. [ISBN](/source/ISBN_(identifier)) [0-521-60368-4](https://en.wikipedia.org/wiki/Special:BookSources/0-521-60368-4).

- Clarke, Frank H.; Loewen, Philip D. (1986). "The Value Function in Optimal Control: Sensitivity, Controllability, and Time-Optimality". *SIAM Journal on Control and Optimization*. **24** (2): 243–263. [doi](/source/Doi_(identifier)):[10.1137/0324014](https://doi.org/10.1137%2F0324014).

- LaFrance, Jeffrey T.; Barney, L. Dwayne (1991). ["The Envelope Theorem in Dynamic Optimization"](http://ageconsearch.umn.edu/record/259398/files/agecon-montanastate-003.pdf) (PDF). *Journal of Economic Dynamics and Control*. **15** (2): 355–385. [doi](/source/Doi_(identifier)):[10.1016/0165-1889(91)90018-V](https://doi.org/10.1016%2F0165-1889%2891%2990018-V).

- Stengel, Robert F. (1994). ["Conditions for Optimality"](https://books.google.com/books?id=jDjPxqm7Lw0C&pg=PA201). *Optimal Control and Estimation*. New York: Dover. pp. 201–222. [ISBN](/source/ISBN_(identifier)) [0-486-68200-5](https://en.wikipedia.org/wiki/Special:BookSources/0-486-68200-5).

---
Adapted from the Wikipedia article [Value function](https://en.wikipedia.org/wiki/Value_function) by Wikipedia contributors ([contributor history](https://en.wikipedia.org/wiki/Value_function?action=history)). Available under [Creative Commons Attribution-ShareAlike 4.0 International](https://creativecommons.org/licenses/by-sa/4.0/). Changes may have been made.