MDP-ValueIteration-Visualizer

This application implements a Markov Decision Process (MDP) to determine optimal navigation policies in a grid-based environment using the Value Iteration algorithm. The environment is represented as a 2D grid where each cell corresponds to a state with an associated reward value, initialized in the states array. Users can define sources (start points), destinations (goals), and obstacles (negative rewards) through an interactive interface. The core algorithm iteratively updates the utility of each state based on the Bellman Equation:

U(s) = R(s) + γ · max_a ∑ P(s'|s,a) · U(s')

where U(s) is the utility of state s, R(s) is the immediate reward, γ (discount factor) determines the importance of future rewards (set as 0.6), and P(s'|s,a) represents the transition probability from state s to s' via action a. The expected utility of actions is computed with smoothing to avoid abrupt direction changes, using:

E(a) = a · U(s_k-1) + (1 - 2a) · U(s_k) + a · U(s_k+1)

where a (adjustable via a trackbar) controls the stochastic behavior of transitions. After utility convergence, an optimal policy is extracted, mapping each state to its best next action, stored in the Policy array. The agent follows this policy from the start state to the goal, with real-time visualization of paths and utilities in an interactive grid interface. Additional features include heuristic adjustments (circumspect()), random scenario generation, and logging of path data and performance metrics like total distance and minimum distance to the target. Edge and corner cases are handled separately to prevent boundary errors.

MDP-ValueIteration-Visualizer

Overview of the Proposed System