Hidden Markov Models (HMMs) with Chow–Liu tree emissions for modeling seasonal rainfall patterns in India
This project is inspired by this paper:
Kirshner, S., Smyth, P., & Robertson, A. W. (2004). Conditional Chow–Liu tree structures for modeling discrete-valued vector time series.
This project applies latent variable modeling and structure learning methods to analyze ten years of daily binary rainfall events across 54 weather stations in India. The goal is to uncover large scale seasonal rainfall regimes and understand how spatial dependency patterns change across these regimes. A Hidden Markov Model captures the temporal dynamics of latent seasonal states, while each state's spatial structure is learned using a Chow-Liu tree[5]
For more details, see Report.pdf.
Latent Temporal states:
The HMM models daily rainfall occurrence as a sequence of hidden states, with each of the three latent states representing a distinct seasonal rainfall pattern.
Each animation visualizes binary rainfall occurrence for all days assigned to each hidden state:
The plot below shows, for a single year, the inferred latent seasonal state sequence (left) alongside the actual daily rainfall observations across all stations (right). Each day is assigned to the state with the highest posterior probability from the forward–backward algorithm, illustrating how the HMM segments the year into transitional, monsoon, and dry periods.
A complete ten year sequence of observed rainfall and inferred hidden states is included here.
Learned spatial dependencies:
The emission distribution p(x|z=k) for each latent state z is modeled using a Chow-Liu tree
- Node colors represent the average rainfall frequency at each station across all days assigned to the corresponding latent state, edge width indicates the strength of Mutual Information between stations.
- Z = 0 corresponds to transitional season, Z = 1 corresponds to Rainy/Monsoon season, Z=2 corresponds to dry season.
Compared to a single global Chow–Liu tree learned on the full ten-year dataset, the EM-trained HMM with state-dependent Chow–Liu emissions improved mean pseudo-log-likelihood per day from −18.59 to −15.89 (about 15%) while yielding more coherent and regionally consistent spatial dependency patterns.
This project was made as the final project for my Graphical Models class at UCI under Prof. Alexander Ihler.
The Baum–Welch learning algorithm was borrowed from the CS179 class material.
The Chow–Liu implementation was borrowed and modified from the pyGMs library (see pyGMs-license.txt).
Data: The rainfall data come from the NOAA GHCN-Daily dataset (India, 1985–1995).
[1] Kirshner, S., Smyth, P., & Robertson, A. W. (2004). Conditional Chow–Liu tree structures for modeling discrete-valued vector time series. UAI’04. https://arxiv.org/abs/1207.4142
[2] Ihler, A. T., et al. Graphical Models for Statistical Inference and Data Assimilation. https://www.ics.uci.edu/~ihler/papers/physd07.pdf
[3] pyGMs Library (Alexander Ihler). https://github.com/ihler/pyGMs
[4] Learning from Data Notebook (pyGMs; Alexander Ihler). https://github.com/ihler/pyGMs/blob/master/notebooks/06%20Learning%20from%20Data.ipynb
[5] Chow, C., & Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14(3), 462–467. https://ieeexplore.ieee.org/document/1054142





