Skip to content

Gaspardooo/Optimal_text_layout

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimal Text Justification: Dynamic Programming

Project Overview

This project solves the algorithmic problem of optimal text layout on a virtual typewriter. The objective is to distribute a list of words across fixed-length lines to minimize the overall "penalty", defined here as the sum of the squares of the remaining spaces at the end of each line.

The study was empirically applied to excerpts from Marcel Proust's work to validate the robustness and efficiency of the algorithms on large texts.

Technical & Algorithmic Skills

The project demonstrates the progression in solving an optimization problem by comparing three distinct approaches:

1. Greedy Algorithm

  • Implementation of a heuristic consisting of placing the maximum possible number of words on the current line before moving to the next.
  • Analysis of its limitations: this local method does not guarantee the global optimum and can generate highly unbalanced lines at the end of a paragraph.

2. Recursive Algorithm with Memoization (Top-Down)

  • Definition of the mathematical recurrence relation to calculate the optimal layout cost.
  • Optimization of the recursive search using memoization (caching intermediate results) to avoid redundant calculations and reduce time complexity.

3. Dynamic Programming (Bottom-Up)

  • Iterative implementation of optimal cost calculations with array storage, thereby avoiding recursion limit exceedances (Stack Overflow) on complete texts.
  • Solution reconstruction: reverse traversal of the cost array to retrieve and format the exact text lines.

Complexity Analysis

  • The project includes an experimental execution time measurement module (mesure_temps_exec) to plot time complexity curves based on the number of words processed.
  • Empirical proof that only dynamic programming can process complete texts within a reasonable timeframe.

Repository Structure

  • code.py: Main script containing text processing functions, algorithmic implementations, and complexity analysis.
  • recherche_p1.txt / recherche_complet.txt: Text files containing the input datasets (raw texts).
  • Rapport_Problème_2.pdf: Detailed report including the mathematical formalization of the recurrence relation and conclusions on performance.

About

Solving the optimal text layout problem (Text Justification) by minimizing the sum of squared spaces at the end of lines. Compares greedy, top-down recursive with memoization, and bottom-up dynamic programming approaches.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages