Skip to content

rust-deep-learning/rust-logistic-regression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rust Logistic Regression for Fake News Detection

A custom, high-performance implementation of Logistic Regression in Rust, designed to detect fake news articles. This project implements the machine learning algorithm from scratch without relying on heavy external ML frameworks, offering a transparent look at how text classification works under the hood.

Features

  • Custom Implementation: Logistic Regression algorithm implemented from scratch in pure Rust.
  • Text Classification: Uses a Bag-of-Words model to classify news articles as "REAL" or "FAKE".
  • Model Persistence: Automatically saves trained models and vocabulary to model.json for instant reuse.
  • Interactive CLI: Real-time prediction interface to test headlines and article text.
  • Interpretability: extracts and displays the top keywords that influence the model's decision (e.g., words most strongly associated with Fake vs. Real news).

Installation

Ensure you have Rust and Cargo installed.

  1. Clone the repository:

    git clone https://github.com/lazarcloud/rust-logistic-regression.git
    cd rust-logistic-regression
  2. Build the project:

    cargo build --release

Data Setup

The project expects training data in the data/ directory. You need two CSV files:

  • data/Fake.csv
  • data/True.csv

Note: The current CSV parser expects a specific simple format. Ensure your data matches the expected structure.

Usage

Training

If no saved model exists (model.json), running the program will automatically start the training process:

cargo run --release

Output:

  • Loads data from data/Fake.csv and data/True.csv.
  • Trains the Logistic Regression model (default: 1000 epochs).
  • Evaluates accuracy on a test set (20% of data).
  • Displays top 10 indicators for both Fake and Real news.
  • Saves the model to model.json.

Prediction (Interactive Mode)

If a model.json file is present, the program launches into interactive mode:

cargo run --release

You will be prompted to enter a news title and text:

Loaded existing model.
Enter news title and text (or type 'exit' to quit):
> "Breaking: Aliens land in Times Square!"
Prediction: 0.00% chance of being REAL news. Classified as: FAKE

To force retraining, simply delete the model.json file.

Performance & Insights

Accuracy: ~96.43% (on test set)

Top 10 Indicators of FAKE News: video : -5.1904 breaking : -4.9621 just : -4.3035 watch : -4.1985 gop : -3.9382 us : -3.5669 21st : -3.1620 isis : -3.0849 racist : -2.9408 century : -2.9341

Top 10 Indicators of REAL News: washington : 5.7267 factbox : 4.5522 u : 3.5536 moscow : 3.2589 york : 2.7964 ex : 2.4858 berlin : 2.4630 said : 2.3459 ipsos : 2.3039 says : 2.2567

Project Structure

  • src/main.rs: Entry point. Handles the flow between training and prediction.
  • src/predict.rs: Core Logistic Regression logic (training, prediction, evaluation).
  • src/vocabulary.rs: Handles text processing and mapping words to numerical indices.
  • src/csv.rs: Data loading and parsing.
  • src/save.rs: Serialization and deserialization of the model.
  • src/cleanup.rs: Data cleaning utilities.

Dependencies

  • anyhow: Flexible error handling.
  • serde & serde_json: For saving and loading the trained model.

About

Logistic Regression for fake news detection implemented in Rust

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages