A custom, high-performance implementation of Logistic Regression in Rust, designed to detect fake news articles. This project implements the machine learning algorithm from scratch without relying on heavy external ML frameworks, offering a transparent look at how text classification works under the hood.
- Custom Implementation: Logistic Regression algorithm implemented from scratch in pure Rust.
- Text Classification: Uses a Bag-of-Words model to classify news articles as "REAL" or "FAKE".
- Model Persistence: Automatically saves trained models and vocabulary to
model.jsonfor instant reuse. - Interactive CLI: Real-time prediction interface to test headlines and article text.
- Interpretability: extracts and displays the top keywords that influence the model's decision (e.g., words most strongly associated with Fake vs. Real news).
Ensure you have Rust and Cargo installed.
-
Clone the repository:
git clone https://github.com/lazarcloud/rust-logistic-regression.git cd rust-logistic-regression -
Build the project:
cargo build --release
The project expects training data in the data/ directory. You need two CSV files:
data/Fake.csvdata/True.csv
Note: The current CSV parser expects a specific simple format. Ensure your data matches the expected structure.
If no saved model exists (model.json), running the program will automatically start the training process:
cargo run --releaseOutput:
- Loads data from
data/Fake.csvanddata/True.csv. - Trains the Logistic Regression model (default: 1000 epochs).
- Evaluates accuracy on a test set (20% of data).
- Displays top 10 indicators for both Fake and Real news.
- Saves the model to
model.json.
If a model.json file is present, the program launches into interactive mode:
cargo run --releaseYou will be prompted to enter a news title and text:
Loaded existing model.
Enter news title and text (or type 'exit' to quit):
> "Breaking: Aliens land in Times Square!"
Prediction: 0.00% chance of being REAL news. Classified as: FAKE
To force retraining, simply delete the model.json file.
Accuracy: ~96.43% (on test set)
Top 10 Indicators of FAKE News: video : -5.1904 breaking : -4.9621 just : -4.3035 watch : -4.1985 gop : -3.9382 us : -3.5669 21st : -3.1620 isis : -3.0849 racist : -2.9408 century : -2.9341
Top 10 Indicators of REAL News: washington : 5.7267 factbox : 4.5522 u : 3.5536 moscow : 3.2589 york : 2.7964 ex : 2.4858 berlin : 2.4630 said : 2.3459 ipsos : 2.3039 says : 2.2567
src/main.rs: Entry point. Handles the flow between training and prediction.src/predict.rs: Core Logistic Regression logic (training, prediction, evaluation).src/vocabulary.rs: Handles text processing and mapping words to numerical indices.src/csv.rs: Data loading and parsing.src/save.rs: Serialization and deserialization of the model.src/cleanup.rs: Data cleaning utilities.
anyhow: Flexible error handling.serde&serde_json: For saving and loading the trained model.