Skip to content

WaveoffBioMed/darwin

 
 

Repository files navigation

darwin

R-CMD-check R-universe License: MIT

📖 Documentation: https://zaoqu-liu.github.io/darwin/

darwin is an R package for automatic marker gene selection using multi-objective evolutionary optimization. It implements the NSGA-II algorithm to identify Pareto-optimal gene subsets for bulk RNA-seq deconvolution.

✨ Features

  • Multi-objective optimization using NSGA-II algorithm
  • High performance with C++ implementations via RcppArmadillo
  • Flexible input: Supports matrices, data.frames, Seurat V4/V5, and SingleCellExperiment objects
  • Multiple objectives: Correlation, distance, condition number, and custom functions
  • Built-in deconvolution: NNLS, NuSVR, and linear regression methods
  • Parallel computing support for large-scale problems
  • Cross-platform: Works on Windows, macOS, and Linux

📦 Installation

From R-universe (Recommended)

install.packages("darwin", repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# Install remotes if needed
install.packages("remotes")

# Install darwin
remotes::install_github("Zaoqu-Liu/darwin")

🚀 Quick Start

library(darwin)

# Create reference expression matrix (cell types × genes)
set.seed(42)
reference <- matrix(abs(rnorm(500)), nrow = 5, ncol = 100)
rownames(reference) <- paste0("CellType", 1:5)
colnames(reference) <- paste0("Gene", 1:100)

# Initialize darwin
dw <- darwin(reference)

# Run optimization
dw$optimize(
  ngen = 100,                                  # Number of generations
  objectives = c("correlation", "distance"),  # Objectives to optimize
  weights = c(-1, 1)                           # Minimize corr, maximize dist
)

# Visualize Pareto front
dw$plot()

# Select optimal solution
dw$select(weights = c(-1, 1))

# Get selected genes
genes <- dw$get_genes()
print(genes)

# Perform deconvolution
bulk <- matrix(abs(rnorm(300)), nrow = 3, ncol = 100)
colnames(bulk) <- colnames(reference)
result <- dw$deconvolve(bulk, method = "nnls")
print(result$proportions)

📚 Documentation

🔧 Seurat Integration

# From Seurat object
dw <- darwin(
  seurat_obj,
  celltype_key = "cell_type",
  assay = "RNA",
  layer = "data"
)

# With highly variable genes only
dw <- darwin(
  seurat_obj,
  celltype_key = "cell_type",
  use_highly_variable = TRUE
)

📊 Supported Objective Functions

Objective Direction Description
correlation Minimize Total pairwise correlation between cell types
distance Maximize Total pairwise Euclidean distance
condition Minimize Condition number of reference matrix
Custom User-defined Any function returning a scalar

🔬 Methods

darwin uses the NSGA-II (Non-dominated Sorting Genetic Algorithm II) for multi-objective optimization:

  1. Non-dominated sorting: Solutions ranked by Pareto dominance
  2. Crowding distance: Maintains diversity in the Pareto front
  3. Tournament selection: Balances exploitation and exploration
  4. Genetic operators: Crossover and mutation for solution evolution

📖 Citation

If you use darwin in your research, please cite:

@software{darwin,
  author = {Liu, Zaoqu},
  title = {darwin: Multi-Objective Gene Selection for Bulk Deconvolution},
  year = {2024},
  url = {https://github.com/Zaoqu-Liu/darwin}
}

The algorithm is based on:

Aliee, H., & Theis, F. J. (2021). AutoGeneS: Automatic gene selection using multi-objective optimization for RNA-seq deconvolution. Cell Systems, 12(7), 706-715.e4.

📄 License

MIT © Zaoqu Liu

About

[Donor] Multi-Objective Gene Selection via Evolutionary Algorithms | 进化选基因

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • R 88.0%
  • C++ 12.0%