FREDA-CV

This repository contains the source code of our research article:

Privacy Preserving Federated Unsupervised Domain Adaptation with Application to Age Prediction from DNA Methylation Data

FREDA-CV is a simplified and task-agnostic implementation of our federated domain adaptation framework originally introduced in the FREDA repository.

The original FREDA implementation relies on domain similarity information, which may not always be available in real-world scenarios. FREDA-CV replaces this with a cross-validation-based strategy to tune the Lambda parameter, improving usability and reproducibility across datasets without requiring prior knowledge about inter-domain relationships.

Usage

The following arguments can be configured when running the main.py script:

Argument	Description	Default Value
`--setup`	Number of source clients to simulate.	`2`
`--use_precomputed_confs`	Whether to use precomputed confidence scores.	`True`
`--cv_folds`	Number of Folds to run cross validation for lambda prediction.	`5`
`--use_precomputed_lambdas`	Whether to use precomputed optimal lambdas.	`True`
`--lambda_path`	Path to a text file containing lambda values. If not provided, default values are used.	`None`
`--home_path`	Root directory for the project. Can be set to any desired path.	`Current directory`
`--alpha`	Weighting factor for the loss function.	`0.8`
`--epochs`	Number of local training epochs.	`20`
`--global_iterations`	Number of global iterations.	`100`
`--lr_init`	Initial learning rate.	`0.0001`
`--lr_final`	Final learning rate.	`0.00001`
`--k_value`	Exponent of the weight function for transforming confidences into weights.	`3`

Example Command

Here’s an example of how to run the experiment with sample arguments:

python main.py --setup 2 --use_precomputed_confs False --cv_folds 5 --use_precomputed_lambdas False --lambda_path ./lambdas.txt --home_path ./FREDA-CV/ --alpha 0.8 --epochs 20 --global_iterations 100 --lr_init 0.0001 --lr_final 0.00001 --k_value 3

Data

This project includes a utility to prepare federated domain adaptation datasets from any tabular .csv file or pandas.DataFrame.

You can use the generate_federated_simulation_data function (in generate_federated_data.py) to split your dataset across multiple source clients and a single target client. The data is saved in individual folders in plain .txt format.

⚠️ Note: Your dataset must include a column specifying the domain of each sample (e.g., region, hospital, tissues).

How to use

from generate_federated_data import generate_federated_simulation_data
import pandas as pd

# Load your dataset
df = pd.read_csv("your_dataset.csv")

# domain column must be available
df["domain"] = ...  # e.g., region, hospital, tissues, etc.

# Generate federated simulation data
generate_federated_simulation_data(
    df=df,
    label_column="your_label_column",
    domain_column="domain",
    num_clients=3,
    output_dir="data",
    target_domains=["target_domain_1"] # one or more domain values to act as the target
)

This will create the following folder structure:

data/
├── 0/
│   ├── x_train.txt
│   └── y_train.txt
├── 1/
│   └── ...
├── 2/
├── target/
    ├── x_train.txt
    └── y_train.txt

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
.gitignore		.gitignore
GPR.py		GPR.py
LICENSE.txt		LICENSE.txt
README.md		README.md
aggregator.py		aggregator.py
clients.py		clients.py
generate_federated_data.py		generate_federated_data.py
main.py		main.py
models.py		models.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FREDA-CV

Usage

Example Command

Data

How to use

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FREDA-CV

Usage

Example Command

Data

How to use

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages