Skip to content

add dfe ci plots#31

Merged
xin-huang merged 1 commit into
mainfrom
dev_2
Jun 5, 2026
Merged

add dfe ci plots#31
xin-huang merged 1 commit into
mainfrom
dev_2

Conversation

@nomis-c

@nomis-c nomis-c commented May 16, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@xin-huang

Copy link
Copy Markdown
Owner

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 772479a3db

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread workflow/scripts/visualization/merge_dfe_ci.py Outdated
Comment thread workflow/scripts/visualization/plot_dfe_params.py Outdated
@xin-huang

Copy link
Copy Markdown
Owner

please address the above two bugs

@xin-huang

Copy link
Copy Markdown
Owner

@codex review

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 990f17ee20

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread workflow/scripts/visualization/merge_dfe_ci.py
Comment thread workflow/scripts/visualization/plot_dfe_params.py Outdated
@xin-huang

Copy link
Copy Markdown
Owner

I got this error from the great age dataset

line 131, in _run_infer_dm
    args.lbounds = check_params(args.lbounds, args.model, "--lbounds", args.misid)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Exception:
Found 3 demographic parameters from the option --lbounds; however, 2 demographic parameters are required from the two_epoch model
You might be using the wrong model or need to add --nomisid if you did not use ancestral allele information to polarize the fs.

I think this is because no ancestral allele information is provided for the great ape data, so for multiple datasets, the anc_allele flag is True only if all datasets have ancestral allele information. We could print a message to remind users and let them know which dataset does not have ancestral alleles.

@nomis-c

nomis-c commented May 18, 2026

Copy link
Copy Markdown
Collaborator Author

I encountered this issue before.
#16 (comment)

I wonder if adding anc_only=True to rule merge_dfe_confidence_intervals could work like this:

rule merge_dfe_confidence_intervals:
    input:
        bestfit_files=expand_1pop(
            "results/dadi/{species}/{dataset}/dfe/{ppl}/InferDFE/{ppl}.{ref_genome}.{demog}.{dfe}.InferDFE.bestfits",
            anc_only=True, **DADI_1D_KW, dfe=dadi_config["dfe_1d"], 
        ),
        ci_files=expand_1pop(
            "results/dadi/{species}/{dataset}/dfe/{ppl}/StatDFE/{ppl}.{ref_genome}.{demog}.{dfe}.godambe.ci",
            anc_only=True, **DADI_1D_KW, dfe=dadi_config["dfe_1d"], 
        ),
    ...
    params:
        populations=expand_1pop("{ppl}", anc_only=True),     
        datasets=expand_1pop("{dataset}", anc_only=True),      

So it only takes datasets in account that have anc_alleles provided for the plot.

@xin-huang

Copy link
Copy Markdown
Owner

We need to add --nomisid if we do not provide ancestral alllele information in the dadi workflow, which is missing in the current implementation

@xin-huang

Copy link
Copy Markdown
Owner

also revise this:

# Datasets that have ancestral allele information (required for selscan/polarization)
datasets_with_anc = [
ds for ds, cfg in dataset_configs.items()
if cfg.get("anc_alleles")
]

with

# Ancestral allele gating policy: all datasets must provide anc_alleles, otherwise disable anc-only globally.
datasets_without_anc = [
    ds for ds, cfg in dataset_configs.items()
    if not cfg.get("anc_alleles")
]

if datasets_without_anc:
    print(
        "Missing anc_alleles for dataset(s): "
        + ", ".join(datasets_without_anc)
        + ". Running workflow in no-ancestral-alleles mode (anc-only rules disabled for all datasets)."
    )
    datasets_with_anc = []
else:
    datasets_with_anc = list(dataset_configs.keys())

@nomis-c nomis-c requested a review from xin-huang May 19, 2026 11:47
Comment thread workflow/rules/common.smk Outdated
@xin-huang

Copy link
Copy Markdown
Owner
Missing input files for rule get_allele_counts:
    output: results/balancing_selection/betascan/Human/1000g_high/YRI/ac/YRI.hg38.21.ac
    wildcards: species=Human, dataset=1000g_high, ppl=YRI, ref_genome=hg38, i=21
    affected files:
        results/polarized_data/Human/1000g_high/1pop/YRI/YRI.21.biallelic.snps.repeats.removed.vcf.gz

@nomis-c

nomis-c commented May 22, 2026

Copy link
Copy Markdown
Collaborator Author
Missing input files for rule get_allele_counts:
    output: results/balancing_selection/betascan/Human/1000g_high/YRI/ac/YRI.hg38.21.ac
    wildcards: species=Human, dataset=1000g_high, ppl=YRI, ref_genome=hg38, i=21
    affected files:
        results/polarized_data/Human/1000g_high/1pop/YRI/YRI.21.biallelic.snps.repeats.removed.vcf.gz

When one dataset doesn’t have anc_alleles provided, all datasets are analyzed without anc_alleles, so no polarization data is produced. However at some areas, it still looked for polarized data, which caused this error. This should be now resolved. The issue regarding dfe_godambe_ci for PPA still remains.

cat dfe_godambe_ci.Pan.greatape.PPA.hg38.two_epoch.lognormal.log
maximum number of subdivisions (50) has been achieved.
  If increasing the limit yields no improvement it is advised to analyze
  the integrand in order to determine the difficulties.  If the position of a
  local difficulty can be determined (singularity, discontinuity) one will
  probably gain from splitting up the interval and calling the integrator
  on the subranges.  Perhaps a special-purpose integrator should be used.The maximum number of subdivisions (50) has been achieved.
  If increasing the limit yields no improvement it is advised to analyze
  the integrand in order to determine the difficulties.  If the position of a
  local difficulty can be determined (singularity, discontinuity) one will
  probably gain from splitting up the interval and calling the integrator
  on the subranges.  Perhaps a special-purpose integrator should be used.Traceback (most recent call last):
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/bin/dadi-cli", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/dadi_cli/__main__.py", line 72, in main
    args.runner(args)
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/dadi_cli/parsers/stat_dfe_parsers.py", line 64, in _run_stat_dfe
    godambe_stat_dfe(
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/dadi_cli/Stat.py", line 246, in godambe_stat_dfe
    uncerts_adj = dadi.Godambe.GIM_uncert(
                  ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/dadi/Godambe.py", line 289, in GIM_uncert
    GIM, H, J, cU = get_godambe(func_ex, grid_pts, all_boot, p0, data, eps, log,
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/dadi/Godambe.py", line 246, in get_godambe
    J_inv = numpy.linalg.inv(J)
            ^^^^^^^^^^^^^^^^^^^
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 561, in inv
    ainv = _umath_linalg.inv(a, signature=signature, extobj=extobj)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/lisc/data/scratch/admixlab/chen/conda_envs/cac3b26c75fb3f9f245d3329f9110a2f_/lib/python3.11/site-packages/numpy/linalg/linalg.py", line 112, in _raise_linalgerror_singular
    raise LinAlgError("Singular matrix")
numpy.linalg.LinAlgError: Singular matrix

@nomis-c

nomis-c commented May 25, 2026

Copy link
Copy Markdown
Collaborator Author

Hi @xin-huang
Do you have a suggestion on how to handle this issue, mentioned above? I think if we can fix this, then this PR is ready to merge and we can continue with the circos plots.

@xin-huang

Copy link
Copy Markdown
Owner

I do not have the bandwidth to investigate this issue in detail. However, I am not yet convinced that the numpy.linalg.LinAlgError: Singular matrix error was caused by sample size alone. Could you please look into this further, or provide additional evidence supporting that explanation?

@nomis-c

nomis-c commented May 28, 2026

Copy link
Copy Markdown
Collaborator Author

numpy.linalg.LinAlgError: Singular matrix error occured for greatape and 1kg_high human data. Only 1kg_low worked successfully.

pop.list samples

[chen@login01 selscape]$ wc -l results/dadi/Human/1000g_high/dfe/CHS/pop.list
5 results/dadi/Human/1000g_high/dfe/CHS/pop.list
[chen@login01 selscape]$ wc -l results/dadi/Human/1000g_high/dfe/YRI/pop.list
5 results/dadi/Human/1000g_high/dfe/YRI/pop.list
[chen@login01 selscape]$ wc -l results/dadi/Human/1000g_low/dfe/CHS/pop.list
105 results/dadi/Human/1000g_low/dfe/CHS/pop.list
[chen@login01 selscape]$ wc -l results/dadi/Human/1000g_low/dfe/YRI/pop.list
108 results/dadi/Human/1000g_low/dfe/YRI/pop.list
[chen@login01 selscape]$ wc -l results/dadi/Pan/greatape/dfe/PPA/pop.list
10 results/dadi/Pan/greatape/dfe/PPA/pop.list

While 1kg_low have over 100 samples for each population. greatape dataset has 10 PPA samples and 1kg_high has only 5 for each population.

frequency spectrum greatape

[chen@login01 selscape]$ cat results/dadi/Pan/greatape/dfe/PPA/fs/PPA.syn.hg38.dadi.fs
21 folded "PPA"
0 42 11 10 12 5 6 4 1 4 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1
[chen@login01 selscape]$ cat results/dadi/Pan/greatape/dfe/PPA/fs/PPA.nonsyn.hg38.dadi.fs
21 folded "PPA"
0 33 15 12 7 11 2 1 1 2 3 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1

frequency spectrum 1kg_high

[chen@login01 selscape]$ cat results/dadi/Human/1000g_high/dfe/CHS/fs/CHS.syn.hg38.dadi.fs
11 folded "CHS"
0 63 42 44 35 22 0 0 0 0 0
1 0 0 0 0 0 1 1 1 1 1
[chen@login01 selscape]$ cat results/dadi/Human/1000g_high/dfe/CHS/fs/CHS.nonsyn.hg38.dadi.fs
11 folded "CHS"
0 79 45 23 29 21 0 0 0 0 0
1 0 0 0 0 0 1 1 1 1 1

frequency spectrum 1kg_low

[chen@login01 selscape]$ cat results/dadi/Human/1000g_low/dfe/CHS/fs/CHS.syn.hg19.dadi.fs
211 folded "CHS"
0 215 47 21 11 10 8 13 2 3 6 3 3 6 3 4 1 1 2 3 2 8 2 4 3 3 3 1 4 4 4 5 3 0 5 2 4 1 3 6 3 4 1 1 4 3 3 1 2 1 3 2 3 0 2 2 4 2 3 3 6 1 0 4 2 2 2 1 1 4 2 2 1 1 3 4 4 1 4 3 3 3 1 3 3 3 2 2 1 1 3 4 2 0 2 5 0 0 5 2 3 0 0 2 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[chen@login01 selscape]$ cat results/dadi/Human/1000g_low/dfe/CHS/fs/CHS.nonsyn.hg19.dadi.fs
211 folded "CHS"
0 320 43 30 23 9 11 11 7 3 5 4 2 5 3 1 5 3 8 5 4 4 4 2 5 3 1 2 0 2 5 4 5 0 1 1 5 1 3 5 3 1 4 2 1 0 2 0 1 1 3 4 2 1 1 1 1 0 0 4 4 5 6 2 3 5 1 0 1 1 2 1 1 6 2 2 8 2 1 2 0 5 2 1 8 3 2 3 2 1 0 3 0 2 3 2 0 1 4 4 2 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

We can see that 1kg_low has more data within the fs comapred to greatape and 1kg_high. It should be enough to successfully ran the dadi analysi. When I used unfolded data it worked without problems.

bootstraps greatape

[chen@login01 selscape]$ cat results/dadi/Pan/greatape/dfe/PPA/StatDFE/PPA_bootstrapping_syn/PPA.hg38.syn.bootstrapping.25.fs
21 unfolded "PPA"
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[chen@login01 selscape]$ cat results/dadi/Pan/greatape/dfe/PPA/StatDFE/PPA_bootstrapping_syn/PPA.hg38.syn.bootstrapping.40.fs
21 unfolded "PPA"
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

bootstraps 1kg_low

[chen@login01 selscape]$ head -2 results/dadi/Human/1000g_low/dfe/CHS/StatDFE/CHS_bootstrapping_syn/CHS.hg19.syn.bootstrapping.1.fs
head -2 results/dadi/Human/1000g_low/dfe/CHS/StatDFE/CHS_bootstrapping_syn/CHS.hg19.syn.bootstrapping.2.fs
211 unfolded "CHS"
0 139 44 8 6 9 6 6 2 1 2 2 0 8 4 3 2 0 1 0 0 0 1 2 2 1 1 1 2 8 6 3 0 0 2 1 3 0 0 0 0 4 0 1 1 2 3 2 0 0 0 0 0 0 0 2 9 0 1 3 1 0 0 1 1 2 0 2 0 2 0 1 0 2 0 1 1 0 0 2 0 0 5 2 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 3 0 0 2 0 0 0 0 0 0 0 1 0 0 0 2 0 0 1 1 4 0 0 2 0 0 5 0 0 0 4 0 0 0 0 3 1 0 1 0 0 2 1 0 2 0 1 0 0 1 1 1 0 0 0 0 1 0 1 0 5 0 0 0 2 0 0 0 0 2 2 0 5 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 2 0 0 0 0 0 0 0 0 2 0 0 0 0 1 0 1 0 3 0
211 unfolded "CHS"
0 136 36 10 12 8 5 11 2 1 2 3 2 5 0 0 2 0 2 1 2 1 0 0 4 1 1 0 5 3 2 1 2 0 2 0 3 2 0 2 0 2 2 1 5 2 1 1 2 2 2 2 0 0 0 1 3 0 2 2 3 0 0 0 0 4 0 1 0 0 0 1 0 2 0 2 0 0 0 0 0 1 0 0 2 0 2 0 0 2 0 3 0 0 2 3 0 0 2 2 4 0 0 1 2 0 0 2 0 0 1 1 0 0 0 3 0 0 0 1 6 0 0 4 0 4 0 0 0 0 2 2 4 2 0 0 0 0 0 0 0 0 0 0 2 0 0 2 0 0 2 0 0 0 2 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 2 0 0 0 0 2 2 0 0 0 0 0 0 0 0 0 0 1 2 2 0 3 0 0 0 0 0 0 0 0 1 0 0 4 0 0 0 2 0 1 0

bootstraps 1kg_high

[chen@login01 selscape]$ head -2 results/dadi/Human/1000g_high/dfe/YRI/StatDFE/YRI_bootstrapping_syn/YRI.hg38.syn.bootstrapping.1.fs
head -2 results/dadi/Human/1000g_high/dfe/YRI/StatDFE/YRI_bootstrapping_syn/YRI.hg38.syn.bootstrapping.2.fs
head -2 results/dadi/Human/1000g_high/dfe/YRI/StatDFE/YRI_bootstrapping_syn/YRI.hg38.syn.bootstrapping.3.fs
11 unfolded "YRI"
0 0 0 0 0 0 0 0 0 0 0
11 unfolded "YRI"
0 0 0 0 0 0 0 0 0 0 0
11 unfolded "YRI"
0 0 0 0 0 0 0 0 0 0 0

It only shows 0 for 1kg_high and greatape which I looks like main cause of the problem. It also shows unfolded , even if it should display folded instead.

I compared 6_deleterious_dfe_dadi.smk between main and dev_2 branch, to check if there are unwanted changes. The only changes are intentional (adding nomisid_flag and get_dadi_param lambdas instead of dadi_config[...]. dfe_godambe_ci is identical in both branches.

I excluded config/greatape.yaml from config/main.yaml and run the analysis with only human datasets with anc_alleles. There were no issues in this case.

[chen@login01 fs]$ cat CHS.nonsyn.hg38.dadi.fs
11 unfolded "CHS"
0 69 38 15 18 21 9 7 5 10 0
1 0 0 0 0 0 0 0 0 0 1
[chen@login01 CHS_bootstrapping_non]$ cat CHS.hg38.nonsyn.bootstrapping.0.fs
11 unfolded "CHS"
0 58 25 12 12 17 8 2 5 3 0
1 0 0 0 0 0 0 0 0 0 1

So I guess there is something wrong with the current implementation of the ancestral allele gating.

@xin-huang

Copy link
Copy Markdown
Owner

Good, the bootstrapping spectrum indicates the cause of this numpy.linalg.LinAlgError: Singular matrix, because there are no variants in the spectrum.

Also, the sample size is not correct in the 1kg_high_cov spectrum. So you should check if there is anything wrong when generating the spectrum.

@nomis-c

nomis-c commented May 29, 2026

Copy link
Copy Markdown
Collaborator Author

Good, the bootstrapping spectrum indicates the cause of this numpy.linalg.LinAlgError: Singular matrix, because there are no variants in the spectrum.

Also, the sample size is not correct in the 1kg_high_cov spectrum. So you should check if there is anything wrong when generating the spectrum.

rule create_example_metadata:
input:
samples=rules.download_1KG_info.output.samples,
output:
metadata="examples/data/Human/metadata/example_metadata.txt",
params:
pop1=example_pops_high[0],
pop2=example_pops_high[1],
shell:
"""
grep -w {params.pop1} {input.samples} | awk '{{print $1}}' | head -5 \
| awk -v pop={params.pop1} '{{print $1"\\t"pop}}' > {output.metadata}
grep -w {params.pop2} {input.samples} | awk '{{print $1}}' | head -5 \
| awk -v pop={params.pop2} '{{print $1"\\t"pop}}' >> {output.metadata}
sed -i '1iSample\\tPopulation' {output.metadata}
"""

I think head -5 in this rule reduces the size of the samples.

@xin-huang

Copy link
Copy Markdown
Owner

However, the same settings worked previously in version 1.0.0, so the key question is why the bootstrapped spectrum is empty now.

@xin-huang

Copy link
Copy Markdown
Owner

any update?

@nomis-c

nomis-c commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

It is clear that currently using unpolarized data, it leads to empty bootstrapping fs, while using polarized it SNPs exist in the bootstrapping fs.

dadi-cli GenerateFs \
    --vcf results/processed_data/Human/1000g_high/1pop/CHS/CHS.biallelic.syn.snps.hg38.vcf.gz \
    --pop-info results/dadi/Human/1000g_high/dfe/CHS/pop.list \
    --pop-ids CHS \
    --projections 10 \
    --bootstrap 5 \
    --chunk-size 1000000 \
    --output tmp/test_bootstrap_not_polarized/CHS.test.syn

output:

cat tmp/test_bootstrap_not_polarized/CHS.test.syn.bootstrapping.0.fs
11 unfolded "CHS"
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0

output

dadi-cli GenerateFs \
    --vcf results/polarized_data/Human/1000g_high/1pop/CHS/CHS.biallelic.syn.snps.hg38.vcf.gz \
    --pop-info results/dadi/Human/1000g_high/dfe/CHS/pop.list \
    --pop-ids CHS \
    --projections 10 \
    --polarized \
    --bootstrap 5 \
    --chunk-size 1000000 \
    --output tmp/test_bootstrap_polarized/CHS.test.syn
cat tmp/test_bootstrap_polarized/CHS.test.syn.bootstrapping.0.fs
11 unfolded "CHS"
0 66 36 32 22 18 17 13 7 17 0
1 0 0 0 0 0 0 0 0 0 1

The issue is, that I still do not know, where the root problem is. In the past days I was looking at the dadi-cli source code, hoping I can find any clues.
I found this PR for dadi-cli which seems to describe the same issue we have (xin-huang/dadi-cli#136).

- dadi-cli=0.9.13

I suggest to update this version to - dadi-cli=0.9.14.

@nomis-c

nomis-c commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Because polarized ist always set to True, regardless of the configuration in dadi.yaml or our gating implementation, it explains why unfolded is displayed instead of folded with the bootstrap fs.

I also looked at the input files and 1kg_low contains an AA field, while greatape and 1kg_high do not.

[chen@login01 selscape]$ bcftools view results/processed_data/Human/1000g_low/1pop/CHS/CHS.biallelic.syn.snps.hg19.vcf.gz | grep -v "^#" | head -3 | cut -f8
AC=50;AF=0.251398;AN=210;NS=2504;DP=36169;EAS_AF=0.255;AMR_AF=0.1282;AFR_AF=0.4584;EUR_AF=0.1183;SAS_AF=0.1922;AA=.|||;VT=SNP;EX_TARGET
AC=1;AF=0.000199681;AN=210;NS=2504;DP=39542;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP;EX_TARGET
AC=1;AF=0.000199681;AN=210;NS=2504;DP=32300;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=.|||;VT=SNP;EX_TARGET

[chen@login01 selscape]$ bcftools view results/processed_data/Human/1000g_high/1pop/CHS/CHS.biallelic.syn.snps.hg38.vcf.gz | grep -v "^#" | head -3 | cut -f8
AC=1;AF=0.019;AN=10;BaseQRankSum=0.017;ClippingRankSum=0.041;DP=180978;ExcessHet=2.14748e+09;FS=27.395;InbreedingCoeff=-0.1472;MLEAC=126;MLEAF=0.025;MQ=51.88;MQ0=0;MQRankSum=-2.19;QD=0.54;ReadPosRankSum=0.269;SOR=8.135;VQSLOD=-43.89;culprit=SOR
AC=5;AF=0.499;AN=10;BaseQRankSum=0.182;ClippingRankSum=-0.016;DP=199953;ExcessHet=2.14748e+09;FS=66.534;InbreedingCoeff=-0.9978;MLEAC=2501;MLEAF=0.499;MQ=52.35;MQ0=0;MQRankSum=-6.242;QD=8.35;ReadPosRankSum=0.81;SOR=4.312;VQSLOD=-49.19;culprit=DP
AC=3;AF=0.283;AN=10;BaseQRankSum=0.629;ClippingRankSum=-0.005;DP=472730;ExcessHet=2.14748e+09;FS=0;InbreedingCoeff=-0.3982;MLEAC=1421;MLEAF=0.284;MQ=48.34;MQ0=0;MQRankSum=0.057;QD=2.93;ReadPosRankSum=0.659;SOR=0.69;VQSLOD=-310.3;culprit=DP

[chen@login01 selscape]$ bcftools view results/processed_data/Pan/greatape/1pop/PPA/PPA.biallelic.syn.snps.hg38.vcf.gz | grep -v "^#" | head -3 | cut -f8
AC=1;AF=0.002336;AN=20;BaseQRankSum=0.825;DP=5312;ExcessHet=3.0103;FS=0;InbreedingCoeff=-0.0032;MLEAC=1;MLEAF=0.002336;MQ=60;MQRankSum=0;QD=15.18;ReadPosRankSum=-0.056;SOR=1.022;mapability=1
AC=1;AF=0.002336;AN=20;BaseQRankSum=-0.96;DP=5884;ExcessHet=3.0103;FS=1.492;InbreedingCoeff=-0.0036;MLEAC=1;MLEAF=0.002336;MQ=60;MQRankSum=0;QD=22.87;ReadPosRankSum=-0.096;SOR=0.437;mapability=1
AC=1;AF=0.002336;AN=20;BaseQRankSum=-0.92;DP=5867;ExcessHet=3.0103;FS=0;InbreedingCoeff=-0.0036;MLEAC=1;MLEAF=0.002336;MQ=60;MQRankSum=0;QD=17.19;ReadPosRankSum=-0.99;SOR=0.749;mapability=1

This could explain why 1kg_low worked. Its processed VCF contains an AA field, so even with polarized=True, dadi could read the ancestral alles and generate non-empty bootstraps. greatape and 1kg_high do not have an AA field, so the forced polarized=True results in empty bootstraps.

@nomis-c

nomis-c commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

@xin-huang
By updating the dadi-cli version and environment, the issue should be resolved. There are no more error message from my end. Please check if you receive any errors regarding the dadi analysis.

@nomis-c nomis-c requested a review from xin-huang June 4, 2026 11:56
@xin-huang

Copy link
Copy Markdown
Owner

I will merge this one. Please continue adding the Circos plots.

@xin-huang xin-huang merged commit cd0f705 into main Jun 5, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants