Skip to content

Releases: Baron-Sun/socialscikit

SocialSciKit v0.1.0 — Zero-code text analysis toolkit

18 Apr 09:23

Choose a tag to compare

SocialSciKit v0.1.0 — Initial Release

SocialSciKit is an open-source, zero-code toolkit for social science text analysis. It runs entirely in the browser, supports GPT / Claude / Ollama backends, and ships with a bilingual UI (English / 中文).

This initial release covers the full research lifecycle — from raw data to a publication-ready Methods section — through three independent modules plus a unified visualization dashboard.

📦 Three Core Modules

QuantiKit — Text Classification

End-to-end pipeline for supervised text classification.

  • Method recommendation with CSS-literature citations (zero-shot / few-shot / fine-tuning)
  • Annotation budget estimation via power-law learning-curve fitting, with 80% CI and marginal-return curves
  • Built-in annotator (skip / undo / flag) with real-time progress chart
  • Three classification paths: prompt classification (with APE-based prompt optimization), local transformer fine-tuning, OpenAI fine-tuning API
  • Pipeline log export in JSON for downstream tools

QualiKit — Qualitative Coding

End-to-end pipeline for interview transcripts, focus groups, and open-ended surveys.

  • PII de-identification with Chinese + English NER, per-item review and bulk acceptance
  • Interactive research framework (RQs + sub-themes) with LLM-assisted sub-theme suggestion
  • LLM batch coding grounded in a verbatim evidence_span extracted from the source text
  • Review workflow with confidence ranking, bulk accept, manual coding, cascading dropdowns
  • Structured Excel export + pipeline log

Toolbox — Research Methods Tools

Three standalone utilities that work with any CSV or pipeline log.

  • ICR Calculator: Cohen's Kappa, Krippendorff's Alpha, Multi-label Jaccard — supports 2 or more coders with auto metric selection
  • Consensus Coding: dispatch the same coding task to 2–5 LLMs in parallel and aggregate via majority vote
  • Methods Section Generator: auto-draft a bilingual Methods paragraph from an imported pipeline log or a short form

📊 Visualization Dashboard

Academic-style matplotlib charts embedded throughout both pipelines:

  • QuantiKit Step 5 (Evaluation) — metric summary cards + row-normalized confusion-matrix heatmap + per-class P/R/F1 grouped bar chart
  • QuantiKit Step 3 (Annotation) — live progress donut, updated after every action
  • QualiKit Step 5 (Review) — review-progress donut + confidence histogram (with tier shading and median marker) + theme-distribution horizontal bar chart
  • Toolbox ICR — pairwise agreement bar chart with "Good" and "Moderate" reference lines

All charts use a consistent blue / green / orange palette and include full CJK font support.

🔍 Evidence Highlighting

LLM coding in QualiKit is now grounded in verbatim evidence rather than opaque labels.

  • The coding prompt requires the LLM to return an evidence_span — the exact phrase or sentence from the source text that supports the assigned RQ / sub-theme.
  • In the review UI, the original text is rendered with the supporting quote highlighted in green at the correct position.
  • When the quote can't be matched verbatim (e.g. paraphrased), a fallback "Evidence" block displays the cited text so reviewers can still audit the coding decision.
  • Case-insensitive substring matching makes highlighting robust to minor capitalization differences.

This makes every LLM decision auditable — a critical step for IRB-facing qualitative research.

💾 Project Save & Restore

Save the full state of your research session — loaded DataFrames, annotation sessions (with cursor + history + elapsed time), extraction review sessions, research questions, de-identification results — to a single JSON file. Reload from the Home tab to resume work later. Tagged-union serialization keeps complex types (DataFrames, dataclasses, enums) losslessly round-tripping.

🌐 Runtime

Component Tested
Python 3.9 – 3.12
Gradio 4.44+
LLM backends OpenAI (gpt-4o / gpt-4o-mini / gpt-4.1), Anthropic (Claude Sonnet 4 / Haiku 4.5), Ollama (Llama 3 / Mistral / Qwen 2.5)
Test suite 676 tests passing

🚀 Install & Launch

pip install socialscikit
socialscikit            # launches the unified UI at http://127.0.0.1:7860