Releases: Baron-Sun/socialscikit
SocialSciKit v0.1.0 — Zero-code text analysis toolkit
SocialSciKit v0.1.0 — Initial Release
SocialSciKit is an open-source, zero-code toolkit for social science text analysis. It runs entirely in the browser, supports GPT / Claude / Ollama backends, and ships with a bilingual UI (English / 中文).
This initial release covers the full research lifecycle — from raw data to a publication-ready Methods section — through three independent modules plus a unified visualization dashboard.
📦 Three Core Modules
QuantiKit — Text Classification
End-to-end pipeline for supervised text classification.
- Method recommendation with CSS-literature citations (zero-shot / few-shot / fine-tuning)
- Annotation budget estimation via power-law learning-curve fitting, with 80% CI and marginal-return curves
- Built-in annotator (skip / undo / flag) with real-time progress chart
- Three classification paths: prompt classification (with APE-based prompt optimization), local transformer fine-tuning, OpenAI fine-tuning API
- Pipeline log export in JSON for downstream tools
QualiKit — Qualitative Coding
End-to-end pipeline for interview transcripts, focus groups, and open-ended surveys.
- PII de-identification with Chinese + English NER, per-item review and bulk acceptance
- Interactive research framework (RQs + sub-themes) with LLM-assisted sub-theme suggestion
- LLM batch coding grounded in a verbatim
evidence_spanextracted from the source text - Review workflow with confidence ranking, bulk accept, manual coding, cascading dropdowns
- Structured Excel export + pipeline log
Toolbox — Research Methods Tools
Three standalone utilities that work with any CSV or pipeline log.
- ICR Calculator: Cohen's Kappa, Krippendorff's Alpha, Multi-label Jaccard — supports 2 or more coders with auto metric selection
- Consensus Coding: dispatch the same coding task to 2–5 LLMs in parallel and aggregate via majority vote
- Methods Section Generator: auto-draft a bilingual Methods paragraph from an imported pipeline log or a short form
📊 Visualization Dashboard
Academic-style matplotlib charts embedded throughout both pipelines:
- QuantiKit Step 5 (Evaluation) — metric summary cards + row-normalized confusion-matrix heatmap + per-class P/R/F1 grouped bar chart
- QuantiKit Step 3 (Annotation) — live progress donut, updated after every action
- QualiKit Step 5 (Review) — review-progress donut + confidence histogram (with tier shading and median marker) + theme-distribution horizontal bar chart
- Toolbox ICR — pairwise agreement bar chart with "Good" and "Moderate" reference lines
All charts use a consistent blue / green / orange palette and include full CJK font support.
🔍 Evidence Highlighting
LLM coding in QualiKit is now grounded in verbatim evidence rather than opaque labels.
- The coding prompt requires the LLM to return an
evidence_span— the exact phrase or sentence from the source text that supports the assigned RQ / sub-theme. - In the review UI, the original text is rendered with the supporting quote highlighted in green at the correct position.
- When the quote can't be matched verbatim (e.g. paraphrased), a fallback "Evidence" block displays the cited text so reviewers can still audit the coding decision.
- Case-insensitive substring matching makes highlighting robust to minor capitalization differences.
This makes every LLM decision auditable — a critical step for IRB-facing qualitative research.
💾 Project Save & Restore
Save the full state of your research session — loaded DataFrames, annotation sessions (with cursor + history + elapsed time), extraction review sessions, research questions, de-identification results — to a single JSON file. Reload from the Home tab to resume work later. Tagged-union serialization keeps complex types (DataFrames, dataclasses, enums) losslessly round-tripping.
🌐 Runtime
| Component | Tested |
|---|---|
| Python | 3.9 – 3.12 |
| Gradio | 4.44+ |
| LLM backends | OpenAI (gpt-4o / gpt-4o-mini / gpt-4.1), Anthropic (Claude Sonnet 4 / Haiku 4.5), Ollama (Llama 3 / Mistral / Qwen 2.5) |
| Test suite | 676 tests passing |
🚀 Install & Launch
pip install socialscikit
socialscikit # launches the unified UI at http://127.0.0.1:7860