This guide covers how to configure IPFS Datasets Python for your specific needs.
The library uses environment variables and configuration files for customization:
# Copy example configuration
cp configs.yaml.example configs.yaml
# Edit configuration
nano configs.yamlIPFS_HOST: IPFS daemon host (default:127.0.0.1)IPFS_PORT: IPFS daemon port (default:5001)IPFS_GATEWAY: IPFS gateway URL
STORAGE_PATH: Local storage directoryCACHE_SIZE: Cache size limitTEMP_DIR: Temporary files directory
VECTOR_STORE_BACKEND: Choose fromfaiss,qdrant,elasticsearchEMBEDDING_MODEL: Model for embeddings (default:sentence-transformers/all-mpnet-base-v2)VECTOR_DIMENSION: Embedding dimension
GRAPHRAG_MAX_DEPTH: Maximum graph traversal depthGRAPHRAG_SIMILARITY_THRESHOLD: Minimum similarity for connectionsCHUNK_SIZE: Document chunk sizeCHUNK_OVERLAP: Overlap between chunks
See Performance Optimization Guide for detailed tuning options.
See Security & Governance Guide for security settings.
For production deployment configuration, see:
Create a .env file in your project root:
# Copy example
cp .env.example .env
# Edit with your settings
nano .envCommon environment variables:
OPENAI_API_KEY: For OpenAI modelsANTHROPIC_API_KEY: For Claude modelsHUGGINGFACE_TOKEN: For Hugging Face models
Main configuration file structure:
ipfs:
host: "127.0.0.1"
port: 5001
gateway: "https://ipfs.io"
storage:
path: "./data"
cache_size: "10GB"
vector_store:
backend: "faiss"
embedding_model: "sentence-transformers/all-mpnet-base-v2"
graphrag:
max_depth: 3
similarity_threshold: 0.7
chunk_size: 1000
chunk_overlap: 200For SQL database configuration (optional):
database:
type: "postgresql"
host: "localhost"
port: 5432
name: "ipfs_datasets"Verify your configuration:
# Test IPFS connection
python -c "from ipfs_datasets_py import DatasetManager; dm = DatasetManager(); print('IPFS OK')"
# Test vector store
python -c "from ipfs_datasets_py.embeddings import FAISSVectorStore; vs = FAISSVectorStore(); print('Vector Store OK')"If you can't connect to IPFS:
- Ensure IPFS daemon is running:
ipfs daemon - Check IPFS API is accessible:
curl http://127.0.0.1:5001/api/v0/version - Verify firewall settings
If you have storage problems:
- Check disk space:
df -h - Verify permissions on storage directory
- Clear cache if needed:
rm -rf ./data/cache/*
- Installation Guide - Install dependencies
- User Guide - Learn how to use the library
- Developer Guide - Contributing and development