Skip to content

LA3D/cogitarelink-experimental

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

154 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cogitarelink

Installation

pip install cogitarelink

Dependencies

Cogitarelink requires the following key dependencies:

  • pyld: JSON-LD processing
  • rdflib: RDF data manipulation (optional)
  • pydantic: Data validation and settings management
  • fastcore: Core utilities

If you encounter import errors, ensure all dependencies are installed:

pip install pyld rdflib pydantic fastcore httpx

Overview

Cogitarelink (“to think connectedly”) is a Python library for working with Linked Open Data as semantic memory for LLMs and agents. It enables processing, validation, and navigation of JSON-LD 1.1 data with context awareness, allowing intelligent systems to build and maintain verifiable knowledge representations.

Motivation

Current AI systems face significant challenges with knowledge management:

  • Knowledge Fragmentation: Information is scattered across documents, databases, and unstructured sources
  • Context Loss: Knowledge is stripped of its semantic context when converted to vector embeddings
  • Verification Difficulty: Generated content can’t be easily traced to source facts
  • Reasoning Opacity: Most reasoning happens in opaque model weights rather than explainable logical steps
  • Trust Boundaries: Agents lack mechanisms to cryptographically verify information exchanged with other agents

Cogitarelink addresses these challenges by providing a structured, semantic memory system that preserves context, enables verification, and supports transparent reasoning.

Core Philosophy

Cogitarelink follows these key design principles:

  • Micro-Kernel Architecture: Core functionality in ~600 LOC, with business logic in data artifacts (contexts, ontologies, rules)
  • Knowledge as Graphs: All information represented as semantic graphs with explicit relationships
  • Verifiable Provenance: Every fact tracked with its origin, enabling source attribution and trust assessment
  • Agent-Centric Design: Optimized for use by LLMs and agents rather than human developers
  • Domain-Agnostic: Framework provides tools and patterns applicable across knowledge domains

Development Approach

Cogitarelink uses an agent-driven development approach:

  • Large parts of the codebase are written and maintained by AI coding agents (Claude Code and OpenAI Codex)
  • Business logic lives in data artifacts (SPARQL queries, SHACL rules) generated by AI agents at runtime
  • Domain-specific knowledge is encoded in ontologies and rules rather than Python code
  • The core team focuses on architectural patterns and guiding the overall system design
  • Continuous integration and testing ensure high quality of AI-generated contributions

This innovative development model enables rapid iteration while maintaining quality and architectural coherence.

Key Capabilities

  • Entity Management: Immutable, normalized entities with deterministic signatures
  • Vocabulary Registry: Centralized management of semantic vocabularies with collision detection
  • Context Processing: Expansion, compaction and normalization of JSON-LD contexts
  • Graph Storage: In-memory or RDFLib-backed storage with query capabilities
  • SPARQL Integration: Execute queries against endpoints with ontology-based validation
  • Provenance Tracking: Record source information for all derived facts
  • Temporal Reasoning: Reason about time-based relationships and sequences
  • Verification: Cryptographic signing and validation with SHACL support

Architecture Overview

Cogitarelink employs a modular architecture organized around several key subsystems:

Component Structure

cogitarelink/
  ├── core/          # Core data structures and processing
  │   ├── debug.py   # Logging and diagnostics
  │   ├── cache.py   # LRU/TTL caching
  │   ├── context.py # Context processing
  │   ├── entity.py  # Entity representation
  │   ├── graph.py   # Graph storage and query
  │   └── temporal.py # Temporal relationships
  │
  ├── vocab/         # Vocabulary management
  │   ├── registry.py  # Prefix → vocabulary mapping
  │   ├── collision.py # Term collision detection
  │   └── composer.py  # Context composition
  │
  ├── reason/        # Reasoning and inference
  │   ├── afford.py  # Shape/rule hints
  │   ├── obqc.py    # Ontology-based query checking
  │   ├── prov.py    # Provenance tracking
  │   └── sandbox.py # Rule execution environment
  │
  ├── tools/         # Integration tools
  │   ├── reason.py   # Function bridge to reason_over
  │   ├── sparql.py   # SPARQL query execution
  │   └── temporal.py # Temporal reasoning utilities
  │
  ├── verify/        # Verification and security
  │   ├── signer.py    # Cryptographic signatures
  │   └── validator.py # SHACL validation
  │
  ├── cli/           # Command-line interfaces
  │   ├── cli.py       # Base CLI
  │   └── agent_cli.py # Agent-enabled CLI
  │
  └── integration/   # External system integration
      └── retriever.py # Linked Open Data retrieval

Knowledge Artifact Layers

Cogitarelink organizes semantic knowledge into a layered stack:

  1. Context Layer (*.context.jsonld)
    • Term to IRI mappings
    • Localized vocabulary subsets
    • Compact JSON representation (<5KB)
  2. Ontology Layer (ontology.ttl/.jsonld)
    • Class/property definitions
    • Domain/range constraints
    • Semantic relationships
  3. Shapes/Rules Layer (shapes.ttl, rules.ttl)
    • SHACL constraints on data shape
    • SPARQL rules for inference
    • Validation patterns
  4. Data Layer (*.jsonld)
    • Actual entity instances
    • Event records
    • Knowledge statements

Data Flow

  1. Input: JSON-LD documents enter through EntityProcessor
  2. Normalization: Documents are expanded and normalized via ContextProcessor
  3. Storage: Entities and relationships are stored in GraphManager
  4. Reasoning: Rules applied through sandbox to derive new facts
  5. Verification: SHACL validation against shapes graphs
  6. Provenance: All derived facts wrapped with provenance information

Quick Start

Cogitarelink provides several ways to work with semantic data, from basic entity management to advanced reasoning and integration with external knowledge sources.

Basic Entity and Graph Operations

from cogitarelink.core.entity import Entity
from cogitarelink.core.processor import EntityProcessor
from cogitarelink.core.graph import GraphManager
from cogitarelink.core.context import ContextProcessor

# Create the core components
ctx_proc = ContextProcessor()
graph = GraphManager(use_rdflib=True)  # Use RDFLib backend for SPARQL support
processor = EntityProcessor(ctx_proc, graph)

# Add entities with metadata
person = processor.add({
    "@type": "Person",
    "name": "Alice Smith",
    "jobTitle": "Software Engineer",
    "knows": {"@type": "Person", "name": "Bob Jones"}
}, vocab=["schema"])  # Use Schema.org vocabulary

# Query the graph (triple pattern matching)
person_id = person.id
connections = graph.query(subj=person_id, pred="http://schema.org/knows")

# Retrieve entities by ID or relationships
same_person = processor.get_by_id(person_id)
child_entities = processor.get_children(person_id)

Using SPARQL for Queries and Knowledge Retrieval

from cogitarelink.tools.sparql import sparql_query
from cogitarelink.tools.sparql_tools import validate_query_against_ontology

# Create a SPARQL query
query = """
PREFIX schema: <http://schema.org/>
SELECT ?person ?name WHERE {
  ?person a schema:Person ;
          schema:name ?name .
}
"""

# Validate the query against an ontology (optional)
validation = validate_query_against_ontology(query, ontology_path="path/to/ontology.ttl")
if validation.get("valid", False):
    # Execute against external endpoint
    results = sparql_query(
        endpoint_url="https://query.wikidata.org/sparql",
        query=query,
        store_result=True,  # Store in GraphManager
        graph_id="people_data"  # Use named graph
    )

Provenance and Temporal Reasoning

from cogitarelink.reason.prov import wrap_patch_with_prov
from cogitarelink.core.temporal import TemporalRelationship

# Record provenance for derived facts
source_graph = "https://example.org/source_data"
agent_id = "https://example.org/agents/reasoner1"
new_triple = (person_id, "http://schema.org/award", "http://example.org/award/123")

# Add triple with provenance
with wrap_patch_with_prov(graph, source=source_graph, agent=agent_id):
    graph.add_triple(*new_triple)

# Work with temporal relationships
temporal_rel = TemporalRelationship.before(
    "http://example.org/event1", 
    "http://example.org/event2"
)
graph.add_temporal_relationship(temporal_rel)
# Working with Entities
from cogitarelink.core.entity import Entity
from cogitarelink.vocab.registry import registry
from cogitarelink.vocab.composer import composer

# Explore available vocabularies
print(f"Available vocabularies: {', '.join(registry._v.keys())}")

# Create an entity with Schema.org vocabulary
person = Entity(vocab=["schema"], content={
    "@type": "Person",
    "name": "Alice Smith",
    "jobTitle": "Software Engineer",
    "email": "alice@example.com",
    "knows": {"@type": "Person", "name": "Bob Jones"}
})

# Entities are immutable and automatically assign IDs if not provided
print(f"Entity ID: {person.id}")

# Get the full JSON-LD representation with context
json_ld = person.as_json
print(f"JSON-LD representation includes {len(json_ld['@context'])} context terms")

# Entities automatically extract nested entities with @type as children
print(f"Number of child entities: {len(person.children)}")
if person.children:
    print(f"First child type: {person.children[0].content.get('@type')}")
    
# Entities provide deterministic normalization for signing and hashing
print(f"Entity signature (SHA-256): {person.sha256[:16]}...")

# Normalized representation (canonical form)
print("\nNormalized representation:")
norm = person.normalized
print(f"  - Type: {type(norm)}")
print(f"  - Length: {len(norm)}")
print(f"  - Sample: {norm[:60]}...")
# Working with the Entity Processor and Graph
from cogitarelink.core.processor import EntityProcessor
from cogitarelink.core.graph import GraphManager
from cogitarelink.core.context import ContextProcessor

# Create core components
ctx_proc = ContextProcessor()
graph = GraphManager(use_rdflib="auto")  # Uses RDFLib if available, otherwise in-memory
processor = EntityProcessor(ctx_proc, graph)

# Add a document with multiple entities
document = {
    "@type": "CreativeWork",
    "name": "Research Paper on Semantic Web",
    "description": "An exploration of knowledge graphs and semantic technologies.",
    "author": {
        "@type": "Person",
        "name": "Dr. Smith",
        "affiliation": {
            "@type": "Organization",
            "name": "Example University",
            "location": {
                "@type": "Place",
                "name": "Example City"
            }
        }
    },
    "keywords": ["research", "linked data", "semantic web"]
}

# Add to processor - automatically handles nested entities
entity = processor.add(document, vocab=["schema"])

# Query related entities
print(f"Main entity: {entity.id} ({entity.content.get('@type')})")

# Get child entities
children = processor.get_children(entity.id)
print(f"Found {len(children)} direct child entities")
for child in children:
    print(f"  - {child.content.get('@type')}: {child.content.get('name')}")

# Graph queries
print("\nGraph query results:")
author_triples = graph.query(pred="http://schema.org/author")
if author_triples:
    author_id = author_triples[0][2]  # Object of the triple
    print(f"Found author with ID: {author_id}")
    
    # Find information about the author
    name_triples = graph.query(subj=author_id, pred="http://schema.org/name")
    if name_triples:
        print(f"Author name: {name_triples[0][2]}")
        
    # Find author affiliation
    affiliation_triples = graph.query(subj=author_id, pred="http://schema.org/affiliation")
    if affiliation_triples:
        affiliation_id = affiliation_triples[0][2]
        print(f"Author affiliated with: {processor.get_by_id(affiliation_id).content.get('name')}")

Working with Vocabularies and Registry

The Vocabulary Registry is a central component for managing semantic vocabularies and composing JSON-LD contexts.

from cogitarelink.vocab.registry import registry
from cogitarelink.vocab.composer import composer
from cogitarelink.vocab.collision import collision_detector

# List available vocabularies in the registry
print("Available vocabularies:")
for prefix in registry._v.keys():
    entry = registry[prefix]
    uri = entry.uris.get("primary", "No primary URI")
    print(f"  {prefix}: {uri}")

# Compose a context from multiple vocabularies
context = composer.compose(["schema", "dc"])
print(f"\nComposed context with {len(context['@context'])} terms")

# Look at a sample of terms
sample_terms = list(context['@context'].keys())[:5]
print(f"Sample terms: {', '.join(sample_terms)}")

# Check for term collisions between vocabularies
collisions = collision_detector.detect_collisions(["schema", "foaf"])
if collisions:
    print("\nDetected vocabulary collisions:")
    for term, entries in collisions.items():
        print(f"  Term '{term}' defined differently in: {', '.join(entries.keys())}")

The registry is designed to handle vocabulary conflicts gracefully. When term collisions are detected, Cogitarelink can:

  1. Apply resolution strategies (prefix, prioritize, merge, or error)
  2. Create custom subsets of vocabularies to avoid conflicts
  3. Generate collision reports to help diagnose issues

This ensures that agent systems using different vocabulary sources can interact without semantic misalignment.

SPARQL Integration and Linked Data Retrieval

Cogitarelink provides tools for working with SPARQL endpoints and retrieving Linked Open Data, enabling agents to explore the web of data.

SPARQL Query Execution and Validation

from cogitarelink.tools.sparql import sparql_query, sparql_discover
from cogitarelink.tools.sparql_tools import validate_query_against_ontology

# Discover endpoint capabilities
endpoint = "https://query.wikidata.org/sparql"
metadata = sparql_discover(endpoint)
print(f"Discovered {len(metadata.get('predicates', []))} predicates at endpoint")

# Validate a SPARQL query against ontology constraints
query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX schema: <http://schema.org/>

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q146 .  # instance of cat
  ?item schema:described ?desc .  # Intentional error - not a valid property
}
"""

# Get validation results
validation = validate_query_against_ontology(query, ontology_path="cogitarelink/data/system/obqc.ttl")
if not validation.get("valid", True):
    print("Query validation failed:")
    for violation in validation.get("violations", []):
        print(f"  - {violation}")
    
    # Auto-fix the query
    from cogitarelink.tools.sparql_tools import refine_query_with_ontology
    refined = refine_query_with_ontology(query, ontology_path="cogitarelink/data/system/obqc.ttl")
    if refined.get("is_valid", False):
        print("\nRefined query:")
        print(refined.get("refined_query"))

# Execute a valid query and store results
valid_query = """
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?item ?itemLabel WHERE {
  ?item wdt:P31 wd:Q146 .  # instance of cat
  ?item rdfs:label ?itemLabel .
  FILTER(LANG(?itemLabel) = "en")
} LIMIT 5
"""

results = sparql_query(
    endpoint_url=endpoint,
    query=valid_query,
    query_type="SELECT",
    result_format="json",
    store_result=True,
    graph_id="wikidata_cats"
)

print(f"\nQuery returned {len(results.get('results', []))} results")

Linked Data Retrieval and Integration

from cogitarelink.integration.retriever import LODRetriever, search_wikidata

# Create a retriever with default settings
retriever = LODRetriever()

# Search for an entity on Wikidata
results = search_wikidata("Douglas Adams", limit=3)
if results:
    print(f"Found {len(results)} search results:")
    for result in results:
        print(f"  - {result.get('label')}: {result.get('uri')}")
    
    # Retrieve data for the first match
    entity_uri = results[0]["uri"]
    result = retriever.retrieve(entity_uri)
    
    if result.get("success", False):
        # Process the retrieved data
        data = result.get("data", {})
        print(f"\nRetrieved {len(data)} data elements for {entity_uri}")
        
        # Add to the graph manager for integration with other data
        from cogitarelink.core.graph import GraphManager
        graph = GraphManager(use_rdflib=True)
        graph.ingest_jsonld(data, graph_id=entity_uri)
        print(f"Ingested data into graph with ID: {entity_uri}")

Provenance and Verification

Cogitarelink provides robust tools for tracking data provenance and cryptographically verifying information, crucial capabilities for trustworthy agent systems.

Provenance Tracking

from cogitarelink.reason.prov import wrap_patch_with_prov
from cogitarelink.core.graph import GraphManager

# Create a graph with provenance tracking
graph = GraphManager(use_rdflib=True)

# Define provenance metadata
source = "https://example.org/dataset/123"
agent = "https://example.org/agents/reasoner1"
activity = "https://example.org/activities/inference/456"

# Add triples with provenance context
with wrap_patch_with_prov(graph, source=source, agent=agent, activity=activity):
    # All triples added in this context will have provenance metadata
    graph.add_triple(
        "http://example.org/entity/alice",
        "http://schema.org/knows",
        "http://example.org/entity/bob"
    )
    
    graph.add_triple(
        "http://example.org/entity/alice",
        "http://schema.org/memberOf",
        "http://example.org/entity/organization1"
    )

# Query the provenance information
prov_triples = graph.query(pred="http://www.w3.org/ns/prov#wasGeneratedBy")
if prov_triples:
    activity_node = prov_triples[0][2]
    print(f"Found provenance activity: {activity_node}")
    
    # Get information about the provenance activity
    agent_triples = graph.query(subj=activity_node, pred="http://www.w3.org/ns/prov#wasAssociatedWith")
    if agent_triples:
        print(f"Activity associated with agent: {agent_triples[0][2]}")

Cryptographic Verification

from cogitarelink.verify.signer import generate_keypair, sign, verify
from cogitarelink.verify.validator import validate_entity_shape
from cogitarelink.core.entity import Entity

# Generate a keypair for signing
private_key, public_key = generate_keypair()
print(f"Generated keypair - public key: {public_key[:16]}...")

# Create a credential
credential = {
    "@type": "VerifiableCredential",
    "issuer": "https://example.edu",
    "issuanceDate": "2023-06-15T12:00:00Z",
    "credentialSubject": {
        "id": "did:example:123",
        "name": "Alice Smith",
        "degree": {
            "type": "BachelorDegree",
            "name": "Bachelor of Science and Arts",
            "college": "Example University"
        }
    }
}

# Create entity with required vocab parameter
vc_entity = Entity(vocab=["schema"], content=credential)

# Sign the credential
signature = sign(vc_entity.normalized, private_key)
print(f"Signed entity - signature: {signature[:16]}...")

# Add signature to entity
signed_content = vc_entity.content.copy()
signed_content["proof"] = {
    "type": "Ed25519Signature2020",
    "created": "2023-06-15T12:30:00Z",
    "verificationMethod": "did:example:issuer#key1",
    "proofPurpose": "assertionMethod",
    "proofValue": signature
}
signed_entity = Entity(vocab=["schema"], content=signed_content)

# Verify the signature
is_valid = verify(signed_entity.normalized, public_key, signature)
print(f"Signature verification: {'Valid' if is_valid else 'Invalid'}")

# Validate against a SHACL shape (if available)
shape_graph = """
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix schema: <http://schema.org/> .
@prefix ex: <http://example.org/> .

ex:CredentialShape a sh:NodeShape ;
  sh:targetClass schema:VerifiableCredential ;
  sh:property [
    sh:path schema:issuer ;
    sh:minCount 1 ;
  ] ;
  sh:property [
    sh:path schema:issuanceDate ;
    sh:datatype schema:DateTime ;
    sh:minCount 1 ;
  ] ;
  sh:property [
    sh:path schema:credentialSubject ;
    sh:minCount 1 ;
  ] .
"""

validation_result = validate_entity_shape(signed_entity, shape_graph)
if validation_result.get("valid", False):
    print("Entity validation against SHACL shape: Passed")
else:
    print("Entity validation against SHACL shape: Failed")
    for violation in validation_result.get("violations", []):
        print(f"  - {violation}")

CLI and Agent Tools

Cogitarelink provides command-line interfaces and agent-compatible tools for working with semantic data.

Command-Line Interface

# Examples of CLI commands (to be run in a terminal, not shown in execution)

# Basic CLI helps manage vocabs and contexts
from cogitarelink.cli.cli import app

# The primary CLI command structure
"""
$ python -m cogitarelink.cli.cli --help
Usage: cli.py [OPTIONS] COMMAND [ARGS]...

  Cogitarelink CLI for working with semantic data

Options:
  --help  Show this message and exit.

Commands:
  context   Work with JSON-LD contexts
  registry  Manage the vocabulary registry
  validate  Validate entity files
"""

# Working with vocabularies
"""
$ python -m cogitarelink.cli.cli registry list
Available vocabularies:
  schema: http://schema.org/
  dc: http://purl.org/dc/terms/
  ...

$ python -m cogitarelink.cli.cli context compose --vocab schema --vocab dc -o composed_context.jsonld
Composed context with 2987 terms
Wrote context to composed_context.jsonld
"""

Agent-Enabled CLI

The agent_cli module provides an LLM-powered interface to Cogitarelink:

from cogitarelink.cli.agent_cli import agent_app

# Examples of agent CLI commands (to be run in a terminal)
"""
$ python -m cogitarelink.cli.agent_cli search "organizations in New York"
Searching for organizations in New York...
[Agent performs SPARQL queries against appropriate endpoints]
Found 15 organizations matching your query.
1. Metropolitan Museum of Art (https://www.wikidata.org/entity/Q160236)
2. Columbia University (https://www.wikidata.org/entity/Q49088)
...

$ python -m cogitarelink.cli.agent_cli explore Q160236
Exploring Metropolitan Museum of Art (Q160236)...
[Agent retrieves and processes entity data]
The Metropolitan Museum of Art is an art museum in New York City, founded in 1870.
Location: 1000 Fifth Avenue, New York
Collections: 2 million works spanning 5,000 years
Website: https://www.metmuseum.org/
...
"""

Tool Functions for Agent Integration

Cogitarelink provides functions that can be registered as tools with LLM agent frameworks:

from cogitarelink.tools.reason import reason_over
from cogitarelink.tools.sparql import sparql_query, sparql_discover
from cogitarelink.cli.vocab_tools import search_vocabulary

# Example of an OpenAI function-calling compatible tool spec
REASON_OVER_SPEC = {
    "name": "reason_over",
    "description": "Run logical reasoning over a knowledge base using SPARQL rules",
    "parameters": {
        "type": "object",
        "properties": {
            "query": {
                "type": "string", 
                "description": "SPARQL CONSTRUCT query to apply"
            },
            "graph_id": {
                "type": "string",
                "description": "Named graph to reason over"
            }
        },
        "required": ["query", "graph_id"]
    }
}

# Agent tools can be registered with OpenAI, Claude, or other frameworks
"""
tools = [
    {"type": "function", "function": REASON_OVER_SPEC},
    {"type": "function", "function": SPARQL_QUERY_SPEC},
    # Other tool specs...
]

# Then used in agent calls
response = client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "user", "content": "Find all museums in Paris and their founding dates"}
    ],
    tools=tools,
    tool_choice="auto"
)
"""

Example: Working with Vocabularies and Registry

from cogitarelink.vocab.registry import registry
from cogitarelink.vocab.composer import composer

# List available vocabularies in the registry
print("Available vocabularies:")
for prefix in registry._v.keys():
    entry = registry[prefix]
    uri = entry.uris.get("primary", "No primary URI")
    print(f"  {prefix}: {uri}")

# Compose a context from a vocabulary (schema.org)
context = composer.compose(["schema"])
print(f"\nComposed context with {len(context['@context'])} terms")

# Look at a few key terms
sample_terms = list(context['@context'].keys())[:5]
print(f"Sample terms: {', '.join(sample_terms)}")

The output would show available vocabularies like schema.org, Dublin Core, and FOAF, along with their primary URIs. A composed context would include thousands of terms from the selected vocabulary.

Example: Retrieving Linked Data

from cogitarelink.integration.retriever import LODRetriever, search_wikidata

# Create a retriever
retriever = LODRetriever()

# Search for an entity on Wikidata
results = search_wikidata("Douglas Adams", limit=3)
print(f"Found {len(results)} search results")

if results:
    # Get the first result's Wikidata URI
    entity_uri = results[0]["uri"]
    print(f"Retrieving data for: {entity_uri}")
    
    # Retrieve the entity data
    result = retriever.retrieve(entity_uri)
    
    if result.get("success", False):
        data = result.get("data", {})
        print("Successfully retrieved data")
        
        # Process the retrieved data
        if "@graph" in data:
            # Handle JSON-LD graph format
            graph_items = data["@graph"]
            # Find the main entity
            main_item = next((item for item in graph_items 
                              if item.get("@id") == entity_uri), None)
            
            if main_item:
                # Work with entity properties
                print(f"Entity type: {main_item.get('@type')}")
        else:
            print(f"Data contains {len(data)} top-level keys")
    else:
        print(f"Error retrieving data: {result.get('error')}")

This example shows how to search Wikidata and retrieve structured data for a specific entity. The retriever handles content negotiation and parses the returned data into a JSON-LD format.

Example: Using Verifiable Credentials

from cogitarelink.verify.signer import generate_keypair, sign
from cogitarelink.core.entity import Entity

# Generate a keypair for signing
private_key, public_key = generate_keypair()
print(f"Generated keypair - public key: {public_key[:16]}...")

# Create a credential with a simplified structure
credential = {
    "@type": "VerifiableCredential",
    "issuer": "https://example.edu",
    "issuanceDate": "2023-06-15T12:00:00Z",
    "credentialSubject": {
        "name": "Alice Smith",
        "degree": "Bachelor of Science"
    }
}

# Create entity with required vocab parameter
vc_entity = Entity(vocab=["schema"], content=credential)

# Sign the credential
signature = sign(vc_entity.normalized, private_key)
print(f"Signed entity - signature: {signature[:16]}...")

# Access the credential subject data
subject_data = vc_entity.content.get("credentialSubject", {})
print(f"Credential subject: {subject_data.get('name')}")
print(f"Credential contains degree: {subject_data.get('degree')}")

This example demonstrates creating and signing a Verifiable Credential, using cryptographic signatures to establish authenticity. The credential contains claims about a subject that can be verified independently.

Developer Guide

Cogitarelink is developed using nbdev, which allows for literate programming in Jupyter notebooks.

Development Environment Setup

# Create a virtual environment using uv (recommended)
$ uv venv

# Install in development mode with all dev dependencies
$ uv pip install -e ".[dev]"

# Install git hooks for nbdev
$ nbdev_install_hooks

Development Workflow

The recommended development workflow:

  1. Make changes in notebook files (*.ipynb in the project root)

  2. Export changes to Python modules:

    $ nbdev_prepare
    
  3. Run tests to ensure everything works:

    $ nbdev_test
    
  4. Commit changes:

    $ git add .
    $ git commit -m "Description of changes"
    

Agent-Driven Development Workflow

Cogitarelink uses agent-driven development with AI systems like Claude Code:

  1. Define high-level tasks in natural language
  2. Have AI coding agents generate implementations
  3. Review and refine AI-generated code
  4. Add tests and documentation
  5. Export from notebooks to Python modules

For example:

# Ask Claude Code to implement a feature
$ claude-code "Implement a function to validate SPARQL queries against a SHACL ontology"

# Review the implementation in the notebook
# Add tests and refine as needed
$ nbdev_test
$ nbdev_prepare

This approach significantly accelerates development while maintaining high quality standards.

Key Development Guidelines

  • Notebook First: All code should start in notebooks and be exported to Python
  • Type Hints: Use Python type hints everywhere for better IDE support
  • Tests: Include tests for all public functions
  • Documentation: Each module/function should have clear documentation
  • SHACL/SPARQL: Business logic belongs in data artifacts (SHACL/SPARQL rules)
  • Provenance: All derived facts must have traceable provenance
  • Caching: Use appropriate caching for performance-critical operations

Installation

Cogitarelink can be installed from PyPI or directly from the GitHub repository:

# Install from PyPI
pip install cogitarelink

# Install latest from GitHub
pip install -U git+https://github.com/la3d/cogitarelink.git

Dependencies

Cogitarelink has a modular dependency structure to accommodate different use cases:

Core Dependencies (Always Required)

  • pyld: JSON-LD 1.1 processing
  • pydantic: Data validation and settings
  • fastcore: Core utilities

Optional Dependencies

  • rdflib: RDF graph handling (required for SPARQL features)
  • httpx: HTTP client for Linked Data retrieval
  • typer: CLI interface support
  • rich: Formatted terminal output

You can install all dependencies with:

# Full installation with all optional dependencies
pip install "cogitarelink[all]"

# Just core + RDF support
pip install "cogitarelink[rdf]"

Feature-based Installation

For specific use cases, you can install just the features you need:

# For CLI tools
pip install "cogitarelink[cli]"

# For SPARQL and reasoning tools
pip install "cogitarelink[sparql]"

# For development
pip install "cogitarelink[dev]"

Documentation and Resources

Documentation

Documentation for Cogitarelink is available in several forms:

Additional Resources

Community and Support

# make sure cogitarelink package is installed in development mode
$ uv venv
$ uv pip install -e ".[dev]"

# make changes under nbs/ directory
# ...

# compile to have changes apply to cogitarelink
$ nbdev_prepare

Installation

Install latest from the GitHub repository:

$ pip install -U git+https://github.com/la3d/cogitarelink.git

Or install from PyPI:

$ pip install cogitarelink

Documentation

Documentation can be found hosted on this GitHub repository’s pages.

Contributing

We welcome contributions! Please see our contributing guide for details.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

About

Linked Data Navigation with AI Agents

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors