Skip to content

Copy-and-Paste: 2. Implement Basic Copy-and-Paste Task #111

@tnaum-ms

Description

@tnaum-ms

Copy-and-Paste: 2. Implement Basic Copy-and-Paste Task

⚠️ Important: Use the /feature/copy-and-paste branch as the base branch for this feature.

Covers Development Plan Point: 2

Note: The interface definitions below are an initial draft and are open for discussion and refinement once development begins.

Description:
This task focuses on implementing the core logic for a "Copy-and-Paste" operation as a task managed by the Task Engine. This initial version will have basic functionality and error handling.

Requirements:

  1. Task Implementation:

    • Create a CopyPasteCollectionTask class that implements the Task interface from the Task Engine.
  2. Configuration:

    • The task should accept a configuration object with:

      • Source: connectionId, databaseName, collectionName.

      • Target: connectionId, databaseName, collectionName (or newCollectionName).

      • Conflict Resolution: A basic configuration, initially hardcoded to 'abort' on any error or conflict.

        enum ConflictResolutionStrategy {
          Abort = 'abort',
          // Future options: Overwrite = 'overwrite', Skip = 'skip'
        }
        
        interface CopyPasteConfig {
          source: { connectionId: string; databaseName: string; collectionName: string };
          target: { connectionId: string; databaseName: string; collectionName: string };
          // For this basic task, conflict resolution is simplified
          onConflict: ConflictResolutionStrategy; // Initially, only ConflictResolutionStrategy.Abort will be implemented
          /**
           * Optional reference to a connection manager or client object.
           * For now, this is typed as `any` to allow flexibility.
           * Specific task implementations (e.g., for MongoDB) will cast this to their
           * required client/connection type. A more generic interface or base class
           * for connection management might be introduced later.
           * This allows the task to potentially reuse existing connections or manage
           * them more effectively if needed, beyond just using connectionId.
           */
          connectionManager?: any; // e.g. could be cast to a MongoDB client instance
        }
  3. Connection Handling:

    • Utilize connectionId to work with existing connection management, supporting various authentication methods.
    • Network resilience is handled by the existing Connection class and is out of scope for this task. The task should abort with an error if the connection cannot be recovered.
  4. Data Transfer Mechanism:

    • Implement a buffer-based streaming approach:
      • One asynchronous operation reads documents from the source collection into an in-memory buffer. It should pause reading if the buffer is full.
      • Another asynchronous operation reads documents from the buffer and writes them to the target collection using bulk operations (e.g., MongoDB insertMany).
    • This approach aims to manage memory efficiently.
  5. Database-Agnostic Design:

    • The CopyPasteCollectionTask should be database-type-agnostic. It will be constructed with database-specific reader and writer components.

    • Define interfaces for these components:

      interface DocumentDetails {
        // Represents a single document.
        // The `id` is crucial for conflict resolution and tracking.
        id: any; // The document's unique identifier (e.g., _id in MongoDB)
        // The `documentContent` is treated as opaque data by the core task logic.
        // Specific readers/writers will know how to interpret/serialize this.
        // For MongoDB, this would typically be a BSON document.
        documentContent: unknown;
      }
      
      interface DocumentReader {
        // Streams documents from the source
        streamDocuments(
          connectionId: string,
          databaseName: string,
          collectionName: string,
        ): AsyncIterable<DocumentDetails>;
        // Counts documents in the source for progress calculation (initial phase)
        countDocuments(connectionId: string, databaseName: string, collectionName: string): Promise<number>;
      }
      
      interface DocumentWriterOptions {
        // Initially, this might be simple, like batch size.
        // Conflict handling details will be added in a later task.
      }
      
      interface BulkWriteResult {
        insertedCount: number;
        errors: Array<{ documentId?: any; error: any }>;
      }
      
      interface DocumentWriter {
        // Writes documents in bulk to the target
        writeDocuments(
          connectionId: string,
          databaseName: string,
          collectionName: string,
          documents: DocumentDetails[],
          options?: DocumentWriterOptions,
        ): Promise<BulkWriteResult>;
        // May need methods for pre-flight checks or setup, e.g., ensuring collection exists.
        // ensureCollectionExists(connectionId: string, databaseName: string, collectionName: string): Promise<void>;
      }
  6. Initial Count for Progress:

    • The task initialization phase should run a count operation on the source collection to determine the total number of documents to be copied. This will be used for progress reporting.
  7. Basic Testing:

    • Implement basic Jest tests for the CopyPasteCollectionTask to be run on demand.

Acceptance Criteria:

  • CopyPasteCollectionTask can successfully copy all documents from a source collection to a target collection on the same or different servers (using provided DocumentReader and DocumentWriter).
  • The task correctly reports its status (pending, initializing (counting), running, completed, failed) via the Task Engine.
  • The task aborts if any error occurs during reading or writing in this basic version (respecting ConflictResolutionStrategy.Abort).
  • Memory usage is managed by the buffer-based streaming approach.
  • Basic unit tests pass.

Dependencies:

  • Copy-and-Paste: 1. Implement Core Task Engine

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

Status
Done

Relationships

None yet

Development

No branches or pull requests

Issue actions