Skip to content

Latest commit

 

History

History
350 lines (271 loc) · 9.27 KB

File metadata and controls

350 lines (271 loc) · 9.27 KB

Amazon S3 Adapter Integration Guide

This guide provides step-by-step instructions for integrating the Amazon S3 adapter with the Firefly ECM Library.

Overview

The S3 adapter provides document storage capabilities using Amazon S3 as the backend. It supports:

  • Document CRUD operations
  • Binary content storage and retrieval
  • Metadata management using S3 object metadata
  • Folder organization using S3 key prefixes
  • Document versioning using S3 versioning
  • Multipart uploads for large files
  • Server-side encryption
  • Multiple storage classes

Prerequisites

Before integrating the S3 adapter, ensure you have:

  1. AWS Account: An active AWS account with S3 access
  2. S3 Bucket: A dedicated S3 bucket for document storage
  3. IAM Permissions: Appropriate IAM permissions for S3 operations
  4. Java 21+: The ECM library requires Java 21 or higher
  5. Spring Boot 3.0+: Compatible Spring Boot version

Step 1: Add Dependencies

Add the S3 adapter dependency to your pom.xml:

<dependencies>
    <!-- Core ECM Library -->
    <dependency>
        <groupId>org.fireflyframework</groupId>
        <artifactId>fireflyframework-ecm-core</artifactId>
        <version>1.0.0-SNAPSHOT</version>
    </dependency>
    
    <!-- S3 Adapter -->
    <dependency>
        <groupId>org.fireflyframework</groupId>
        <artifactId>fireflyframework-ecm-adapter-s3</artifactId>
        <version>1.0.0-SNAPSHOT</version>
    </dependency>
</dependencies>

Step 2: Configure AWS Credentials

Option 1: Environment Variables (Recommended)

Set the following environment variables:

export AWS_ACCESS_KEY_ID=your-access-key
export AWS_SECRET_ACCESS_KEY=your-secret-key
export AWS_REGION=us-east-1
export S3_BUCKET_NAME=your-ecm-bucket

Option 2: IAM Roles (Production Recommended)

For production deployments, use IAM roles instead of access keys:

  1. Create an IAM role with S3 permissions
  2. Attach the role to your EC2 instance or EKS pod
  3. The adapter will automatically use the role credentials

Option 3: YAML Configuration File

Configure credentials in application.yml:

firefly:
  ecm:
    adapter:
      s3:
        access-key: ${AWS_ACCESS_KEY_ID}
        secret-key: ${AWS_SECRET_ACCESS_KEY}
        region: ${AWS_REGION}
        bucket-name: ${S3_BUCKET_NAME}

Step 3: Configure the S3 Adapter

Basic Configuration

Add the following to your application.yml:

firefly:
  ecm:
    adapter-type: s3
    adapter:
      s3:
        bucket-name: ${S3_BUCKET_NAME:firefly-ecm-documents}
        region: ${AWS_REGION:us-east-1}
        path-prefix: documents/
        enable-versioning: true

Advanced Configuration

For production environments, configure additional settings:

firefly:
  ecm:
    adapter:
      s3:
        # Required settings
        bucket-name: ${S3_BUCKET_NAME}
        region: ${AWS_REGION}
        
        # Optional authentication (use IAM roles when possible)
        access-key: ${AWS_ACCESS_KEY_ID:}
        secret-key: ${AWS_SECRET_ACCESS_KEY:}
        
        # Storage settings
        path-prefix: documents/
        enable-versioning: true
        storage-class: STANDARD
        
        # Security settings
        enable-encryption: true
        kms-key-id: ${S3_KMS_KEY_ID:}
        
        # Performance settings
        enable-multipart: true
        multipart-threshold: 5242880  # 5MB
        multipart-part-size: 5242880  # 5MB
        
        # Connection settings
        connection-timeout: PT30S
        socket-timeout: PT30S
        max-retries: 3
        
        # S3-compatible services (e.g., MinIO)
        endpoint: ${S3_ENDPOINT:}
        path-style-access: false

Step 4: Set Up S3 Bucket

Create S3 Bucket

aws s3 mb s3://your-ecm-bucket --region us-east-1

Configure Bucket Versioning (Optional)

aws s3api put-bucket-versioning \
    --bucket your-ecm-bucket \
    --versioning-configuration Status=Enabled

Configure Bucket Encryption (Recommended)

aws s3api put-bucket-encryption \
    --bucket your-ecm-bucket \
    --server-side-encryption-configuration '{
        "Rules": [
            {
                "ApplyServerSideEncryptionByDefault": {
                    "SSEAlgorithm": "AES256"
                }
            }
        ]
    }'

Step 5: Configure IAM Permissions

Create an IAM policy with the following permissions:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject",
                "s3:GetObjectVersion",
                "s3:DeleteObjectVersion",
                "s3:ListBucket",
                "s3:ListBucketVersions",
                "s3:GetBucketLocation",
                "s3:GetBucketVersioning"
            ],
            "Resource": [
                "arn:aws:s3:::your-ecm-bucket",
                "arn:aws:s3:::your-ecm-bucket/*"
            ]
        }
    ]
}

Step 6: Use in Your Application

Basic Document Operations

@Service
public class DocumentService {
    
    @Autowired
    private DocumentPort documentPort;
    
    @Autowired
    private DocumentContentPort contentPort;
    
    public Mono<Document> uploadDocument(String name, byte[] content) {
        Document document = Document.builder()
            .name(name)
            .mimeType("application/pdf")
            .size((long) content.length)
            .ownerId(UUID.randomUUID()) // ownerId is UUID, not Long
            .status(DocumentStatus.ACTIVE)
            .createdAt(Instant.now())
            .build();

        return documentPort.createDocument(document, content);
    }
    
    public Mono<byte[]> downloadDocument(UUID documentId) {
        return contentPort.getContent(documentId);
    }
}

Streaming Operations

public Flux<DataBuffer> streamDocument(UUID documentId) {
    return contentPort.getContentStream(documentId);
}

public Mono<Void> uploadLargeDocument(UUID documentId, Flux<DataBuffer> content) {
    return contentPort.storeContentStream(documentId, content);
}

Step 7: Testing

Unit Tests

@SpringBootTest
@TestPropertySource(properties = {
    "firefly.ecm.adapter-type=s3",
    "firefly.ecm.adapter.s3.bucket-name=test-bucket",
    "firefly.ecm.adapter.s3.region=us-east-1"
})
class S3AdapterIntegrationTest {
    
    @Autowired
    private DocumentPort documentPort;
    
    @Test
    void shouldCreateAndRetrieveDocument() {
        // Test implementation
    }
}

Integration Tests with Testcontainers

@Testcontainers
class S3AdapterTestcontainersTest {
    
    @Container
    static LocalStackContainer localstack = new LocalStackContainer(DockerImageName.parse("localstack/localstack:latest"))
            .withServices(LocalStackContainer.Service.S3);
    
    @Test
    void shouldWorkWithLocalStack() {
        // Test implementation using LocalStack
    }
}

Troubleshooting

Common Issues

  1. Access Denied: Check IAM permissions and bucket policies
  2. Bucket Not Found: Verify bucket name and region configuration
  3. Connection Timeout: Adjust timeout settings or check network connectivity
  4. Large File Upload Fails: Enable multipart upload and adjust thresholds

Logging

Enable debug logging for troubleshooting:

logging:
  level:
    org.fireflyframework.ecm.adapter.s3: DEBUG
    software.amazon.awssdk.services.s3: DEBUG

Quality Assurance & Testing

The S3 adapter maintains 100% test success rate with comprehensive coverage:

Test Coverage

  • 21/21 tests passing
  • Complete CRUD operations: Document creation, retrieval, updates, deletion
  • Content management: Binary content upload, download, streaming
  • Resilience patterns: Circuit breaker and retry mechanism testing
  • Error handling scenarios: S3 exceptions, network failures, timeout handling

Testing Infrastructure

  • Real resilience instances for proper circuit breaker and retry testing
  • Comprehensive S3 mocking with proper response simulation
  • Byte array validation for content integrity verification
  • Reactive stream testing using StepVerifier for async operations
  • Checksum verification with correct SHA-1 hash validation

Recent Improvements

  • Resilience Framework Integration: Successfully resolved complex resilience4j reactive operator testing
  • Mock Configuration: Proper MockitoSettings for clean test execution
  • Content Validation: Accurate binary content comparison and checksum verification
  • Error Simulation: Comprehensive testing of S3 exception scenarios

Best Practices

  1. Use IAM Roles: Avoid hardcoding access keys in production
  2. Enable Versioning: Protect against accidental deletions
  3. Configure Encryption: Use server-side encryption for sensitive data
  4. Monitor Costs: Set up CloudWatch alarms for S3 usage
  5. Lifecycle Policies: Configure automatic archiving for old documents
  6. Cross-Region Replication: Consider replication for disaster recovery
  7. Resilience Configuration: Properly configure circuit breaker and retry patterns

Next Steps