Skip to content

Excel Package Skips Duplicate SharedString Entries #435

Description

@arpitmantri7501

Bug Report: Excel Package Skips Duplicate SharedString Entries

Package Information

Summary

The _SharedStringsMaintainer.add() method uses putIfAbsent() which skips adding duplicate string values to the _list. This causes index mismatches when cells reference sharedString indices that were skipped, resulting in "Unexpected null value" errors.

Root Cause

File: lib/src/sharedStrings/shared_strings.dart
Method: _SharedStringsMaintainer.add()

Current Implementation (Buggy)

void add(SharedString val, String key) {
  _map[val]?.increaseCount();
  _map.putIfAbsent(val, () {  // ❌ Problem: This closure only executes if key doesn't exist
    _mapString[key] = val;
    _list.add(val);  // ❌ This is never called for duplicate values!
    return _IndexingHolder(_index++);
  });
}

The Problem

  1. When parsing sharedStrings.xml, the parser iterates through all <si> elements (e.g., 416 elements)
  2. For each element, it calls _SharedStringsMaintainer.add()
  3. If a duplicate string value is encountered (e.g., "Black" appears at index 143 and again at index 348), putIfAbsent() does not execute the closure because the key already exists
  4. This means _list.add(val) is never called for the duplicate
  5. Result: _list has only 415 entries (indices 0-414) instead of 416 (indices 0-415)
  6. When a cell references index 415, _list[415] throws "Unexpected null value"

Steps to Reproduce

1. Create a corrupted Excel file

Windows Excel sometimes creates duplicate sharedString entries when using "Enable Editing and Save". Example XML:

<sst xmlns="..." count="2723" uniqueCount="416">
  <si><t>Black</t></si>      <!-- Index 143 -->
  <!-- ... other entries ... -->
  <si><t>Black</t></si>      <!-- Index 348: DUPLICATE! -->
  <!-- ... other entries ... -->
  <si><t>Final Value</t></si> <!-- Index 415 -->
</sst>

2. Create a cell referencing the last index

<worksheet>
  <sheetData>
    <row r="87">
      <c r="B87" t="s">
        <v>415</v>  <!-- References index 415 -->
      </c>
    </row>
  </sheetData>
</worksheet>

3. Try to parse the file

final excel = Excel.decodeBytes(fileBytes);
// Error: Unexpected null value

4. Debug Output

📊 XML contains 416 <si> elements
⚠️ DUPLICATE SKIPPED at XML index 348: String value = "Black" (already exists at parser index 143)
📊 Parser loaded 415 sharedStrings (indices 0-414)
❌ Cell B87: SharedString at index 415 is NULL! Total sharedStrings: 415

Expected Behavior

All 416 <si> elements from the XML should be loaded into _list, even if some have duplicate text values. The _list should maintain the exact same indexing as the XML to ensure cells can reference any index 0-415.

Actual Behavior

Only 415 unique entries are loaded into _list. Any cell referencing index 415 (or any other skipped duplicate index) crashes with "Unexpected null value".

Proposed Fix

Option 1: Always Add to List (Recommended)

Modify _SharedStringsMaintainer.add() to always add elements to _list:

void add(SharedString val, String key) {
  // Always add to _list to preserve XML index positions
  _list.add(val);
  
  // Track in map for lookups (update if exists, add if new)
  if (_map.containsKey(val)) {
    _map[val]?.increaseCount();
  } else {
    _map[val] = _IndexingHolder(_index);
    _mapString[key] = val;
  }
  
  _index++;
}

Advantages:

  • Maintains 1:1 correspondence between XML indices and _list indices
  • Fixes the crash for corrupted files
  • No breaking changes to existing API

Considerations:

  • _list may contain duplicate entries
  • Memory usage slightly higher for files with duplicate sharedStrings
  • But this accurately reflects the actual Excel file structure

Option 2: Track Original XML Index

Create a separate index mapping to preserve XML positions while deduplicating:

final List<SharedString> _list = <SharedString>[];
final Map<int, int> _xmlIndexToListIndex = <int, int>{};  // New
int _xmlIndex = 0;  // New: Track position in XML

void add(SharedString val, String key) {
  _map[val]?.increaseCount();
  
  final listIndex = _map.putIfAbsent(val, () {
    _mapString[key] = val;
    final idx = _list.length;
    _list.add(val);
    return _IndexingHolder(idx);
  }).index;
  
  _xmlIndexToListIndex[_xmlIndex] = listIndex;
  _xmlIndex++;
}

SharedString? value(int xmlIndex) {
  final listIndex = _xmlIndexToListIndex[xmlIndex];
  if (listIndex != null && listIndex < _list.length) {
    return _list[listIndex];
  }
  return null;
}

Impact

High Severity: Files created by Windows Excel's "Enable Editing and Save" feature may contain duplicate sharedString entries, causing the parser to crash with "Unexpected null value".

Workaround

Pre-process the Excel file to make duplicate entries unique before parsing:

// Add invisible zero-width space to duplicates
final uniqueText = originalText + '\u200B' + index.toString();

Test Case

test('should load all sharedString entries including duplicates', () {
  // Create Excel file with duplicate sharedStrings
  final archive = Archive();
  
  final sharedStringsXml = '''
    <sst xmlns="..." count="3" uniqueCount="3">
      <si><t>Value1</t></si>
      <si><t>Value2</t></si>
      <si><t>Value1</t></si>  <!-- Duplicate of index 0 -->
    </sst>
  ''';
  
  // ... add to archive and encode ...
  
  final excel = Excel.decodeBytes(bytes);
  
  // Should load all 3 entries
  expect(excel._sharedStrings._list.length, 3);
  expect(excel._sharedStrings.value(0)?.stringValue, 'Value1');
  expect(excel._sharedStrings.value(1)?.stringValue, 'Value2');
  expect(excel._sharedStrings.value(2)?.stringValue, 'Value1');
});

Additional Context

  • Excel files use sharedStrings to avoid storing duplicate text values multiple times
  • However, Windows Excel sometimes creates malformed files with actual duplicate entries in the XML
  • The parser should handle these gracefully rather than crashing
  • The count attribute in <sst> represents total string usage across all cells, not unique strings
  • The uniqueCount attribute should represent unique values, but Windows Excel sometimes sets this incorrectly

Environment

  • Dart SDK: 3.x
  • Flutter: 3.x
  • Platform: Web, Android, iOS, Windows, macOS, Linux
  • File Source: Excel files created/edited by Microsoft Excel on Windows

Related Files

  • lib/src/sharedStrings/shared_strings.dart (main issue)
  • lib/src/parser/parse.dart (calls add() method)
  • lib/src/excel.dart (uses _sharedStrings)

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions