Skip to content

text.linesLimited lets through arbitrarily long lines when the line and its terminator land in the same chunk #3725

@haskiindahouse

Description

@haskiindahouse

Version

Scala 3.8.3, fs2-core 3.13.0 (also reproduces on 3.12.0; verified against current main).

Minimized code

//> using scala 3.8.3
//> using dep co.fs2::fs2-core:3.13.0

import fs2._

@main def repro091(): Unit =
  // 1000 chars, limit is 10
  val longLine = "x" * 1000 + "\n"

  val singleChunk = Stream.emit(longLine).covary[Fallible]
    .through(text.linesLimited(10)).toList
  println(s"Single chunk: $singleChunk")

  val multiChunk = Stream.emits(longLine.toList.map(_.toString)).covary[Fallible]
    .through(text.linesLimited(10)).toList
  println(s"Multi  chunk: $multiChunk")

Console output

Single chunk: Right(List(xxxxxx...1000 chars))
Multi  chunk: Left(LineTooLongException: ... limit 10)

Expected result

text.linesLimited(maxLineLength) raises LineTooLongException whenever any line exceeds maxLineLength, regardless of how the input is chunked.

Actual result

When a complete line including its terminator arrives inside one chunk, the line is processed entirely inside fillBuffers and pushed straight into linesBuffer (the completed-lines accumulator). The maxLineLength check at line 556 of text.scala only inspects stringBuilder.length — the pending line — which is 0 after the line has been flushed. Lines already in linesBuffer aren't checked against the limit, so any line completed within a chunk passes through.

Source

def fillBuffers(
stringBuilder: StringBuilder,
linesBuffer: ArrayBuffer[String],
string: String,
ignoreFirstCharNewLine: BoolWrapper
): Unit = {
var i = if (ignoreFirstCharNewLine.value) {
ignoreFirstCharNewLine.value = false
if (string.nonEmpty && string(0) == '\n') {
1
} else {
0
}
} else {
0
}
val stringSize = string.size
while (i < stringSize) {
val idx = indexForNl(string, stringSize, i)
if (idx < 0) {
stringBuilder.appendAll(string.slice(i, stringSize))
i = stringSize
} else {
if (stringBuilder.isEmpty) {
linesBuffer += string.slice(i, idx)
} else {
stringBuilder.appendAll(string.slice(i, idx))
linesBuffer += stringBuilder.result()
stringBuilder.clear()
}
i = idx + 1
if (string(i - 1) == '\r') {
if (i < stringSize) {
if (string(i) == '\n') {
i += 1
}
} else {
ignoreFirstCharNewLine.value = true
}
}
}
}
}
def go(
stream: Stream[F, String],
stringBuilder: StringBuilder,
ignoreFirstCharNewLine: BoolWrapper,
first: Boolean
): Pull[F, String, Unit] =
stream.pull.uncons.flatMap {
case None =>
if (first) Pull.done
else {
val result = stringBuilder.result()
if (result.nonEmpty && result.last == '\r')
Pull.output(
Chunk(
result.dropRight(1),
""
)
)
else Pull.output1(result)
}
case Some((chunk, stream)) =>
val linesBuffer = ArrayBuffer.empty[String]
chunk.foreach { string =>
fillBuffers(stringBuilder, linesBuffer, string, ignoreFirstCharNewLine)
}
maxLineLength match {
case Some((max, raiseThrowable)) if stringBuilder.length > max =>
Pull.raiseError[F](
new LineTooLongException(stringBuilder.length, max)
)(using raiseThrowable)
case _ =>
Pull.output(Chunk.from(linesBuffer)) >> go(
stream,
stringBuilder,
ignoreFirstCharNewLine,
first = false
)
}

fillBuffers (lines 484-527) processes input character by character. On a newline the completed line is pushed to linesBuffer (lines 509 / 512) and stringBuilder is cleared (line 513). After fillBuffers returns, the check at lines 555-559 only looks at stringBuilder.length:

maxLineLength match {
  case Some((max, raiseThrowable)) if stringBuilder.length > max =>
    Pull.raiseError[F](new LineTooLongException(stringBuilder.length, max))(using raiseThrowable)
  case _ =>
    Pull.output(Chunk.from(linesBuffer)) >> go(...)
}

Two ways to fix:

  • Check the length of each line as it's added to linesBuffer inside fillBuffers and raise immediately, or
  • Check all lines in linesBuffer before outputting them.

The single existing test in TextSuite.scala line 311 only covers a no-newline single-line input — which always sits in stringBuilder and is properly checked — so this gap was hidden.

Happy to follow up with a PR adding both a fix and a regression test that exercises both chunkings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions