Skip to content

IO.binstream/2 drops bytes in some cases for line_or_bytes as :line #13717

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wkirschbaum opened this issue Jul 11, 2024 · 4 comments
Closed

Comments

@wkirschbaum
Copy link

wkirschbaum commented Jul 11, 2024

Elixir and Erlang/OTP versions

Erlang/OTP 27 [erts-15.0.1] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit:ns]

Elixir 1.17.1 (compiled with Erlang/OTP 27)

Operating system

GNU/Linux

Current behavior

When running the following code

 test "io.binstream" do
    in_file =
      Path.expand("~/Downloads/file_in.xlsx")

    out_file = Path.expand("~/Downloads/file_out.xlsx")

    orig_content = File.read!(in_file)
    target_stream = File.stream!(out_file, :line, [:write, :binary])

    StringIO.open(orig_content, fn pid ->
      IO.binstream(pid, :line)
      |> Stream.into(target_stream)
      |> Stream.run()
    end)

    final_content = File.read!(out_file)

    assert byte_size(orig_content) == byte_size(final_content)
  end

The the assert fails with a byte difference of ~ 3 bytes ( in our cases ).

     Assertion with == failed
    code:  assert byte_size(orig_content) == byte_size(final_content)
    left:  93627
    right: 93624

When we change line_or_bytes from :line to a number like 1000, the assert passes.

Unfortunately the files we test have PSI data so cannot share, but busy trying to replicate the issue ( it does not happen often and seems somewhat random ).

Expected behavior

We would expect the byte_size to not change when specifying :line as chunk size for IO.binstream/2, or there being some comment on why :line cannot be used, as it was quite unexpected. ( some of this code is in a library and we extracted it in the above example for debugging the issues ).

@wkirschbaum wkirschbaum changed the title IO.binstream/2 drops bytes in some cases for chunk size as :line IO.binstream/2 drops bytes in some cases for line_or_byes as :line Jul 11, 2024
@wkirschbaum wkirschbaum changed the title IO.binstream/2 drops bytes in some cases for line_or_byes as :line IO.binstream/2 drops bytes in some cases for line_or_bytes as :line Jul 11, 2024
@wkirschbaum
Copy link
Author

wkirschbaum commented Jul 11, 2024

byte differs somewhere in the middle: char 35538, line 158 / out of 409

and reproducible on mac m1 and linux

@josevalim
Copy link
Member

Can you check if there is "\r\n" in the file? File.read!(...) =~ "\r\n"

@wkirschbaum
Copy link
Author

@josevalim thanks, you are right. The update to the docs are great :) and hopefully helps the next person.

@josevalim
Copy link
Member

Fantastic! Thanks for opening the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants