[ruby-talk:443038] parallel require in ruby, dirwalking and other windows things

Dear ruby community,

I have one question, a suggestion to speedup directory walking, and a few
brief notes about win32.c.

I'm a Windows ruby user of 3 years or so, and I'm fairly satisfied. I
noticed, however, that some operations get pathologically slow at times. I
have seen 5 minute startup times for Rails at worst. This is partly the
residual MFT congestion leftover from when I worked on altWinDirStat, but I
believe also partly the result of slow directory tree walking.

It looks like the way directory tree walking is implemented *may* be 1.5-3x
slower than necessary on windows. Sadly, I do not have the proper build
environment setup to debug or fix this, but someone may be able to!

It's hard for me to fully understand glob_helper, since there are many
#ifdefs and it's nearly 300 lines long, but I think that when globbing the
directory recursively, I see that there's (first) a compatibility layer of
sorts for windows in rb_w32_opendir that fills out a bunch of fake
direntries from underlying FindFirstFileW/FindNextFileW.

One interesting opportunity here is that it appears ruby makes a wstati128
call before FindFirstFile to find out if the file/path is indeed a
directory, which is roughly the same behavior that Ben Hoyt saw when
working on BetterWalk <https://github.com/benhoyt/betterwalk&gt;\. Since
Windows already provides this info from the FindFirstFile/FindNextFile
APIs. If this happens on every file it encounters, (and I cannot tell),
this is a 2x slowdown.

Sidenote: there's no reporting/mapping of GetLastError if FindNextFile
fails. This could bite someone in the future. I'd say the same about
CloseHandle, but I'm the only dev I've ever met who checks that return
value.

The other interesting issue I've noticed is that lstrlenW is still used
throughout win32.c. I'm perplexed by this! lstrlenW is a full syscall that
has the same exact behavior as wcslen, just way slower. Is there a good
reason to keep it? I believe it exists as a legacy matter more than a
useful reason. Back in the day, lstrlenW used to catch and silently ignore
access violations like it's friends lstrcat and lstrcpy. There's a few uses
of lstrcat left in ruby, those should be removed for security reasons even
if there's no memory corruption at the moment! I've personally been bit by
them many times in the past.

The last thing, what I initially came here to ask, has anybody ever thought
about parallelizing require?

I'm sure this would lead to a speedup of some kind for everybody around the
world who uses ruby! When I saw the aforementioned worst-case Rails
startup, I saw the classic single-CPU-pegged-at-100 pattern for a possible
parallel speedup. It's one of the things I implemented for altWinDirStat
which improved performance quite a bit! More than you'd expect too, since
queuing up many NtQueryDirectoryFile calls encourages the OS and even lower
levels of the stack to take maximum advantage of IO caching. I'm sure I'm
not the only one who's ever faced a slow bootup of a ruby app.

Now that there's more interesting stuff going on with parallelism in ruby
(Reactor looks neat!) I'm wondering if there's more of an appetite for this
kind of thing. It does not look like the current implementation of
directory walking is easily amenable for this, which is how this turned
into such a long message :slight_smile:

Sincerely,
Alexander Riccio

···

--
"Change the world or go home."
about.me/ariccio

<http://about.me/ariccio&gt;
If left to my own devices, I will build more.

Dear ruby community,

I have one question, a suggestion to speedup directory walking, and a few
brief notes about win32.c.

I'm a Windows ruby user of 3 years or so, and I'm fairly satisfied. I
noticed, however, that some operations get pathologically slow at times. I
have seen 5 minute startup times for Rails at worst. This is partly the
residual MFT congestion leftover from when I worked on altWinDirStat, but I
believe also partly the result of slow directory tree walking.

It looks like the way directory tree walking is implemented *may* be
1.5-3x slower than necessary on windows. Sadly, I do not have the proper
build environment setup to debug or fix this, but someone may be able to!

It's hard for me to fully understand glob_helper, since there are many
#ifdefs and it's nearly 300 lines long, but I think that when globbing the
directory recursively, I see that there's (first) a compatibility layer of
sorts for windows in rb_w32_opendir that fills out a bunch of fake
direntries from underlying FindFirstFileW/FindNextFileW.

One interesting opportunity here is that it appears ruby makes a wstati128
call before FindFirstFile to find out if the file/path is indeed a
directory, which is roughly the same behavior that Ben Hoyt saw when
working on BetterWalk <https://github.com/benhoyt/betterwalk&gt;\. Since
Windows already provides this info from the FindFirstFile/FindNextFile
APIs. If this happens on every file it encounters, (and I cannot tell),
this is a 2x slowdown.

Sidenote: there's no reporting/mapping of GetLastError if FindNextFile
fails. This could bite someone in the future. I'd say the same about
CloseHandle, but I'm the only dev I've ever met who checks that return
value.

I haven’t used Windows in more than a decade, but there are active Windows
core developers that can be found on ruby-core and the issue reported here
could be raised more usefully at
Issues - Ruby master - Ruby Issue Tracking System.

The other interesting issue I've noticed is that lstrlenW is still used
throughout win32.c. I'm perplexed by this! lstrlenW is a full syscall that
has the same exact behavior as wcslen, just way slower. Is there a good
reason to keep it? I believe it exists as a legacy matter more than a
useful reason. Back in the day, lstrlenW used to catch and silently ignore
access violations like it's friends lstrcat and lstrcpy. There's a few uses
of lstrcat left in ruby, those should be removed for security reasons even
if there's no memory corruption at the moment! I've personally been bit by
them many times in the past.

I would also suggest that this be raised, but as a separate issue. I
suspect that there are very good reasons to keep it, but there are more
knowledgeable people on this.

The last thing, what I initially came here to ask, has anybody ever
thought about parallelizing require?

I’ve never used it, but Zeitwerk (GitHub - fxn/zeitwerk: Efficient and thread-safe code loader for Ruby) is at
least one approach to improving loading. Because `require`s are only
*nominally* related to the files that are loaded, however, I would not
count on parallel `require`s being clean, easy, or thread-safe. As before,
the best place to discuss this would be
Issues - Ruby master - Ruby Issue Tracking System
or the ruby-core mailing list.

-a

···

On Wed, Oct 5, 2022 at 6:42 PM <Alexander G. Riccio> <test35965@gmail.com> wrote:
--
Austin Ziegler • halostatue@gmail.com • austin@halostatue.ca
http://www.halostatue.ca/http://twitter.com/halostatue

Hi Alexander,

···

On 2022-10-6 6:41 am, <Alexander G. Riccio> wrote:

I have one question, a suggestion to speedup directory walking, and a few brief notes about win32.c.

I'm a Windows ruby user of 3 years or so, and I'm fairly satisfied. I noticed, however, that some operations get pathologically slow at times. I have seen 5 minute startup times for Rails at worst. This is partly the residual MFT congestion leftover from when I worked on altWinDirStat, but I believe also partly the result of slow directory tree walking.

This sounds very good! I would strongly recommend that you access the Ruby core team and/ or the GitHub/ Redmine for it.

Ruby is very much alive on Windows, so I'm sure this would be very welcome.

Best regards,
Mohit.