Skip to content

Conversation

@Washi1337
Copy link
Contributor

@Washi1337 Washi1337 commented Nov 28, 2025

What does the pull request do?

This PR addresses some performance regressions and reduces overall allocations when rendering a lot of (complex) text runs.

What is the current behavior?

When I upgraded to the latest version of Avalonia and .NET 10 in AvaloniaHex, I noticed a pretty significant bump in memory allocations and performance degradation in rendering of many (complex) text runs. Upon profiling, I found a couple of related hotspots:

  1. In Avalonia.Media.TextFormatting there are various precompiled tries stored as raw data. This data is recreated on every access of the trie. E.g., here is UnicodeData.trie:

    public static UnicodeTrie Trie
    {
    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    get => new(Data, 0x00100000, 0x00000000);
    }
    private static ReadOnlySpan<uint> Data => new uint[]
    {

    These computed tries are used quite extensively throughout the library. This results in a significant number of unnecessary allocations of very large data blobs (literally millions of instances), significantly slowing down controls that do a lot of (complex) text rendering. Below, an example of how AvaloniaHex is affected when scrolling once down and up in the example project:
    image

  2. FontFamily.Parse eventually calls FontFamily.GetFontSourceIdentifier, which always allocates extra string and string[] instances when parsing a font by name, even if the font name is a simple font without any fallback fonts:

    image
  3. FontShaperImpl.ShapeText in Avalonia.Skia uses a language cache with a GetOrAdd construction that uses a non-static / closure capturing lambda for its factory.

    buffer.Language = s_cachedLanguage.GetOrAdd(usedCulture.LCID, _ => new Language(usedCulture));

    This results in a closure being created for every text run that isn't used most of the time because the used culture doesn't change in most cases:

    image
  4. LineBreakEnumerator always creates a LineBreakState heap allocation (even if a text run does not contain line breaks). This is wasteful, especially considering LineBreakEnumerator is already a ref struct and will never appear on the heap:

    image

What is the updated/expected behavior with this PR?

This PR removes the vast majority of these allocations. Running the example project of AvaloniaHex with these changes applied to Avalonia makes scrolling much smoother.

How was the solution implemented (if it's not obvious)?

In chronological order of the issues described above:

  1. UnicodeDataGenerator was updated to generate code with a readonly Data field as opposed to a computed property. This removes all RuntimeFieldInfoStub allocations.
  2. FontFamily.GetFontSourceIdentifier was rewritten to avoid any array allocations and most string reallocations using ReadOnlySpan<char>s and slicing.
  3. FontShaperImpl.ShapeText now uses the overload of GetOrAdd that passes an argument to the factory lambda.
  4. LineBreakEnumerator+LineBreakState was turned into a ref struct and is now passed along as a ref parameter to all unicode rule methods.

Checklist

Breaking changes

None. All changes are made on internal or private APIs.

Obsoletions / Deprecations

None

Fixed issues

Related to #16390

Additional Questions

Maybe out of this scope for this PR, but I am still seeing a lot of instances of SKFont (scaling linearly with the number of text runs I create) in AvaloniaHex, even though I reuse the same Typeface instances as much as possible. Is there any possibility/talks on caching SKFont instances?

@avaloniaui-bot
Copy link

You can test this PR using the following package version. 12.0.999-cibuild0060453-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@Gillibald
Copy link
Contributor

Gillibald commented Nov 28, 2025

We can try to reuse one SKFont instance and change its properties before we call some API that needs it. Not sure how costly mutating it is. If that isn't improving anything, we can cache SKFont instances per font size.

Thank you for your contribution

@Washi1337
Copy link
Contributor Author

We can try to reuse one SKFont instance and change its properties before we call some API that needs it.

I am not sure this would help much, because every GlyphRunImpl at the moment creates a new SKFont. So even if we share typefaces or only slightly change properties of some public Avalonia text/font-related types, it would still be recreating a SKFont.

we can cache SKFont instances per font size.

This is what I was thinking as well, though, we would also need to cache it by edging type. Most of the instances seem to come from this method:

private SKFont CreateFont(SKFontEdging edging)
{
var font = _glyphTypefaceImpl.CreateSKFont((float)FontRenderingEmSize);
font.Hinting = SKFontHinting.Full;
font.Subpixel = edging != SKFontEdging.Alias;
font.Edging = edging;
return font;
}

I am not entirely sure what the best approach would be, do you think we should have GlyphTypefaceImpl cache them?

@MrJul
Copy link
Member

MrJul commented Nov 28, 2025

Thank you for your contribution!

I'll review and test in depth when I have a bit more time, but the first point seems very strange to me. Which OS, architecture and exact runtime are you using?

For a few versions of the C# compiler now, ReadOnlySpan<T> of primitive types with constant values are actually embedded directly inside the assembly. You get a simple pointer to the static data at runtime, without having to allocate heap memory at all. Said another way, new[] doesn't allocate in this case (nowadays, collection expressions should be used to make that more obvious). #15074 implemented this.

A quick check with https://godbolt.org/z/KP1xnE5Yj shows that this hasn't changed at all and still works as expected in .NET 10. I'll look at Avalonia in details as soon as I can :)

@Gillibald
Copy link
Contributor

Gillibald commented Nov 28, 2025

We can try to reuse one SKFont instance and change its properties before we call some API that needs it.

I am not sure this would help much, because every GlyphRunImpl at the moment creates a new SKFont. So even if we share typefaces or only slightly change properties of some public Avalonia text/font-related types, it would still be recreating a SKFont.

we can cache SKFont instances per font size.

This is what I was thinking as well, though, we would also need to cache it by edging type. Most of the instances seem to come from this method:

private SKFont CreateFont(SKFontEdging edging)
{
var font = _glyphTypefaceImpl.CreateSKFont((float)FontRenderingEmSize);
font.Hinting = SKFontHinting.Full;
font.Subpixel = edging != SKFontEdging.Alias;
font.Edging = edging;
return font;
}

I am not entirely sure what the best approach would be, do you think we should have GlyphTypefaceImpl cache them?

Yes, GlyphTypefaceImpl would cache them

@Washi1337
Copy link
Contributor Author

Washi1337 commented Nov 28, 2025

I'll review and test in depth when I have a bit more time, but the first point seems very strange to me. Which OS, architecture and exact runtime are you using?

Yes, I also found it quite strange and something that probably would've been caught by you guys already.

  • Arch: x64
  • OS: NixOS 25.11/unstable (running in an FHS devshell with x11 libraries in PATH), Kernel 6.12.58
  • WM: Hyprland 0.52.1
  • Editor: JetBrains Rider 2025.3.
  • dotnet: 10.0.100 (but also have other versions installed)

After posting the PR I double-checked my build configs and it seemed I ran my tests under the DEBUG config (
Sharplab seems to confirm this too.). My bad, I should've run all tests in RELEASE mode. Nonetheless, this change may still be worth it for speeding up debug builds :), also the other issues are still present even in release mode.

@kekekeks
Copy link
Member

We can try to reuse one SKFont instance and change its properties before we call some API that needs it.

Please, avoid native objects with mutable state in IGlyphRun and friends. Those can be used from multiple threads and it's really easy to introduce hard to track native memory corruption.

@Gillibald
Copy link
Contributor

We can try to reuse one SKFont instance and change its properties before we call some API that needs it.

Please, avoid native objects with mutable state in IGlyphRun and friends. Those can be used from multiple threads and it's really easy to introduce hard to track native memory corruption.

So you suggest we can't cache the SkFont in the GlyphTypefaceImpl that already holds a SKTypeface?

IGlyphRunImpl is immutable

@TomEdwardsEnscape
Copy link
Contributor

My bad, I should've run all tests in RELEASE mode. Nonetheless, this change may still be worth it for speeding up debug builds :), also the other issues are still present even in release mode.

Even without any performance boost, it's worth doing this solely to prevent other developers from seeing the same alarming number of allocations that you did and wasting their time trying to investigate the source.

Can we not get the best of both worlds like this?

private static ReadOnlySpan<uint> Data { get; } = new uint[] ...

I assume that this would still trigger the compiler optimisation, and it definitely avoids those million+ array allocations at runtime.

@Washi1337
Copy link
Contributor Author

My bad, I should've run all tests in RELEASE mode. Nonetheless, this change may still be worth it for speeding up debug builds :), also the other issues are still present even in release mode.

Even without any performance boost, it's worth doing this solely to prevent other developers from seeing the same alarming number of allocations that you did and wasting their time trying to investigate the source.

Can we not get the best of both worlds like this?

private static ReadOnlySpan<uint> Data { get; } = new uint[] ...

I assume that this would still trigger the compiler optimisation, and it definitely avoids those million+ array allocations at runtime.

Sadly, this is not possible because fields (and by extension, property backing fields) cannot be of type ReadOnlySpan<T> unless it is an instance field of a ref struct. This is why I changed it to a uint[]. Happy to hear other options though that could get rid of the single array allocation.

@Gillibald
Copy link
Contributor

We need to keep ReadOnlySpan<uint> Data => new uint[] to get the optimization

@maxkatz6
Copy link
Member

maxkatz6 commented Dec 2, 2025

Since getter-only property ("=>") doesn't have any state (that could hypothetically be mutated with reflection), it's easier for the compiler to assume optimizations.

But I don't know if .NET 10 is better at optimizing { get; }.

@MrJul
Copy link
Member

MrJul commented Dec 5, 2025

Note regarding the ReadOnlySpan<T>: even in debug mode, I don't see those allocations at all (I was very surprised at the original claim since I remember running dotMemory on debug builds several times). Tried on Windows x64 and macOS ARM64 with the latest master branch. This is a JIT intrinsic so I'm not sure why that happens on your machine.

Copy link
Member

@MrJul MrJul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the ReadOnlySpan change we've discussed already, the rest looks good to me!

get => new(Data, 0x00100000, 0x00000000);
}

private static ReadOnlySpan<uint> Data => new uint[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed in the comments, please revert the changes in all *.trie.cs files. Or Use collection expressions ([ ]) since they are now available, they will be more obvious than => new.

}
private static ReadOnlySpan<uint> Data => new uint[]
private static readonly uint[] Data = new uint[]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also revert this.

@MrJul MrJul added the backport-candidate-11.3.x Consider this PR for backporting to 11.3 branch label Dec 5, 2025
@Washi1337
Copy link
Contributor Author

Washi1337 commented Dec 7, 2025

Note regarding the ReadOnlySpan<T>: even in debug mode, I don't see those allocations at all (I was very surprised at the original claim since I remember running dotMemory on debug builds several times). Tried on Windows x64 and macOS ARM64 with the latest master branch. This is a JIT intrinsic so I'm not sure why that happens on your machine.

I have done some additional validation on DEBUG builds, this time on a fresh Ubuntu 25.04 VM, as well as a Windows 10 x64 VM, both using a fresh .NET 10 installation (obtained through preview ppa on Ubuntu and winget on Windows). I am still seeing the same allocations of System.RuntimeFieldInfoStub across all systems when using computed ReadOnlySpan properties.

How to reproduce:

  1. Compile this code for .NET 10:

    Program.cs
    using System;
    using System.Threading;
    using System.Runtime.CompilerServices;
    
    internal class Program
    {
        public static void Main(string[] args)
        {
            Thread.Sleep(5000); // Added to give some time to enable full memory allocation tracking
            var random = new Random();
            for (int i = 0; i < 1000000; i++)
            {
                DoSomething(Foo[random.Next(Foo.Length)]);
            }
            Console.WriteLine("Done");
            Thread.Sleep(10000); // Added to give some time to create a snapshot
        }
    
        [MethodImpl(MethodImplOptions.NoInlining)]
        private static void DoSomething(uint u)
        {
        }
    
        public static ReadOnlySpan<uint> Foo => new uint[] { 1, 2, 3, 4 };
        public static ReadOnlySpan<uint> Bar => [1, 2, 3, 4];
    }
    Program.csproj
    <Project Sdk="Microsoft.NET.Sdk">
    
      <PropertyGroup>
        <OutputType>Exe</OutputType>
        <TargetFramework>net10.0</TargetFramework>
        <ImplicitUsings>enable</ImplicitUsings>
        <Nullable>enable</Nullable>
      </PropertyGroup>
    
    </Project>
  2. Start the dotMemory CLI tool.

    $ dotMemory start path/to/Program
    Z:\> dotMemory.exe start path\to\Program.exe
  3. Enable full memory tracking during the first Thread.Sleep call:

    ##dotMemory["collect-allocations-on", {pid: xxxx}]
    
  4. Create snapshot during the second Thread.Sleep call:

    ##dotMemory["get-snapshot", {pid: xxxx}]
    
  5. Open the snapshot in dotMemory. Observe the majority of allocations are dominated by System.RuntimeFieldInfoStub:

    Allocated type : System.RuntimeFieldInfoStub
      Objects : n/a
      Bytes   : 144000000
    
    Allocated by
       100%  FromPtr • 137.33 MB / 137.33 MB • System.RuntimeFieldInfoStub.FromPtr(IntPtr)
         100%  get_Foo • 137.33 MB / - • global::Program.get_Foo()
           100%  Main • 137.33 MB / - • global::Program.Main(String[])
            ►  100%  [AllThreadsRoot] • 137.33 MB / - • [AllThreadsRoot]
    

I am happy to revert the changes on ReadOnlySpan<T> for the tries in this PR, but this seems to be a reliably reproducible hotspot. Arguably, this may not be necessarily related to Avalonia, and we may want to move this specific issue to the dotnet/runtime repo tosee what they have to say about it. Let me know what you think :).

EDIT: Dumping the generated x64 code using DOTNET_JitDisasm environment variable also confirms that the get_Foo property does a whole lot more on DEBUG builds than simply returning a static handle to the RVA data.

@MrJul
Copy link
Member

MrJul commented Dec 8, 2025

Yes you're right, not sure how I missed that last time, or if I wasn't looking at the right thing, sorry about that.

While I still think that keeping things as they are for simplicity is fine (the runtime has tons of ReadOnlySpan<...> => [] usages) and that profiling should only be done in release mode, let's make a change to keep memory allocations low in debug.

Let's generate something like this instead:

#if DEBUG
    public static ReadOnlySpan<uint> Bar => s_bar;
    private static uint[] s_bar =
#else
    public static ReadOnlySpan<uint> Bar =>
#endif
    [1, 2, 3, 4];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-perf backport-candidate-11.3.x Consider this PR for backporting to 11.3 branch enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants