How to prevent the user from falling asleep while loading a large dataset

Introduction

Recently in the organization where I work the idea arose to rewrite a corporate system which had been in operation for more than 20 years and had become somewhat outdated during that time. In brief - an information system in the field of maritime freight and port services. At the moment I am doing research and development and want to share with you some of the problems that the developer of such a system can face and how I plan to solve them. At the moment I am planning ASP.NET Core application server for .NET 6. The database will remain the old, so that I can gradually move to a new system by adding functionality. The main client software is supposed to run on WPF for .NET 6. In addition, small narrowly focused clients on web pages and mobile devices are possible.

Problem

The user is too lazy to filter the requested data, and there is quite a lot of it. Forcing the user to filter something is inhumane and reeks of arbitrariness. In the current system, such a user sits sad, curses the developers, pressures the technical support. In the current system, the client goes directly to the database through the BDE, you can not do anything about it. In the new system, the client does not know anything about the database, the server sends JSON via HTTP at his request. If you do it "directly", then very little will change: the server at the request of the client will build a collection of objects to send, loading them from the database or something else. Then the whole thing will be sent in response HTTP-package, pre-serialized in JSON, then it will be deserialized in one go into client objects, which will finally appear in the eyes of the hopeless user.

Solution

Let's have the server start loading to its collection on request, as before, but send to the client not everything, but as many as it can be loaded before timeout expires, or some fixed number of objects, or as a result of superposition of these conditions. In this case, we will tell the client in a special header if everything is done or not. And if not, we will pass a special identifier, which the client will have to present when requesting the next batch. In this way, the user will quickly see the result, perhaps even be able to use it.

Suppose we have the next class:

public enum PartialLoaderState { New, Partial, Full ... }

public class PartialLoader where T: class
{
    // Returns the current state: new or received data (fully or partially)
    PartialLoaderState State { get; }
    // Loads the next batch of data
    Task LoadAsync()
    {
        ...
    }
    // Sets data source
    public PartialLoader SetDataProvider(IAsyncEnumerable dataProvider)
    {
        ...
    }
    // Sets the timeout, after which the return from 
    // LoadAsync() method
    public PartialLoader SetTimeout(TimeSpan timeout)
    {
        ...
    }
    // Sets the batch size, after reaching which the
    // return the LoadAsync() method
    public PartialLoader SetPaging(int paging)
    {
        ...
    }
    // Adds a Utilizer to the handler chain of each item
    public PartialLoader AddUtilizer(Action utilizer)
    {
        ...
    }
    ...
}

This class can also be inherited with predefined "disposers". In this article we will use the class ChunksPartialLoader, which puts the next chunk in List, replacing the previous chunk.

public class ChunkPartialLoader : PartialLoader where T : class
{
    private readonly List _chunk = new();

    public List Chunk
    {
        get
        {
            ...
            return _chunk;
        }
    }
    public override async Task LoadAsync()
    {
        AddUtilizer(Utilizer);
        _chunk.Clear();
        await base.LoadAsync();
    }
    private void Utilizer(T item)
    {
        _chunk.Add(item);
    }
}

Let's see how we can take advantage of it.

Lab setup

To test how everything works, let's create three projects in VisualStudio Studio Community 2022. Here I will describe as briefly as possible, since all the sources are available: https://sourceforge.net/p/partialloader/code/ci/v1.1.0/tree/.

BigCatsDataContract is a library of classes common to the server and the client. Here is a class describing the data element, in our case - a cat. There is also a class with some constants.

        public class Cat
        {
            public string Name { get; set; }
        }

        public class Constants
        {
            public const string PartialLoaderStateHeaderName = 
                "X-CatsPartialLoaderState";
            public const string PartialLoaderSessionKey = 
                "X-CatsPartialLoaderSessionKey";
            public const string Partial = "Partial";
            public const string Full = "Full";
        }

    BigCatsDataServer is an ASP.NET Core project. In Program.cs We process two routes with parameters:

         // We request count cats in one batch with a specified delay, 
        // required to load each cat
        app.MapGet("/cats/{count=1001}/{delay=0}",
            async (HttpContext context, int count, double delay) =>
            await Task.Run(() => CatsGenerator.GetCats(context, count, delay))
        );
        // Query the count of cats with a given delay, 
        // required to load each cat, in batches whose sizes are 
        // limited by timeout or a fixed value of paging
        app.MapGet("/catsChunks/{count=1001}/{timeout=100}/{paging=1000}/{delay=0}",
            async (HttpContext context, int count, int timeout, int paging, 
                    double delay) =>
            await Task.Run(() => CatsGenerator.GetCatsChunks(context, count, timeout, 
                                                                paging, delay))
        );


    The CatsGenerator class contains the method that generates the data and the methods that handle HTTP requests:

        public class CatsGenerator
    {
    private const string CatNamePrefix = "Cat No.";

    public static async IAsyncEnumerable GenerateManyCats(int count,
                                                                double delay)
    {
        ...

        yield return await Task.Run(() => 
        {
            if (delay > 0)
            {
                // If we set a non-zero delay, we simulate a violent 
                // activity for about delay milliseconds.
            }
            return new Cat { Name = $"{CatNamePrefix}{i + 1}" }; 
        });
    }

    /// 
    ///     Method returns all cats at once.
    /// 
    public static async Task GetCats(HttpContext httpContext, int count, 
                                        double delay)
    {
        List cats = new();
        await foreach(Cat cat in GenerateManyCats(count, delay))
        {
            cats.Add(cat);
        }
        await httpContext.Response.WriteAsJsonAsync>(cats);
    }

    /// 
    ///     Method returns cats in batches. 
    /// 
    public static async Task GetCatsChunks(HttpContext context, int count, 
                                    int timeout, int paging, double delay)
    {
        IPartialLoader partialLoader;
        string key = null!;

        // We get the repository through the dependency injection mechanism. 
        // The repository is registered as a Singleton, so it lives forever.
        CatsLoaderStorage loaderStorage = 
                context.RequestServices.GetRequiredService();

        if (!context.Request.Headers.ContainsKey(
            Constants.PartialLoaderSessionKey))
        {
            // If this is the first request, we create an IPartialLoader. 
            // we get it from the dependency injection mechanism, where it is 
            // registered as a Transient) and start the generation.
            partialLoader = context.RequestServices.
                GetRequiredService>();
            partialLoader
                    .SetTimeout(TimeSpan.FromMilliseconds(timeout))
                    .SetPaging(paging)
                    .SetDataProvider(GenerateManyCats(count, delay))
                    ;
        } 
        else
        {
            // If it is a follow up query, we get the key from the header 
            // query, take PartialLoader from the repository and continue 
            // the generation.
            key = context.Request.Headers[Constants.PartialLoaderSessionKey];
            partialLoader = loaderStorage.Data[key];
        }
            await partialLoader.LoadAsync();

        // We add a response header signaling whether this is the last batch 
        // or not.
        context.Response.Headers.Add(Constants.PartialLoaderStateHeaderName,
                                        partialLoader.State.ToString());

            if(partialLoader.State == PartialLoaderState.Partial)
            {
                // If it's not the last batch, 
                if(key is null)
                {
                    // If the batch is the first, we come up with a key and put 
                        // IPartialLoader in storage.
                    key = Guid.NewGuid().ToString();
                    loaderStorage.Data[key] = partialLoader;
                }
                // Add the answer header with the key.
                context.Response.Headers.Add(
                Constants.PartialLoaderSessionKey, key);
            }
            else
            {
                // If the batch is last, remove the IPartialLoader from the repository.
                if (key is not null)
                {
                    loaderStorage.Data.Remove(key);
                }
            }


            await context.Response.WriteAsJsonAsync>(
                partialLoader.Chunk);
        }
    }

BigCatsDataClient is a WPF project. We will not consider the markup, and in the code-behind, for simplicity, there are methods requesting the list of cats in full and in parts. For the sake of brevity, I'm not giving the full code, it is available in the source code.

    private const string Server = "https://localhost:7209";

/// 
///     Whole loading method.
/// 
private async Task GetAllCats()
{
    ...
    try
    {
        ...
        using HttpClient _client = new HttpClient();
        ...
        _client.BaseAddress = new Uri(Server);

        // Send a REST-style request to the server
        HttpRequestMessage request = new HttpRequestMessage(
            HttpMethod.Get,
            $"{Constants.AllUri}/{Count}/{Delay.ToString().Replace(',', '.')}");
        HttpResponseMessage response = await _client.SendAsync(request);

        if(response.StatusCode == System.Net.HttpStatusCode.OK)
        {
            await Dispatcher.BeginInvoke(async () =>
            {
                List? list = await JsonSerializer.
                    DeserializeAsync>(
                            response.Content.ReadAsStream(),
                        new JsonSerializerOptions { 
                            PropertyNameCaseInsensitive = true 
                            });
                    // Adding cats to the cat table
                foreach (Cat cat in list)
                {
                    Cats.Add(cat);
                }
                ...
            });
        }
            ...

    }
    catch (Exception ex)
    {
        // Something went wrong at all.
        ...
    }
}

/// 
///    Loading method in parts.
/// 
private async Task GetChunksCats()
{
    ...
    try
    {
        ... 

        using HttpClient _client = new HttpClient();
        ...
        _client.BaseAddress = new Uri(Server);

        // Send a REST-style request to the server
        HttpRequestMessage request = new HttpRequestMessage(
            HttpMethod.Get, 
            $"{Constants.ChunkslUri}/{Count}/{Timeout}/{Paging}/{Delay.ToString().Replace(',', '.')}");
        HttpResponseMessage response = await _client.SendAsync(request);
            ...
        while(response.StatusCode == System.Net.HttpStatusCode.OK
                && IsDataLOading)
        {
            await Dispatcher.BeginInvoke(async () =>
            {
                List? list = await JsonSerializer.
                    DeserializeAsync>(
                    response.Content.ReadAsStream(),
                        new JsonSerializerOptions { 
                            PropertyNameCaseInsensitive = true });

                // Add cats to the table with cats
                foreach (Cat cat in list)
                {
                    Cats.Add(cat);
                }

                ...

                if (response.Headers.GetValues(
                    Constants.PartialLoaderStateHeaderName).First() == 
                    Constants.Partial)
                {
                    // If the data came incomplete, repeat the request. 
                    // It is possible to do without parameters, because the server will substitute 
                    // default values,
                    // but they will not be used anyway, 
                    // because we pass the header with the request identifier,
                    // which the server
                    // returned to us with incomplete data.
                    request = new HttpRequestMessage(
                        HttpMethod.Get, $"{Constants.ChunkslUri}");
                    request.Headers.Add(
                        Constants.PartialLoaderSessionKey, 
                        response.Headers.GetValues(
                        Constants.PartialLoaderSessionKey).First());
                    response = await _client.SendAsync(request);
                }
                ...
            });
        }
        ...
    }
    catch (Exception ex)
    {
        // Something went wrong at all.
        ...
    }
}

About PartialLoader implementation

A few words about PartialLoader implementation used for this demonstration. The first call to LoadAsync(...) starts the task which reads the asynchronous Enumerable and writes them to the queue. Both the first and subsequent calls to LoadAsync() read data from this queue for the time corresponding to the specified parameters. ManualResetEventSlim is used to share the queue between threads. It is initially reset. When the task writing to the queue adds an object, it sets ManualResetEventSlim, and reading from the queue takes place. When the queue is empty, ManualResetEventSlim is reset. You can see a more detailed implementation in the source code.

public async Task StartAsync(IAsyncEnumerable data, PartialLoaderOptions options)
{
    ...

    if (State is PartialLoaderState.New)
    {
        State = PartialLoaderState.Started;
            _manualReset.Reset();

            _loadTask = Task.Run(async () =>
            {
                    await foreach (T item in data)
                    {
                        if (_cancellationTokenSource.Token.IsCancellationRequested)
                        {
                            ...
                        break;
                        }
                        _queue.Enqueue(item);
                        _manualReset.Set();
                    }
            });

        ...
    }
    else
    {
        State = PartialLoaderState.Continued;
    }
    await ExecuteAsync();
    // clean up "utisizers" after each call to avoid
    // calls to objects that might not retain functionality
    // between calls, e.g., use HttpContext
    _utilizers.Clear();

}

private async Task ExecuteAsync()
{
    _start = DateTimeOffset.Now;
    _count = 0;


    while (!_loadTask.IsCompleted)
    {
        TimeSpan limeLeft = _timeout.Ticks <= 0 ?
                TimeSpan.MaxValue : _timeout - (DateTimeOffset.Now - _start);
        if (limeLeft == TimeSpan.MaxValue || limeLeft.TotalMilliseconds > 0)
        {
            try
            {
                if (limeLeft == TimeSpan.MaxValue)
                {
                    _manualReset.Wait(_cancellationTokenSource!.Token);
                }
                else
                {
                    _manualReset.Wait(limeLeft, _cancellationTokenSource!.Token);
                }
            }
            catch (OperationCanceledException) 
            {
                await _loadTask;
            }
            if (_cancellationTokenSource!.Token.IsCancellationRequested)
            {
                    await _loadTask;
                State = PartialLoaderState.Canceled;
                return;
            }
            if (UtilizeAndPossiblySetPartialStateAndReturn())
            {
                    return;
            }
        } 
        else
        {
            if (_cancellationTokenSource!.Token.IsCancellationRequested)
            {
                await _loadTask;
                    State = PartialLoaderState.Canceled;
                return;
            }
            State = PartialLoaderState.Partial;
                return;
        }
        if (!_loadTask.IsCompleted)
        {
            _manualReset.Reset();
        }
    }
    if (UtilizeAndPossiblySetPartialStateAndReturn())
    {
            return;
    }
    if (_cancellationTokenSource!.Token.IsCancellationRequested)
    {
            State = PartialLoaderState.Canceled;
        return;
    }
    if (_loadTask.IsFaulted)
    {
        throw _loadTask.Exception!;
    }
    State = PartialLoaderState.Full;
}

private bool UtilizeAndPossiblySetPartialStateAndReturn()
{
        while (_queue.TryDequeue(out T? item))
    {
            if (item is not null)
        {
                foreach (Action utilizer in _utilizers)
            {
                    utilizer.Invoke(item);
            }

            _count++;

            if (_paging > 0 && _count >= _paging || _timeout.Ticks > 0 && (_timeout - (DateTimeOffset.Now - _start)).Ticks <= 0)
            {
                    State = PartialLoaderState.Partial;
                    return true;
            }
        }
    }
    return false;
}

Conclusion

If you need to load a large amount of data into the user interface, you can reduce the user wait time by loading it in parts and displaying it immediately.

NuGet: https://www.nuget.org/packages/Net.Leksi.PartialLoader/

Original source