New Features P2: Memory Management Variables and Objects

Allen Webster
Intro

It looks like the next build of 4coder (4.0.29) is going to be ready sometime in the next few weeks. The new build has been in development for a couple months and is loaded to the brim with new features that have all gone through interesting architectural and algorithmic design that I believe are worth sharing for several reasons. One it will prepare 4coder users who want to start writing customizations for how to think about the new features. Two it will help anyone who likes to think about the processes of architecture and algorithm design with some examples of my own process. Three it might expose me to some criticisms or suggestions that could help me improve the specifics of the new 4coder features before I put them into the wild.

Directory of all parts:
  1. Memory Management Overview
  2. Memory Management Variables and Objects
  3. Memory Management Scopes
  4. Custom UIs and Various Layers for Lister Wrappers
  5. Custom Cursors, Markers, and Highlights, and the Render Caller


"The Variable"

First let's take a look at the variable feature. We require that it be as easy as possible for users to get the handles to variables without having to ever pass them around if that will be too much hassle. Another way to put it is that we want our handles to be compile time constants. However if we start using compile time constants, we also want to think about how to avoid collisions between modules that don't know about each other, so just using an enum-style compile time constant doesn't work. If my module think variable 1 stores the previous index of the "paste next" command, and your module thinks it stores the mode for a vim emulation, we're going to have a really big problem.

Instead variables are named by strings. Alice's module can set "ALICE_IN_CLIPBOARD_LAND.index" and Bob's module can set "BOB_VIM_MASOCHIST.mode". There's no rule at all about what the string can be, except that it should be non-zero in length and have no nulls before the null terminator. I would like to encourage users to use distinct module names followed by a dot for every variable name, but I don't intend to force anything.

There are other ways I could go about structuring this. Instead I could have everyone name their module with a string, and then within each module use an index to get to each variable. However, I'm not convinced this would actually be better. Customizers still have to use a string to query the module anywhere they interact with the variable, and if Alice and Bob pick the same module name we still have to do a replace all on one of the two modules to integrate them. If anyone has another idea, let me know!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Module Names vs Variable Names
// I don't see any real advantage to one or the other as an API
// but it takes more to implement the module version.
void do_foo_module_version(Application_Links *app){
    uint64_t value = 0;
    read_variable(app, "FOO_MODULE", FooVariable_Bar, &value);
}
void do_foo_variable_version(Application_Links *app){
    uint64_t value = 0;
    read_variable(app, "FOO_MODULE.bar", &value);
}


What should we be worried about with this system? In a statically compiled language we rarely have to worry about variable misspelling mistakes, this variable feature threatens that possibility. There are a few solutions to this. One we can do in the API design, and another users can do in their usage code. Another issue we want to think about is reducing the frequency with which we have to hash a variable name when the usage side could be storing the result of a hash and reusing it. In the API design we can help with both of these problems if we must call a "create variable" before it can be used (as opposed to creating it on the fly the first time we read or write it) and having the create variable return some kind of fixed width handle that doesn't need to redo the hash.

These concerns lead to the API:

1
2
3
4
5
Managed_Variable_ID managed_variable_create(Application_Links *app, char *null_terminated_name, uint64_t default_value);
Managed_Variable_ID managed_variable_get_id(Application_Links *app, char *null_terminated_name);
Managed_Variable_ID managed_variable_create_or_get_id(Application_Links *app, char *null_terminated_name, uint64_t default_value);
bool32 managed_variable_set(Application_Links *app, Managed_Scope scope, Managed_Variable_ID location, uint64_t value);
bool32 managed_variable_get(Application_Links *app, Managed_Scope scope, Managed_Variable_ID location, uint64_t *value_out);


Now users have various options for creating and getting variable ids from their compile time constant names. Technically the names don't even have to be compile time constant, but the API is not trying to help you operate that way, which is why it takes null terminated strings without length specifiers. Anything you would like to do with names generated at runtime should turn out better when implemented through objects, which are meant for more intricate data types. If you're not worried about hashing work, and usually you won't have to be worried about it, you can just use the "create or get" option every time you operate on a variable. If you do care about the hashing, you can store the variable ids in global integers. This will work just fine for now, but when 4coder supports reloading and swapping out the custom layer, global variables will have to be managed with extra care.

Users can reclaim their compile time spellchecking by putting variable names into global constants.

Example of using the variable:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
static const char *aabbb_parent = "ALICE_AND_BOBS_BUFFER_BOOPER.parent";
CUSTOM_COMMAND_SIG(switch_to_parent_buffer)
CUSTOM_DOC("If this buffer has set it's parent buffer, switch to the parent buffer")
{
    View_Summary view = get_active_view(app, AccessProtected);
    Managed_Scope buffer_scope = buffer_get_managed_scope(app, view.buffer_id);
    Managed_Variable_ID varid_parent = managed_variable_create_or_get_id(app, aabbb_parent, 0);
    uint64_t parent_buffer_id = 0;
    if (managed_variable_get(app, buffer_scope, varid_parent, &parent_buffer_id)){
        if (parent_buffer_id != 0){
            view_set_buffer(app, &view, parent_buffer_id, 0);
        }
    }
}


Objects (sans Orientation)

If you're anything like me the word object immediately puts you in a state of unease. The word object is usually reserved for object oriented programming paradigms that I have long ago learned to avoid. In fact the objects in this API are not object oriented at all, it's just that no other generic non-descriptive noun fits very well either. "Entity" is even more reserved for game engines than "object" is for OOP. These objects work as memory allocations, but they can often work as more than just a memory allocation so "Memory" feels inapropriate as well as "Array". "Buffer" would be very appropriate but in a text editor that word binds more closely to the text storage system and would be very confusing. Although I find it unfortunate, "Object", is the best I have been able to do, so instead of infering meaning from the name, try to focus on how I describe it.

Objects were first introduced to store arrays tied to buffers and views. Arrays could have been simulated with variables by appending an index to a string name and doing everything in variables, but I would like to stay out of the business of generating variable names at run time. If we think of an object as a generic array system, most of the API just materializes from the simples possible setup. We will need an allocation call, a free call, a call to store data into the array, and a call to load data out of the array. We have the choice of either always doing byte arrays and letting the usage code manually multiply by sizeof(T) or having the user specify the allocation by item size and count.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Always by Byte vs Indexed by Item Size
// Do we want more control at the site of store/load?
void foo_by_byte(Application_Links *app, Managed_Scope scope){
    Managed_Object object = alloc_managed_memory_object_in_scope(app, scope, sizeof(Foo)*100);
    Foo foo[100];
    make_up_these_foos(app, foo, 100);
    // I have more freedom here to do any byte indexes I want,
    // I am not a slave to what the alloc call thought this
    // memory was for.
    managed_object_store_data(app, object , 0, sizeof(Foo)*100, foo);
    Assert(sizeof(Foo)*100 == managed_object_get_size(app, object ));
}
// Do we want more control at the size of alloc?
void foo_by_byte(Application_Links *app, Managed_Scope scope){
    Managed_Object object = alloc_managed_memory_object_in_scope(app, scope, sizeof(Foo), 100);
    Foo foo[100];
    make_up_these_foos(app, foo, 100);
    // My freedom here is gone :( I have to follow the
    // addressing rules set by the alloc call.
    // At least I have less to type!
    managed_object_store_data(app, object , 0, 100, foo);
    Assert(sizeof(Foo) == managed_object_get_item_size(app, object ));
    Assert(100 == managed_object_get_item_count(app, object ));
}


I eventually decided to use separate item size and count, but the reasons for that will not be clear until part 5 of this series.

As I said sometimes objects act as more than just "Arrays". The codebase currently supports two types of objects, and there is a third type very clearly defined in my head that I will probably be adding either in this build or in a subsequent build not too far off. All object types share the trait that they have an item size, an item count, can store and load data, and can be freed. The presence of different types basically just means expanding the API to one allocation call per type, and adding a get type query.

Put together the api for object features:
1
2
3
4
5
6
7
8
9
Managed_Object alloc_managed_memory_in_scope(Application_Links *app, Managed_Scope scope, int32_t item_size, int32_t count);

int32_t managed_object_get_item_size(Application_Links *app, Managed_Object object);
int32_t managed_object_get_item_count(Application_Links *app, Managed_Object object);
Managed_Object_Type managed_object_get_type(Application_Links *app, Managed_Object object);
Managed_Scope managed_object_get_containing_scope(Application_Links *app, Managed_Object object);
bool32 managed_object_free(Application_Links *app, Managed_Object object);
bool32 managed_object_store_data(Application_Links *app, Managed_Object object, uint32_t first_index, uint32_t count, void *data);
bool32 managed_object_load_data(Application_Links *app, Managed_Object object, uint32_t first_index, uint32_t count, void *mem);


This list of calls includes the allocator for managed memory but the other specialized type allocation call is not included because we have a lot to learn and discuss about the new systems before it will make sense.

A Managed_Object handle fits in 64-bits and converts to unsigned integers so the handles can be stored in variables. One thought I have had is using "named objects". Essentially using the variable method of putting a handle onto an object, but then using the object method of allocation. The initial reason I resist that thought is that then every call involved with creating variables and every call involved with querying and modifying objects has to be duplicated into an incompatible set of signatures. If Alice has written some generic code that can do some work given an object as a parameter, and Bob has stored his object as a "named object", we would ideally like YOU to be able to use both of these systems, and pass Bob's object to Alice.

How is that going to work? If Alice has implemented her thing to work with the Managed_Object and names, she will either have to duplicate her code, or she will have to make some kind of generic wrapper for both handle types. We obviously don't want her to duplicate code, and if she has to write her own wrapper, then so will Bob and every other module, and they'll all be incompatible. We could provide a standard wrapper, but the it sounds to me like we added two entry points to a system, and then didn't actually want two entry points, and so wrapped it back up into one entry point. If we really want to write the code:

1
2
3
4
5
6
void foo_named_object(Application_Links *app, Managed_Scope scope){
    Managed_Variable_ID id = managed_variable_create_or_get_id(app, "FOO_MODULE.bar_object", 0);
    Foo foo[100];
    make_up_foos(app, foo, 100);
    named_managed_object_store_data(app, scope, id, 0, 100, &foo);
}


We can just make named_managed_object_store_data as a wrapper, and then Bob's code can work that way, and when we want to pass to Alice we just go below the wrapper and get the Managed_Object. I am also not convinced we ever want this so badly that it's even worth building the wrappers, but if it does become an issue, this is the plan.

In the next part, I will go into scopes, which brings this all together and is way more intricate than you might think!
In both foo_by_byte functions: "object" should be "managed_object" on the allocation line.
1
2
3
4
void foo_by_byte(Application_Links *app, Managed_Scope scope){
    Managed_Object object = alloc_memory_object_in_scope(app, scope, sizeof(Foo)*100);
    ...
}


Mr4thDimention
I am also not convinced we every want this so badly that it's even worth building the wrappers, but if it does become an issue, this is the plan.

I believe you meant "... we ever want...".