4coder 4.0.18: Pushed to the Limit

Hey everyone!

The newest version of 4coder, 4.0.18, will be the first to feature full unicode support. As of today I have all of the unicode features working that I want for the build. 4coder now translates UTF8 into codepoints, and the font system supports rendering all the codepoints provided by a font. I restructured the font system so that now users can drop .ttf files into the new font folder and 4coder will load them at launch. It looks like everything will be ready to ship in just a few days, once I finish doing a few details here and there.

4coder 4.0.X gets pushed to the limit
I first announced that I was going to begin working on line wrapping and unicode support back in September 2016 . I did not fully explain why I saw those two features as related, because I was not sure if my vision for the system was actually going to work, and I wanted to wait until I had really seen everything up close first.

Back then I started to suspect that I would need a dramatic change in the way buffer coordinates and layout are handled to keep the system maintainable. But I know that sometimes doing the big dramatic change is overkill. Sometimes it is actually possible to get everything to fit together nicely by just typing it in and following the rules of the system that are already there.

So since September, six months ago, I've been working on code wrapping and unicode support. To fit those features into the 'complexity budget' I had to buy back some of the complexity I spent in related details of the systems. Improve the syntax and semantics of the buffer streaming over here and now have the code wrapping system use two parallel buffer streams over there. Stop doing incremental updates to this piece of meta data in this situation here and now generalize the meta data measurement system over there. Ultimately it was a six month project to confirm that the rules of the old system just aren't powerful enough to keep the complexity down and the performance up.

In 4.0.18 I have TRIED to keep performance up. And, I think, I mostly succeeded... actually line insertion in big code files feels a little bit squishy on my new Windows 10 laptop. But a lot of things feel squishy on there to me, even 4coder 4.0.17, so it's hard to be sure. Either way I have definitely blown past the complexity budget in this version, and so I think this is the end of the line for 4.0.X.

A little background
First, any text editor will have to use some sort of data structure for the actual text in the buffer. Here are a few of the options:

Flat Array - You just store your text in a contiguous block, to insert you shift characters over and copy new characters in, to delete you shift characters the other direction.

Gap Buffer - Store the text in two contiguous blocks, with a gap of memory in between. When insertions happen first move some text from one side of the gap to the other, then insert into the gap. To delete move the text specified for deletion to the edge of the gap, then shrink the size variable on the appropriate text block to indicate that the text is gone.

Fixed Width Gap Buffer - This idea was brought to my attention by Martin Cohen who is implementing Hale. Here the buffer is split into n gap buffers, with each gap buffer having the same size. To do inserts and deletes you often just apply the gap buffer rule to one of the gap buffers, occasionally you have create new fixed width gap buffers and copy content into them. Under the right conditions it is actually possible to get edits down to O(1) time.

Ropes - Store a lot of separate strings as the leaves of a binary tree. Edits involve splitting strings into two nodes rearranging the tree and discarding nodes or inserting some new nodes. This strucutre is pretty tough to get working, and even tougher to optimize, but if handled properly it can 'magic' a lot of operations down to being almost free by just inserting a node somewhere.

Back in November of 2015 I implemented each of these and the primary driving concern in 2016 (besides supporting Linux) was repositioning 4coder so that I could switch it off of a flat array and onto a gap buffer. I chose the gap buffer because it required the least sophisticated optimizations to achieve very good performance. Both FWGBs and Ropes could have achieved better performace if I did some sophisticated optimizations for them. But their base complexity + the complexity of the sophisticated optimizations was many times greater than the complexity of the gap buffer.

It is unfortunate that the flat array was so much slower than the gap buffer. Even the gap buffer has the disadvantage that the text is not contiguous. This becomes a problem because a lot of operations want to loop over the buffer and look forward and backward through the buffer, and in a contiguous array it is trivial to use arbitrary access patterns. For instance, the first lexer I wrote assumed it could look forward and backward through the text and do arbitrarily complex control flow, with ifs and loops and incrementing and decrementing the 'i' variable for the loop. Any buffer data structure other than a flat array puts the text into two or more places, so a simple for loop does not work and arbitrary access is gets expensive quickly.

One way to fix this is just to write a little routine that takes the place of indexing into the array:

1	char buffer_index(Buffer *buffer, int i);

This way everything is semantically identical. You can still write a simple for loop, and wherever you were reading from the array you just call the indexing function instead. Choosing this option means choosing lower complexity in exchange for losing a little bit of speed.

I went a different direction. I figured that most operations can be written in terms of a loop that goes from front to end, without ever skipping backward or forward. So I made what I have since called a 'buffer streaming' solution. Basically to loop over a buffer you just initialize a buffer stream, setting any start and end indexes that you want, and it handles moving from range to range through the buffer. The stream provides a 'data' array that you can index as if it were contiguous. It is not safe to jump ahead or backwards, but in terms of speed it is pretty close to a contiguous array. Except that whenever you get to the end of the range you have to ask the streaming system to move to the next range. The code looks something like:

Buffer_Stream stream = {0};
if (init_stream(&stream, buffer, start, end)){
    bool32 still_looping = true;
    do{
        for (; i < stream.end; ++i){
            char ch = stream.data[i];
            // do some work
        }
        still_looping = stream_step_forward(&stream);
    }while(still_looping);
}

With the gap buffer the cost of switching to the next range is almost zero, since it only ever happens once.

To be clear, these are just first order problems. No matter what combination of choices I made for buffer data structure + access API, I still would have hit the complexity ceiling that is making 4.0.18 the end of the 4.0.X series. So far I am just making sure that you are prepared think about the second order problems with some solid footing. So the take aways from this section are:

1. The underlying system stores a series of characters in some way, in 4coder it happens to be a 'Gap Buffer'.

2. The access API abstracts away the actual location of text so that systems using the buffer only need to know the right 'i' value to do an operation or a read. This is true whether you use an indexing function or a streaming apparatus.

The API I Want
The basic pieces I have talked about so far are all fine and dandy. To implement lots of little commands like the primitives for reading and writing work just fine. But here are some commands that will come along and complicate the situation:
"Jump to line 100"
"Put the cursor on the nearest character to the mouse coordinates"
"Delete the text inside this rectangle on the screen"

Actually, it's not that you would struggle to implement them correctly. If you just sacrificed performance entirely and wrote the simplest possible thing, these would not be so bad:

"Jump to line 100" -> Start at i = 0 and go forward until you've seen 99 '\n' characters

"Put the cursor on the nearest character to the mouse coordinates" -> Start at i = 0 and go forward keeping track of the x and y values, until you are as close as you can get to the mouse.

"Delete the text inside this rectangle on the screen" -> Start at i = 0 and go forward until you find the first corner of the rectangle, then keep track of the first and last character on each line until you're past the bottom of the rectangle, now perform the delete on all of the ranges.

There is another, related problem: rendering the buffer. If you never want to scroll down, then you're fine. Just render all the characters that fit on the screen and stop. But clearly this is a bad plan, because most files don't fit on the screen. To handle scrolling the view, you first have to store some value to represent the location of the view. If you store the line number, then you've reduced the problem to "Jump to line #" of course this doesn't support horizontal scrolling so maybe you'll need another value for how many characters over from the start of the line the view should be. If you want smooth scrolling you're really going to wish you could store x and y as coordinates and now it reduces to the problem of finding the the character nearest to some x & y position.

The simplest possible approach of measuring from i = 0 just doesn't cut it very long. Once there are some commands that do ~100 of those measurements, or your file grows to 10MB or so, that approach just doesn't hold up.

All of these problems come down to one thing: They want to think about the buffer in terms of 'buffer coordinates' that differ from the natural coordinates. The natural coordinate system is the one the primitive buffer structure understands. 'i = 0' is the first byte of the buffer, 'i = 1' is the second byte, etc. If your problem can be stated in those terms, then you're in good shape. However if you want to "Go to line 100" it all falls apart. In one buffer line 100 could be at i = 100, and in another buffer it could easily be at i = 1400. All this business about starting at i = 0 and counting '\n' is really doing a coordinate convertion from line numbers to natural indexes.

Before I go on, I want to make sure I've doubly convinced you that this is a real and interesting problem. You might be thinking something like "Well why would you start at the beginning of the file every time!? Start somewhere not too far away!" That is exactly the right idea, but if you don't start at i = 0, how do you know which line number you've started on? These second order problems are all about being able to choose a closer starting point and get the values of all the coordinates at the starting point, so you can use it to seek to your query point.

Based on this understanding, I structured 4coder to have seek operations exposed to the custom layer. The seek operations allow the user to query a position in one coordinate system, and get back the coordinates of the nearest real position in all coordinate systems. Because of the power of this API I prefer to add new features in the custom layer whenever I can. With that API in place, the real problem is how to implement it.

Some Basic Methods
Earlier I mentioned Hale, and this problem is another one where Hale does something interesting that I do not do. To be fair I have not chatted with Martin Cohen about Hale for quite some time, so my info might be out of date, but at least back then, Hale used a system that I called "relative positioning". With "relative positioning" you always remember the coordinates of your current cursor. What line number it's on, what it's x and y values are, and any other coordinates you care about. You make sure to get those values right whenever you move the cursor. Then you take advantage of the fact that most operations happen near your cursor. If you want to "Go five lines up" you just start moving backwards until you've seen five '\n' characters and then you find the beginning of the current line and move forward until your x is as close to the original value as possible. In this system all of your coordinate problems are solved 'relative' to the the cursor, or you just revert to the i = 0 approach. If you are on line 105, then "Go to line 100" is not going to be too bad. However if you're on line 500 then you're better off starting at i = 0.

The simple part of the relative position system is that there is very little meta data work, but the complex part is the difficulty of actually performing the relative seeks. For instance, when you're seeking backwards you never know the x and y values of the cursor, because as soon as you back up to a previous line, that line could be arbitrarily long or short. You have to get back to the beginning of the line where x = 0 to know where things are. It also complicates your life if you want to do lots of work in absolute positions.

I decided 4coder should be able to handle all of it's buffer coordinates with absolute positioning without having to seek from i = 0 all the time. This moves all of the complexity off of the 4coder custom layer because the programmer can seek anything they need and not have to worry about performance. It moves the complexity into the application core where some sort of buffer coordinate meta data needs to supplement the buffer itself to make everything work.

When the editor was just getting started the obvious coordinate systems to support were:
1. natural coordinates - indexes into the buffer as an array of bytes
2. line & column - aligns with the system used to specify compilation errors and to enable go to line style commands
3. screen x & y - aligns with the coordinates of the mouse and the renderer

The meta data for supporting those coordinates can be pretty straight forward. An array that marks the 'natural coordinate' of the first character of each line sovles the problem. This is trivial to generate at buffer load time, and then the meta data can be incrementally updated after each edit, with ideal performance. To do coordinate conversions, the user simply specifies a position in the buffer in whatever coordinates they have. If it is natural coordinates, binary search the array to find the starting line. If it is screen x & y just divide y by the line height round down to get the line number. Once you have a line number generate a 'cursor hint' for the first character of the line, and then iterate forward to the position you're looking for, which is always pretty close.

Unforunately that simple system did not last. Line wrapping was the first problem that came up. The problem with line wrapping is that it ruins the rule that each textual line corresponds with each visual line so it could no longer rely on:

1	line_index = floor(y/line_height);

On top of that, with line wrapping the coordinate system 'screen x & y' was suddenly dependent on more state than just the text in the buffer. I decided not to conditionally apply wrapping to the screen x & y coordinates. Instead I just introduced a new coordinate system:
4. wrapped x & y
And I modified the 3rd coordinate system to:
3. unwrapped x & y

So now seeks give back both the unwrapped and wrapped x & y and it's up to the user to choose whichever one is appropriate.

This also complicated the meta data because now I needed to convert between "line_index" <-> "wrapped_y", but that was still possible to do while keeping incremental recomputes working and everything could have gone on it's merry way like that forever... except I had a few more features to fit in.

Where the Situation Gets out of Hand
Long time 4coder users will remember that at one time, 4coder wrapped by character. It would put the wrap point anywhere even right in middle of a word. This setup for the wrappigng system was not arbitrary, because anything more complicated would make the meta data problem worse. As the details work out, in order to support word wrapping in text files it took another array of meta data. But the situation with code is quite a bit trickier than that. With text it's expected that wrapped text just goes back to x = 0. A good code wrapper, on the other hand, should understand how to align the code that it wraps. You don't want to see:

1 2	foo(param1, param2, param3, param4);

What you'd much rather have is:

1 2	foo(param1, param2, param3, param4);

So to support proper wrapping for code I knew I would need to specify an x offset with each wrap, and that, somehow, has to be captured in the meta data system so that it can calculate the line_index <-> wrapped_y conversion correctly. Once I was handling that correctly, an ugly detail emerged. A line that looks like a lot of spaces or tabs followed by code, sometimes actually is a lot of spaces or tabs followed by code, but sometimes it's a wrapped code line with no characters before the code. Internally the picture gets ugly too. The position of the wrap depends on the positioning of the original line of code, and so it is dependent on however many spaces and tabs come before that line and the widths of that leading whitespace, so I would have needed to individually measure each whitespace character leading the line, even though indented code starts at very predictable x values. To solve both of those wrapping problems at once, and to simultaneously solve tabs vs spaces, I needed 'Virtual Whitespace'.

Virtual whitespace acts as if no line has any leading whitespace. The phrase 'virtual whitespace' is meant to emphasise the layer of separation between the space that is actually in the buffer, and the space that is presented to the user. Now you might be thinking "Well there is your problem, Allen! That's just too complicated!" but that's not quite true. For one thing, the real complexity of code wrapping is in the interoperation between the wrapping meta data generation and the code wrapping semantic rules. And it also turns out that the primitive that it takes to support unicode also trivially supports virtual whitespace.

Consider for a moment the problem of adding unicode support to an editor that only supports ascii. In some ways it's not actually too bad if you use UTF8, because UTF8 is designed so that ascii is just a subset of UTF8. Things like the buffer data structure already work correctly, with no changes at all. In some other ways it's pretty daunting. How do you take 3 bytes, which used to mean 3 separate characters, and instead put them on the screen as single unit? When a user deletes a character, they don't expect it to break out into multiple bytes that each get deleted separately, so a 3 byte UTF8 character has to be treated as one character. This is the principle that I saw in common between these two systems: a separation between the bytes that are actually in the buffer, and what gets represented as a character to the user.

If this is done correctly the command that deletes a one byte character and a three byte character should look the same. It shouldn't be a situation where the programmer of the 'delete' command has to check the UTF8 length and delete exactly the right number of bytes. If virtual whitespace is going to work, the writer of 'delete' certainly can't be expected to do all the work of figuring out how many spaces to delete in case the delete happens at the beginning of a line. The core plan for solving all of this is to leverage the coordinate system conversion API, by adding a fifth coordinate system:
5. apparent character - a 0 based count of the number of 'apparent characters' from the start of the buffer

With apparent characters the custom layer author can write 'delete' by specifying the range as one 'apparent character'. Furthermore the cursor cannot land on all possible indexes in the buffer anymore, the cursor can only land on the apparent characters. Adding this to the system required yet another meta data array, but ultimately it was possible. The rendering rule also needed a tweak which is not really captured in the meta data, but that by itself is not very complicated to do.

Back in September that's what I was foreseeing. That connection, through the coordinate system, was the big plan for making both code wrapping and unicode manageable. But if you've been reading from the beginning, you already know how the story ends. That plan wasn't enough. The major failure isn't in the API at all, which has turned out exactly as I envisioned. The problem is that the adhoc meta data system, that was fine originally, now has a few too many interconnections.

Here is a quick summary of what it takes to make the system work:
1. The meta data generation system, the coordinate calculating system, and the rendering system, all have to be independently implemented in such a way that they line up at every point of the buffer's layout across all five coordinate systems.
2. The meta data generation system has to simultaneously measure the widths of lines for wrapping, examine tokes to choose optimal wrapping positions and apply the correct line shifts, generate virtual whitespace meta data, and generate all the other line starts meta data.
3. All of the above systems have to stream the buffer, translate utf8 to codepoints, and then perform their operations on those codepoints instead of on the individual bytes coming from the stream, while also translating out some of the whitespace and applying line shifts.
4. Each edit requires an incremental change to the meta data, which means not only parsing out the edit itself, but also figuring out how to fit the new meta data into the existing meta data, and where to shift the values in the existing meta data.
5. And since unicode is pretty large, it is not a great idea to load all possible codepoints, so the font system has to fully participate at every stage in case a codepoint comes up that is in an unloaded glyph page and the system needs the glyph metrics.

In my mind this is exactly a complexity budget issue. Everything I listed could be done in a way that is performant in theory, but after going through the debugging process to get code wrapping to work properly today, I've realized there is no way for me to get it to that level and keep it there for long. Ultimately I gave up on making the measurement system incremental, so 4coder is now redoing the entire meta data generation with every edit. As I said at the beginning, this appears to work reasonably well, thanks to extremely beefy modern hardware. But the current situation is not going to last whether I gut the system now, or wait until it kills me first.

Summary
So there you have it. The problem that faces me. I need to come up with a novel way to speed up absolute cursor positioning and buffer layout, with support for separating buffer contents from presentation, utf8 decoding, and programmable wrapping rules. Luckily I have been thinking about this problem since the announcement in September, when I first wondered if the current system would hold up, and I think I have a way to do it.

If you've made it to the bottom, then thanks for going on that wild journey with me! And thanks to everyone who has been following and supporting 4coder through more than two years of experimentation!

-Allen

4coder 4.0.18: Pushed to the Limit

Comments