We are a community of programmers producing quality software through deeper understanding.

Originally inspired by Casey Muratori's Handmade Hero, we have grown into a thriving community focused on building truly high-quality software. We're not low-level in the typical sense. Instead we realize that to write great software, you need to understand things at more than a surface level.

Modern software is a mess. The status quo needs to change. But we're optimistic that we can change it.

Latest News

Ben Visness

Hello everyone! Just dropping by to make three announcements of upcoming events and projects here on the network over the summer:

1. Fishbowl on Saturday, May 28

Our next fishbowl conversation will be happening on the Discord on May 28! If you've never joined us, a fishbowl is essentially a panel conversation held over text chat, where a select few participants can hold a focused long-form discussion on a specific topic. We're a big fan of this format because of the way it allows participants to write thoughtful, long-form answers that would be impossible in any other medium.

We've had fishbowls on the future of operating systems in the age of the Internet, the relationship between simplicity and performance, "configuration" and how to avoid it, and many other topics. This time we are excited to tackle a subject near and dear to all our hearts: object-oriented programming.

We hope for this to be the definitive community conversation about OOP, since it's such a frequent topic of conversation and such a confusing issue to discuss. If you'd like a preview of the topic or would like to contribute thoughts or resources, please check out the planning discussion over on GitHub.

The fishbowl will be held in the #fishbowl channel on Discord at noon CDT. To see the time in your local timezone, and be notified when the fishbowl starts, bookmark this whenisit page!

2. Jam in August

We've held two jams as a community in the past, and they have both been smashing successes. In particular, last year's Wheel Reinvention Jam produced tons of great projects and introduced us to lots of amazing community members. Naturally, we're going to do another jam this year!

We are still working out all the details, so a proper announcement and final dates are yet to come. But we will confirm that this year's jam is happening in August! Expect it to follow a similar format to prior years—so, one week, and working in teams is allowed and encouraged.

More soon. 🙂

3. A new education initiative

This project is still in its early stages, but too exciting not to tease. Spearheaded by long-standing community member cloin, we are hard at work behind the scenes on a new kind of education initiative for programmers. This new initiative will subsume and replace the old library, which we know many people sorely miss, and will also hopefully live up to the dream of the old wiki and old education initiative.

We are very excited to share more about this project with you in the coming months. For now, if you're interested, nag us on Discord and maybe we'll drop a few more hints. 😉

A call for volunteers!

These are just a few of the projects we have in the works for the coming year, and there is even more we'd like to be doing. We'd love to get more people involved in the following:

  • Helping maintain the website and build new features for the community
  • Developing and organizing starter projects for people to use (particularly for the jam)
  • Beta testing new initiatives

If you're interested in any of these, please reply to this post or hit us up on Discord!

Around the Network

Shastic
Dawoodoz
New forum thread: C with auto generated headers
Shastic
x13pixels
Forum reply: Parallel Stacks
x13pixels
Forum reply: Set Thread Name
x13pixels
Shastic
Asaf Gartner
Shastic
Shastic
Blog comment: Three Announcements
Asaf Gartner
Blog comment: Three Announcements
Simon Anciaux
Blog comment: Three Announcements
Neo Ar
New blog post: Three Announcements
Ben Visness

Hello everyone! Just dropping by to make three announcements of upcoming events and projects here on the network over the summer:

1. Fishbowl on Saturday, May 28

Our next fishbowl conversation will be happening on the Discord on May 28! If you've never joined us, a fishbowl is essentially a panel conversation held over text chat, where a select few participants can hold a focused long-form discussion on a specific topic. We're a big fan of this format because of the way it allows participants to write thoughtful, long-form answers that would be impossible in any other medium.

We've had fishbowls on the future of operating systems in the age of the Internet, the relationship between simplicity and performance, "configuration" and how to avoid it, and many other topics. This time we are excited to tackle a subject near and dear to all our hearts: object-oriented programming.

We hope for this to be the definitive community conversation about OOP, since it's such a frequent topic of conversation and such a confusing issue to discuss. If you'd like a preview of the topic or would like to contribute thoughts or resources, please check out the planning discussion over on GitHub.

The fishbowl will be held in the #fishbowl channel on Discord at noon CDT. To see the time in your local timezone, and be notified when the fishbowl starts, bookmark this whenisit page!

2. Jam in August

We've held two jams as a community in the past, and they have both been smashing successes. In particular, last year's Wheel Reinvention Jam produced tons of great projects and introduced us to lots of amazing community members. Naturally, we're going to do another jam this year!

We are still working out all the details, so a proper announcement and final dates are yet to come. But we will confirm that this year's jam is happening in August! Expect it to follow a similar format to prior years—so, one week, and working in teams is allowed and encouraged.

More soon. 🙂

3. A new education initiative

This project is still in its early stages, but too exciting not to tease. Spearheaded by long-standing community member cloin, we are hard at work behind the scenes on a new kind of education initiative for programmers. This new initiative will subsume and replace the old library, which we know many people sorely miss, and will also hopefully live up to the dream of the old wiki and old education initiative.

We are very excited to share more about this project with you in the coming months. For now, if you're interested, nag us on Discord and maybe we'll drop a few more hints. 😉

A call for volunteers!

These are just a few of the projects we have in the works for the coming year, and there is even more we'd like to be doing. We'd love to get more people involved in the following:

  • Helping maintain the website and build new features for the community
  • Developing and organizing starter projects for people to use (particularly for the jam)
  • Beta testing new initiatives

If you're interested in any of these, please reply to this post or hit us up on Discord!

Forum reply: Parallel Stacks
Mārtiņš Možeiko
New forum thread: Parallel Stacks
Dbug
Forum reply: Set Thread Name
Dbug
Macoy Madson

This article is mirrored on my blog

My linker in development was crashing on free. It was calling one malloc but then freeing with a free associated with a different malloc. This subsequently caused a segmentation fault because the free expected a metadata structure that didn't exist in the other malloc (at least, not at the same size).

This took a lot of sleuthing. I knew it was a problem with my linker/loader, but couldn't step-debug it because my loader doesn't create a program image the way GDB expects.[^1]

This article goes over how I found the issue. It might help you if you are encountering strange things with your work-in-progress linker (ha ha!), or like reading about someone debugging something.[^2]

Finding the problem

The problem manifest in my attempt to statically link[^3] the following simple Cakelisp program:

(add-c-search-directory-module "/home/macoy/musl/include")
(c-import "stdio.h" "stdlib.h")

(defun main (&return int)
  (fprintf stderr "Hello, C runtime!\n")
  (var data (* char) (type-cast (malloc (* (sizeof (type (* char))) 10))
                                (* char)))
  (fprintf stderr "Allocated and got %p!\n" data)
  (set (at 0 data) 0)
  (fprintf stderr "Accessed %p, it's now %d\n" data (at 0 data))
  (free data)
  (fprintf stderr "Freed!\n")
  (return 0))

This program first proved that musl libc was at least partially functional by successfully printing to stderr. However, the program segfaulted in free.

I used a combination of a signal handler and rudimentary stack printing via backtrace.h to find I was calling the incorrect malloc function relative to the later free.

I discovered this by noticing that I could successfully set the data returned by malloc without encountering a segmentation fault, so the memory was at least valid.

I then hacked together a damn simple "interactive debugger", which gets triggered when SIGSEGV is caught by my signal handler[^4]:

;; Very minimal!
(defun-local interactive-debugger ()
  (fprintf stderr "Commands:\n
\tquit\n
\tprint-symbol [symbol-name]\n")
  (var print-symbol-tag-length (const int) (strlen "print-symbol"))
  (while 1
    (fprintf stderr "> ")
    (var input ([] 256 char) (array 0))
    ;; Note: We need to request stdin before running this in a signal handler!
    (fgets input (sizeof input) stdin)
    (cond
      ((or (= 0 (strcmp "quit\n" input))
           (= 0 (strcmp "q\n" input)))
       (break))
      ((= 0 (strncmp "print-symbol " input print-symbol-tag-length))
       (var symbol-name-buffer ([] 128 char) (array 0))
       (strcpy symbol-name-buffer (+ input print-symbol-tag-length 1))
       (set (at (- (strlen symbol-name-buffer) 1) symbol-name-buffer) 0)
       (fprintf stderr "Searching for '%s'\n" symbol-name-buffer)
       ;; This prints where it finds it for us
       (var symbol (* void) (find-symbol-address-in-allocated-sections
                             symbol-name-buffer))))))

The print-symbol command alerted me to the fact that malloc was resolving to the lite_malloc.c implementation, but free was resolving to the mallocng implementation.

I then started looking at lite_malloc.c and untangling the mess.

Let's walk through the issue.

musl's malloc implementation

musl has a "lite" or simple malloc that is defined as a fallback when e.g. mallocng isn't included in your musl build.

It is defined like so:

static void *__simple_malloc(size_t n)
{
    // [Implementation omitted by article author]
}

weak_alias(__simple_malloc, __libc_malloc_impl);

void *__libc_malloc(size_t n)
{
    return __libc_malloc_impl(n);
}

static void *default_malloc(size_t n)
{
    return __libc_malloc_impl(n);
}

weak_alias(default_malloc, malloc);

After reading several pages on this relatively obscure feature, I came to understand what these weak_alias macros accomplish.

If the user defines their own malloc, that creates a strong definition, thereby overriding musl libc's malloc.

If the user does not define their own malloc, the definition of default_malloc will be resolved to by the linker when the linker asks for malloc.

This I deduce is accomplished like so:

  • GCC sees __attribute__(weak, alias("malloc")) associated with default_malloc (either via #pragma or direct attribute on the default_malloc definition)
  • GCC generates the code for default_malloc, and puts it in the object file under the ELF symbol name malloc and default_malloc, which we can confirm with nm:
~/Repositories/linker-loader $ nm --defined /home/macoy/Downloads/musl-1.2.3/obj/src/malloc/lite_malloc.lo
0000000000000000 b brk.2119
0000000000000000 D __bump_lockptr
0000000000000000 b cur.2120
0000000000000000 t default_malloc
0000000000000000 b end.2121
0000000000000000 T __libc_malloc
0000000000000000 W __libc_malloc_impl
0000000000000000 b lock
0000000000000000 W malloc
0000000000000000 b mmap_step.2122
0000000000000000 t __simple_malloc

The W denotes a weak symbol. Note that you would be able reference default_malloc directly, i.e. the alias isn't an override, but in this case default_malloc is marked static, so it will not be exposed to the linker by its true name.

The other weak_alias on __simple_malloc is the one that broke my loader. This alias accomplishes a different goal. In case the user has not defined their own malloc, default_malloc will be called, which references __libc_malloc_impl. The weak alias on __simple_malloc says "If __libc_malloc_impl is not defined, then use __simple_malloc instead."

The intent I believe with this alias is to allow users when building musl to switch which malloc implementation musl will use internally and by default. This is corroborated by the configure --help prompt:

Optional packages:
  --with-malloc=...       choose malloc implementation [mallocng]

The problem with my linker

I end up finding a weak definition of malloc, which resolves to calling default_malloc. This is actually the right behavior because there is no strong definition of malloc, because I never override it in the user program.

However, __libc_malloc_impl resolves to the weak __simple_malloc, when it should instead resolve to the strong __libc_malloc_impl provided by musl libc's mallocng implementation:

~/Repositories/linker-loader $ nm --defined /home/macoy/Downloads/musl-1.2.3/obj/src/malloc/mallocng/malloc.lo
0000000000000000 t alloc_slot
0000000000000000 r debruijn32.3106
0000000000000000 t enframe
0000000000000000 t get_stride
0000000000000000 T __libc_malloc_impl
0000000000000000 T __malloc_alloc_meta
0000000000000000 T __malloc_allzerop
0000000000000000 T __malloc_atfork
0000000000000000 B __malloc_context
0000000000000004 C __malloc_lock
0000000000000000 R __malloc_size_classes
0000000000000000 r med_cnt_tab
0000000000000000 t queue
0000000000000000 t rdlock
0000000000000000 t size_to_class
0000000000000000 r small_cnt_tab
0000000000000000 t step_seq
0000000000000000 t wrlock

Here are the relevant lines adjacent to each other, for comparison:

# lite_malloc.lo:
0000000000000000 W __libc_malloc_impl

# malloc.lo:
0000000000000000 T __libc_malloc_impl

W denotes a weak symbol definition while T denotes a strong public/global symbol defined in the text section of the object file.

In the ELF specification (PDF), the issue becomes quite clear (emphasis mine):

When the link editor combines several relocatable object files, it does not allow multiple definitions of STB_GLOBAL symbols with the same name. On the other hand, if a defined global symbol exists, the appearance of a weak symbol with the same name will not cause an error. The link editor honors the global definition and ignores the weak ones. Similarly, if a common symbol exists (i.e., a symbol whose st_shndx field holds SHN_COMMON), the appearance of a weak symbol with the same name will not cause an error. The link editor honors the common definition and ignores the weak ones.

I was resolving __libc_malloc_impl to the weak definition in lite_malloc.lo instead of the strong definition in malloc.lo.

Later, when I try to free, I end up referencing the free I find in malloc/free.lo, which just calls __libc_free, which is only defined in malloc/mallocng/free.c. If instead there was a corresponding __simple_free, I never would have realized that I was calling __simple_malloc.

Of course, I had the TODO item to implement this all properly before I began debugging:

*** TODO Symbol resolution needs to be addressed, especially once I
         load programs that override "weak" functions

I had put it off not knowing whether it would become an issue. It isn't a straightforward implementation, which is why I didn't do it immediately.

Now, after not doing it and seeing why it is important, I have gained a better understanding of how it is supposed to work.

Takeaways

If you do not implement the specification to a T[^5], you may end up debugging tricky things like this without any debugger nor good idea of what's going wrong.

The advantage is if you persist, you can learn new tools for understanding the problem and investigating the data.

If you're interested in other linker adventures, read about my "linker-loader" project, which talks about why I even bother with all this work.

You can also read the much simpler Know What Your Linker Knows article where I explain how objdump can be useful when debugging link errors. Around the time I wrote that article was when I started learning more about linkers, which are something you don't really have to think too hard about during regular program development.

[^1]: It does not meet the assumptions required by the GDB add-symbol-file, so I couldn't use gdb even if I manually added objects one-by-one, with specified offsets in memory

[^2]: There must be like, a dozen people in the world that would meet that criteria, right? Right?!

[^3]: I wanted to statically link to musl libc because dynamic linking to e.g. glibc seemed much more complicated, especially because I did not have any dynamic linking support yet in my linker/loader.

[^4]: Yes, I am aware I am calling functions which aren't safe to call in signal handlers. In my case I am not expecting to "ship" this debugger, it's only a means to an end, so it ended up being fine.

[^5]: Why hadn't I? Well, I find implementing things piece-by-piece and testing as I go to result in higher success rates than all-or-nothing pushes. Sometimes it bites you when the missing pieces are essential to the next test. Also, sometimes you're not really sure how to make things until you're halfway down the road making it, because it's unique/experimental and/or you don't fully understand the purpose of the specification.

emerson
New blog post: Imports and modules
Christoffer Lernö

When talking about packages / modules, I think it's useful to start with Java. As a language C/C++ but with an import / module system from the beginning, it ended up being a very influential.

Importing a namespace or a graph

Interestingly, the import statement in Java doesn't actually import anything. It's a simple namespace folding mechanism, allowing you to use something like java.util.Random as just Random. The fact that you can use the fully qualified name somewhere later in the source code to implicitly use another package, means that the imports do not fully define the dependencies of a Java source file.

In Java, given a collection of source files, all must be compiled to determine the actual dependencies. However, we can imagine instead a different model where the import statements create a dependency graph, starting from the source file that is the main entry point. In this model we may have N source files, but not all are even compiled, since only the subset M can be reached from the import graph.

This later model allows some extra features. For example we can build the feature where including a source file may also implicitly cause a dynamic or static library to be linked. Because only the source code in the graph is compiled, we'll then only get the extra link parameter if the imports reach the source file with the parameter.

The disadvantage is that the imports need to have a clear way of finding the additional dependencies. This is typically done with a file hierarchy or strict naming scheme, so that importing foo.bar allows the compiler to easily find the file or files that define that particular module.

Folding the import

For module systems that allow sub modules, so that there's both foo.bar and foo.baz, the problem with verbosity appears: do we really want to type std.io.net.Socket everywhere? I think the general consensus is that this is annoying.

The two common ways to solve this are namespace folding and namespace renaming, but I'm going to present one more which I term namespace shortening.

The namespace folding is the easiest. You import std.io.net and now you can use Socket unqualified. This is how it works in Java. However, we should note that in Java any global or function is actually prefixed with the class name, which means that even when folding the namespace, your globals and "functions" (static methods) ends up having a prefix.

To overcome collisions and shortcomings of namespace folding, there's namespace renaming, where the import explicitly renames the module name in the file scope, so std.io.net might become n and you now use n.Socket rather than the fully folded or fully qualified name. The downside is naming this namespace alias. Naming things well is known to be one of the harder things in programming, and it can also add to the confusion if the alias is chosen to be different in different parts of the program, e.g. n.Socket in one file and netio.Socket in another.

A way to address the renaming problem is to recognize that usually only the last namespace element is sufficient to distinguish one function from another, so we can allow an abbreviated namespace, allowing the shortened namespace to be used in place of the full one. With this scheme std.io.net.open_socket(), io.net.open_socket() and net.open_socket() are all valid as long as there is no ambiguity (for example, if an import made foo.net.open_socket() available in the current scope, then net.open_socket() would be ambiguous and a longer path, like io.net.open_socket() would be required). C3 uses this scheme for all globals, functions and macros and it seems successful so far.

Lots of imports

In Java, imports quickly became fairly onerous to write, since using a class foo.bar.Baz would use another class from foo.bar.Bar and now both needed to be imported. While wildcard imports helped a bit, those would pull in more classes than necessary, and so inspecting the import statements would obfuscate the actual dependencies.

As a workaround, languages like D added the concept of re-exported imports (D calls this feature "public imports"). So in our foo.bar.Baz case, it could import foo.bar.Bar and re-export it. So that an import of foo.bar.Baz implicitly imports foo.bar.Bar as well. The downside here again is that it's not possible from looking at the imports to see what the actual dependencies are.

A related feature is implicit imports determined by the namespace hierarchy. So for example in Java, any source file in the package foo.bar.baz has all the classes of foo.bar implicitly folded into its namespace. This folding goes bottom up, but not the other way around. So while foo.bar.baz.AbcClass sees foo.bar.Baz, Baz can't access foo.bar.baz.AbcClass without an explicit import.

An experiment: no imports

For C3 I wanted to try going completely without imports. This was feasible mainly due to two observations: (1) type names tend to be fairly universally unique (2) methods and globals are usually unique with a shortened namespace. So given, Foo and foo::some_function() these should mostly be unique without the need for imports. So this is a completely implicit import scheme.

This is completmented by the compiler requiring the programmer to explicitly say which libraries should be used for compilation. So imports could be said to be done globally for the whole program in the build settings.

This certainly works, but has a drawback: let's say a program relies on a library like Raylib. Now Raylib in itself will create a lot of types and functions and while it's no problem to resolve them, it could make it confusing for a casual reader "Oh, a Vector2, is this part of the C3 standard library?", whereas having an import raylib; at the top would immediately hint to the reader where Vector2 might be found.

Wildcard imports for all?

The problem with zero imports suggests an alternative of wildcard imports as the default, so import raylib; would be the standard type of imports and would recursively import everything in raylib, and similarly import std; would get the whole standard library. This would be more for the reader of the code to find the dependencies than being necessary for the compiler.

One problem with this design are the sub modules visibility rules: "what does foo::bar::baz and foo::bar see?"

Java would allow foo::bar::baz to see the foo::bar parent module, but not vice versa. However, looking at the actual usage patterns, it seems to make sense to make this bidirectional, so that all are visible to each other.

But if parent and children modules are visible to each other, what about sibling modules? E.g. does foo::bar::baz see foo::bar::abc? In actual usecases there are arguments both for and against. But if we have sibling visibility what about foo::def and foo::bar::abc? Could they be visible to each other? And if not, would such rules get complicated?

To create a more practical scenario, imagine that we have the following:

  1. std::io::file::open_filename_for_read() a function to open a file for reading
  2. std::io::Path representing a general path.
  3. std::io::OpenMode a distinct type for a mask value for file or resource opening
  4. std::io::readoptions::READ_ONLY a constant of type OpenMode

Let's say this is the implementation of (1)

fn File* open_filename_for_read(char[] filename)
{
  Path* p = io::path_from_string(foo);
  defer io::path_free(p);
  return file::open_file(p, readoptions::READ_ONLY);
}

Here we see that std::io::file must be able to use std::io and std::io::readoptions. The readoptions sub module needs std::io but not the file sub module. Note how C3 uses a function in a sub module as other languages would typically use static methods. If we want to avoid excessive imports in this case, then file would need sibling and parent visibility, whereas the readoptions use only requires parent visibility.

Excessive rules around visibility is both hard to implement well, hard to test and hard to remember, so it might be preferrable to simply say that a module has visibility to any other module in the same top module. The downside would of course be that visibility is much wider than what's probably desired (e.g. std::math having visibility to std::io).

Conclusions and further research for C3

Like everything in language design, imports and modules have a lot of trade-offs. Import statements may be used to narrow down the dependency graph, but at the same time a language with a lot of imports don't necessarily use them in that manner. For namespace folding it matters a lot whether functions are usually grouped as static methods or free functions. Imports can be used to implicitly determine things like linking arguments, in which case the actual import graph matters.

For C3, the scheme with implicit imports works thanks to library imports also being restricted by build scripts, but high level imports could still improve readability. However such a scheme would probably need recursive imports which raises the question of implicit imports between sub modules. For C3 in particular this an important usability concern as sub modules are used to organize functions and constants more than is common in many other languages. This is the area I'm currently researching, but I hope that within a few weeks I can have a design candidate.

Community Showcase

This is a selection of recent work done by community members. Want to participate? Join us on Discord.