This post is about an important data structure, embedded linked list, which is used massively in the pintos project amd offers a different idea how to create dynamic list in C. But its design usually made it easy to confuse the people at first, so it’s very interesting to look into its implementation.

Pintos is a simple Linux OS for the course CS140. It’s designed to help understand some important functions of OS, including threads, virtual memory and file system. Though it’s very simple compared to the real linux kernel, it still can run an real x86 simulator.

Some Background

Since Standard C doesn’t provide dynamic list library, it’s up to developers to decide how to implement the list when they need to use it. A very simple list, linked list, would be like:

struct node 
  {
    struct foo *data_ptr; // save pointer to the data
    struct node *next;    // save pointer to the next node in the list
  };

However, the biggest drawback of this implementation is that, we cannot reuse this list implementation and its library functions for other struct. If we want to create a reusable list library, we may have to introduce generics. For example, in C++, we can use #include <list> to import a list from STL library for any data types. The list node in the list is defined as below, which a double linked list using generics to allow it any kind of data.

// stl_list.h
struct _List_node_base {
    _List_node_base* _M_next;
    _List_node_base* _M_prev;
    // ...
};

/// An actual node in the list.
template<typename _Tp>
struct _List_node : public __detail::_List_node_base
{
    // ...
    _Tp _M_data;
    // ...
};

But the problem is there is no generics in standard C. So instead of putting data into the list node, C developers come up with the idea of putting list node into the data struct.

Embedded Linked List

Below is the core definitions of double linked list in the Pintos kernel. Some library functions are omitted.

// src/lib/kernel/list.h
struct list_elem 
  {
    struct list_elem *prev;     /* Previous list element. */
    struct list_elem *next;     /* Next list element. */
  };

/* List. */
struct list 
  {
    struct list_elem head;      /* List head. */
    struct list_elem tail;      /* List tail. */
  };

/* Converts pointer to list element LIST_ELEM into a pointer to
   the structure that LIST_ELEM is embedded inside. */ 
#define list_entry(LIST_ELEM, STRUCT, MEMBER)           \
        ((STRUCT *) ((uint8_t *) &(LIST_ELEM)->next     \
                     - offsetof (STRUCT, MEMBER.next)))

void list_init (struct list *);

/* List traversal. */
struct list_elem *list_begin (struct list *);
struct list_elem *list_next (struct list_elem *);
struct list_elem *list_end (struct list *);

Any data structures would potentially inserted into the list will need include a list_elem in its struct definition. For example, in this struct foo, we may want to put it into a list, then we need to add one struct list_elem into the struct foo.

struct foo
  {
    int data;
    struct list_elem elem;
  };

After we create a new foo struct using malloc(), we then can append it into the list without needing create a new node using malloc() again.

static struct list g_list;

int
main ()
{
  list_init (&g_list);
  struct foo *e;

  // Create 10 nodes
  for (size_t i = 0; i < 10; i++)
    {
      e = malloc (sizeof (struct foo));
      e->data = i;
      list_push_back (&g_list, &e->elem);
    }

  struct list_elem *k;

  for (k = list_begin (&g_list); k != list_end (&g_list); k = list_next (k))
    {
      struct foo *f = list_entry (k, struct foo, elem);
      offsetof (struct foo, elem);
      printf ("%d\n", f->data);
    }

  return 0;
}

So from the code above, we can see that, this design makes it possible that, with only extra two pointers in each data struct, we can create generic list structure like c++.

The list can be also illustrated as

db_list

where each struct list_elem elem inside has the addresses of previous and next struct list_elem elem. Thus, how it implements the traverse can basically be the same as any ordinary linked list, reading list_elem->next for the memory address of the next node.

So far, we just traverse using struct list_elem, but how can we obtain the real data’s pointer (e.g., struct foo)? The marco list_entry() helps us to converts from the pointer to list_elem to the pointer to the actual data struct.

// src/lib/kernel/list.h
/* Converts pointer to list element LIST_ELEM into a pointer to
   the structure that LIST_ELEM is embedded inside. */ 
#define list_entry(LIST_ELEM, STRUCT, MEMBER)           \
        ((STRUCT *) ((uint8_t *) &(LIST_ELEM)->next     \
                     - offsetof (STRUCT, MEMBER.next)))

// src/lib/stddef.h
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *) 0)->MEMBER)

The marco does the work in a quiet hack way. Since we have the pointer to the struct list_elem and the layout of the data struct is fixed after compiling. Then, we could calculate the memory address offset between STRUCT and the MEMBER.next. Finally, move the pointer by decreasing the length of offset and we have the pointer to the address of data struct.

ptrs

In terms of how we calculate the offset, as defined in another marco offsetof(), we can create a dummy data struct starting at memory address of 0 and try to obtain the pointer to the member of the data struct, then the address of that member would be offset we want. And since we don’t dereference the pointer, most compilers won’t complain even though 0 is an invalid address.

The reason why list_entry() uses list_elem.next instead of list_elem itself, I believe it’s that if we accidentally pass a member which is not type of list_elem then we may get a wrong offset. Using list_elem.next would allow the compiler to prevent the problem. But certainly, if that member struct also has a member named next, the complier won’t help.

In the Real Linux Kernel

I also went into the code of the real linux kernel, and found a similar implementation in the linux/list.h and linux/types.h

// include/linux/types.h
struct list_head {
	struct list_head *next, *prev;
};

// include/linux/list.h
/**
 * list_entry - get the struct for this entry
 * @ptr:	the &struct list_head pointer.
 * @type:	the type of the struct this is embedded in.
 * @member:	the name of the list_head within the struct.
 */
#define list_entry(ptr, type, member) \
	container_of(ptr, type, member)
// include/linux/kernel.h
/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:	the pointer to the member.
 * @type:	the type of the container struct this is embedded in.
 * @member:	the name of the member within the struct.
 *
 */
#define container_of(ptr, type, member)                                       \
  ({                                                                          \
    void *__mptr = (void *)(ptr);                                             \
    BUILD_BUG_ON_MSG (!__same_type (*(ptr), ((type *)0)->member)              \
                          && !__same_type (*(ptr), void),                     \
                      "pointer type mismatch in container_of()");             \
    ((type *)(__mptr - offsetof (type, member)));                             \
  })

From the source code of linux kernel, we can see that container_of() is very close to list_entry() in pintos but has some extra data type checking included.

References

  1. Pintos Project
  2. Linux Kernel Mirror