Manages storage for variable sized data objects

It is designed especially for embedded devices.

BR This page describes the problem it tries to solve and its design details. BR BR Problem StatementBR Normally when you allocate arbitrary sized objects using kmalloc()/vmalloc() there is big space wastage due to internal fragmentation. So, if memory is at premium, tight storage is required for these variables sized data items which is what VStore does. This however comes at cost of some speed. BR BR How to use it?BR To store data:

vstore_write(objectID /* out */, data_to_store, len)

To restore this data:

vstore_read(objectID, buffer, len /* out */, unsigned long alloc_flags)

NOTE: In vstore_read(), make sure that buffer is big enough to store data assoc. with object 'ObjectID' - the 'len' (out) param will tell how much data was read into the buffer. Thus, to make sure that buffer size is always sufficient, either you know maximum size for objects stored (more common case) or you will have to track size of each object stored to provide buffer of right size. BR BR

Design Details

Design Goals

Minimize fragmentation
Minimize metadata overhead
Fast store and restore
Provide dense storage - minimize spreading of chunks associated with any particular data object into multiple pages. This improves locality and hence performance.
Portability across different archs

BR Each data item stored is split into multiple chunks (small contiguous physical space) and these chunks can then be spread across multiple physical pages. The Metadata associated with each chunk is stored in the beginning of chunk itself with necessary padding added to maintain alignment constraints.

Chunks can be one of three types:

Free chunk
Busy chunk with next chunk in same page
Busy chunk with next chunk in different page

Following gives metadata layout for each of these types of chunks:

Some jargon first:

M     = Metadata - size has to be n * ALIGN_BYTES
M(A)  = address of next chunk relative to page start (PAGE_SHIFT - ALIGN_SHIFT bits)
M(S)  = size of chunk (same as M(A))
F     = flags (2 bits)
        F(1): last chunk
        F(2): next chunk in diff page
FP    = prev chunk's flags
FN    = next chunk's flags
PFN   = PageFrameNo where next chunk exists (BITS_PER_LONG - PAGE_SHIFT bits)
OD(i) = i-th optional metadata field. Optional fields are enclosed on []

- Free chunk

M(A) | F | M(S) | free space

- Busy chunk type-1 (next chunk in same page)

M(A) | F | M(S) | [ M(S) of prev chunk ]
        - OD(1) exists if F(2) is not set

- Busy chunk type-2 (next chunk in different page)

M(A) | F | PFN | [ M(S) of prev chunk ] | [ M(S) of this chunk ]
        - only OD(1) exists if          FP(2) && !F(1)
        - only OD(2) exists if          F(1) && !FP(2)
        - both OD(1) and OD(2) exist if FP(2) && F(1)

NOTE:

Space is not reserved for optional fields that do not exist. Flags let us know which fields really exist.
Each type of chunk has required padding at end of metadata to satisfy alignment constraint. Metadata size at beginning of each chunk is always some multiple of ALIGN_BYTES. So, access to actual data is always aligned.

Chunk management

A Chunk never crosses page boundary.
Adjacent free chunks are always merged.
Chunks belonging to a data item are singly linked (using metadata in beginning of chunk).
Free lists are per page. Free chunks are (singly) links in order of increasing physical address (this eases merging of free chunks and is more hardware caches friendly).
Pages are then placed in free lists according to total free space they have. Pages with maximum free space are selected first for allocation. This gives a dense storage (i.e. we avoid chunks belonging to same data item being spread across many pages).

                          Freelists
                      +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
                 Order|    4      |     5     |     6     |   --- --- --- ---     |     10    |    11     |    12     |
                      +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
                            ^                       ^                                                ^
                            | page->lru             |                                                |
                            V                       V                                                V
                      +----------+  offset -->+-----------+                                 pages with [2^11, 2^12)
                      |    1     |            |  | F      |                                       bytes free
                      |  | M(A)  |            +--|--------+
    page->offset ---> +--|-------+      ----->|  | 2   |  |
                      |  | F  |  +      |     +--|-----|--+
                      +--|----|--+      |     |  | 3   |  |
                      |  | 2  |  |------|     +--|-----|--+
                      +--|----|--+ PFN + M(A) |  |     |  |
                      |  | F  V  |            |  V F   |  |
                      +--|-------+            +--------|--+
                      |  V 1     |            |    2   V  |
                      +----------+            +-----------+
                           ^
                           | page->lru
                           V
                        more pages

Implementation

Now, we can represent these three chunk types as:

#define ALIGN_SHIFT    2    /* power of 2 (min 2)
#define MA_SIZE        PAGE_SHIFT - ALIGN_SHIFT
#define MS_SIZE        MA_SIZE

- Representing Free chunk

struct free_chunk {
        unsigned long MA: MA_SIZE;
        unsigned long flags: 2;
        unsigned long MS: MS_SIZE;
} __attribute__ ((packed));

- Representing Busy chunk type-1 (next chunk in same page)

struct busy_type1_chunk {
        unsigned long MA: MA_SIZE;
        unsigned long flags: 2;
        unsigned long MS1: MS_SIZE;
        unsigned long MS2: MS_SIZE;
} __attribute__ ((packed));

- Representing Busy chunk type-2 (next chunk in different page)

struct busy_type2_chunk {
        unsigned long MA: MA_SIZE;
        unsigned long flags: 2;
        unsigned long PFN: BITS_PER_LONG - PAGE_SHIFT;
        unsigned long MS1: MS_SIZE;
        unsigned long MS2: MS_SIZE;
} __attribute__ ((packed));

VStore operations

vstore_read()

/*
 * Args (IN):
 *      obj_id: handle of the required object (as returned by vstore_write)
 *      data: buffer space to read object data into
 * Args (OUT)
 *      len: length of data read into buffer
 *
 * Return:
 *      0 on success, <0 on error (should NEVER occur)
 *      Errors:
 *              -ENOMEM
 *
 * NOTE: Make sure that buffer 'data' has enough space for object 'obj_id'.
 */
int vstore_read(unsigned long obj_id, void *data, unsigned int *len);

vstore_write()

/*
 * Args (IN):
 *      data: pointer to data to be copied to vstore
 *      len: data length (in bytes)
 *      alloc_flags: allocation flags used when we need to allocate more chunks to store this data (e.g. GFP_KERNEL)
 * Args (OUT):
 *      obj_id: handle to object stored
 *
 * Return:
 *      0 on success, <0 on error
 *      Errors:
 *              -ENOMEM
 */
int vstore_write(unsigned long *obj_id, void *data, unsigned int *len, unsigned long alloc_flags);

vstore_readBR Time: O(num_chunks)

Sub ops:

Collect data from all chunks into given buffer - O(num_chunks)
Free these chunks and merge them with adjacent free chunks - O(num_chunks)

vstore_writeBR Time: O(num_chunks) (if enough free chunks are already in freelists)

Sub ops:

Allocate required no. of chunks from freelists - O(num_chunks)
If sufficient free chunks are not available, allocate additional chunks (allocate 0-order page and add as single chunk in highest order freelist) - might sleep depending on alloc_flags.
Copy data to these chunks - O(num_chunks)