Size: 4752
Comment:
|
Size: 8350
Comment:
|
Deletions are marked like this. | Additions are marked like this. |
Line 24: | Line 24: |
vstore_read(objectID, buffer, len /* out */) | vstore_read(objectID, buffer, len /* out */, unsigned long alloc_flags) |
Line 85: | Line 85: |
'''Chunk management''' * A Chunk never crosses page boundary. * Adjacent free chunks are always merged. * Chunks belonging to a data item are singly linked (using metadata in beginning of chunk). * Free lists are per page. Free chunks are (singly) links in order of increasing physical address (this eases merging of free chunks and is more hardware caches friendly). * Pages are then placed in free lists according to total free space they have. Pages with maximum free space are selected first for allocation. This gives a dense storage (i.e. we avoid chunks belonging to same data item being spread across many pages). {{{ Freelists +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ Order| 4 | 5 | 6 | --- --- --- --- | 10 | 11 | 12 | +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ ^ ^ ^ | page->lru | | V V V +----------+ offset -->+-----------+ pages with [2^11, 2^12) | 1 | | | F | bytes free | | M(A) | +--|--------+ page->offset ---> +--|-------+ ----->| | 2 | | | | F | + | +--|-----|--+ +--|----|--+ | | | 3 | | | | 2 | |------| +--|-----|--+ +--|----|--+ PFN + M(A) | | | | | | F V | | V F | | +--|-------+ +--------|--+ | V 1 | | 2 V | +----------+ +-----------+ ^ | page->lru V more pages }}} ---- === Implementation === |
|
Line 124: | Line 161: |
'''Chunk management''' * A Chunk never crosses page boundary. * Adjacent free chunks are always merged. * Free lists are per page. * Pages are then placed in free lists according to total free space they have. Pages with maximum free space are selected first for allocation. This gives a dense storage (i.e. we avoid chunks belonging to same data item being spread across many pages). |
'''VStore operations''' * vstore_read() {{{ /* * Args (IN): * obj_id: handle of the required object (as returned by vstore_write) * data: buffer space to read object data into * Args (OUT) * len: length of data read into buffer * * Return: * 0 on success, <0 on error (should NEVER occur) * Errors: * -ENOMEM * * NOTE: Make sure that buffer 'data' has enough space for object 'obj_id'. */ int vstore_read(unsigned long obj_id, void *data, unsigned int *len); }}} * vstore_write() {{{ /* * Args (IN): * data: pointer to data to be copied to vstore * len: data length (in bytes) * alloc_flags: allocation flags used when we need to allocate more chunks to store this data (e.g. GFP_KERNEL) * Args (OUT): * obj_id: handle to object stored * * Return: * 0 on success, <0 on error * Errors: * -ENOMEM */ int vstore_write(unsigned long *obj_id, void *data, unsigned int *len, unsigned long alloc_flags); }}} '''vstore_read'''[[BR]] Time: O(num_chunks) Sub ops: 1. Collect data from all chunks into given buffer - O(num_chunks) 2. Free these chunks and merge them with adjacent free chunks - O(num_chunks) '''vstore_write'''[[BR]] Time: O(num_chunks) (if enough free chunks are already in freelists) Sub ops: 1. Allocate required no. of chunks from freelists - O(num_chunks) 2. If sufficient free chunks are not available, allocate additional chunks (allocate 0-order page and add as single chunk in highest order freelist) - might sleep depending on alloc_flags. 3. Copy data to these chunks - O(num_chunks) |
[:NitinGupta:Nitin Gupta]
MailTo(nitingupta910 AT gmail DOT com)
Manages storage for variable sized data objects
It is designed especially for embedded devices.
BR This page describes the problem it tries to solve and its design details. BR BR Problem StatementBR Normally when you allocate arbitrary sized objects using kmalloc()/vmalloc() there is big space wastage due to internal fragmentation. So, if memory is at premium, tight storage is required for these variables sized data items which is what VStore does. This however comes at cost of some speed. BRBR How to use it?BR To store data:
vstore_write(objectID /* out */, data_to_store, len)
To restore this data:
vstore_read(objectID, buffer, len /* out */, unsigned long alloc_flags)
NOTE: In vstore_read(), make sure that buffer is big enough to store data assoc. with object 'ObjectID' - the 'len' (out) param will tell how much data was read into the buffer. Thus, to make sure that buffer size is always sufficient, either you know maximum size for objects stored (more common case) or you will have to track size of each object stored to provide buffer of right size. BRBR
Design Details
Design Goals
- Minimize fragmentation
- Minimize metadata overhead
- Fast store and restore
- Provide dense storage - minimize spreading of chunks associated with any particular data object into multiple pages. This improves locality and hence performance.
- Portability across different archs
BR Each data item stored is split into multiple chunks (small contiguous physical space) and these chunks can then be spread across multiple physical pages. The Metadata associated with each chunk is stored in the beginning of chunk itself with necessary padding added to maintain alignment constraints.
Chunks can be one of three types:
- Free chunk
- Busy chunk with next chunk in same page
- Busy chunk with next chunk in different page
Following gives metadata layout for each of these types of chunks:
Some jargon first:
M = Metadata - size has to be n * ALIGN_BYTES M(A) = address of next chunk relative to page start (PAGE_SHIFT - ALIGN_SHIFT bits) M(S) = size of chunk (same as M(A)) F = flags (2 bits) F(1): last chunk F(2): next chunk in diff page FP = prev chunk's flags FN = next chunk's flags PFN = PageFrameNo where next chunk exists (BITS_PER_LONG - PAGE_SHIFT bits) OD(i) = i-th optional metadata field. Optional fields are enclosed on []
- Free chunk
M(A) | F | M(S) | free space
- Busy chunk type-1 (next chunk in same page)
M(A) | F | M(S) | [ M(S) of prev chunk ] - OD(1) exists if F(2) is not set
- Busy chunk type-2 (next chunk in different page)
M(A) | F | PFN | [ M(S) of prev chunk ] | [ M(S) of this chunk ] - only OD(1) exists if FP(2) && !F(1) - only OD(2) exists if F(1) && !FP(2) - both OD(1) and OD(2) exist if FP(2) && F(1)
NOTE:
- Space is not reserved for optional fields that do not exist. Flags let us know which fields really exist.
- Each type of chunk has required padding at end of metadata to satisfy alignment constraint. Metadata size at beginning of each chunk is always some multiple of ALIGN_BYTES. So, access to actual data is always aligned.
Chunk management
- A Chunk never crosses page boundary.
- Adjacent free chunks are always merged.
- Chunks belonging to a data item are singly linked (using metadata in beginning of chunk).
- Free lists are per page. Free chunks are (singly) links in order of increasing physical address (this eases merging of free chunks and is more hardware caches friendly).
- Pages are then placed in free lists according to total free space they have. Pages with maximum free space are selected first for allocation. This gives a dense storage (i.e. we avoid chunks belonging to same data item being spread across many pages).
Freelists +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ Order| 4 | 5 | 6 | --- --- --- --- | 10 | 11 | 12 | +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+ ^ ^ ^ | page->lru | | V V V +----------+ offset -->+-----------+ pages with [2^11, 2^12) | 1 | | | F | bytes free | | M(A) | +--|--------+ page->offset ---> +--|-------+ ----->| | 2 | | | | F | + | +--|-----|--+ +--|----|--+ | | | 3 | | | | 2 | |------| +--|-----|--+ +--|----|--+ PFN + M(A) | | | | | | F V | | V F | | +--|-------+ +--------|--+ | V 1 | | 2 V | +----------+ +-----------+ ^ | page->lru V more pages
Implementation
Now, we can represent these three chunk types as:
#define ALIGN_SHIFT 2 /* power of 2 (min 2) #define MA_SIZE PAGE_SHIFT - ALIGN_SHIFT #define MS_SIZE MA_SIZE
- Representing Free chunk
struct free_chunk { unsigned long MA: MA_SIZE; unsigned long flags: 2; unsigned long MS: MS_SIZE; } __attribute__ ((packed));
- Representing Busy chunk type-1 (next chunk in same page)
struct busy_type1_chunk { unsigned long MA: MA_SIZE; unsigned long flags: 2; unsigned long MS1: MS_SIZE; unsigned long MS2: MS_SIZE; } __attribute__ ((packed));
- Representing Busy chunk type-2 (next chunk in different page)
struct busy_type2_chunk { unsigned long MA: MA_SIZE; unsigned long flags: 2; unsigned long PFN: BITS_PER_LONG - PAGE_SHIFT; unsigned long MS1: MS_SIZE; unsigned long MS2: MS_SIZE; } __attribute__ ((packed));
VStore operations
- vstore_read()
/* * Args (IN): * obj_id: handle of the required object (as returned by vstore_write) * data: buffer space to read object data into * Args (OUT) * len: length of data read into buffer * * Return: * 0 on success, <0 on error (should NEVER occur) * Errors: * -ENOMEM * * NOTE: Make sure that buffer 'data' has enough space for object 'obj_id'. */ int vstore_read(unsigned long obj_id, void *data, unsigned int *len);
- vstore_write()
/* * Args (IN): * data: pointer to data to be copied to vstore * len: data length (in bytes) * alloc_flags: allocation flags used when we need to allocate more chunks to store this data (e.g. GFP_KERNEL) * Args (OUT): * obj_id: handle to object stored * * Return: * 0 on success, <0 on error * Errors: * -ENOMEM */ int vstore_write(unsigned long *obj_id, void *data, unsigned int *len, unsigned long alloc_flags);
vstore_readBR Time: O(num_chunks)
Sub ops:
- Collect data from all chunks into given buffer - O(num_chunks)
- Free these chunks and merge them with adjacent free chunks - O(num_chunks)
vstore_writeBR Time: O(num_chunks) (if enough free chunks are already in freelists)
Sub ops:
- Allocate required no. of chunks from freelists - O(num_chunks)
- If sufficient free chunks are not available, allocate additional chunks (allocate 0-order page and add as single chunk in highest order freelist) - might sleep depending on alloc_flags.
- Copy data to these chunks - O(num_chunks)