KDDM: A Generic Basis For Distributed Kernel Infrastructure
Renaud Lottiaux – Kerlabs Erich Focht - NEC
June 28wwwth - .Okerlabs.LS 20com07 Goal of this BoF
Introduce the KDDM concept Measure the interest in such a sub-system Identify who could be interested in using this Have an idea of how far we could be from an integration in the main line
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 2 Linux and Clustering
Linux community was quite skeptic regarding clustering However, several cluster projects are already included or close to be included in main line Transparent Inter Process Communication (TIPC) Distributed Lock Manager (DLM) GFS Oracle Cluster FS 2 (OCFS2) Distributed IPC (DIPC) ... This is just the beginning ! Good time to setup the basis of a kernel level distributed infrastructure KDDM
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 3 Definition of KDDM
Distributed cache of objects Generic mechanism to share objects between nodes Ensure an access to data which is Transparent Efficient Coherent ! Objects are accessed through a set of functions Don't care about data localization Just use data ! KDDM can host any kind of data Memory pages Data structure
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 4 Object identifier
Objects are identified using 3 values Object id Set id Name space id A set is of collection of objects You can freely define your sets Pages from the same system V IPC segment ... A name space is a collection of sets You can freely define your name spaces Regular linux name spaces ...
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 5 Object coherence
R/W-semaphore like access to object Single writer / multiple reader Object are transparently moved / duplicated between nodes Duplication means coherence problem Coherence managed using “invalidation on write” mechanism Lighter coherence mechanisms will be implemented Update on time-out Hardware helped data sharing (specific network needed) ...
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 6 Basic KDDM interface
void * _kddm_get_object ( struct kddm_set *kddm_set, objid_t objid )
void * _kddm_grab_object ( struct kddm_set *kddm_set, objid_t objid )
void * _kddm_put_object ( struct kddm_set *kddm_set, objid_t objid )
void * _kddm_remove_object ( struct kddm_set *kddm_set, objid_t objid )
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 7 IO Linkers
KDDM set instantiated by IO linkers Determine the nature of hosted objects Define object input/output functions One kind of IO linker per kind of object to share Memory pages File cache pages Inodes ... Define links between object and physical nodes File pages are linked to the node hosting the file data Process memory pages are not linked to a given node
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 8 IO Linker functions
IO Linkers are mainly a set of function pointer Define the behavior of the KDDM set Object allocation / Free First touch Object invalidation Object export / Import Object synchronization Etc... Default functions for kmalloc based objects.
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 9 The IO Linker Structure
struct iolinker_struct { int (*instantiate) (struct kddm_set *set, void *private_data, int master); void (*uninstantiate) (struct kddm_set *set, int destroy); int (*first_touch) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*remove_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*invalidate_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*flush_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*insert_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*put_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*sync_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); void (*change_access) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid, dsm_state_t state); void *(*alloc_object) (struct kddm_obj *entry, struct kddm_set *set, objid_t objid); int (*import_object) (struct kddm_obj *entry, struct rpc_desc *desc); int (*export_object) (struct rpc_desc *desc, struct kddm_obj *obj_entry); void (*freeze_object) (struct kddm_obj *obj_entry); void (*warm_object) (struct kddm_obj *obj_entry); int (*is_frozen) (struct kddm_obj *obj_entry); char linker_name[16]; iolinker_id_t linker_id; };
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 10 KDDM Architecture
Distributed service Distributed service
KDDM Core
I/O Linker I/O Linker
Local resource Local resource manager manager
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 11 Outline
General overview Hello world with KDDM ! Quick KDDM architecture overview System V Shared Memory example
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 12 KDDM “Hello World !” (1/3)
struct iolinker_struct hw_linker = { linker_name: “hw”, linker_id: 1 }; struct kddm_set *hw_set; void hello_world_setup (void) { register_io_linker (1, &hw_linker);
hw_set = create_new_kddm_set (kddm_def_ns, /* Default name space */ 1, /* IO linker id */ KDDM_SET_NOT_LINKED, 64, /* Size of objects to share*/ NULL, 0, 0); }
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 13 KDDM “Hello World !” (2/3)
void hello_world_node0 (void) { char *buf_en, *buf_en;
buf_en = kddm_grab_object (hw_set, 0); strcpy (buf_en, “Hello “); kddm_put_object (hw_set, 0);
buf_fr = kddm_grab_object (hw_set, 1); strcpy (buf_fr, “Bonjour “); kddm_put_object (hw_set, 1); } void hello_world_node1 (void) { char *buf_en, *buf_en;
buf_en = kddm_grab_object (hw_set, 0); strcpy (&buf_en[6], ”world !“); kddm_put_object (hw_set, 0);
buf_fr = kddm_grab_object (hw_set, 1); strcpy (&buf_fr[8], “monde !“); kddm_put_object (hw_set, 1); }
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 14 KDDM “Hello World !” (3/3)
Node 0 hello_world_setup (); hello_world_node0 ();
Node 1 hello_world_setup (); hello_world_node1 ();
char *buf; buf = kddm_get_object (hw_set, 0); printk (“%s\n”, buf); kddm_put_object (hw_set, 0); buf = kddm_get_object (hw_set, 1); printk (“%s\n”, buf); kddm_put_object (hw_set, 1); Hello world ! kddm_remove_object (hw_set, 0); Bonjour monde ! kddm_remove_object (hw_set, 1);
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 15 How does that work ?
kddm_grab_object (hw_st, 0)
KDDM set
I/O Linker I/O Linker
Memory Memory Hello
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 16 How does that work ?
kddm_grab_object (hw_set, 0)
KDDM set
I/O Linker I/O Linker
Memory Memory Hello Hello
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 17 How does that work ?
kddm_grab_object (hw_set, 0)
KDDM set
I/O Linker I/O Linker
Memory Memory Hello world !
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 18 How does that work ?
kddm_get_object (hw_set, 0)
KDDM set
I/O Linker I/O Linker
Memory Memory Hello world ! Hello world !
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 19 How does that work ?
kddm_get_object (hw_set, 0)
KDDM set
I/O Linker I/O Linker
Memory Memory Hello world ! Hello world !
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 20 Outline
General overview Hello world with KDDM ! Quick KDDM architecture overview System V Shared Memory example
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 21 KDDM Design
Interface KDDM Object NS Set Object Protocol IO Linker Core server manager manager manager
KDDM Communication interface HotPlug Comm Layer
TIPC
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 22 Outline
General overview Hello world with KDDM ! Quick KDDM architecture overview System V Shared Memory example
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 23 Building Distributed SHM with KDDM
Based on KDDM, building a distributed SHM mechanism is quite simple We need to share Segment content One SHM memory data IO linker A set of KDDM set instantiated with this linker One per memory segment SHM ids One SHM ids IO linker A unique KDDM hosting existing ids cluster wide
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 24 Distributed SHM implementation
On a new segment creation Hook in kernel newseg function Create a new KDDM set for segment data Create a new entry for the segment in the ids KDDM set Make the link between SHM id and KDDM data set id On segment removal Hook in kernel do_shm_rmid function Destroy the data KDDM set Remove the entry in ids KDDM set. On segment mapping Hook in kernel shm_mmap function Set the vm_ops mm field to our set of functions.
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 25 Distributed SHM implementation
On a segment lookup Hook in the kernel shm_lock function Check if the requested id exist in the ids KDDM set. kddm_get_object VM operations no_page kddm_get_object / kddm_grab_object on data KDDM set wp_page (present in 2.2 series) kddm_grab_object on data KDDM set
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 26 Container use in Kerrighed SSI OS
Used as a basic bloc to implement Process memory migration Memory sharing cluster wide File cache sharing cluster wide Inodes sharing cluster wide Cluster wide locks Signal sharing Etc... Could be used by some other projects DIPC OpenSSI Etc...
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 27 Conclusion
KDDM is a high level abstraction to share data between nodes at kernel level KDDM can be used to (more) easily implement distributed services Could be a very good basis for a distributed kernel infrastructure
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 28 Backups
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 29 Object Localization
Localization of objects on local node KDDM object table Localization across the cluster Localization of object copies For an object : one node is said to be the Owner The owner hosts a copy set : the list of nodes owning a copy The owner is the last node who made a grab The owner changes during the time depending on object access Localization of the owner Probable owner chain Each node hosts a pointer to a probable node owner Default owner is set to a default value depending on the KDDM type Chain updated during coherence management
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 30 Data Coherence Management
Object state associated to each object on each node Used states INV_COPY No local copy, Not owner INV_OWNER No local copy, Owner READ_COPY Local copy, Read-only, Not owner READ_OWNER Local copy, Read-only, Owner WRITE_OWNER Local copy, Write, Owner WRITE_GHOST Local copy, Write, Owner, Not used locally WAIT_OBJ_READ No local copy, Wait for a read copy WAIT_OBJ_WRITE No or read-only local copy, Wait for write copy WAIT_ACK_INV Wait for invalidation ACK to send a Write copy WAIT_ACK_WRITE Wait for invalidation ACK to get Write access WAIT_CHG_OWN_ACK
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 31 Alloc / First Touch / Remove
void *shm_alloc_object ( struct kddm_obj *obj_entry, struct kddm_set *set, objid_t objid) { return alloc_page(GFP_HIGHUSER); }
int shm_first_touch ( struct kddm_obj *obj_entry, struct kddm_set *set, objid_t objid) { objEntry->object = alloc_page(GFP_HIGHUSER); if (objEntry->object == NULL) return -ENOMEM;
return 0; }
int shm_remove_object ( void *object, struct kddm_set *set, objid_t objid) { page_cache_release ((struct page *) objEntry->object); return 0; }
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 32 Import / Export
int shm_import_object ( struct kddm_obj *obj_entry, struct rpc_desc *desc) { struct page *page = (struct page *) obj_entry->object; char *data;
data = kmap (page); rpc_unpack (desc, 0, data, PAGE_SIZE); kunmap (page);
return 0; } int shm_export_object ( struct kddm_obj *obj_entry, struct rpc_desc *desc) { struct page *page = (struct page *) obj_entry->object; char *data;
data = kmap (page); rpc_pack (desc, 0, data, PAGE_SIZE); kunmap (page);
return 0; }
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 33 Invalidate
int shm_invalidate_object ( ctnrObj_t * objEntry, container_t * ctnr, objid_t objid) { struct page *page = (struct page *) objEntry->object;
TestSetPageLocked (page);
SetPageToInvalidate (page); try_to_unmap (page); ClearPageToInvalidate (page);
remove_from_page_cache (page); page_cache_release (page);
unlock_page (page);
if (TestClearPageLRU (page)) del_page_from_lru (page_zone(page), page);
page_cache_release (page);
return 0; }
KDDM 0BoF2/11/0 8- June 28thw ww.- OkerLSlab s20.com07 34