Extended attributes store metadata beyond the standard inode fields. Security contexts, ACLs, user-defined data - all live in xattrs. When you set a security label with setfattr or apply capability flags, those go into xattrs. For NONOS, ext4 support handles mounted external storage like USB drives or network shares.
The implementation handles the full xattr lifecycle: parsing attribute names, validating XATTR_CREATE and XATTR_REPLACE flags, looking up inodes, reading existing xattr blocks, preparing new entries, and serializing everything back to the block device.
The ext4 xattr format is a bit involved. Attributes live in a dedicated block pointed to by i_file_acl in the inode. The block has a header, followed by entries growing forward from the start, and values growing backward from the end. Entries and values meet in the middle. When they collide, you're out of space.
Xattr Block Structure
src/fs/ext4/xattr/types.rs
pub const EXT4_XATTR_MAGIC: u32 = 0xEA020000;
#[repr(C)]
pub struct Ext4XattrHeader {
pub h_magic: u32,
pub h_refcount: u32,
pub h_blocks: u32,
pub h_hash: u32,
pub h_checksum: u32,
pub h_reserved: [u32; 3],
}
#[repr(C)]
pub struct Ext4XattrEntry {
pub e_name_len: u8,
pub e_name_index: u8,
pub e_value_offs: u16,
pub e_value_inum: u32,
pub e_value_size: u32,
pub e_hash: u32,
// name follows immediately
}
The header identifies the block as xattr storage (magic 0xEA020000) and tracks refcount for shared blocks. Each entry stores the attribute name length, a namespace index, the offset to the value (from the end of the block), and the value size. The actual name bytes follow immediately after the entry structure.
The setxattr Implementation
src/fs/ext4/xattr/set.rs
/* DEV NOTES eK@nonos.systems
Set an extended attribute on an inode. Handles XATTR_CREATE (fail if exists)
and XATTR_REPLACE (fail if doesn't exist) flags. Allocates xattr block if
needed. Serializes all attributes back to block device after modification.
*/
pub fn ext4_setxattr(
dev: &str,
sb: &Ext4Superblock,
ino: u32,
name: &str,
value: &[u8],
flags: i32
) -> Result<(), i32> {
let inode = read_inode(dev, sb, ino)?;
let xattr_block = inode.i_file_acl();
let block_size = sb.block_size() as usize;
// Read existing attributes or start fresh
let (existing, mut buf) = if xattr_block != 0 {
let mut b = alloc::vec![0u8; block_size];
crate::drivers::block::read(dev, &mut b, xattr_block as u64 * block_size as u64)?;
(parse_xattr_block(&b), b)
} else {
(BTreeMap::new(), alloc::vec![0u8; block_size])
};
// Enforce XATTR_CREATE and XATTR_REPLACE semantics
let exists = existing.contains_key(name);
if (flags & XATTR_CREATE) != 0 && exists {
return Err(-17); // EEXIST
}
if (flags & XATTR_REPLACE) != 0 && !exists {
return Err(-61); // ENODATA
}
// Build updated attribute set
let mut updated = existing.clone();
updated.insert(name.to_string(), value.to_vec());
// Serialize back to block format
serialize_xattr_block(&updated, &mut buf)?;
// Allocate block if this is the first xattr
let target_block = if xattr_block != 0 {
xattr_block
} else {
allocate_block(dev, sb)?
};
// Write the block to storage
crate::drivers::block::write(dev, &buf, target_block as u64 * block_size as u64)?;
// Update inode if we allocated a new block
if xattr_block == 0 {
update_inode_xattr_block(dev, sb, ino, target_block)?;
}
Ok(())
}
The implementation reads existing attributes into a BTreeMap (or creates an empty one if this is the first xattr), validates the create/replace semantics, updates the map, serializes back to ext4 format, and writes the block. If no xattr block existed, we allocate one and update the inode to point to it.
The XATTR_CREATE and XATTR_REPLACE flags are important for atomic operations. XATTR_CREATE ensures you don't accidentally overwrite an existing attribute. XATTR_REPLACE ensures the attribute exists before you modify it. Without these, there's no way to safely coordinate xattr access between processes.
Serialization
src/fs/ext4/xattr/set.rs (serialize_xattr_block)
fn serialize_xattr_block(
attrs: &BTreeMap>,
buf: &mut [u8]
) -> Result<(), i32> {
let block_size = buf.len();
// Write header
let header = Ext4XattrHeader {
h_magic: EXT4_XATTR_MAGIC,
h_refcount: 1,
h_blocks: 1,
h_hash: 0,
h_checksum: 0,
h_reserved: [0; 3],
};
unsafe {
core::ptr::copy_nonoverlapping(
&header as *const _ as *const u8,
buf.as_mut_ptr(),
core::mem::size_of::()
);
}
let mut entry_offset = core::mem::size_of::();
let mut value_end = block_size;
for (name, value) in attrs.iter() {
// Calculate entry size (header + name, 4-byte aligned)
let entry_size = core::mem::size_of::() + name.len();
let entry_size_aligned = (entry_size + 3) & !3;
// Calculate value size (4-byte aligned)
let value_size_aligned = (value.len() + 3) & !3;
// Check for space collision
if entry_offset + entry_size_aligned > value_end - value_size_aligned {
return Err(-28); // ENOSPC
}
// Write value at end of block
value_end -= value_size_aligned;
buf[value_end..value_end + value.len()].copy_from_slice(value);
// Write entry
let entry = Ext4XattrEntry {
e_name_len: name.len() as u8,
e_name_index: 1, // user namespace
e_value_offs: (value_end) as u16,
e_value_inum: 0,
e_value_size: value.len() as u32,
e_hash: 0,
};
unsafe {
core::ptr::copy_nonoverlapping(
&entry as *const _ as *const u8,
buf.as_mut_ptr().add(entry_offset),
core::mem::size_of::()
);
}
// Write name after entry
let name_offset = entry_offset + core::mem::size_of::();
buf[name_offset..name_offset + name.len()].copy_from_slice(name.as_bytes());
entry_offset += entry_size_aligned;
}
Ok(())
}
Serialization writes the header first, then iterates through attributes. Each attribute gets an entry written forward from the header and a value written backward from the end. The 4-byte alignment is required by the ext4 specification. If entries and values would collide, we return ENOSPC.
Directory operations in ext4 involve reading the directory's data blocks, parsing the linked list of directory entries, and either finding, adding, or removing entries. The entry format is variable-length with a rec_len field that points to the next entry.
Directory Entry Removal
src/fs/ext4/dir/remove.rs
/* DEV NOTES eK@nonos.systems
Remove a directory entry by name. Reads through directory blocks, finds matching entry,
and zeros out the inode field to mark as deleted. The rec_len is preserved so directory
traversal still works.
*/
pub fn dir_remove_entry(
dev: &str,
sb: &Ext4Superblock,
dir_ino: u32,
name: &str
) -> Result<(), i32> {
let dir_inode = read_inode(dev, sb, dir_ino)?;
if !dir_inode.is_dir() {
return Err(-20); // ENOTDIR
}
let block_size = sb.block_size() as usize;
let blocks = (dir_inode.size() + block_size as u64 - 1) / block_size as u64;
let mut buf = alloc::vec![0u8; block_size];
for b in 0..blocks {
let pblock = extent_lookup(dev, sb, &dir_inode, b as u32)?;
crate::drivers::block::read(dev, &mut buf, pblock * sb.block_size() as u64)?;
let mut offset = 0usize;
while offset < block_size {
let entry = unsafe { &*(buf.as_ptr().add(offset) as *const Ext4DirEntry) };
if entry.rec_len == 0 {
break;
}
if entry.inode != 0 && entry.name_len as usize == name.len() {
let entry_name = core::str::from_utf8(
&buf[offset + 8..offset + 8 + entry.name_len as usize]
).unwrap_or("");
if entry_name == name {
// Found it - zero the inode to mark as deleted
unsafe {
let entry_mut = &mut *(buf.as_mut_ptr().add(offset) as *mut Ext4DirEntry);
entry_mut.inode = 0;
}
crate::drivers::block::write(dev, &buf, pblock * sb.block_size() as u64)?;
return Ok(());
}
}
offset += entry.rec_len as usize;
}
}
Err(-2) // ENOENT
}
The removal algorithm iterates through all directory blocks, following the rec_len chain. When we find an entry with matching name, we zero the inode field and write the block back. We don't reclaim the space (that would require merging with adjacent entries), but the entry is effectively deleted - traversal will skip entries with inode 0.
Directory Iteration
src/fs/ext4/dir/iterate.rs
/* DEV NOTES eK@nonos.systems
Iterate through all directory entries, invoking callback for each valid entry.
Callback receives: inode number, entry name, file type. Skips deleted entries
(inode == 0) and entries with zero-length names.
*/
pub fn dir_iterate(
dev: &str,
sb: &Ext4Superblock,
dir_inode: &Ext4Inode,
mut f: F
) -> Result<(), i32> {
if !dir_inode.is_dir() {
return Err(-20);
}
let block_size = sb.block_size() as usize;
let blocks = (dir_inode.size() + block_size as u64 - 1) / block_size as u64;
let mut buf = alloc::vec![0u8; block_size];
for b in 0..blocks {
let pblock = extent_lookup(dev, sb, dir_inode, b as u32)?;
crate::drivers::block::read(dev, &mut buf, pblock * sb.block_size() as u64)?;
let mut offset = 0usize;
while offset < block_size {
let entry = unsafe { &*(buf.as_ptr().add(offset) as *const Ext4DirEntry) };
if entry.rec_len == 0 {
break;
}
if entry.inode != 0 && entry.name_len > 0 {
if let Ok(entry_name) = core::str::from_utf8(
&buf[offset + 8..offset + 8 + entry.name_len as usize]
) {
f(entry.inode, entry_name, entry.file_type);
}
}
offset += entry.rec_len as usize;
}
}
Ok(())
}
Iteration uses the same traversal pattern but invokes a callback for each valid entry. The callback-based API is more flexible than returning a vector - the caller can stop early, filter entries, or build whatever data structure they need. The closure captures any needed context.