Table of Contents
uClibc and Glibc are not the same
There are a number of differences which may or may not cause you problems. This document attempts to list these differences and, when completed, will contain a full list of all relevant differences.
- uClibc is smaller than glibc. We attempt to maintain a glibc compatible interface, allowing applications that compile with glibc to easily compile with uClibc. However, we do not include _everything_ that glibc includes, and therefore some applications may not compile. If this happens to you, please report the failure to the uclibc mailing list, with detailed error messages.
- uClibc is much more configurable then glibc. This means that a developer may have compiled uClibc in such a way that significant amounts of functionality have been omitted.
- uClibc does not even attempt to ensure binary compatibility across releases. When a new version of uClibc is released, you may or may not need to recompile all your binaries.
- malloc(0) in glibc returns a valid pointer to something(!?!?) while in uClibc calling malloc(0) returns a NULL. The behavior of malloc(0) is listed as implementation-defined by SuSv3, so both libraries are equally correct. This difference also applies to realloc(NULL, 0). I personally feel glibc's behavior is not particularly safe. To enable glibc behavior, one has to explicitly enable the MALLOC_GLIBC_COMPAT option.
- glibc's malloc() implementation has behavior that is tunable via the MALLOC_CHECK_ environment variable. This is primarily used to provide extra malloc debugging features. These extended malloc debugging features are not available within uClibc. There are many good malloc debugging libraries available for Linux (dmalloc, electric fence, valgrind, etc) that work much better than the glibc extended malloc debugging. So our omitting this functionality from uClibc is not a great loss.
- uClibc does not provide a database library (libdb).
- uClibc does not support NSS (/lib/libnss_*), which allows glibc to easily support various methods of authentication and DNS resolution. uClibc only supports flat password files and shadow password files for stotiny authentication information. If you need something more complex than this, you can compile and install pam.
- uClibc's libresolv is only a stub. Some, but not all of the functionality provided by glibc's libresolv is provided internal to uClibc. Other functions are not at all implemented.
- libnsl provides support for Network Information Service (NIS) which was originally called “Yellow Pages” or “YP”, which is an extension of RPC invented by Sun to share Unix password files over the network. I personally think NIS is an evil abomination and should not be used. These days, using ldap is much more effective mechanism for doing the same thing. uClibc provides a stub libnsl, but has no actual support for Network Information Service (NIS). We therefore, also do not provide any of the headers files provided by glibc under /usr/include/rpcsvc.
- uClibc's locale support is not 100% complete yet. We are working on it.
- uClibc's math library only supports long double as inlines, and even then the long double support is quite limited. Also, very few of the float math functions are implemented. Stick with double and you should be just fine.
- uClibc's libcrypt does not support the reentrant crypt_r, setkey_r and encrypt_r, since these are not required by SuSv3.
- uClibc directly uses kernel types to define most opaque data types.
- uClibc directly uses the linux kernel's arch specific 'stuct stat'.
- uClibc's librt library currently lacks all aio routines, all clock routines, and all shm routines (only the timer routines and the mq routines are implemented).
Some general comments...
The intended target for all my uClibc code is ANSI/ISO C99 and SUSv3 compliance. While some glibc extensions are present, many will eventually be configurable. Also, even when present, the glibc-like extensions may differ slightly or be more restrictive than the native glibc counterparts. They are primarily meant to be porting _aides_ and not necessarily drop-in replacements.
Now for some details…
- Leap seconds are not supported.
- /etc/timezone and the whole zoneinfo directory tree are not supported. To set the timezone, set the TZ environment variable as specified in http://www.opengroup.org/onlinepubs/007904975/basedefs/xbd_chap08.html or you may also create an /etc/TZ file of a single line, ending with a newline, containing the TZ setting. For example echo CST6CDT > /etc/TZ
- Currently, locale specific eras and alternate digits are not supported. “They are on my TODO list.”
wide char support
- The only multibyte encoding currently supported is UTF-8. The various ISO-8859-* encodings are (optionally) supported. The internal representation of wchar's is assumed to be 31 bit unicode values in native endian representation. Also, the underlying char encoding is assumed to match ASCII in the range 0-0x7f.
- In the next iteration of locale support, I plan to add support for (at least some) other multibyte encodings.
- The target for support is SUSv3 locale functionality. While nl_langinfo has been extended, similar to glibc, it only returns values for related locale entries.
- Currently, all SUSv3 libc locale functionality should be implemented except for wcsftime and collating item support in regex.
- Conversion of large magnitude floating-point values by printf suffers a loss of precision due to the algorithm used.
- uClibc's printf is much stricter than glibcs, especially regarding positional args. The entire format string is parsed first and an error is returned if a problem is detected. In locales other than C, the format string is checked to be a valid multibyte sequence as well. Also, currently at most 10 positional args are allowed (although this is configurable).
- BUFSIZ is configurable, but no attempt is made at automatic tuning of internal buffer sizes for stdio streams. In fact, the stdio code in general sacrifices sophistication/performace for minimal size.
- uClibc allows glibc-like custom printf functions. However, while not currently checked, the specifier must be ⇐ 0x7f.
- uClibc allows glibc-like custom streams. However, no in-buffer seeking is done.
- The functions fcloseall() and \_\_fpending() can behave differently than their glibc counterparts.
- uClibc's setvbuf is more restrictive about when it can be called than glibc's is. The standards specify that setvbuf must occur before any other operations take place on the stream.
- Right now, %m is not handled properly by printf when the format uses positional args.
- The FILEs created by glibc's fmemopen(), open_memstream(), and fopencookie() are not capable of wide orientation. The corresponding uClibc routines do not have this limitation.
- For scanf, the C99 standard states “The fscanf function returns the value of the macro EOF if an input failure occurs before any conversion.” But glibc's scanf does not respect conversions for which assignment was surpressed, even though the standard states that the value is converted but not stored.
Ulrich Drepper has refused to acknowledge or comment on ( http://sources.redhat.com/ml/libc-alpha/2003-09/ )
- The C99 standard says that for printf, a %s conversion makes no special provisions for multibyte characters. SUSv3 is even more clear, stating that bytes are written and a specified precision is in bytes. Yet glibc treats the arg as a multibyte string when a precision is specified and not otherwise.
- Both C99 and C89 state that the %c conversion for scanf reads the exact number of bytes specified by the optional field width (or 1 if not specified). uClibc complies with the standard. There is an argument that perhaps the specified width should be treated as an upper bound, based on some historical use. However, such behavior should be mentioned in the Conformance document.
- glibc's scanf is broken regarding some numeric patterns. Some invalid strings are accepted as valid (“0x.p”, “1e”, digit grouped strings). In spite of my posting examples clearly illustrating the bugs, they remain unacknowledged by the glibc developers.
- glibc's scanf seems to require a 'p' exponent for hexadecimal float strings. According to the standard, this is optional.
- C99 requires that once an EOF is encountered, the stream should be treated as if at end-of-file even if more data becomes available. Further reading can be attempted by clearing the EOF flag though, via clearerr() or a file positioning function. For details concerning the original change, see Defect Report #141. glibc is currently non-compliant, and the developers did not comment when I asked for their official position on this issue.
- glibc's collation routines and/or localedef are broken regarding implicit and explicit UNDEFINED rules.
More to follow as I think of it…
uClibc no longer supports 'gcc -fprofile-arcs -pg' style profiling, which causes your application to generate a 'gmon.out' file that can then be analyzed by 'gprof'. Not only does this require explicit extra support in uClibc, it requires that you rebuild everything with profiling support. There is both a size and performance penalty to profiling your applications this way, as well as Heisenberg effects, where the act of measuring changes what is measured. There exist a number of less invasive alternatives that do not require you to specially instrument your application, and recompile and relink everything.
- The OProfile system-wide profiler is an excellent alternative: http://oprofile.sourceforge.net/
- Many people have had good results using the combination of Valgrind to generate profiling information and KCachegrind for analysis: http://developer.kde.org/~sewardj/ http://kcachegrind.sourceforge.net/
- Prospect is another alternative based on OProfile: http://prospect.sourceforge.net/
- And the Linux Trace Toolkit (LTT) is also a fine tool: http://www.opersys.com/LTT/
- FunctionCheck: http://www710.univ-lyon1.fr/~yperret/fnccheck/
How uClinux provides MMU-less processors with an alternative
By Michael Durrant and Michael Leslie Of Lineo Embedded.com (02/12/02, 06:06:53 dop. EST) Many software developers in recent years have turned to Linux as their operating system of choice. Until the advent of uClinux developers of smaller embedded systems, usually incorporating microprocessors with no memory management unit could not take advantage of Linux in their designs according to Michael Durrant and Michael Leslie of Lineo.
UClinux is a variant of mainstream Linux that runs on 'MMU-less' processor architectures. Component costs are of primary concern in embedded systems, which are typically required to be small and inexpensive. Microprocessors with on-chip memory management unit (MMU) hardware tend to be complex and expensive, and as such are not typically selected for small, simple embedded systems which do not require them.
Benefits of Linux
Using Linux in devices which require some intelligence is attractive for many reasons:
-It is a mature, robust operating system-It already supports a large number of devices, filesystems, and networking protocols -Bug fixes and new features are constantly being added, tested and refined by a large community of programmers and users -It gives everyone from developers to end users complete visibility of the source code -A large number of applications (such as GNU software) exist which require little to no porting effort -Linux's very low cost
Embedded systems running uClinux may be configured in many ways other than that of the familiar UNIX-like Linux distribution. Nevertheless, an example of a system running uClinux in this way will help to illustrate how it may be used.
Lineo's uCdimm is a complete computer in an SO-DIMM form factor, built around a Motorola 68VZ328 'Dragonball' microcontroller, the latest processor in a family widely popularised by the 'Palm Pilot. It is equipped with 2M of flash memory, 8M of SDRAM, both synchronous and asynchronous serial ports, and an ethernet controller. There is a custom resident boot monitor on the device, which is capable of downloading a binary image to flash memory and executing it. The image that is downloaded consists of a uClinux kernel and root filesystem.
In UNIX terms, the kernel makes a block device out of the memory range where the root filesystem resides, and mounts this device as root. The root filesystem is in a read-only UNIX-like format called 'ROMFS'.
Since the Dragonball runs at 32MHz, the kernel and optionally user programs execute in-place in flash memory. Faster systems benefit from copying the kernel and root filesystem into RAM and executing there.
Other embedded systems may be inherently network-based, so a kernel in flash memory might mount a root filesystem being served via network file system (NFS). An even more network-centric device might request its kernel image and root filesystems via dynamic host configuration protocol (DHCP) and bootp. Note that drivers for things like IDE and SCSI disk, CD, and floppy support are all still present in the uClinux kernel.
The contents of the root filesystem vary more dramatically between embedded systems using uClinux than between Linux workstations. The uClinux distribution contains a root filesystem which implements a small UNIX-like server, with a console on the serial port, a telnet daemon, a web server, NFS client support, and a selection of common UNIX tools. A system such as an MPEG layer 3 compressed audio CD player might not even have a console. The kernel might contain only support for a CD drive, parallel I/O, and an audio DAC. User space might consist only of an interface program to drive buttons and LEDs, to control the CD, and which could invoke one other program; an MPEG audio player. Such an application specific system would obviously require much less memory than the full-fledged uClinux distribution as it is shipped.
Development under uClinux
Developing software for uClinux systems typically involves a cross-compiler toolchain built from the familiar GNU compiler tools. Software that builds under GNU C Compiler (GCC) for x86 architectures, for example, often builds without modification on any uClinux target.Debugging a target via GNU debugger (GDB) presents a debugging interface common to all the platforms supported by GDB.
The debugging interface to a uClinux kernel on a target depends on debugging support for that target. If the target processor has hardware support for debugging, such as IEEE's JTAG or Motorola's BDM, GDB may connect non-intrusively to the target to debug the kernel. If the processor lacks such support, a GDB 'stub' may be incorporated into the kernel. GDB communicates with the stub via a serial port, or via Ethernet.
The C library used in uClinux, uClibc, is a smaller implementation than those which ship with most modern Linux distributions. The library has been designed to provide most of the calls that UNIX-like C programs will use. If an application requires a feature that is not implemented in uClibc, the feature may be added to uClibc, it may me linked in as a separate library, or it may be added to the application itself.
Differences between uClinux and Linux
Considering that the absence of MMU support in uClinux constitutes a fundamental difference from mainstream Linux, surprisingly little kernel and user space software is affected. Developers familiar with Linux will notice little difference working under uClinux. Embedded systems developers will already be familiar with some of the issues peculiar to uClinux. Two differences between mainstream Linux and uClinux are a consequence of the removal of MMU support from uClinux. The lack of both memory protection and of a virtual memory model are of importance to a developer working in either kernel or user space. Certain system calls to the kernel are also affected.
One consequence of operating without memory protection is that an invalid pointer reference by even an unprivileged process may trigger an address error, and potentially corrupt or even shut down the system. Obviously code running on such a system must be programmed carefully and tested diligently to ensure robustness and security.
There are three primary consequences of running Linux without virtual memory. One is that processes which are loaded by the kernel must be able to run independently of their position in memory. One way to achieve this is to “fix up” address references in a program once it is loaded into RAM. The other is to generate code that uses only relative addressing (referred to as PIC, or Position Independent Code) - uClinux supports both of these methods.
Another consequence is that memory allocation and deallocation occurs within a flat memory model. Very dynamic memory allocation can result in fragmentation which can starve the system. One way to improve the robustness of applications that perform dynamic memory allocation is to replace malloc() calls with requests from a preallocated buffer pool.
Since virtual memory is not used in uClinux, swapping pages in and out of memory is not implemented, since it cannot be guaranteed that the pages would be loaded to the same location in RAM. In embedded systems it is also unlikely that it would be acceptable to suspend an application in order to use more RAM than is physically available.
Changes to the interface
The lack of memory management hardware on uClinux target processors has meant that some changes needed to be made to the Linux system interface. Perhaps the greatest difference is the absence of the fork() and brk() system calls.
A call to fork() clones a process to create a child. Under Linux, fork() is implemented using copy-on-write pages. Without an MMU, uClinux cannot completely and reliably clone a process, nor does it have access to copy-on-write.
uClinux implements vfork() in order to compensate for the lack of fork(). When a parent process calls vfork() to create a child, both processes share all their memory space including the stack. vfork() then suspends the parent's execution until the child process either calls exit() or execve().
Note that multitasking is not otherwise affected. It does, however, mean that older-style network daemons that make extensive use of fork() must be modified. Since child processes run in the same address space as their parents, the behaviour of both processes may require modification in particular situations.
Many modern programs rely on child processes to perform basic tasks, allowing the system to maintain an interactive 'feel' even if the processing load is quite heavy. Such programs may require substantial reworking to perform the same task under uClinux. If a key application depends heavily on such structuring, then it may be necessary to either re-create the application, or an MMU-enabled processor may also be needed.
A hypothetical, simple network daemon, hyped, will illustrate the use of fork().hyped always listens on a well-known network port (or socket) for connections from a network client. When the client connects, hyped gives it new connection information (a new socket number) and calls fork(). The child process then accepts the client's reconnection to the new socket, freeing the parent to listen for new connections.
uClinux has neither an autogrow stack nor brk() and so user space programs must use the mmap() command to allocate memory. For convenience, our C library implements malloc() as a wrapper to mmap(). There is a compile-time option to set the stack size of a program.
Anatomy of the uClinux kernel
This section describes the changes that were made to the Linux kernel to allow it to run on MMU-less processors.
The architecture-generic memory management subsystem was modified to remove reliance on MMU hardware by providing basic memory management functions within the kernel software itself.
For those who are familiar with uClinux, this is the role of the directory /mmnommu derived from and replacing the directory /mm. Several subsystems needed to be modified, added, removed, or rewritten. Kernel and user memory allocation and deallocation routines had to be reimplemented.
Support for transparent swapping/paging was removed. Program loaders which support position independent code (PIC) were added. A new binary object code format, named 'flat' was created, which supports PIC and which has a very compact header. Other program loaders, such as that for ELF, were modified to support other formats which, instead of using PIC, use absolute references which it is the responsibility of the kernel to 'fix up' at run time.
Each method has advantages and disadvantages. Traditional PIC is quick and compact but has a size restriction on some architectures. For example, the 16-bit relative jump in Motorola 68k architectures limits PIC programs to 32K. The runtime fix-up technique removes this size restriction, but incurs overhead when the program is loaded by the kernel.
Porting uClinux to new platforms
The task of adding support for a new CPU architecture in uClinux is similar to doing so in Linux proper. Fortunately, there is a great deal of code in Linux that can be ported with minor adaptations and reused in uClinux. Machine dependent startup code and header files already exist in Linux for MMU versions of processors in the ARM, Motorola 68k, MIPS, SPARC and other families. This code may be adapted to support non-MMU versions of these processors in uClinux.
Driver code which already exists in Linux is often easily portable to run under uClinux. Issues in porting such code may involve endian issues or memory handling code which assumes the presence of MMU support.
Numerous enhancements are in the works for uClinux. The diversity of the innovations that mainstream Linux receives from the community pave a good path for the development of uClinux. The uClinux developer community is very active; enhancements and innovations are frequently made.
Linux is now a platform for hard real-time application development (that is, applications with deterministic latency under varying processor loads). The Linux kernel scheduler already provides non-deterministic, or 'soft', real-time, and systems such as real-time application interface (RTAI) upgrade the Linux kernel to provide hard (deterministic) real-time support. Real-time applications in Linux have access to the extensive resources of the Linux kernel without sacrificing hard real-time performance. Efforts are underway to provide the RTAI subsystem for use on various MMU-less processors.
uClinux 2.4, with support for Motorola Dragonball and Coldfire, was released in January of 2001. Other ports have been made or are being planned to the uClinux 2.4 tree, which is based on Linux 2.4. but enhancements are also still being made to the uClinux 2.0 tree. uClinux 2.4 will give developers access to many of the new features added to Linux since 2.0, including support for USB, IEEE Firewire, IrDA, and new networking features such as bandwidth allocation, (a.k.a. QoS: Quality of Service) IP Tables, and IPv6.
Since uClinux is Open Source, development effort spent on uClinux will never be lost. Engineering professionals world-wide, are using uClinux to create commercial products and a significant portion of their work is contributed back to the open source community.
Why create an MMU-less Linux? As early as 1997, Jeff Dionne, Michael Durrant, and others discussed the possibility of implementing Linux on MMU-less processors to act as a low cost network controller driving data communications between Ethernet and Microwave communication systems. However, it was the collaboration of Kenneth Albanowski and Jeff Dionne that resulted in the world's first release. This early uClinux implementation was deployed into a SCADA controller and publicly released into the open source community as an alternative OS for the Palm Pilot (Feb 1998).
Jeff Dionne and Michael Durrant from Lineo Canada (formerly Rt-Control) later designed and built a line of embedded controllers know as uCsimm and uCdimm (see photo of uCdimm) taking advantage of uClinux's compact code size. Meanwhile, Greg Ungerer, Chief Scientist, Lineo Australia (formerly MoretonBay) ported uClinux onto the popular Motorola ColdFire platform and designed several VPN (Virtual Private Network) Internet appliances including Lineo's NetTEL and SecureEdge routers (see photo of SecureEdge).
The uClinux kernel has been deployed on several other CPU architectures, and platforms including the AXIS Network Camera from Axis Communications (Sweden), and the Voice over IP (VoIP) telephone from Aplio SA (France). Notable contributions to uClinux have been made by engineers at Lineo, Aplio, Axis, and from individuals in the open source community. These contributions are reflected on Lineo's open source web site (opensource.lineo.com).
References 'Running Linux on low cost, low power, MMU-less processors', Michael Durrant, Lineo, Inc. 'Building Low Cost, Embedded, Network Appliance with Linux', Greg Ungerer, Lineo, Inc. 'Embedded Coldfire-Taking Linux On-Board', Nigel Dick, Motorola Ltd 'When hard real-time goes soft', D. Jeff Dionne, Lineo, Inc.
Michael Durrant is director of engineering at Lineo and Michael Leslie is a senior software developer at Lineo.
uClinux for Linux Programmers
By David McCullough on Thu, 2004-07-01 01:00. Embedded
Adapt your software to run on processors without memory management—it's easier than you think.
uClinux has seen a huge increase in popularity and is appearing in more commodity devices than ever before. Its use in routers (Figure 1), Web cameras and even DVD players is testimony to its versatility. The explosion of low-cost, 32-bit CPUs capable of running uClinux is providing even more options to manufacturers considering uClinux. Now with uClinux's debut as part of the 2.6 kernel, it is set to become even more popular.
Figure 1. The SnapGear LITE2 VPN/Router runs uClinux.
With more embedded developers facing the possibility of working with uClinux, a guide to its differences from Linux and its traps and pitfalls is an invaluable tool. Here we discuss the changes a developer might encounter when using uClinux and how the environment steers the development process.
No Memory Management
The defining and most prevalent difference between uClinux and other Linux systems is the lack of memory management. Under Linux, memory management is achieved through the use of virtual memory (VM). uClinux was created for systems that do not support VM. As VM usually is implemented using a processing unit called an MMU (memory management unit), you often hear the term NOMMU when traveling in uClinux circles.
With VM, all processes run at the same address, albeit a virtual one, and the VM system takes care of what physical memory is mapped to these locations. So even though the virtual memory the process sees is contiguous, the physical memory it occupies can be scattered around. Some of it even may be on a hard disk in swap. Because arbitrarily located memory can be mapped to anywhere in the process' address space, it is possible to add memory to an already running process.
Without VM, each process must be located at a place in memory where it can be run. In the simplest case, this area of memory must be contiguous. Generally, it cannot be expanded as there may be other processes above and below it. This means that a process in uClinux cannot increase the size of its available memory at runtime as a traditional Linux process would.
Although all programs need to be relocated at run time so that they can execute, it is a fairly transparent task for the developer. It is the direct effect of no VM that is the thorn in every uClinux developer's side. The net effect is that no memory protection of any kind is offered-it is possible for any application or the kernel to corrupt any part of the system. Some CPU architectures allow certain I/O areas, instructions and memory regions to be protected from user programs but that is not guaranteed. Even worse than the corruption that crashes a system is the corruption that goes unnoticed, and tracking down random interprocess corruption can be extremely difficult.
Without VM, swap is effectively impossible, although this limitation is rarely an issue on the kinds of systems that run uClinux. They often do not have hard drives or enough memory to make swap worthwhile.
To a kernel developer, uClinux offers little in the way of differences from Linux. The only real issue is that you cannot take advantage of the paging support provided by an MMU. In practice, this doesn't affect much of the kernel. tmpfs, for example, does not work on uClinux because it relies on the VM system.
Similarly, all of the standard executable formats are unsupported, because they make use of VM features that do not exist under uClinux. Instead, a new format is required, the flat format. Flat format is a condensed executable format that stores only executable code and data, along with the relocations needed to load the executable into any location in memory.
Device drivers often need some work when you move to uClinux, not because of differences in the kernels, but due to the kinds of devices the kernel needs to support. For example, the SMC network driver supports ISA SMC cards. They usually are 16-bit and are located at I/O addresses below 0x3ff. The same driver easily can be made to support the non-ISA embedded versions of the chip, but it may need to run in 8-, 16- or 32-bit mode, at an I/O address that is a full 32-bit address and at an interrupt number quite often higher than ISA's maximum of 16. So despite the fact that the bulk of the driver is the same, the hardware specifics can require a little porting effort. Quite often, older drivers store I/O addresses in short format, which does not work on an embedded uClinux platform with devices appearing at memory-mapped I/O addresses.
The implementation of mmap within the kernel is also quite different. Though often transparent to the developer, it needs to be understood so it is not used in ways that are particularly inefficient on uClinux systems. Unless the uClinux mmap can point directly to the file within the filesystem, thereby guaranteeing that it is sequential and contiguous, it must allocate memory and copy the data into the allocated memory. The ingredients for efficient mmap usage under uClinux are quite specific. First, the only filesystem that currently guarantees that files are stored contiguously is romfs. So one must use romfs to avoid the allocation. Second, only read-only mappings can be shared, which means a mapping must be read-only in order to avoid the allocation of memory. The developer under uClinux cannot take advantage of copy-on-write features for this reason. The kernel also must consider the filesystem to be “in ROM”, which means a nominally read-only area within the CPU's address space. This is possible if the filesystem is present somewhere in RAM or ROM, both of which are addressable directly by the CPU. One cannot have a zero allocation mmap if the filesystem is on a hard disk, even if it is a romfs filesystem, as the contents are not directly addressable by the CPU.
Memory Allocation (Kernel and Application)
uClinux offers a choice of two kernel memory allocators. At first it may not seem obvious why an alternative kernel memory allocator is needed, but in small uClinux systems the difference is painfully apparent. The default kernel allocator under Linux uses a power-of-two allocation method. This helps it operate faster and quickly find memory areas of the correct size to satisfy allocation requests. Unfortunately, under uClinux, applications must be loaded into memory that is set aside by this allocator. To understand the ramifications of this, especially for large allocations, consider that an application requiring a 33KB allocation in order to be loaded actually allocates to the next power of two, which is 64KB. The 31KB of extra space allocated cannot be utilized effectively. This order of memory wastage is unacceptable on most uClinux systems. To combat this problem, an alternative memory allocator has been created for the uClinux kernels. It commonly is known as either page_alloc2 or kmalloc2, depending on the kernel version.
page_alloc2 addresses the power-of-two allocation wastage by using a power-of-two allocator for allocations up to one page in size (a page is 4,096 bytes, or 4KB). It then allocates memory rounded up to the nearest page. For the previous example, an application of 33KB actually has 36KB allocated to it; a savings of 28KB for a 33KB application is possible.
page_alloc2 also takes steps to avoid fragmenting memory. It allocates all amounts of two pages (8KB) or less from the start of memory up and all larger amounts from the end of free memory down. This stops transient allocations for network buffers and so on, fragmenting memory and preventing large applications from running. For a more detailed example of memory fragmentation, see the example in the Applications and Processes section below. page_alloc2 is not perfect, but it works well in practice, as the embedded environments that run uClinux tend to have a relatively static group of long-lived applications.
Once the developer gets past the kernel memory allocation differences, the real changes appear in the application space. This is where the full impact of uClinux's lack of VM is realized. The first major difference most likely to cause an application to fail under uClinux is the lack of a dynamic stack. On VM Linux, whenever an application tries to write off the top of the stack, an exception is flagged and some more memory is mapped in at the top of the stack to allow the stack to grow. Under uClinux, no such luxury is available as the stack must be allocated at compile time. This means that the developer, who previously was oblivious to stack usage within the application, must now be aware of the stack requirements. The first thing a developer should consider when faced with strange crashes or behavior of a newly ported application is the allocated stack size. By default, the uClinux toolchains allocate 4KB for the stack, which is close to nothing for modern applications. The developer should try increasing the stack size with one of the following methods:
Add FLTFLAGS = -s <stacksize> and export FLTFLAGS to the Makefile for the application before building. 2.
Run flthdr -s <stacksize> executable after the application has been built.
The second major difference that strikes a uClinux developer is the lack of a dynamic heap, the area used to satisfy memory allocations with malloc and related functions in C. On Linux with VM, an application can increase its process size, allowing it to have a dynamic heap. This traditionally is implemented at the low level using the sbrk/brk system calls, which increase/change the size of a process' address space. The heap's management by library functions such as malloc then is performed on the extra memory obtained by calling sbrk() on behalf of the application. If an application needs more memory at any point, it can get more simply by calling sbrk() again; it also can decrease memory using brk(). sbrk() works by adding more memory to the end of a process (increasing its size). brk() arbitrarily can set the end of the process to be closer to the start of the process (reduce the process size) or further away (increase the process size).
Because uClinux cannot implement the functionality of brk and sbrk, it instead implements a global memory pool that basically is the kernel's free memory pool. There are pitfalls with this method. For example, a runaway process can use all of the system's available memory. Allocating from the system pool is not compatible with sbrk and brk, as they require memory to be added to the end of a process' address space. Thus, a normal malloc implementation is no good, and a new implementation is needed.
A global pool approach has some advantages. First, only the amount of memory actually required is used, unlike the pre-allocated heap system that some embedded systems use. This is extremely important on uClinux systems, which generally are running with little memory. Another advantage is that memory can be returned to the global pool as soon as it is finished being used, and the implementation can take advantage of the existing in-kernel allocator for managing this memory, reducing the size of application code.
One of the common problems new users encounter is the missing memory problem. The system is showing a large amount of free memory, but an application cannot allocate a buffer of size X. The problem here is memory fragmentation, and all of the uClinux solutions available at this time suffer from it. Because of the lack of VM in the uClinux environment, it is nearly impossible to utilize memory fully due to fragmentation. This is best explained by example. Suppose a system has 500KB of free memory and one wishes to allocate 100KB to load an application. It is easy to think that this would be possible. However, it is important to remember that one must have a contiguous 100KB block of memory in order to satisfy the allocation. Suppose the memory map looks like this. Each character represents approximately 20KB, and X marks areas allocated or in use by other programs or by the kernel: Garrick, please use small font below.
0 100 200 300 400 500 600 700 800 900 1000
In this case, 500KB are free, but the largest contiguous block is only 80KB. There are many ways to arrive at such a situation. A program that allocates some memory and then frees most of it, leaving a small allocation in the middle of a larger free block, often is the cause. Transient programs under uClinux also can affect where and how memory is allocated. The uClinux page_alloc2 kernel allocator has a configuration option that can help identify this problem. It enables a new /proc entry, /proc/mem_map, that shows pages and their allocation grouping. Documenting this is beyond the scope of this article, but more information can be found in the kernel source for page_alloc2.c.
The question is often asked, why can't this memory be defragmented so it is possible to load a 100KB application? The problem is that we don't have VM and we cannot move memory being used by programs. Programs usually have references to addresses within the allocated memory regions, and without VM to make the memory always appear to be at the correct address, the program will crash if we move its memory. There is no solution to this problem under uClinux. The developer needs to be aware of the problem and, where possible, try to utilize smaller allocation blocks.
Applications and Processes
Another difference between VM Linux and uClinux is the lack of the fork() system call. This can require quite a lot of work on the developer's part when porting applications that use fork(). The only option under uClinux is to use vfork(). Although vfork() shares many properties with fork(), the differences are what matter the most.
fork() and vfork(), for those unfamiliar with these system calls, allow a process to split into two processes, a parent and a child. A process can split many times to create multiple children. When a process calls fork(), the child is a duplicate of the parent in all ways, but it shares nothing with the parent and can operate independently, as can the parent. With vfork() this is not the case. First, the parent is suspended and cannot continue executing until the child exits or calls exec(), the system call used to start a new application. The child, directly after returning from vfork(), is running on the parent's stack and is using the parent's memory and data. This means the child can corrupt the data structures or the stack in the parent, resulting in failure. This is avoided by ensuring that the child never returns from the current stack frame once vfork() has been called and that it calls _exit when finishing-exit cannot be called as it changes data structures in the parent. The child also must avoid changing any information in global data structures or variables, as such changes may break the execution of the parent.
Making an application use vfork instead of fork usually falls into the absolutely simple or incredibly difficult category. Generally, if the application does not fork and then exec() almost immediately, it needs to be checked carefully before fork() can be replaced with vfork().
The uClinux flat executable format, though it doesn't directly affect applications and their operations, does allow quite a few options that the usual ELF executables under Linux do not. Flat format executables come in two basic flavors, fully relocated and a variation of position-independent code (PIC). The fully relocated version has relocations for its code and data, while the PIC version generally needs only a few relocations for its data.
One of the most advantageous features to the embedded developer is execute-in-place (XIP). This is where the application executes directly from Flash or ROM, requiring the absolute minimum of memory, because only the memory for the data of the application is needed. This allows the text or code portion to be shared between multiple instances of the application. Not all uClinux platforms are capable of XIP, as it requires compiler support and the PIC form of the flat executable. So unless the toolchain for a given platform can do PIC, it cannot do XIP. Currently, only the m68k and ARM toolchains provide the required level of support for flat format XIP. romfs is the only filesystem to support XIP under uClinux, because the application must be stored contiguously within the filesystem for XIP to be possible.
The flat format also defines the stack size for an application as a field in the flat header. To increase the stack allocated to an application, a simple change of this field is all that is required. This can be done with the flthdr command, like this:
flthdr -s flat-executable
The flat format also allows two compression options. The entire executable can be compressed, providing maximum ROM savings. It also offers the often useful side effect that the application is loaded entirely into a contiguous RAM block. You also may choose data-segment-only compression. This is important if you want to save ROM space but still want the option to utilize XIP. The following:
flthdr -z flat-executable
creates a fully compressed executable, and
flthdr -d flat-executable
compresses only the data segment.
Although a complete discussion of shared libraries is beyond the scope of this article, they are quite different under uClinux. The currently available solutions require compiler changes and care on the part of the developer. The best way to create shared libraries is to start with an example. The current uClinux distributions provide shared libraries for both the uC-libc and uClibc libraries. The method for creating a shared library isn't difficult, and both of these libraries provide a good, clean example of how it is done. To set expectations appropriately, the GCC -shared option is not part of the shared library creation process, so do not expect it to be familiar. Shared libraries under uClinux are flat format executables, just like applications, and to be truly shared must be compiled for XIP. Without XIP, shared libraries result in a full copy of the library for each application using it, which is worse than statically linking your applications.
The step into uClinux from Linux often is more than the differences between uClinux and Linux. uClinux systems tend to be more deeply embedded systems, with smaller memories and ROM footprints and an unusual array of devices. The loss of a hard drive and the tight resource limits, coupled with no memory protection and a number of other subtle differences can make a developer's first adventure into uClinux more difficult than imagined. The best way to get started is to look at the uClinux Emulators (Figure 2) and cheap hardware (Figure 3) options available.
Figure 2. uClinux Running under Xcopilot (Palm Emulator)
Figure 3. uClinux Running on a Real Palm IIIx (with Microwindows)
Hopefully, highlighting these issues will help the wary developer be prepared beforehand and avoid some of the common pitfalls and misconceptions of working with uClinux.
Resources for this article: www.linuxjournal.com/article/7546.
David McCullough is a senior software engineer and a veteran embedded software developer. Prior to working at SnapGear and Lineo, he held software development and engineering management positions at Stallion Technologies and was involved in the development of products based on SCO and BSD UNIX. David ported and maintained XFree86 on SCO UNIX for several years and recently was instrumental in the development of the uClinux port of Linux 2.6. 7221aa.jpg