Swapping via NFS for Linux
You are visitor number
since Sunday, June 1st 2003.
Contents.
Status of this project.
Unluckily, I do not have the time to maintain the patches any further. Sad, was quite a lot of fun.
Which of the patches are considered to be stable?
In short: NONE. I'm just saying this because the patches were not tested very much. I'm using an old 486 machine with 8MB for quick testing, and the thing was able to run a recent Linux distribution with X without deadlocking.
Use at your own risk. Comments and bug-reports or bug-fixes are welcome.
How to apply the patch files
To apply the patches below, change to the location of your kernel source tree, e.g.
/usr/src/hacking/linux/
and run the patch command like this
gunzip -c /usr/src/linux-2.2.14-nfs-swap.diff.gz | patch -p1 -l -swhere you have to replace /usr/src/linux-2.2.14-nfs-swap.diff.gz by the name of the patch file your are actually using. The example given above assume that you have down-loaded the patch file into /usr/src/.
After applying the patch you have to reconfigure your kernel. Run make menuconfig or make xconfig at the top-level directory of the patched kernel tree. Then enable the following configuration options
- Code maturity level options
- Prompt for development and/or incomplete code/drivers
- Networking options
- Swapping via network sockets (EXPERIMENTAL)
- Filesystems/Network File Systems
- Swapping via NFS (EXPERIMENTAL)
Your computer probably doesn't have a hard-disk. So you can also disable the Swapping to block devices option in the Filesystems menu. This will will save a little bit of memory, 2k or so. Also, you probably need to enable the following options as well
- Networking options
- IP: kernel level autoconfiguration
- Networking options
- IP: DHCP support
- Networking options
- IP: BOOTP support
- Networking options
- IP: RARP support
- Filesystems/Network File Systems
- Root file system on NFS
You probably need only one of DHCP, BOOTP and RARP, depending on you setup.
How to enable swapping using the NFS protocol
After you have recompiled and installed the kernel in the usual way you have to reboot. Afterwards you can enable swapping to files located on an NFS server by using the commands dd, mkswap and swapon which are available on virtually any Linux machine. Proceed as follows (customise to your needs!!!). The example assumes that your machine also gets its root file-system via NFS.
dd if=/dev/zero of=/SWAPFILE bs=1k count=20480 mkswap /SWAPFILE 20480 swapon /SWAPFILEThat's it. You have created a 20MB swap-file and told your kernel to use it. Please refer to the man-pages for the respective programs for more information (man 8 swapon, man 8 mkswap).
Pitfalls
Swapping via NFS is really slow. Swapping via NFS is also insecure as you memory pages go unencrypted over the network. You have been warned.
- Due to the way network data is transferred from inside the Linux kernel to the network it is necessary to mount the NFS volume which holds the swap-files with an rsize and wsize which is a good part less than the page size used by the kernel (i.e. 4k on Intel machines). Otherwise your machine may run out of memory because of memory-fragmentation.
The network layer still needs contiguous areas of RAM for its data, and the NFS layer needs a little bit more space than the page size to transfer a page to the NFS server. You can set the rsize and wsize as a mount option like this (e.g.)
mount -t nfs -o rsize=2048,wsize=2048 YOUR_SERVER_IP:/var/stage/swapvolume /swapfilesThis example would mount the server-side directory /var/stage/swapvolume on the local mount point /swapfiles (on the client).
On the other hand it has to be noted that setting the rsize and wsize to a value below the page size of your Linux system has the effect that asynchronous IO is disabled which is a severe performance hit.
One solution is to mount a separate volume for the swap-files, or create the swap-files on a volume which does not contain large files which are frequently accessed (i.e. large program files etc.).
Because swapping over network needs itself memory, it might be a good idea to increase the number of pages normal processes are not allowed to eat up using the sysctl interface, i.e. tune the contents of /proc/sys/vm/freepages. A description of the meaning of the file can be found in the Linux kernel source tree in Documentation/sysctl/vm.txt.
Note, however, that /proc/sys/vm/freepages is readonly on recent v2.4.0 kernels and is meaningless. Read /usr/src/linux/Documentation/vm/ and related documentation.
There is also a new entry /proc/sys/net/swapping/threshold. I'm using a value of 32 (meaning 32 pages), i.e. during system boot one of the scripts does the following:
echo 32 > /proc/sys/net/swapping/thresholdCustomize according to your experiences; i.e. if your machine runs out of memory then try to increase the value (but don 't set it too high ...)
Only for 2.2.* kernels: This value should be set to a number somewhat below the parameter free_pages_min, i.e. the first number of the contents of /proc/sys/vm/freepages. The default is the same value as free_pages_min.
Implementation notes
The main problem is that receiving data packets via the network consumes memory in itself: each packet first has to be copied into the systems RAM by the network device layer. Only after it has been copied into a previous allocated memory block one can have a look at its contents and decide what to do with it.
This means that it would be possible to bring a machine -- which uses any network-based swapping mechanism -- down to its knees by simply flood-pinging it.
The present implementation of the NFS-Swap patch resolves this problem by dropping network packets not needed for swapping when running out of memory, i.e. when the number of free pages falls below a configurable threshold. There is a new sysctl entry /proc/sys/net/swapping/threshold which specifies that threshold. It defaults to the parameter free_pages_min, i.e. the first number of /proc/sys/vm/freepages.
There is a new socket option SO_SWAPPING. A socket which shall be used for swapping should have this option set at the SOL_SOCKET level (see setsockopt(2)). My existing NFS-swap implementation sets this options when calling swapon for a file located on an NFS volume. This might be of some help when trying to swap to Pavel Machek's nbd (Network Block Device).
I have changed mm/page_io.c and mm/swapfile.c to no longer access block devices or files themselves, but instead use a list of swap methods which must register/unregister themselves using the register_swap_method() and unregister_swap_method() calls. Methods are identified by their name. Currently, the well known blkdev and blkdev file methods (swapping to hard disk partitions and files located on such partitions) and the nfs file method are defined.
The files containing the implementation of these methods are fs/blkdev_swap.c and fs/nfs/nfsswap.c.
Where to download the patch files from
Protocol Location ![]()
http://sourceforge.net/project/showfiles.php?group_id=82543
Old web page (and patches)
Can be found here.
Links
- http://www.netboot.info
The package which contained the orignal nfs-swap patch. See also Credits below.
- Etherboot Project
I used the etherboot package -- now maintained by Ken Yap -- when I started implementing the NFS-swap patch for my 8Meg i486 test machine (which is gone now ...). Its homepage http://etherboot.sourceforge.net/related.html gives a good overview of other packages for booting disk-less machines.
- Linux NFS project at SourceForge
Tom Dyas' nfs-swap patch for Linux-2.2.17 applies on top of the Trond and DHiggen patches for 2.2.17 available at SourceForge.
- Pavel Machek's Network Block Device
Visit Pavel's nbd page. His hacks to allow swap files to be located on nbd volumes are partly based on my nfs-swap patches. His approach is probably superior to mine as the network block device protocol he has developed is faster than the NFS protocol.
Stories of Success
- Matt Boytim
has reported that linux-2.4.11-nfs-swap.diff [no longer available] works fine for his SH3 machine. Matt Boytim:
[snip] The sh3 board has 16Mbyte of memory and no local storage (no local drives so both nfs root and nfs swap). On the sh3 board I was telenet'ed in three times - two sessions were running large compiles (one was the kernel) and the other running top. > From two other machines on the network I ran four flood pings plus one tcpspray - the sh3 system did not hang and in fact seemed quite normal. The swap file was about 6 meg. [snip]
Credits.
- Pavel Machek
Fruitful discussions about one year ago when I started to really implement the thing and to make it more or less stable ...
- Volker Seebode
For testing and using the stuff in professional environments.
- Tom Dyas
For porting the stuff to Linux v2.2.17.
- Jens Wilke
For fixing bugs in the patch for 2.2.18.
- Many others ...
- ... and last but not least: Gero Kuhlmann
AFAIK, the first patch was by him. Actually, I fetched his patch from the etherboot-2.0 package, but the original patch was included in some netboot-nfs.tar.gz archive which was uploaded to sunsite on the fifth of February 1995 ("The" SunSITE server at .unc.edu. where the majority of Linux related software could be found in those days ... SunSITE, however, is a network of servers spread over the internet).
However, his patch hardly addressed deadlock situations under high net-load which are the major problem when trying to swap using network connections as transport. My major concern was to resolv those dead-lock problems (see Implementations Notes above).
The successor of the netboot-nfs package can still be found at http://www.netboot.info.
webmaster
Impressum
Last modified: Wed Mar 2 00:53:11 CET 2005