<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>pixelchaos.net &#187; AoE</title>
	<atom:link href="http://www.pixelchaos.net/category/aoe/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.pixelchaos.net</link>
	<description>random bits for your terminal</description>
	<lastBuildDate>Tue, 29 Sep 2009 11:38:59 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>install debian directly onto an AoE root filesystem</title>
		<link>http://www.pixelchaos.net/2009/05/25/install-debian-directly-onto-an-aoe-root-filesystem/</link>
		<comments>http://www.pixelchaos.net/2009/05/25/install-debian-directly-onto-an-aoe-root-filesystem/#comments</comments>
		<pubDate>Tue, 26 May 2009 02:23:16 +0000</pubDate>
		<dc:creator>jcl</dc:creator>
				<category><![CDATA[AoE]]></category>
		<category><![CDATA[Debian]]></category>
		<category><![CDATA[HowTo]]></category>

		<guid isPermaLink="false">http://www.pixelchaos.net/?p=131</guid>
		<description><![CDATA[Something that just about no one out there seems to be doing (yet) is trying to install Debian directly onto network block devices. The Debian installer doesnt support it (yet), grub doesnt support it (usually), and its just generally not an easy thing to do.
Now, there are quite a few ways around this problem. You [...]]]></description>
			<content:encoded><![CDATA[<p>Something that just about no one out there seems to be doing (yet) is trying to install Debian directly onto network block devices. The Debian installer doesnt support it (yet), grub doesnt support it (usually), and its just generally not an easy thing to do.</p>
<p>Now, there are quite a few ways around this problem. You can install to a &#8216;real&#8217; computer and migrate the installation to a network block device. You can use debootstrap in place of the actual Debian installation system. You can use a combination of these two methods, NFS root filesystems, TFTP hacks, etc. All of these solutions are lacking in my opinion. I want to run the &#8216;real&#8217; debian installer against a network block device and boot my physical hardware using only the built in PXE booting capability of the BIOS.</p>
<p>Taking all these issues as a personal challenge, I&#8217;ve outlined below how to go about using the regular old Debian Lenny installer directly against an AoE block device.<br />
<span id="more-131"></span></p>
<p>First of all, youll need the full installer disc 1 for Lenny. We need one of the .deb&#8217;s thats on that CD. The netinst or businesscard installers will not work with the steps below. (NOTE: If you want to be really creative, I suppose you could download the .deb from within the installer system via wget or even get it before hand and put it on a thumb drive or something&#8230;)</p>
<p>1. Boot the lenny install cd just to the point that /cdrom is mounted. For me, that was the &#8220;Configure the network&#8221; prompt.</p>
<p>2. Switch to a virtual terminal by hitting alt-f2.</p>
<p>3. Unpack the .deb that contains the aoe kernel module</p>
<pre class="brush: plain;">udpkg --unpack /cdrom/pool/main/l/linux-2.6/linux-image-2.6.26-2-486_2.6.26-15_i386.deb</pre>
<p>4. Now load the kernel module we just unpacked</p>
<pre class="brush: plain;">insmod /lib/modules/2.6.26-2-486/kernel/drivers/block/aoe/aoe.ko</pre>
<p>5. Now we create a symlink to the AoE block device. Its important that we use a symlink at this point in the process since it will cause the installer to set up a lot of paths in the correct fashion.</p>
<pre class="brush: plain;">ln -s /dev/etherd/eX.X /dev/sda</pre>
<p>6. Hit alt-f1 and continue with installation until the point that grub fails to install, then switch back to terminal 2.</p>
<p>7. Now chroot into the mostly installed system</p>
<pre class="brush: plain;">chroot /target</pre>
<p>8. Install the aoetools package</p>
<pre class="brush: plain;">apt-get -y install aoetools</pre>
<p>9. Listed below are two scripts that youll need to get into your chroot environment. They are used to build a new custom initrd image. I find it easiest to put them on a webserver somewhere and wget them.</p>
<pre class="brush: plain;">wget http://www.pixelchaos.net/aoe/hooks/aoetools -O /etc/initramfs-tools/hooks/aoetools
chmod 755 /etc/initramfs-tools/hooks/aoetools
wget http://www.pixelchaos.net/aoe/scripts/local-top/aoetools -O /etc/initramfs-tools/scripts/local-top/aoetools
chmod 755 /etc/initramfs-tools/scripts/local-top/aoetools</pre>
<p>10. Now we regenerate the initrd image</p>
<pre class="brush: plain;">update-initramfs -u -k all</pre>
<p>11. Now that all the above is done, we can manually install grub &#8211; which the debian installer was unable to do.</p>
<pre class="brush: plain;">apt-get -y install grub
mkdir /boot/grub
cp -r /usr/lib/grub/i386-pc/. /boot/grub</pre>
<p>12. Now we can remove our old symlink and create some hard links instead. This is required to allow grub to modify the MBR of our block device.</p>
<pre class="brush: plain;">rm /dev/sda
ln /dev/etherd/eX.X /dev/sda
ln /dev/etherd/eX.Xp1 /dev/sda1
ln /dev/etherd/eX.Xp5 /dev/sda5</pre>
<p>13. Now update grubs device.map</p>
<pre class="brush: plain;">echo &quot;(hd0) /dev/sda&quot; &gt; /boot/grub/device.map</pre>
<p>14. Now run &#8220;grub&#8221; and issue the following commands</p>
<pre class="brush: plain;">device (hd0) /dev/sda
root (hd0,0)
setup (hd0)</pre>
<p>15. Now issue the &#8220;update-grub&#8221; command which will generate /boot/grub/menu.lst for the first time</p>
<p>16. Remember that grub has been using the hard link to /dev/sda this whole time. Change the root=/dev/sda1 line in /boot/grub/menu.lst to root=/dev/etherd/eX.Xp1, then issue the &#8220;update-grub&#8221; command again to commit the change to the boot stanzas.</p>
<p>17. For the sake of completeness, lets change grubs device.map back to its original contents.</p>
<pre class="brush: plain;">echo &quot;(hd0) /dev/etherd/eX.X&quot; &gt; /boot/grub/device.map</pre>
<p>18. Get out of the chroot environment by issuing the &#8220;exit&#8221; command.</p>
<p>19. Go back to installer and choose &#8220;Continue without boot loader&#8221; to finish the installation and reboot the computer.</p>
<p>Thats it! Assuming you have properly set up a gPXE boot environment everything should &#8220;just work&#8221;. Setting up gPXE is outside the scope of this document but some good notes are available on the etherboot web site (see the links below).</p>
<p>Please note that there is one caveat to this whole process. Grub does not support installation onto AoE block devices. There very well may be some issues the next time you try to upgrade your kernel if some post-inst script calls grub-install against your root device. Keep that in mind.</p>
<p><strong>aoe-hooks script</strong></p>
<pre class="brush: bash;">
#!/bin/sh

set  -e

PREREQ=&quot;&quot;

prereqs()
{
echo &quot;$PREREQ&quot;
}

case $1 in
prereqs)
prereqs
exit 0
;;
esac

. /usr/share/initramfs-tools/hook-functions

[ -x /sbin/aoe-discover ] &amp;&amp; copy_exec /sbin/aoe-discover /sbin
manual_add_modules aoe
</pre>
<p><strong>aoe-local-top script</strong></p>
<pre class="brush: bash;">
#!/bin/sh

set -e

PREREQ=&quot;udev&quot;

prereqs()
{
echo &quot;$PREREQ&quot;
}

case $1 in
prereqs)
prereqs
exit 0
;;
esac

case $ROOT in
/dev/etherd/e*)
INTERFACES=`sed -ne '/eth.*:/{s/:.*$//;p;}' &lt; /proc/net/dev`
for i in $INTERFACES; do
echo Bringing up interface $i for AoE
ifconfig $i up
done
# Make sure udev has processed all events from adding the NIC
# modules before loading aoe
[ -x /sbin/udevsettle ] &amp;&amp; /sbin/udevsettle --timeout=30
modprobe aoe
# Wait until aoe device files have been generated.
[ -x /sbin/udevsettle ] &amp;&amp; /sbin/udevsettle --timeout=30
aoe-discover
;;
esac
</pre>
<p><strong>references</strong><br />
<a href="http://maht0x0r.blogspot.com/2009/03/installing-debian-5-for-migration-to.html" target="_blank">http://maht0&#215;0r.blogspot.com/2009/03/installing-debian-5-for-migration-to.html</a><br />
<a href="http://www.etherboot.org/wiki/sanboot/debian_and_ubuntu" target="_blank">http://www.etherboot.org/wiki/sanboot/debian_and_ubuntu</a><br />
<a href="http://mike.neir.org/wiki/articles/AoE_Root_Fedora" target="_blank">http://mike.neir.org/wiki/articles/AoE_Root_Fedora</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.pixelchaos.net/2009/05/25/install-debian-directly-onto-an-aoe-root-filesystem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coraid Odyssey: Part 5 (AoE vs iSCSI)</title>
		<link>http://www.pixelchaos.net/2008/04/16/coraid-odyssey-part-5-aoe-vs-iscsi/</link>
		<comments>http://www.pixelchaos.net/2008/04/16/coraid-odyssey-part-5-aoe-vs-iscsi/#comments</comments>
		<pubDate>Wed, 16 Apr 2008 19:40:53 +0000</pubDate>
		<dc:creator>jcl</dc:creator>
				<category><![CDATA[AoE]]></category>
		<category><![CDATA[Debian]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[IOS]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[iSCSI]]></category>

		<guid isPermaLink="false">http://www.pixelchaos.net/?p=51</guid>
		<description><![CDATA[The next phase of this project is choosing AoE or iSCSI. The debate on the relative merits of each protocol continues to rage on the Internet but in my particular case the criteria are pretty simple; which one performs better without causing excessive system load? Just from reading about the two protocols I am already [...]]]></description>
			<content:encoded><![CDATA[<p>The next phase of this project is choosing AoE or iSCSI. The debate on the relative merits of each protocol continues to rage on the Internet but in my particular case the criteria are pretty simple; which one performs better without causing excessive system load? Just from reading about the two protocols I am already leaning toward iSCSI for the simple fact that I can use all my TCP/IP management tools (routing, NAT, firewalling, etc.) on every iSCSI device. The only (potential) drawback is CPU load on the involved systems since it has to calculate TCP checksums for all those packets. Yes, there are many, many other advantages of one protocol over the other. No, they don&#8217;t matter to me in this scenario :-) So here we go!</p>
<p><span id="more-51"></span><br />
In keeping with my character, the first thing I did was start all over again from scratch by reinstalling the operating system. This time around I set up /dev/md0 as /boot (255 MB) and /dev/md1 as an LVM physical volume (the remainder of the disk), within which /, /home, /usr and friends reside as logical volumes. Its something I&#8217;ve wanted to start doing with all my systems for a long time now and shouldn&#8217;t have any bearing on the performance tests we are about to do.</p>
<p>Regardless of which protocol will be used we need to enable jumbo frames on all the involved devices. For my setup that means the target (stor01), the initiator (node02), and the switch (a Cisco Catalyst 2970).</p>
<p>First, we turn on jumbo frames for gigabit ethernet at the switch. Beware that this requires a reset (aka reboot) of the switch to take effect:</p>
<p><code type="text"><br />
c2970# system mtu jumbo 9000<br />
</code></p>
<p>Now we enable an MTU of 9000 on both the target and the initiator:</p>
<p><code type="text"><br />
root@stor01:~# ifconfig bond0 mtu 9000<br />
</code></p>
<p><code type="text"><br />
root@node02:~# ifconfig eth0 mtu 9000<br />
</code></p>
<p>For the sake of comparison, here is an iperf test done between the target and initiator with the standard MTU of 1500, and then with an MTU of 9000:</p>
<p><code type="text"><br />
root@stor01:~# iperf -s<br />
------------------------------------------------------------<br />
Server listening on TCP port 5001<br />
TCP window size: 1.00 MByte (default)<br />
------------------------------------------------------------<br />
[  4] local 65.171.150.4 port 5001 connected with 65.171.150.161 port 58731<br />
[  4]  0.0-10.0 sec    780 MBytes    654 Mbits/sec<br />
[  5] local 65.171.150.4 port 5001 connected with 65.171.150.161 port 58732<br />
[  5]  0.0-10.0 sec    916 MBytes    768 Mbits/sec<br />
</code></p>
<p>As you can see, just enabling jumbo frames produces a raw throughput increase of 17.43%. Nothing to sneeze at.</p>
<p>At this point I tried enabling flow control on the catalyst switch (it is already enabled for both send and receive by default in the e1000 driver) but it did not have any effect on iperf numbers. I turned it back off for now.</p>
<p>So now we set up a 20GB LVM volume on the target and export it using vblade to be mounted on the initiator. We then run a simple dd test to check throughput:</p>
<p><code type="text"><br />
root@node02:~# dd if=/dev/zero of=/mnt/test oflag=direct bs=4M<br />
419+0 records in<br />
419+0 records out<br />
1757413376 bytes (1.8 GB) copied, 133.476 seconds, 13.2 MB/s<br />
</code></p>
<p>CPU load on the target was 10-15% during the dd operation. Now we try writing direct to the (unmounted) block device to rule out any performance penalties of the filesystem itself&#8230;</p>
<p><code type="text"><br />
root@node02:~# dd if=/dev/zero of=/dev/etherd/e0.1 oflag=direct bs=4M<br />
513+0 records in<br />
512+0 records out<br />
2147483648 bytes (2.1 GB) copied, 170.991 seconds, 12.6 MB/s<br />
</code></p>
<p>CPU usage was slightly higher in that test, running 15-20%. So some slight difference but nothing to be too concerned about.</p>
<p>Now we take that same LVM device and share it via iSCSI for the same dd tests:</p>
<p><code type="text"><br />
ladmin@node02:~$ dd if=/dev/zero of=/mnt/test oflag=direct bs=4M<br />
462+0 records in<br />
461+0 records out<br />
1933574144 bytes (1.9 GB) copied, 38.2375 seconds, 50.6 MB/s<br />
</code></p>
<p>CPU load was 6-8% during that test. We also run that same test with flow control enabled at the switch:</p>
<p><code type="text"><br />
ladmin@node02:~$ dd if=/dev/zero of=/mnt/test oflag=direct bs=4M<br />
463+0 records in<br />
462+0 records out<br />
1937768448 bytes (1.9 GB) copied, 38.1851 seconds, 50.7 MB/s<br />
</code></p>
<p>Essentially the same&#8230;</p>
<p>Now this raises the question of why AoE is so much slower than iSCSI on an essentially default install of Debian Etch. To AoE&#8217;s credit, many people report getting just as good (50MB/s or better) of performance from AoE on their systems as I&#8217;m seeing with iSCSI. I spent quite a large amount of time playing with flow control, kernel ring buffer values, filesystem options, etc. and was unable to determine why performance is so terrible for me. I did find a pretty high number (half a dozen at least) of recent posts to the AoE mailing list by other people having essentially identical problems so I&#8217;m certainly not alone. In the interest of completing my testing, I&#8217;ve decided to move forward with iSCSI.</p>
<p>Now we try reformatting with the stride option to mkfs:</p>
<p><code type="text"><br />
mkfs.ext3 -E stride=16<br />
</code></p>
<p>The results of several more tests are shown here&#8230;</p>
<p><code type="text"><br />
root@node02:~# dd if=/dev/zero of=/mnt/test oflag=direct bs=4M<br />
2038431744 bytes (2.0 GB) copied, 41.8938 seconds, 48.7 MB/s<br />
2038431744 bytes (2.0 GB) copied, 42.2633 seconds, 48.2 MB/s<br />
2038431744 bytes (2.0 GB) copied, 41.3756 seconds, 49.3 MB/s<br />
</code></p>
<p>So we don&#8217;t see any appreciable difference when using a combination of the stride= option and flow control, at least with a simple dd test.</p>
<p>Next we turn flow control back off, and reformat again without the stride= option. We are now back to our baseline setup for a new test with bonnie++.</p>
<p><code type="text"><br />
ladmin@node02:~$ /usr/sbin/bonnie++ -d /mnt -s 4096Mb -n 10 -x 5 -q<br />
</code></p>
<p>This test produced block writes of about 82MB/s and block reads of about 37MB/s. The cause for the difference in write speed between the dd and bonnie++ tests is still unclear to me. There also appears to be a known issue where writes are much faster than reads which is apparently due to interrupt handling. This is further evidenced by running a quick dd test that does a read instead of a write:</p>
<p><code type="text"><br />
ladmin@node02:~$ dd if=/mnt/test of=/dev/null bs=4M<br />
3602907136 bytes (3.6 GB) copied, 96.2053 seconds, 37.5 MB/s<br />
3602907136 bytes (3.6 GB) copied, 90.5524 seconds, 39.8 MB/s<br />
3602907136 bytes (3.6 GB) copied, 90.0425 seconds, 40.0 MB/s<br />
3602907136 bytes (3.6 GB) copied, 88.5708 seconds, 40.7 MB/s<br />
</code></p>
<p>As you can see, read operations are about 20% slower than write operations which goes against common thinking with regard to stripped disk arrays.</p>
<p>So there you have it. In my particular situation, with no tuning/optimizing done, iSCSI performs much better than AoE. Even in the event that I were to go to the trouble to performance tune AoE and get it as good as, or even better than, iSCSI I would still be inclined to standardize around iSCSI. Authentication, routing, NAT, etc. can all be done very easily on iSCSI protocol with all the standard TCP/IP tools that are out there. For me that&#8217;s a pretty big advantage.</p>
<p>Next up will be our final piece of the puzzle, left over from the initial system setup &#8211; getting hot swap working with the sata_mv module!</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pixelchaos.net/2008/04/16/coraid-odyssey-part-5-aoe-vs-iscsi/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Coraid Odyssey: Part 3 (performance testing)</title>
		<link>http://www.pixelchaos.net/2008/04/08/coraid-odyssey-part-3/</link>
		<comments>http://www.pixelchaos.net/2008/04/08/coraid-odyssey-part-3/#comments</comments>
		<pubDate>Tue, 08 Apr 2008 16:15:46 +0000</pubDate>
		<dc:creator>jcl</dc:creator>
				<category><![CDATA[AoE]]></category>
		<category><![CDATA[Debian]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[Xen]]></category>
		<category><![CDATA[iSCSI]]></category>

		<guid isPermaLink="false">http://www.pixelchaos.net/?p=48</guid>
		<description><![CDATA[Performance and failure testing are next up in building our kickin&#8217; iSCSI/AoE device.
The Debian Etch installer supports building and installing onto software RAID arrays. Because of that&#8230;
during installation I configured the initial RAID1 boot volume with hot spare, consisting of three WD 160GB SATA 3Gbps disks. mdadm sees / and swap as /dev/md0 and /dev/md1 [...]]]></description>
			<content:encoded><![CDATA[<p>Performance and failure testing are next up in building our kickin&#8217; iSCSI/AoE device.</p>
<p>The Debian Etch installer supports building and installing onto software RAID arrays. Because of that&#8230;</p>
<p><span id="more-48"></span>during installation I configured the initial RAID1 boot volume with hot spare, consisting of three WD 160GB SATA 3Gbps disks. mdadm sees / and swap as /dev/md0 and /dev/md1 respectively. There was some remaining space on the drives which I set up as /dev/md2 for future use. The remaining arrays I decided to create manually using mdadm after getting a usable system up and running.</p>
<p>First of all we need figure out what devices we want in the arrays by probing /dev. The ultimate goal here is to build three arrays: 1 x RAID1 with 1 hot spare (8GB root, 1GB swap, 151GB extra), 1 x RAID6 with 1 hot spare (4TB Xen LVM&#8217;s + 2TB Bacula LVM), 1 x RAID5 (2TB offsite mirror).</p>
<p>I was forward thinking enough to label all the drive carriers with the serial number of the disk in it so all we need to do is get the disk-by-id name from /dev/disk/by-id/ and then build our test array like so:</p>
<p><code type="text"><br />
mdadm --create /dev/md3 --level=6 --raid-devices=8 \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ09HZR \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ09JV3 \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ06913 \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ08XHB \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ09KNM \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ09JHQ \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ0817T \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ09B9P \<br />
--spare-devices=1 \<br />
/dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ07CF3<br />
</code></p>
<p>This assembles our /dev/md3 device with the default options for RAID6 (64k chunk, left-symmetric parity)</p>
<p><code type="text"><br />
stor01:~# cat /proc/mdstat<br />
Personalities : [raid1] [raid6] [raid5] [raid4]<br />
md3 : active raid6 sdl[8](S) sdk[7] sdj[6] sdf[5] sdh[4] sdg[3] sdi[2] sde[1] sdd[0]<br />
5860574976 blocks level 6, 64k chunk, algorithm 2 [8/8] [UUUUUUUU]<br />
[>....................]  resync =  0.8% (8790864/976762496) finish=217.3min speed=74238K/sec<br />
</code></p>
<p>Here you can see that CPU load is moderate doing the first full sync of that array:</p>
<p><code type="text"><br />
Cpu0  :  0.0%us, 33.3%sy,  0.0%ni, 63.3%id,  0.0%wa,  1.3%hi,  2.0%si,  0.0%st<br />
Cpu1  :  0.0%us,  7.0%sy,  0.0%ni, 91.0%id,  0.0%wa,  0.3%hi,  1.7%si,  0.0%st<br />
</code></p>
<p>So, we wait about 4 hours&#8230; and its done :-)</p>
<p>For formatting the volume, I decided to use the stride= option to mkfs.ext3. This provides for optimal striping across a raid array. The secret here is to make $stride = $chunks / $block_size. In our case, thats 4096 byte (4k) blocks divided by 64k chunks. So 16 would be our optimal stride value.</p>
<p><code type="text"><br />
root@stor01:~# mkfs.ext3 -E stride=16 /dev/md3<br />
</code></p>
<p>So how does it perform? Well enough to saturate the dual gigabit NIC ports &#8211; which is all that matters :-)</p>
<p><code type="text"><br />
root@stor01:~# dd bs=4M if=/dev/zero of=/dev/md3<br />
4469+0 records in<br />
4469+0 records out<br />
18744344576 bytes (19 GB) copied, 97.583 seconds, 192 MB/s<br />
</code></p>
<p>I also ran a more real-world test with bonnie++</p>
<p><code type="text"><br />
/usr/sbin/bonnie++ -d /mnt -s 4096Mb -n 100 -x 10 -q<br />
</code></p>
<p>This test showed approximately 335MB/s read, 170MB/s write. You can download the actual data <a href="http://www.pixelchaos.net/?attachment_id=49">here</a> if you wish.</p>
<p>For failure testing the array, its as simple as removing and inserting disks while the array is up and running. Both tests work fine for the onboard SATA II controller but alas, the sata_mv kernel module does not yet support hotplug so all we can do on the remaining drives is simulate a drive failure by removing one disk. This does work fine but I need to see if there is a way to refresh the SATA bus to get any replacement drive to show up and be added back into the running array. Otherwise, we will need to power down the array to replace a faulty disk which kind of ruins the whole project, dont you think? ;-)</p>
<p>So while we wait for sata_mv to start working (or find a different SATA controller to use in this project) we will move on to the remaining issues. Up next; getting port trunking working with a cisco switch&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pixelchaos.net/2008/04/08/coraid-odyssey-part-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coraid Odyssey: Part 2 (sata_mv hotplug)</title>
		<link>http://www.pixelchaos.net/2008/04/01/coraid-odyssey-part-2/</link>
		<comments>http://www.pixelchaos.net/2008/04/01/coraid-odyssey-part-2/#comments</comments>
		<pubDate>Tue, 01 Apr 2008 15:33:28 +0000</pubDate>
		<dc:creator>jcl</dc:creator>
				<category><![CDATA[AoE]]></category>
		<category><![CDATA[Debian]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[Xen]]></category>
		<category><![CDATA[iSCSI]]></category>

		<guid isPermaLink="false">http://www.pixelchaos.net/?p=47</guid>
		<description><![CDATA[Todays adventure with building a SAN on the cheap involves attempting to get hotplug working and changing device mappings.
First of all, hotplug. I have discovered that&#8230;
hotplug doesn&#8217;t seem to work on the Supermicro AOC-SAT2-MV8 cards. The controllers do work, but unless a drive is plugged in during bootup they will not be subsequently detected&#8230;

root@stor01:~# lspci
05:01.0 [...]]]></description>
			<content:encoded><![CDATA[<p>Todays adventure with building a SAN on the cheap involves attempting to get hotplug working and changing device mappings.</p>
<p>First of all, hotplug. I have discovered that&#8230;</p>
<p><span id="more-47"></span>hotplug doesn&#8217;t seem to work on the Supermicro AOC-SAT2-MV8 cards. The controllers do work, but unless a drive is plugged in during bootup they will not be subsequently detected&#8230;</p>
<p><code type="text"><br />
root@stor01:~# lspci<br />
05:01.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)<br />
05:02.0 SCSI storage controller: Marvell Technology Group Ltd. MV88SX6081 8-port SATA II PCI-X Controller (rev 09)<br />
</code></p>
<p>You can see that the Linux kernel sees these cards as Marvell MV88SX6081 devices. Further probing reveals that the sata_mv driver is claiming them:</p>
<p><code type="text"><br />
root@stor01:~$ dmesg | grep 05:01.0<br />
sata_mv 0000:05:01.0: version 0.7<br />
ACPI: PCI Interrupt 0000:05:01.0[A] -> GSI 24 (level, low) -> IRQ 233<br />
sata_mv 0000:05:01.0: 32 slots 8 ports SCSI mode IRQ via INTx<br />
</code></p>
<p>So off we go <a href="http://lkml.org">lkml.org</a> and other sources to find out why&#8230; Basically, hotplug support in the sata_mv module is still a work in progress. A <a href="http://lkml.org/lkml/2007/12/12/143">post</a> to the lkml claims that its slated for inclusion in the 2.6.26 kernel. There is also more information <a href="http://linux-ata.org/driver-status.html">here</a>.</p>
<p>So hotplug isnt going to work for us today :-( On the bright side, 2.6.25-rc7 is the current development kernel so I can&#8217;t imagine that 2.6.26 is too far off.</p>
<p>The second (perceived) issue of getting all the hot swap bays to be mapped to /dev/sd* entries in a particular order doesnt seem to be as much of an issue as I&#8217;d thought.</p>
<p>The Linux kernel will initialize PCI devices in a strange fashion depending on what is plugged in to the bus. For example, if all fifteen bays have drives plugged into them, the leftmost bay always shows up at /dev/sda. Unplug just one drive and reboot the machine and it will be mapped to /dev/sdl for example. The reason this isnt really an issue is that newer Linux kernels have not only udev (which is great but not really a solution for this particular problem) but the /dev/disk/by-* tree. You can find all your disks in there with handy names like /dev/disk/by-id/scsi-SATA_ST31000340NS_5QJ06913. That name belongs to that particular drive and only that particular drive, ever.</p>
<p><code type="text"><br />
root@stor01:~# ls -l /dev/disk/by-id/ | grep scsi | grep -v part<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ06913 -> ../../sde<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ07CF3 -> ../../sdi<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ0817T -> ../../sdb<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ08T21 -> ../../sdh<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ08XHB -> ../../sdc<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09B9P -> ../../sda<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09GMW -> ../../sdd<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09HZR -> ../../sdg<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09JHQ -> ../../sdk<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09JV3 -> ../../sdj<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09KNM -> ../../sdl<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_ST31000340NS_5QJ09LLA -> ../../sdf<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_WDC_WD1600AAJS-_WD-WCAP93952468 -> ../../sdm<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_WDC_WD1600AAJS-_WD-WMAS20341446 -> ../../sdo<br />
lrwxrwxrwx 1 root root  9 2008-04-01 11:15 scsi-SATA_WDC_WD1600AAJS-_WD-WMAS20366839 -> ../../sdn<br />
</code></p>
<p>Cool, huh?</p>
<p>So we will just let mdadm do its job of assembling /dev/md* devices on the fly at bootup and use the /dev/disk/by-id entries to do stuff to particular disks (like temperature monitoring, etc.).</p>
<p>Coming up in my next post &#8211; performance and failure testing the array.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pixelchaos.net/2008/04/01/coraid-odyssey-part-2/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coraid Odyssey: Part 1 (building the chassis)</title>
		<link>http://www.pixelchaos.net/2008/03/28/coraid-odyssey-part-1/</link>
		<comments>http://www.pixelchaos.net/2008/03/28/coraid-odyssey-part-1/#comments</comments>
		<pubDate>Fri, 28 Mar 2008 21:17:03 +0000</pubDate>
		<dc:creator>jcl</dc:creator>
				<category><![CDATA[AoE]]></category>
		<category><![CDATA[Debian]]></category>
		<category><![CDATA[Hardware]]></category>
		<category><![CDATA[HowTo]]></category>
		<category><![CDATA[Xen]]></category>
		<category><![CDATA[iSCSI]]></category>

		<guid isPermaLink="false">http://www.pixelchaos.net/index.php/archives/46</guid>
		<description><![CDATA[AoE (ATA over Ethernet) and iSCSI are the hot new things. Xen is the hot new thing. I like using hot new things as long as they can be made rock solid.
There happens to be a company (Coraid) that makes a turnkey AoE device. Its far cheaper than a true fibre channel SAN or something [...]]]></description>
			<content:encoded><![CDATA[<p>AoE (ATA over Ethernet) and iSCSI are the hot new things. <a href="http://xen.org">Xen</a> is the hot new thing. I like using hot new things as long as they can be made rock solid.</p>
<p>There happens to be a company (<a href="http://www.coraid.com">Coraid</a>) that makes a turnkey AoE device. Its far cheaper than a true fibre channel SAN or something similar. Perfect for setting up a SAN over Ethernet device that can serve Xen domU filesystems out to &#8220;thin&#8221; dom0&#8217;s on the network.</p>
<p>Well that&#8217;s all well and good but you see I&#8217;m always looking to save a buck&#8230;</p>
<p><span id="more-46"></span> So I asked myself; why not build my own &#8220;coraid&#8221; with off the shelf parts and save 50% in the process? Herein I&#8217;ll do my best to chronicle my adventures in getting this thing built, tested, and (hopefully) deployed.</p>
<p>First of all the hardware. Thanks to the ever-brilliant <a href="http://mike.neir.org">Mike Neir</a> for helping me figure out what Coraid uses to build these suckers. Here are the parts I ended up ordering:</p>
<ul>
<li> 2 x Supermicro 8-Port SATA Card (AOC-SAT2-MV8)</li>
<li> 1 x Supermicro Xeon Dual-Core Blackford VS ServerBoard (X7DVL-E)</li>
<li> 1 x Supermicro Black 3U Rackmount Case 760W (SC933T-R760B)</li>
<li> 1 x Kingston 2GB 240-Pin DDR2 FB-DIMM ECC (KVR667D2D8F5K2/2G)</li>
<li> 1 x Intel Xeon 5130 Woodcrest 2.0GHz 4M (BX805565130A)</li>
<li> 15 x Seagate Barracuda ES.2 1TB (ST31000340NS)</li>
<li> 3 x Supermicro Drive Carrier (CSE-PT39B)</li>
<li> 3 x Western Digital 7K 8M SATA2 160GB (WD1600AAJS)</li>
</ul>
<p>Now you may ask, why buy 18 drives for a chassis that has 15 hot swap bays? Well, I&#8217;ll be using the three WD drives as a RAID1 boot device with one hot spare, then setting nine of the 1TB drives up into a RAID6 device with one hot spare. The remaining three drives will be in RAID5 and swapped with the last three drives on a regular basis to go off site as part of a hard disk based backup system.</p>
<p>Now that we have all of that out of the way; how did the assembly go? Flawless almost ;-)</p>
<p>There are two gotcha&#8217;s to be dealt with in the assembly of the aforementioned motherboard and server chassis; first of all, the EPS12V connector for CPU power that comes out of the power distribution block wont reach its molex connector on the motherboard. Second, the ribbon cable that connects the front panel controls to the their motherboard header just *barely* reaches which makes me uncomfortable.</p>
<p>To resolve the first problem I ordered up a StarTech 8&#8243; EPS 8 Pin Power Extension Cable (EPS8EXT) which did the job nicely. To fix the second issue, I ordered twenty feet of 16 conductor ribbon cable (might need extra!), some cable ends and a crimper. Problem solved.</p>
<p>With all that taken care of its time to get an operating system installed and test this thing out. I did go ahead and power up the system and test the redundant PSU&#8217;s and fans. The alarm features of both seem to work fine with audible alarms when any of them are removed from the system. After that I plugged an old CD-ROM into the ATA header on the mobo and installed a <a href="http://www.debian.org">rockin Linux distro</a>.</p>
<p>Some things I discovered:</p>
<ul>
<li>The BIOS and the Linux kernel both initialize the two SATA PCI cards first. Some fiddling will need to be done to make the three WD drives (plugged into the motherboard SATA headers) show up as sda, sdb, and sdc.</li>
<li>You must turn off &#8220;compatibility mode&#8221; in the BIOS for the onboard SATA controllers or hotplug will not work for drives on those headers. After turning it off, hotplug works just dandy. Yay!</li>
<li>Hotplug isnt working for the two PCI SATA cards at this point. I suspect its something to do with the sata_mv kernel module but havent gotten to look into it much. That will be the first order of business when next I work on this project.</li>
</ul>
<p>So thats it for now. Things are coming together nicely. My next post will detail getting hotplug working for the two PCI SATA cards and getting device detection and mapping working correctly in the Linux kernel (I smell some udev tomfoolery most likely).</p>
]]></content:encoded>
			<wfw:commentRss>http://www.pixelchaos.net/2008/03/28/coraid-odyssey-part-1/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
