LinHES Forums
http://forums.linhes.org/

Under high IO, my disks lose DMA and other settings.
http://forums.linhes.org/viewtopic.php?f=5&t=11276
Page 1 of 1

Author:  abrendel [ Thu Aug 24, 2006 7:53 pm ]
Post subject:  Under high IO, my disks lose DMA and other settings.

Alright, this problem has gotten progressivly worse. Before I simply was annoyed and lived with it, but now it threatens the longevity of Myth in the house....my wife doesn't have my patience with things like this.

The following issues are causing stuttering at best and often causes the frontend to lock up and has even caused the backend to freeze, causing a reset of my whole setup.

First my setup.

Backend:
KM-R5B7
Gigabyte 7N400 Pro MB with 4 disks.
hda 80gb OS drive
hdc 200gb Maxtor (LVM striped w/ hdd) /myth
hdd 200gb Maxtor (LVM striped w/ hdc) /myth
sda 250gb Sata /myth/video

2xHD5000 Tuners, 1xPVR500, 1xPVR150

Frontends:
2x "Almost-a-Dragon" for HD content
1xEpia M12000 based SD frontend

Under light load, everything works fantastic. However, under heavy load (recording/playing 2 HD shows at once) I lose DMA and other settings on either hdd or hdc and sometimes both. When I run hdparm, I normally see:

/dev/hdd:
multcount = 16 (on)
IO_support = 1 (32-bit)
unmaskirq = 1 (on)
using_dma = 1 (on)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 24792/255/63, sectors = 203928109056, start = 0

However when this problem occurs, I see:
/dev/hdd:
multcount = 0 (off)
IO_support = 0 (default 16-bit)
unmaskirq = 0 (off)
using_dma = 0 (off)
keepsettings = 0 (off)
readonly = 0 (off)
readahead = 256 (on)
geometry = 24792/255/63, sectors = 203928109056, start = 0

And when this happens, the following is logged to /var/log/messages.

Aug 21 00:27:17 mythbe01 kernel: ide: failed opcode was: unknown
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: status=0x51 { DriveReady SeekCom
plete Error }
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: error=0x84 { DriveStatusError Ba
dCRC }
Aug 21 00:27:18 mythbe01 kernel: ide: failed opcode was: unknown
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: status=0x51 { DriveReady SeekCom
plete Error }
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: error=0x84 { DriveStatusError Ba
dCRC }
Aug 21 00:27:18 mythbe01 kernel: ide: failed opcode was: unknown
Aug 21 00:27:18 mythbe01 kernel: hdc: DMA disabled
Aug 21 00:27:18 mythbe01 kernel: ide1: reset: success
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: status=0x51 { DriveReady SeekCom
plete Error }
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: error=0x84 { DriveStatusError Ba
dCRC }
Aug 21 00:27:18 mythbe01 kernel: ide: failed opcode was: unknown
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: status=0x53 { DriveReady SeekCom
plete Index Error }
Aug 21 00:27:18 mythbe01 kernel: hdd: dma_intr: error=0x84 { DriveStatusError Ba
dCRC }
Aug 21 00:27:18 mythbe01 kernel: ide: failed opcode was: unknown


If it was consistantly one drive that was having this issue, I would think it was hardware and replace the drive. However this is two drives and both are fairly new drives and have done this for awhile. My bandaid right now is to cron every five minutes the reseting of my hdparm settings.

0,5,10,15,20,25,30,35,40,45,50,55 * * * * /sbin/hdparm -qd1c1u1m16 /dev/hdc
2,7,12,17,22,27,32,37,42,47,52,57 * * * * /sbin/hdparm -qd1c1u1m16 /dev/hdd


Any help would be HUGELY appreciated.

Author:  Dak48 [ Fri Aug 25, 2006 6:26 am ]
Post subject: 

Oh, I've had errors like that before. You will not like the answer.

hdd is dying and is on its last legs. Replace it before it dies and you lose data.

Author:  abrendel [ Fri Aug 25, 2006 8:18 am ]
Post subject: 

Dak48 wrote:
Oh, I've had errors like that before. You will not like the answer.

hdd is dying and is on its last legs. Replace it before it dies and you lose data.


Normally I would agree with you. However, this is happening to both drives and both are fairly new. It just seems odd to me that both newish drives would be doing this. I was planning on adding more disk space anyhow so I guess I might as well try and replace them.

Author:  abrendel [ Fri Aug 25, 2006 9:14 am ]
Post subject: 

Alright, just ordered a new 500gb WD5000YS Sata II drive. This way I am eliminating the possibility of it being the IDE drives or the IDE controller on the MB. Hopefully I am not just throwing money at a problem, but I really need it fixed.

I'll set the other two drives up as a mirror and put low IO data on them such as music and pictures. That way they wont get hammered near as bad as the HD video and they will be mirrored so if one of the drives does finally bite it, I'll have the other still with the data safe.

Author:  Dak48 [ Fri Aug 25, 2006 9:19 am ]
Post subject: 

Sounds like a working plan to me.

I shudder when I see errors like that in my logs.

Author:  abrendel [ Fri Aug 25, 2006 1:12 pm ]
Post subject: 

Just found this post on the Myth mailing list archives. Sounds fairly similar to my issue. I'll have to check my ivtv logs and see if they are also giving DMA errors. But with the large number of pci cards I've got in my backend, I can definatly see how this could be the issue. Here was the post:





My backend was locking up constantly and it turned out to be an IRQ
problem. I could see what was happening because I have a split
front/back system and the back-end has only an 80x24 console display.
The symptoms were that the box would freeze during heavy I/O and then
I'd get a DMA error message from ivtv, a timeout message from the LAN
card and then a "drive not ready" error from /dev/hde. The solution in
my case was to shuffle the cards around (so that no card occupied the
PCI slot that was sharing interrupts with the on-board LAN or USB
devices) and using the BIOS to manually assign IRQs to each card.

I still get DMA errors from the ivtv card (a known problem with no known
solution) but at least the ivtv code doesn't block the IDE/ETH0
interrupts from getting through.

Author:  tjc [ Fri Aug 25, 2006 4:11 pm ]
Post subject: 

Oh, Hang on, a random neuron just fired. You know how the IVTV drivers complain about the low default latency of 32 set by the BIOS and then reset it to 64? You may need to do something like that for your disk controller... Otherwise the PVR card can hog the DMA...

Author:  abrendel [ Tue Sep 12, 2006 2:50 pm ]
Post subject: 

Actually, I discovered the problem last night.

I've got a 4in3 drive cage that I have these disks in 1 IDE OS disk, 1 Sata disk (/myth/video) and these 2 IDE drives in an LVM stripe (/myth/tv).

Well last night I had a hard lock up. First the tv locked up, then the whole server locked up and I couldn't even use it at the console. I went to turn off the system and for some reason decided to open up the box and look at the drives. When I pulled out one of the IDE drives, it was HOT. I immediatly turned the system on and looked into the 4in3 cage and sure as can be, the 120mm fan was not running anymore.

THe problem is that my drives are overheating...it's got to be the problem. When they record/play multiple HD streams as I described in the OP, the drives are generating a massive amount of heat that is also overheating the OS drive as well.

I bought a new drive to replace this one, but I've been battling other sound issues on a dragon system that is soon to be my Master BE instead of this older system. I wonder how much life I've sucked from these disks.

Author:  gigakev [ Tue Sep 12, 2006 5:34 pm ]
Post subject: 

I have the same errors, but the system seems to work fine.

The motherboard also has DMA issues under Win2k and XP, it always has. Not sure what chipset your DMA controller is, but I have an old Abit Socket A Athlon with the Via KT-133 chipset. There was one version of the old Via 4-in-1 drivers that seemed to make 2k happy, but i just live with it most of the time.

Author:  Liv2Cod [ Tue Sep 12, 2006 7:43 pm ]
Post subject: 

Ah, the smell of burning drives in the morning...

Drives do not do too well with overheating, as I have discovered to my detriment more than once. I would not depend on your fried drives to last too long -- they are probably hanging on to life with one foot in the grave and the other on a banana peel.

You might look into that smartmon package that can monitor drive temps and fire off a message if they look hot.

Page 1 of 1 All times are UTC - 6 hours
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/