View unanswered posts    View active topics

All times are UTC - 6 hours





Post new topic Reply to topic  [ 16 posts ] 
Go to page 1, 2  Next

Print view Previous topic   Next topic  
Author Message
Search for:
PostPosted: Mon Oct 26, 2009 10:48 pm 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
In a recent thread, http://knoppmyth.net/phpBB2/viewtopic.php?t=20178, I described how my R5.5 system was randomly rebooting. The reboots always seemed to happen during HD playback and/or recording but they weren't repeatable.

After some messing around I decided to go with a clean install of R6. As a reminder, my hardware is:

Gigabyte X48-DS5 MB with Intel Quad Core 2.66MHZ
4GB RAM (3.3 usable)
NVIDIA GeForce 9800GTX
PVR 250
PVR 150
HDHomerun (on network)
WD 1TB SATA HD
Seagate 250GB PATA HD
Samsung 500GB PATA HD

I installed R6 on Friday afternoon and everything seemed to go well until Sunday afternoon when I noticed I wasn't able to connect to Mythweb. When I turned on the TV, the screen was frozen at a time several hours in the past and the caps lock and scroll lock lights were flashing. I had to reset the machine because I couldn't get to a command line or restart X.

Several hours later it happened again, then happened 2 or 3 times in rapid succession. At the time I was recording one HD program which was flagging commercials and watching another HD program. Each time the machine completely froze and the keyboard lights were blinking.

Let me answer the obvious suggestions before they are made: I ran memtest for 3 days last month and got no errors. All 3 hard drives are in hard drive coolers and never reach 40 C when I check hddtemp. The 4 processors never drop below 95% idle even when recording 2 HD programs and watching a third at the same time.

Two differences from my 5.5 install are that I left on the option to start flagging commercials immediately and allowed a max of 2 jobs at a time.

Today I ran stress (from the Stress Linux CD) on it for 30 minutes and didn't get an error...I'm trying to find a good amount of time to run stress because the Stress Linux documentation is thin.

I also downloaded Inquisitor and ran its readonly benchmark test with no errors.

I've searched the logs and found nothing useful...just a last unrelated entry before the lockup followed by the next entry when I restart. Of course there are so many logs to check that it's likely I've missed an error if one was logged.

Since my WAF is zero with the machine up in my office being tested, I'm going to re-install it in the living room and keep a terminal open running top on another machine. My hope is that I'll catch some wind of what's happening when the computer freezes.

After 5 years with Myth this is the first time I've seriously considered giving up, but I know even though my computer appears to be acting randomly there must be a specific issue causing this problem. If anyone can help narrow down the cause I'd appreciate it. In the meantime I'll be reading through all the R6 posts in the forum in case there's something that can help me here already.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Oct 26, 2009 11:26 pm 
Offline
Site Admin
Joined: Fri Sep 19, 2003 6:37 pm
Posts: 2659
Location: Whittier, Ca
I think logs would be needed to even start to chase this down... I'd tail /var/log/messages.log


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 27, 2009 10:34 am 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
cecil wrote:
I think logs would be needed to even start to chase this down... I'd tail /var/log/messages.log


Here's a snippet from messages.log. It starts midway through the bootup sequence at 21:35 and continues until the last message at 21:55. It stopped somewhere around this time and I powered down the machine to bring it upstairs for testing. I restarted it at 11:18 this morning. If you want I can post a link to a longer section (or the whole file) which follows it through several reboots on 25 October. I don't see anything interesting in there but maybe someone else will.

Code:
Oct 25 21:35:32 mythtv lo: Disabled Privacy Extensions
Oct 25 21:35:32 mythtv ntpd[4417]: ntpd 4.2.4p5@1.1541-o Sat Feb  7 19:38:13 UTC 2009 (3)
Oct 25 21:35:32 mythtv mythfrontend:
Oct 25 21:35:32 mythtv mythfrontend: X.Org X Server 1.5.3
Oct 25 21:35:32 mythtv mythfrontend: Release Date: 5 November 2008
Oct 25 21:35:32 mythtv mythfrontend: X Protocol Version 11, Revision 0
Oct 25 21:35:32 mythtv mythfrontend: Build Operating System: Linux 2.6.26-ARCH i686
Oct 25 21:35:32 mythtv mythfrontend: Current Operating System: Linux mythtv 2.6.28-LinHES #1 SMP PREEMPT Mon Aug 17 05:38:57 UTC 2009 i686
Oct 25 21:35:32 mythtv mythfrontend: Build Date: 12 January 2009  10:25:53PM
Oct 25 21:35:32 mythtv mythfrontend:
Oct 25 21:35:32 mythtv mythfrontend:    Before reporting problems, check http://wiki.x.org
Oct 25 21:35:32 mythtv mythfrontend:    to make sure that you have the latest version.
Oct 25 21:35:32 mythtv mythfrontend: Markers: (--) probed, (**) from config file, (==) default setting,
Oct 25 21:35:32 mythtv mythfrontend:    (++) from command line, (!!) notice, (II) informational,
Oct 25 21:35:32 mythtv mythfrontend:    (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
Oct 25 21:35:32 mythtv mythfrontend: (==) Log file: "/var/log/Xorg.0.log", Time: Sun Oct 25 21:35:32 2009
Oct 25 21:35:32 mythtv mythfrontend: (==) Using config file: "/etc/X11/xorg.conf"
Oct 25 21:35:32 mythtv ntpd[4417]: precision = 1.000 usec
Oct 25 21:35:32 mythtv ntpd[4417]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled
Oct 25 21:35:32 mythtv ntpd[4417]: Listening on interface #1 lo, 127.0.0.1#123 Enabled
Oct 25 21:35:32 mythtv ntpd[4417]: Listening on interface #2 eth0, 192.168.2.11#123 Enabled
Oct 25 21:35:32 mythtv ntpd[4417]: kernel time sync status 0040
Oct 25 21:35:32 mythtv ntpd[4417]: frequency initialized 16.823 PPM from /etc/ntp.drift
Oct 25 21:35:33 mythtv mythfrontend: (EE) Failed to load module "type1" (module does not exist, 0)
Oct 25 21:35:33 mythtv ivtv 0000:06:00.0: firmware: requesting v4l-cx2341x-enc.fw
Oct 25 21:35:33 mythtv ivtv0: Loaded v4l-cx2341x-enc.fw firmware (376836 bytes)
Oct 25 21:35:33 mythtv ivtv0: Encoder revision: 0x02060039
Oct 25 21:35:33 mythtv cx25840 0-0044: firmware: requesting v4l-cx25840.fw
Oct 25 21:35:38 mythtv cx25840 0-0044: loaded v4l-cx25840.fw firmware (16382 bytes)
Oct 25 21:35:38 mythtv mythfrontend: (EE) config/hal: couldn't initialise context: (null) ((null))
Oct 25 21:35:38 mythtv mythfrontend: non-network local connections being added to access control list
Oct 25 21:35:38 mythtv mythfrontend: 127.0.0.1 being added to access control list
Oct 25 21:35:38 mythtv mythfrontend: Xlib:  extension "Generic Event Extension" missing on display ":0.0".
Oct 25 21:35:39 mythtv ivtv 0000:06:01.0: firmware: requesting v4l-cx2341x-enc.fw
Oct 25 21:35:39 mythtv ivtv1: Loaded v4l-cx2341x-enc.fw firmware (376836 bytes)
Oct 25 21:35:39 mythtv ivtv1: Encoder revision: 0x02060039
Oct 25 21:35:40 mythtv ntpd[4417]: synchronized to 204.9.54.119, stratum 1
Oct 25 21:35:40 mythtv ntpd[4417]: kernel time sync status change 0001
Oct 25 21:37:23 mythtv mythfrontend: GetModeLine - scrn: 0 clock: 148500
Oct 25 21:37:23 mythtv mythfrontend: GetModeLine - hdsp: 1920 hbeg: 2008 hend: 2052 httl: 2200
Oct 25 21:37:23 mythtv mythfrontend:               vdsp: 1080 vbeg: 1084 vend: 1089 vttl: 1125 flags: 5
Oct 25 21:37:23 mythtv mythfrontend: GetModeLine - scrn: 0 clock: 148500
Oct 25 21:37:23 mythtv mythfrontend: GetModeLine - hdsp: 1920 hbeg: 2008 hend: 2052 httl: 2200
Oct 25 21:37:23 mythtv mythfrontend:               vdsp: 1080 vbeg: 1084 vend: 1089 vttl: 1125 flags: 5
Oct 25 21:55:03 mythtv mythfrontend: GetModeLine - scrn: 0 clock: 148500
Oct 25 21:55:03 mythtv mythfrontend: GetModeLine - hdsp: 1920 hbeg: 2008 hend: 2052 httl: 2200
Oct 25 21:55:03 mythtv mythfrontend:               vdsp: 1080 vbeg: 1084 vend: 1089 vttl: 1125 flags: 5
Oct 25 21:55:03 mythtv mythfrontend: GetModeLine - scrn: 0 clock: 148500
Oct 25 21:55:03 mythtv mythfrontend: GetModeLine - hdsp: 1920 hbeg: 2008 hend: 2052 httl: 2200
Oct 25 21:55:03 mythtv mythfrontend:               vdsp: 1080 vbeg: 1084 vend: 1089 vttl: 1125 flags: 5
Oct 27 11:18:12 mythtv BIOS EBDA/lowmem at: 0009e800/0009e800
Oct 27 11:18:12 mythtv Linux version 2.6.28-LinHES (root@dev) (gcc version 4.3.3 (GCC) ) #1 SMP PREEMPT Mon Aug 17 05:38:57 UTC 2009
Oct 27 11:18:12 mythtv KERNEL supported cpus:
Oct 27 11:18:12 mythtv Intel GenuineIntel
Oct 27 11:18:12 mythtv AMD AuthenticAMD


Top
 Profile  
 
 Post subject:
PostPosted: Tue Oct 27, 2009 6:05 pm 
Offline
Joined: Wed Nov 16, 2005 8:55 pm
Posts: 1381
Location: Farmington, MI USA
Another hardware-related suggestion after reading your previous thread - Have you eliminated the NIC as a possibility? Seems like the lockups occur when you have network activity from the HDHR. Perhaps a cable/switch problem? The testing you mentioned earlier wouldn't have identified a network problem.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 28, 2009 10:24 am 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
slowtolearn wrote:
Another hardware-related suggestion after reading your previous thread - Have you eliminated the NIC as a possibility? Seems like the lockups occur when you have network activity from the HDHR. Perhaps a cable/switch problem? The testing you mentioned earlier wouldn't have identified a network problem.


I haven't looked at the NIC. I never considered that it could cause the computer to hang. I'll look into it, thanks.


Top
 Profile  
 
 Post subject:
PostPosted: Wed Oct 28, 2009 7:18 pm 
Offline
Joined: Wed Mar 07, 2007 9:51 am
Posts: 173
Location: Uniontown, PA
The blinking keyboard lights points to a kernel oops error. Most of the time it's memory related. Usually a parity or ecc error. I know you ran memtest, but sometimes real software stresses memory access very well, too.

If you have other memory to try out, please do so, or at least flip/flop what you have into different sockets. You could try to remove one dimm at a time, if your mobo supports it.

Go into the mobo's bios setup and see if you can reset the default 'timing' for memory access. If you're overclocking (doubtful) that can be a factor, too. Is a BIOS update (or downgrade) available?

Another thing to look at is see if any of the Caps on the mobo are failing. This is what caused my R6 testing system to fail on a regular basis last month. I had it repaired by 'badcaps.net.'

It's something hardware related...that locks the system hard. Timing and/or memory issues. Keep swapping hardware out until the problem goes away.

This isn't the best answer, but it's how I'd approach the issue.


Top
 Profile  
 
 Post subject:
PostPosted: Mon Nov 02, 2009 7:55 pm 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
After several days with no problems it started locking up again on Sunday (maybe it hates NFL Football). Some part of the machine must still be running in order to transmit the frozen mythfrontend screen to the TV, right? I can't use any keyboard commands and can't ssh in. I had an ssh session running top and when the computer died I didn't see any onscreen error, just a disconnect message.

Since so many people point to memory even though it tested good, I decided to order another 4GB pair today. It's DDR2-1200 vs the DDR2-800 I have now (MB supports 1200). I should have that in a day or two and hope it fixes the problem. In the meantime I've run LinHes upgrade and upgraded the HDHomerun firmware on the off chance one of those will magically fix the error. I've also considered upgrading the Nvidia drivers since I know that video card drivers can lead to all sorts of problems. I'm trying not to change everything at once because I want to figure out what fixed it.

While I'm waiting for the memory, can anyone offer tips on how to capture the kernel oops? I thought about downloading the software from kerneloops.org but don't know if that will help me since nothing else (log files, etc) seems to be written to disk when the system hangs.


Top
 Profile  
 
 Post subject:
PostPosted: Tue Nov 03, 2009 7:06 pm 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
BTW, here's what top looked like last night at the time of failure. Note that I had R5.5 reboots and R6 lockups before activating Folding@Home so I don't think that's the problem:

Code:
top - 22:07:50 up  2:38,  1 user,  load average: 1.56, 1.22, 1.14
Tasks: 193 total,   4 running, 185 sleeping,   0 stopped,   4 zombie
Cpu(s):  7.9%us, 10.7%sy, 27.4%ni, 53.9%id,  0.2%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3373088k total,  3274732k used,    98356k free,     3604k buffers
Swap:  3373640k total,     6752k used,  3366888k free,  2908768k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 5342 root      39  19 31260  19m  800 R   96  0.6 156:53.95 FahCore_78.exe
 5097 root      20   0  660m 127m 108m R   50  3.9   0:44.50 X
 5241 mythtv    20   0  395m 219m 112m S   13  6.7   1:16.52 mythfrontend
 4489 mythtv    20   0  277m  44m 4480 S    1  1.4   0:27.23 mythbackend
  810 root      15  -5     0    0    0 S    0  0.0   1:15.10 lirc_dev
  812 root      15  -5     0    0    0 S    0  0.0   1:00.65 lirc_dev
 5698 mythtv    20   0  2308 1096  820 R    0  0.0   0:00.90 top
    1 root      20   0   744   56   36 S    0  0.0   0:00.80 runit
    2 root      15  -5     0    0    0 S    0  0.0   0:00.00 kthreadd
    3 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/0
    4 root      15  -5     0    0    0 S    0  0.0   0:00.43 ksoftirqd/0
    6 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/1
    7 root      15  -5     0    0    0 S    0  0.0   0:00.06 ksoftirqd/1
    8 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/1
    9 root      RT  -5     0    0    0 S    0  0.0   0:00.00 migration/2
   10 root      15  -5     0    0    0 S    0  0.0   0:00.06 ksoftirqd/2
   11 root      RT  -5     0    0    0 S    0  0.0   0:00.00 watchdog/2


Top
 Profile  
 
 Post subject:
PostPosted: Sat Nov 07, 2009 7:46 am 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
Update: My new memory arrived yesterday, and 5 minutes after I installed it the video I was watching froze and the keyboard lights were flashing. So I think I've ruled out memory as a cause.

Could this possibly be video driver or xorg.conf related? I might try updating the Nvidia driver to see if it helps.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Nov 07, 2009 8:23 pm 
Offline
Joined: Wed Mar 07, 2007 9:51 am
Posts: 173
Location: Uniontown, PA
Sorry to hear that new memory is still causing you grief.

Try installing only 2GB of ram, and see if that still hangs. You have a DualCore CPU to try?

You're really at the point of swapping everything hardware-wise. I wouldn't make any software/driver changes yet.

IF R5.5 ran just fine...it's probably some odd kernel issue. I'm no so sure that trying a different kernel would help, since KM/LinHES/MythTV is so tightly tied to the kernel.

Can LinHES survive a changing to a different/newer kernel version? I know that KM can not. :(

In perspective....I have an OLD dual PentiumPro box that will NOT run any of the Linux 2.6 SMP kernels. Not sure, but since the mobo is 13 years old, it's not worth tracking down the issue. It works just fine with the 2.4 kernels. Sometimes 'giving up' saves your sanity...and a LOT of time! What else peeves me, is that you make a change, reboot, wait...make another change, reboot, wait. Most of my time is spent waiting for the reboots.


Top
 Profile  
 
 Post subject: Almost solved--I hope!
PostPosted: Wed Nov 11, 2009 8:33 pm 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
After much frustration I finally captured an error tonight. I don't know why I didn't think of this before, but I started an HD program then CTRL+ALT+F1 switched to the first screen to see if there was any output. I came back later and saw:

Code:
-CPU2: Machine Check Exception: 0000000000000005
CPU0: Machine Check Exception: 0000000000000004
CPU0: Bank 0: 3200004000000800
CPU0: Bank 5: 3200001004000e0f
Kernel Panic--not syncing: CPU Context corrupt
CPU2: Bank 0: 3200004000000800
CPU2: Bank 5: 3200001004000e0f
Kernel Panic--not syncing: CPU Context corrupt

(Hand copied so might be a misplaced digit here or there; I have a photo on my phone if anyone wants to see it.)

I don't know what this error means yet, but I'm hoping a replacement CPU will eliminate the problem. I'm off do do some research on the specific error and then off to NewEgg to see if my current CPU is under warranty.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Nov 13, 2009 10:23 pm 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
I feel like I'm overposting here but just want to keep it updated in case it helps someone. It turns out the error I captured wasn't much more help than the flashing lights, although I finally learned I was having a kernel panic and not just an "oops."

Some on the net point right to the RAM, which I feel like I've ruled out here. Others point to bad capacitors which I admit I haven't checked yet. Others link it to SATA, XFS and JFS, BIOS, NIC and network traffic--many things that aren't strictly hardware problems. Since I'm close to exercising whatever warranty these parts have, I figured it can't hurt to try new software suggestions. I updated the BIOS and have not experienced a crash in nearly 48 hours. The last 2 nights I've started an HD program before going to bed and in the morning the program had played through without crashing the machine. It's not definitive since a lack of crash doesn't mean another one's not around the corner. If I don't get another crash by the one-week point I'll post back here. So far I think 48 hours is the longest I've gone without a kernel panic so I'm hopeful.

If the crashes keep coming, then I'll take a hard look at the mobo capacitors to see if I have the dreaded "bad caps" problem. After 4 months of random reboots in 5.5 and kernel panics in R6, I'd be happy to find a physical defect that could be repaired or replaced. In the meantime I'm crossing my fingers hoping the BIOS update fixed it. BTW, since I never explicitly answered, I am not overclocking at all. Thanks for the suggestions along the way.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Nov 14, 2009 7:29 am 
Offline
Joined: Thu Mar 02, 2006 5:42 pm
Posts: 410
Location: middleton wi usa atsc
Keep posting. Enquiring minds want to know! :)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Nov 14, 2009 6:42 pm 
Offline
Joined: Wed Mar 07, 2007 9:51 am
Posts: 173
Location: Uniontown, PA
BIOS Update....

I hope that works out for you. It was probably a firmware 'timing' issue.

Crosses Fingers.

Ya gotta keep at it....It's a mind job, but the rewards are awesome. :)


Top
 Profile  
 
 Post subject:
PostPosted: Sat Nov 21, 2009 11:54 pm 
Offline
Joined: Wed Apr 12, 2006 3:05 pm
Posts: 252
Location: GA, USA
I just realized I passed the 1-week mark yesterday and didn't post. So far I haven't had a single lockup since I updated the BIOS. YMMV but apparently that was at least part of my problem all along. Just because life doesn't want to make itself too easy for me, I had a concurrent problem where many recorded programs were not available to play when I tried watching them. I was afraid my almost-new SATA HD was already dying on me; turns out my cable company moved channels around without announcing it so all those misfires were nonexistent or 0-byte files due to no signal on the channel.

I don't want to jinx myself but I think I've gone from panicking about having to drop MythTV to the everyday bugs and glitches. Hopefully it will keep recording strong while I'm on my 6-month military vacation out of country starting 2 months from today (not that I'm keeping track.) The wife will appreciate being able to watch TV while I'm gone.


Top
 Profile  
 

Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 
Go to page 1, 2  Next



All times are UTC - 6 hours




Who is online

Users browsing this forum: No registered users and 18 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB® Forum Software © phpBB Group

Theme Created By ceyhansuyu