T420s 下安装centos7 系统,出现bug soft lockupp cpu#3 报错,什么问题

BUG: soft lockup - CPU#8 stuck for 22s!
6 messages
Open this post in threaded view
Report Content as Inappropriate
BUG: soft lockup - CPU#8 stuck for 22s!
Greetings, all.
Just wanted to drop this out there to see if it rang any bells.
I've been getting a soft lockup (numad thread stuck on a cpu
while attempting to attach a task to a cgroup) for a while now,
but I thought it was only happening when I applied Mel Gorman's
set of AutoNUMA patches. Today, however, it happened on a stock
3.12rc3 kernel as well, so it is in the baseline. And before
anyone asks, I wanted to make sure directed numa activities
such as numad would do interacted safely with the AutoNUMA
stuff so that's why I was running with both enabled.
I believe this started in the 3.11 timeframe (and I'll try to
bisect to narrow things down).
The problem/reproduction environment is:
& & & & + Centos 6.4
& & & & /* The next three lines are to get numad running */
& & & & + mkdir /cgroup/cpuset
& & & & + mount cgroup -t cgroup -o cpuset /cgroup/cpuset
& & & & + service numad start
& & & & + loop running the AutoNUMA tests available at:
& & & & git://gitorious.org/autonuma-benchmark/autonuma-benchmark.git
How long it takes to hit this varies -- since it looks like it
is not due to Mel's changes at all, a stress test for cgroup
interactions would likely kick it faster (anyone care to point
me at one?).
/var/log/messages output attached, trimmed to just one boot+instance
of the problem.
Oct 22 11:05:10 hornet2 kernel: BUG: soft lockup - CPU#8 stuck for 22s!
[numad:27384]
Oct 22 11:05:10 hornet2 kernel: Modules linked in: ebtable_nat ebtables
xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc 8021q garp stp llc
ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
ip6table_filter ip6_tables ipv6 ext2 vhost_net macvtap macvlan vhost tun
kvm_intel kvm uinput hp_wmi sparse_keymap rfkill snd_usb_audio
snd_usbmidi_lib snd_rawmidi acpi_cpufreq freq_table iTCO_wdt
iTCO_vendor_support sg microcode serio_raw pcspkr sb_edac edac_core wmi
i2c_i801 lpc_ich mfd_core xhci_hcd e1000e ptp pps_core ioatdma dca
snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore
snd_page_alloc ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif
crct10dif_common firewire_ohci firewire_core crc_itu_t ahci libahci
pata_acpi ata_generic isci libsas scsi_transport_sas radeon ttm
drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log
Oct 22 11:05:10 hornet2 kernel: CPU: 8 PID: 27384 Comm: numad Not
tainted 3.12.0-rc3+ #1
Oct 22 11:05:10 hornet2 kernel: Hardware name: Hewlett-Packard HP Z620
Workstation/158A, BIOS J61 v03.15 05/09/2013
Oct 22 11:05:10 hornet2 kernel: task: ffffc0 ti:
ffff task.ti: ffff
Oct 22 11:05:10 hornet2 kernel: RIP: 0010:[&ffffffff8154256c&]
[&ffffffff8154256c&] _raw_read_lock+0xc/0x20
Oct 22 11:05:10 hornet2 kernel: RSP: 0018:ffffcc8 &EFLAGS:
Oct 22 11:05:10 hornet2 kernel: RAX: 0000 RBX:
ffffffff81117b52 RCX: ffff880c073ca6e8
Oct 22 11:05:10 hornet2 kernel: RDX: ffff880c12a1f040 RSI:
ffff RDI: ffffffff81a46cc8
Oct 22 11:05:10 hornet2 kernel: RBP: ffffcc8 R08:
ffff880c2fc55ba0 R09: ffff880c030fa000
Oct 22 11:05:10 hornet2 kernel: R10: f4f0 R11:
f000 R12: 0d0f
Oct 22 11:05:10 hornet2 kernel: R13: ffffce8 R14:
0030 R15: ffff
Oct 22 11:05:10 hornet2 kernel: FS: &)
GS:ffff880c2fc) knlGS:0000
Oct 22 11:05:10 hornet2 kernel: CS: &0010 DS: 0000 ES: 0000 CR0:
Oct 22 11:05:10 hornet2 kernel: CR2: ffffffffff600400 CR3:
7000 CR4: 07e0
Oct 22 11:05:10 hornet2 kernel: Stack:
Oct 22 11:05:10 hornet2 kernel: ffffce8 ffffffff810c0ee9
ffff880c0f5c
Oct 22 11:05:10 hornet2 kernel: ffffda8 ffffffff810c4896
Oct 22 11:05:10 hornet2 kernel: 940e 0000
Oct 22 11:05:10 hornet2 kernel: Call Trace:
Oct 22 11:05:10 hornet2 kernel: [&ffffffff810c0ee9&]
task_cgroup_from_root+0x29/0xa0
Oct 22 11:05:10 hornet2 kernel: [&ffffffff810c4896&]
cgroup_attach_task+0xe6/0x3b0
Oct 22 11:05:10 hornet2 kernel: [&ffffffff810c4ccf&]
attach_task_by_pid+0x16f/0x1b0
Oct 22 11:05:10 hornet2 kernel: [&ffffffff810c4d26&]
cgroup_tasks_write+0x16/0x20
Oct 22 11:05:10 hornet2 kernel: [&ffffffff810c1b3c&]
cgroup_write_X64+0xec/0x150
Oct 22 11:05:10 hornet2 kernel: [&ffffffff81210cb3&] ?
security_file_permission+0x23/0x90
Oct 22 11:05:10 hornet2 kernel: [&ffffffff810c4338&]
cgroup_file_write+0x58/0xc0
Oct 22 11:05:10 hornet2 kernel: [&ffffffff&] ?
file_start_write+0x33/0x40
Oct 22 11:05:10 hornet2 kernel: [&ffffffff81178c68&] vfs_write+0xc8/0x170
Oct 22 11:05:10 hornet2 kernel: [&ffffffff8117927f&] SyS_write+0x5f/0xb0
Oct 22 11:05:10 hornet2 kernel: [&ffffffff8154af92&]
system_call_fastpath+0x16/0x1b
Oct 22 11:05:10 hornet2 kernel: Code: 17 b8 01 00 00 00 ff ca 78 05 c9
c3 0f 1f 00 f0 ff 07 30 c0 c9 c3 66 0f 1f 84 00 00 00 00 00 55 48 89 e5
66 66 66 66 90 f0 ff 0f &79& 05 e8 bd f0 d2 ff c9 c3 66 66 2e 0f 1f 84
00 00 00 00 00 55
Thanks in advance for any input/interest.
Don Morris
cgroup_hang_trimmed (366K)
Open this post in threaded view
Report Content as Inappropriate
Re: BUG: soft lockup - CPU#8 stuck for 22s!
On Tue, Oct 22, 2013 at 01:29:22PM -0400, Don Morris wrote:
& Greetings, all.
& Just wanted to drop this out there to see if it rang any bells.
& I've been getting a soft lockup (numad thread stuck on a cpu
& while attempting to attach a task to a cgroup) for a while now,
& but I thought it was only happening when I applied Mel Gorman's
& set of AutoNUMA patches. Today, however, it happened on a stock
& 3.12rc3 kernel as well, so it is in the baseline. And before
& anyone asks, I wanted to make sure directed numa activities
& such as numad would do interacted safely with the AutoNUMA
& stuff so that's why I was running with both enabled.
& I believe this started in the 3.11 timeframe (and I'll try to
& bisect to narrow things down).
& The problem/reproduction environment is:
+ Centos 6.4
/* The next three lines are to get numad running */
+ mkdir /cgroup/cpuset
+ mount cgroup -t cgroup -o cpuset /cgroup/cpuset
+ service numad start
+ loop running the AutoNUMA tests available at:
git://gitorious.org/autonuma-benchmark/autonuma-benchmark.git
& How long it takes to hit this varies -- since it looks like it
& is not due to Mel's changes at all, a stress test for cgroup
& interactions would likely kick it faster (anyone care to point
& me at one?).
I ran this a few times in different configurations and was unable to
reproduce the problem. numad is certainly runnign because I can see
its effect.
& /var/log/messages output attached, trimmed to just one boot+instance
& of the problem.
& Oct 22 11:05:10 hornet2 kernel: BUG: soft lockup - CPU#8 stuck for 22s!
& [numad:27384]
& Oct 22 11:05:10 hornet2 kernel: Modules linked in: ebtable_nat ebtables
& xt_CHECKSUM iptable_mangle bridge autofs4 sunrpc 8021q garp stp llc
& ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables
& ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack
& ip6table_filter ip6_tables ipv6 ext2 vhost_net macvtap macvlan vhost tun
& kvm_intel kvm uinput hp_wmi sparse_keymap rfkill snd_usb_audio
& snd_usbmidi_lib snd_rawmidi acpi_cpufreq freq_table iTCO_wdt
& iTCO_vendor_support sg microcode serio_raw pcspkr sb_edac edac_core wmi
& i2c_i801 lpc_ich mfd_core xhci_hcd e1000e ptp pps_core ioatdma dca
& snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec
& snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore
& snd_page_alloc ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif
& crct10dif_common firewire_ohci firewire_core crc_itu_t ahci libahci
& pata_acpi ata_generic isci libsas scsi_transport_sas radeon ttm
& drm_kms_helper drm i2c_algo_bit i2c_core dm_mirror dm_region_hash dm_log
& Oct 22 11:05:10 hornet2 kernel: CPU: 8 PID: 27384 Comm: numad Not
& tainted 3.12.0-rc3+ #1
& Oct 22 11:05:10 hornet2 kernel: Hardware name: Hewlett-Packard HP Z620
& Workstation/158A, BIOS J61 v03.15 05/09/2013
& Oct 22 11:05:10 hornet2 kernel: task: ffffc0 ti:
& ffff task.ti: ffff
& Oct 22 11:05:10 hornet2 kernel: RIP: 0010:[&ffffffff8154256c&]
& [&ffffffff8154256c&] _raw_read_lock+0xc/0x20
I assume it's the css_set_lock that is causing the problem. Someone
somewhere has gone to sleep forever holding that lock or there is an
error path that is not releasing it. Does sysrq-t reveal what might have
gone asleep with the lock held? None of the processes currently running
looked liks obvious candidates.
Mel Gorman
To unsubscribe from this list: send the line &unsubscribe linux-kernel& in
the body of a message to
More majordomo info at &Please read the FAQ at &
Open this post in threaded view
Report Content as Inappropriate
Re: BUG: soft lockup - CPU#8 stuck for 22s!
In reply to
by Don Morris
On Tue, Oct 22, 2013 at 01:29:22PM -0400, Don Morris wrote:
& Greetings, all.
& Just wanted to drop this out there to see if it rang any bells.
& I've been getting a soft lockup (numad thread stuck on a cpu
& while attempting to attach a task to a cgroup) for a while now,
& but I thought it was only happening when I applied Mel Gorman's
& set of AutoNUMA patches.
This maybe?
mm: memcontrol: Release css_set_lock when aborting an OOM scan
css_task_iter_start acquires the css_set_lock and it must be released with
a call to css_task_iter_end. Commmit 9cbb78bb (mm, memcg: introduce own
oom handler to iterate only over its own threads) introduced a loop that
was not guaranteed to call css_task_iter_end.
Cc: stable &&
Signed-off-by: Mel Gorman &&
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 5efd 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -95,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
mem_cgroup_iter_break(memcg, iter);
if (chosen)
put_task_struct(chosen);
css_task_iter_end(&it);
case OOM_SCAN_OK:
To unsubscribe from this list: send the line &unsubscribe linux-kernel& in
the body of a message to
More majordomo info at &Please read the FAQ at &
Open this post in threaded view
Report Content as Inappropriate
Re: BUG: soft lockup - CPU#8 stuck for 22s!
On 11/04/ PM, Mel Gorman wrote:
& On Tue, Oct 22, 2013 at 01:29:22PM -0400, Don Morris wrote:
&& Greetings, all.
&& Just wanted to drop this out there to see if it rang any bells.
&& I've been getting a soft lockup (numad thread stuck on a cpu
&& while attempting to attach a task to a cgroup) for a while now,
&& but I thought it was only happening when I applied Mel Gorman's
&& set of AutoNUMA patches.
& This maybe?
Certainly would make sense. My appreciation for taking a look
I happen to be on the road today, however -- and away from the
reproduction environment. I'll give it a shot tomorrow morning
and either let you know if it fixes things or report the sysrq-t
output you requested.
Again, my thanks!
Don Morris
& ---8&---
& mm: memcontrol: Release css_set_lock when aborting an OOM scan
& css_task_iter_start acquires the css_set_lock and it must be released with
& a call to css_task_iter_end. Commmit 9cbb78bb (mm, memcg: introduce own
& oom handler to iterate only over its own threads) introduced a loop that
& was not guaranteed to call css_task_iter_end.
& Cc: stable &&
& Signed-off-by: Mel Gorman &&
& diff --git a/mm/memcontrol.c b/mm/memcontrol.c
& index 5efd 100644
& --- a/mm/memcontrol.c
& +++ b/mm/memcontrol.c
& @@ -95,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
mem_cgroup_iter_break(memcg, iter);
if (chosen)
put_task_struct(chosen);
css_task_iter_end(&it);
case OOM_SCAN_OK:
kernel, n:
A part of an operating system that preserves the medieval traditions
of sorcery and black art.
To unsubscribe from this list: send the line &unsubscribe linux-kernel& in
the body of a message to
More majordomo info at &Please read the FAQ at &
Open this post in threaded view
Report Content as Inappropriate
Re: BUG: soft lockup - CPU#8 stuck for 22s!
In reply to
by Mel Gorman-2
On Mon, 4 Nov 2013, Mel Gorman wrote:
& This maybe?
& ---8&---
& mm: memcontrol: Release css_set_lock when aborting an OOM scan
& css_task_iter_start acquires the css_set_lock and it must be released with
& a call to css_task_iter_end. Commmit 9cbb78bb (mm, memcg: introduce own
& oom handler to iterate only over its own threads) introduced a loop that
& was not guaranteed to call css_task_iter_end.
& Cc: stable &&
& Signed-off-by: Mel Gorman &&
& diff --git a/mm/memcontrol.c b/mm/memcontrol.c
& index 5efd 100644
& --- a/mm/memcontrol.c
& +++ b/mm/memcontrol.c
& @@ -95,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
mem_cgroup_iter_break(memcg, iter);
if (chosen)
put_task_struct(chosen);
css_task_iter_end(&it);
case OOM_SCAN_OK:
What tree is this?
I'm afraid I don't understand this at all, I thought css_task_iter_end()
was added to take over for cgroup_task_iter_end() and
mem_cgroup_out_of_memory() was modified with 72ec7029937f (&cgroup: make
task iterators deal with cgroup_subsys_state instead of cgroup&)
correctly. &Why do we need to call css_task_iter_end() twice with your
To unsubscribe from this list: send the line &unsubscribe linux-kernel& in
the body of a message to
More majordomo info at &Please read the FAQ at &
Open this post in threaded view
Report Content as Inappropriate
Re: BUG: soft lockup - CPU#8 stuck for 22s!
On Wed, Nov 06, 2013 at 04:30:05PM -0800, David Rientjes wrote:
& On Mon, 4 Nov 2013, Mel Gorman wrote:
& & This maybe?
& & ---8&---
& & mm: memcontrol: Release css_set_lock when aborting an OOM scan
& & css_task_iter_start acquires the css_set_lock and it must be released with
& & a call to css_task_iter_end. Commmit 9cbb78bb (mm, memcg: introduce own
& & oom handler to iterate only over its own threads) introduced a loop that
& & was not guaranteed to call css_task_iter_end.
& & Cc: stable &&
& & Signed-off-by: Mel Gorman &&
& & diff --git a/mm/memcontrol.c b/mm/memcontrol.c
& & index 5efd 100644
& & --- a/mm/memcontrol.c
& & +++ b/mm/memcontrol.c
& & @@ -95,7 @@ static void mem_cgroup_out_of_memory(struct mem_cgroup *memcg, gfp_t gfp_mask,
mem_cgroup_iter_break(memcg, iter);
if (chosen)
put_task_struct(chosen);
css_task_iter_end(&it);
case OOM_SCAN_OK:
& What tree is this?
& I'm afraid I don't understand this at all, I thought css_task_iter_end()
& was added to take over for cgroup_task_iter_end() and
& mem_cgroup_out_of_memory() was modified with 72ec7029937f (&cgroup: make
& task iterators deal with cgroup_subsys_state instead of cgroup&)
& correctly. &Why do we need to call css_task_iter_end() twice with your
I screwed up, patch is broken. I'll recheck for imbalances in the
handling of css_set_lock. Sorry for the noise.
Mel Gorman
To unsubscribe from this list: send the line &unsubscribe linux-kernel& in
the body of a message to
More majordomo info at &Please read the FAQ at &
Loading...基本上如果安装vmtools时报错的错,首先记录一下报错信息
例如,本次遇见的就是
/tmp/modconfig-8mD7iy/vmhgfs-only/page.c:1625:23: 错误:提供给函数‘wait_on_bit’的实参太多
TASK_UNINTERRUPTIBLE);
这个报错都是因具体情况而有所不同的,处理的方法基本上是一样的
先运行uname -a看看centos的内核版本
Linux localhost.localdomain 3.10.0-514.2.2.el7.x86_64 #1 SMP Tue Dec 6 23:06:41 UTC
x86_64 x86_64 GNU/Linux
可以看到我当前的是3.10.0
然后去vmtools的安装目录下找到vmhgfs.tar压缩包
在解压出来的vmhgfs-only文件夹内,找到对应的报错的文件,本例中为page.c的第1625行
将其之前的判定由3,17,0改成自己的内核,这边是3,10,0
然后重新运行安装即可
其他的报错信息不一样的时候,要看清报错内容,报错内容里一些都有提示报错原因,要根据报错的原因来处理
阅读(...) 评论()}

我要回帖

更多关于 cpu soft lockup 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信