Koudink
Dalsi
Seznam
Predchozi
Autor: pharook (kobayashi maru) on 'Koudink'
Cas: Po 02.12.2013 15:49.27
Titulek: Neobvykly OOM killer a necekana NUMA v ramci jedne masiny

                                                                                 
Zdravim,
 
   nektere stroje s novymi procesory se v ramci jedne masiny chovaji jako
NUMA a diky podivuhodnym heuristikam muze dochazet k necekanym OOM killum i
kdyz je pameti habakuk.  Je to uz pred nejakou dobou, ale protoze takovych
stroju bude zrejme pribyvat a v jadre to dodnes nevyresili kompletne, mozna
se to bude nekomu hodit.
 
   Na jednom stroji jsme tu zjistili, ze nam OOM vrazdi cas od casu
ruzne procesy, bez jasnych souvislosti.
 
Aug 26 17:22:49 mentat kernel: [11859102.984747]
 mentat-wardenin invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
Aug 27 16:34:32 mentat kernel: [11942606.528269]
 mongod invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0
 
   Na napad nas privedla hlaska Monga:
 
Wed Aug 28 12:42:04.337 [initandlisten] ** WARNING: You are running on a NUMA
machine.
 
   Mel jsem za to, ze NUMA (Non-unified Memory Access) stroje jsou farmy
nezavislych stroju se sdilenou pameti. Vypada to ale, ze to tak dneska uz
neni, NUMA je pouzivana na dnesnich beznych multiprocesorovych
architekturach, kde kazdy procesor ma "snazsi" pristup k urcite casti
pameti:
 
  root@mentat:~# numactl --hardware
  available: 2 nodes (0-1)
  node 0 cpus: 0 2 3 4 5 6
  node 0 size: 16374 MB
  node 0 free: 117 MB
  node 1 cpus: 1 7 8 9 10 11
  node 1 size: 16384 MB
  node 1 free: 161 MB
  node distances:
  node   0   1
    0:  10  16
    1:  16  10
 
  (Distances udavaji relativni "cenu" pristupu k pametovemu bloku.)
 
Dokumentace Mongo:
docs.mongodb.org/manual/administration/production-notes/#production-numa
 
A nasledne Jeremy Cole:
blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/
 
A take Robert Chase:
oracle.com/technetwork/articles/servers-storage-admin/oom-killer-1911807.html
 
  Preference procesoru blizsiho pametoveho bloku jednoho databazoveho
threadu muze vest k tomu, ze se linux snazi aktivne odswapovat data z onoho
bloku, aby mohl splnit preferenci, cimz trpi vykon db vic, nez kdyby
alokoval pamet libovolne, resp.  priblizne rovnomerne z blizsich i
vzdalenejsich bloku.  Co je ale zajimavejsi:
 
Jeremy Cole:
 
> Using large-pages will keep mysqld from being swapped out, but it won t
> keep the system from swapping something out.  And, if it really needs
> memory on a particular node, and can t swap out pages for mysqld, it will
> swap something else which might be more important to the function of the
> system, fail the allocation, or *OOM-kill a process* (likely mysqld).
 
Robert Chase:
 
> Many NUMA architecture-based systems *can experience OOM conditions*
> because of one node running out of memory triggering an OOM in the kernel
> while plenty of memory is left in the remaining nodes.  More information
> about OOM conditions on machines that have the NUMA architecture can be
> found in the "See Also" section of this article.F
 
Z dokumentace proc/sys/vm
(https://www.kernel.org/doc/Documentation/sysctl/vm.txt):
 
  zone_reclaim_mode:
 
  Zone_reclaim_mode allows someone to set more or less aggressive approaches
  to reclaim memory when a zone runs out of memory.  If it is set to zero
  then no zone reclaim occurs.  Allocations will be satisfied from other
  zones / nodes in the system.
 
  This is value ORed together of
 
  1    = Zone reclaim on
  2    = Zone reclaim writes dirty pages out
  4    = Zone reclaim swaps pages
 
  Narazili jsme na to zrejme proto, ze jde o Opteron (176HE, 2.4, 6MB),
podle Jeremyho Colea by podobnou architekturu mely mit ale i jine
architektury.
 
> The new architecture for multiple processors, starting with AMD s Opteron
> and Intel s Nehalem2 processors (we ll call these "modern PC CPUs"), is a
> Non-Uniform Memory Access (NUMA) architecture, or more correctly
> Cache-Coherent NUMA (ccNUMA).
 
   Kolega jeste narazil na poznamku na http://www.poempelfox.de/blog/2010/03/
 
> There is hope the kernel developers might fix or have fixed this feature
> already: The changelog for 2.6.30.1 lists several bugfixes for it, but the
> author doesn't seem to be sure to have catched them all, and asks for
> bugreports if problems still arise.
 
   Takze pokud se vam masiny, kde bezi pametove intenzivni aplikace (Mongo,
postgresql, MySQL), chovaji nezvykle, zkuste numactl, a zkuste (treba pres
sysctl.conf) vypnout zone_reclaim.
 
 
____________________________________________________________________pharook_
"Mesic je dulezitejsi nez Slunce", reklo dite. "Protoze sviti, kdyz je tma."
 

Dalsi Seznam Predchozi


[ Domu | Prstik | O Piskovisti | Deticky | Nastenky | Koutky ]