Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 17 Dec 2008 15:57:13 -0500
From:      Paul MacKenzie <paul@elehost.com>
To:        freebsd-stable@freebsd.org
Subject:   Re: 7.1-PRERELEASE: arcmsr write performance problem
Message-ID:  <494967A9.3000503@elehost.com>

next in thread | raw e-mail | index | archive | help

> > [ns8]# vmstat -i
> > interrupt                          total       rate
> > irq4: sio0                         57065          0
> > irq17: em1                    3989494045        554
> > irq18: arcmsr0                 558098657         77
> > cpu0: timer                  14381393929       2000
> > irq256: em0                     22763077          3
> > cpu1: timer                  14381384902       1999
> > Total                        33333191675       4635
> > [ns8]#
> >
> > arcmsr0: <Areca SATA Host Adapter RAID Controller
>   
>> > > mem 0xe8600000-0xe8600fff,0xe8000000-0xe83fffff irq 18 at device
>>     
> > 14.0 on pci2
> > ARECA RAID ADAPTER0: Driver Version 1.20.00.15 2007-10-07
> > ARECA RAID ADAPTER0: FIRMWARE VERSION V1.43 2007-4-17
> > arcmsr0: [ITHREAD]
> > .....
> > Waiting 5 seconds for SCSI devices to settle
> > (probe16:arcmsr0:0:16:0): inquiry data fails comparison at DV1 step
> > da0 at arcmsr0 bus 0 target 0 lun 0
> > da0: <Areca ARC-1210-VOL#00 R001> Fixed Direct Access SCSI-5 device
> > da0: 166.666MB/s transfers (83.333MHz DT, offset 32, 16bit)
> > da0: 305175MB (624999424 512 byte sectors: 255H 63S/T 38904C)
> > SMP: AP CPU #1 Launched!
>   

Hi and thanks for your reply.

I do not believe the interrupts are the problem at the moment as the
stats. Here is a vmstat when the system usage is spiking and just before
http needs to be killed to get going again most recently.

vmstat -i
interrupt                          total       rate
irq1: atkbd0                           2          0
irq4: sio0                         22880          0
irq14: ata0                           58          0
irq22: uhci1 uhci3                 18068          0
irq23: uhci0 uhci+                     1          0
irq26: arcmsr0                    496094         14
cpu0: timer                     61769334       1791
irq256: em0                            1          0
irq258: em2                        48505          1
irq259: em3                            1          0
cpu1: timer                     61762043       1791
cpu3: timer                     61299367       1777
cpu2: timer                     61299179       1777
cpu4: timer                     61326132       1778
cpu7: timer                     60845245       1764
cpu5: timer                     61326513       1778
cpu6: timer                     60845018       1764
Total                          491058441      14243

There are no errors en the event console for the areca-cli.

ARC-1130-VOL#00  Main Raid Array Raid1+0 1000.0GB 00/00/00   Normal
Main Raid Array      4 2000.0GB    0.0GB 1234               Normal

Main Processor     : 500MHz
CPU ICache Size    : 32KB
CPU DCache Size    : 32KB
CPU SCache Size    : 0KB
System Memory      : 1024MB/333MHz/ECC
Firmware Version   : V1.44 2008-2-1 BOOT ROM Version   : V1.44 2008-1-28

The buildworld taking a really long time was just an example of the
problem I am seeing that is easy to quantify.  If I run boxbackup, dump,
clamscan  or a few other IO intensive  everything gets VERY slow even
when reading files from the server.

When the HTTP locks up (another  issue that is seen and is connected to
the same issue in my view) this is what it looks like. It is almost as
if the http gets backed up from what I can tell and I need a plunger to
clean out the blockage :) I have to kill it and then restart it to get
things back to normal for a bit.

last pid: 46013;  load averages: 105.30, 67.67,
34.45                                               up 4+23:59:42  19:08:40
629 processes: 89 running, 540 sleeping
CPU: 21.9% user,  0.0% nice, 74.5% system,  3.1% interrupt,  0.4% idle
Mem: 1538M Active, 11G Inact, 898M Wired, 303M Cache, 214M Buf, 1346M Free
Swap: 8192M Total, 1036K Used, 8191M Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
COMMAND
46000 www              1  65    0 86728K 15008K RUN    1   0:01 12.06% httpd
45994 www              1  56    0 86728K 15008K CPU1   3   0:01 10.16% httpd
46002 www              1  -4    0   150M 20648K RUN    3   0:00  6.98% httpd
45195 www              1  68    0   121M 19748K RUN    1   0:29  6.88% httpd
45991 www              1  53    0   150M 21060K select 3   0:01  6.59% httpd
45997 www              1  -4    0   150M 20992K ufs    5   0:01  6.59% httpd
45950 www              1  57    0   153M 23388K RUN    2   0:01  6.49% httpd
45999 www              1  -4    0   150M 20640K ufs    6   0:00  5.96% httpd
45189 www              1  66    0   161M 29660K RUN    6   0:26  5.76% httpd
45974 www              1  -4    0   151M 21564K ufs    3   0:01  5.76% httpd
45984 www              1  -4    0   151M 21376K ufs    5   0:01  5.66% httpd
45998 www              1  -4    0   150M 20652K ufs    3   0:00  5.57% httpd
45780 www              1  -4    0   153M 23516K ufs    6   0:06  5.37% httpd
45972 www              1  -4    0   151M 21332K RUN    4   0:01  5.37% httpd
46001 www              1  20    0   150M 20568K lockf  4   0:00  5.37% httpd
45425 www              1  60    0   164M 31516K RUN    7   0:15  5.18% httpd
45995 www              1  63    0   150M 20820K RUN    2   0:00  5.18% httpd
45845 www              1  -4    0   151M 21692K ufs    6   0:02  4.98% httpd
45854 www              1  52    0   151M 22080K CPU6   0   0:02  4.88% httpd
45977 root             1  47    0 10160K  3260K CPU2   6   0:01  4.88% top
45509 www              1  56    0   155M 25028K RUN    0   0:14  4.79% httpd
45735 www              1  -4    0   158M 27096K RUN    3   0:07  4.79% httpd
45730 www              1  20    0   151M 21728K lockf  2   0:04  4.79% httpd
45847 www              1  -4    0   150M 21092K RUN    5   0:02  4.69% httpd
85338 root             1  46    0   150M 20560K select 7   5:03  4.59% httpd
45835 www              1  -4    0   150M 21100K ufs    0   0:02  4.59% httpd
45443 www              1  -4    0   151M 22220K ufs    6   0:12  4.49% httpd
45699 www              1  -4    0   157M 26528K RUN    0   0:07  4.39% httpd
45722 www              1  -4    0   152M 22908K RUN    0   0:05  4.39% httpd
45701 www              1  -4    0   152M 22268K RUN    2   0:07  4.30% httpd
45849 www              1  -4    0   151M 21748K ufs    6   0:02  4.30% httpd
46010 nagios           1  -4    0  7684K  1136K ufs    5   0:00  4.30%
check_ping
45515 www              1  -4    0   160M 29048K ufs    5   0:13  4.20% httpd
45606 www              1  -4    0   152M 22420K ufs    0   0:09  4.20% httpd

vfs.numvnodes: 355382
kern.maxvnodes: 400000
vfs.ufs.dirhash_docheck: 0
vfs.ufs.dirhash_mem: 3239015
vfs.ufs.dirhash_maxmem: 10485760
vfs.ufs.dirhash_minsize: 2560
kern.ipc.nsfbufs: 0
kern.openfiles: 3395
kern.maxfiles: 12328

Results from netstat -m
------------------------
1185/3360/4545 mbufs in use (current/cache/total)
1062/2856/3918/25600 mbuf clusters in use (current/cache/total/max)
1062/1556 mbuf+clusters out of packet secondary zone in use (current/cache)
10/1550/1560/12800 4k (page size) jumbo clusters in use
(current/cache/total/max)
0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
2460K/12752K/15212K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0/0/0 sfbufs in use (current/peak/max)
0 requests for sfbufs denied
0 requests for sfbufs delayed
46262 requests for I/O initiated by sendfile
0 calls to protocol drain routines

Results from vmstat -m
------------------------
         Type InUse MemUse HighUse Requests  Size(s)
         cdev    22     6K       -       22  256
   acd_driver     1     2K       -        1  2048
        sigio     1     1K       -     1626  64
     filedesc   684   941K       -  1199696 16,32,64,128,256,512,1024,2048,4096
         kenv    68    11K       -       70  16,32,64
       kqueue   368   414K       -  1740632  256,2048,4096
    proc-args    52     4K       -  5389885  16,32,64,128,256
      ithread    99    19K       -       99  32,128,256
      acpisem    13     2K       -       13  128
    CAM queue    12     1K       -      302  16,32,64,128,256
       KTRACE   100    13K       -      100  128
       linker    45     4K       -       71  16,32,64,128,512
        lockf   314    38K       - 16413112  64,128,256,512,1024,2048,4096
      scsi_da     0     0K       -       65  16
       ip6ndp     7     1K       -        7  64,128
       ip6opt     1     1K       -    50503  256
         temp    66   222K       -  6704801 16,32,64,128,256,512,1024,2048,4096
       devbuf 16781 35476K       -   108258 16,32,64,128,256,512,1024,2048,4096
      CAM SIM     2     1K       -        2  256
       module   204    26K       -      204  128
      acpidev    78     5K       -       78  64
     mtx_pool     1     8K       -        1
      subproc  1111  1606K       -  1045562  512,4096
         proc     2    16K       -        2
      session    34     5K       -    20772  128
         pgrp    39     5K       -   158890  128
         cred 24950  6238K       - 11839905  256
      uidinfo    13     3K       -     7337  64,2048
       plimit    24     6K       -   226179  256
   CAM periph     7     2K       -       45  16,32,64,128,256
    sysctltmp     0     0K       -   215050  16,32,64,128,256
    sysctloid  4373   216K       -     4373  16,32,64,128
       sysctl     0     0K       -   828292  16,32,64
         umtx  1692   212K       -     1692  128
     p1003.1b     1     1K       -        1  16
         SWAP     2  1097K       -        2  64
      CAM XPT    51    24K       - 19790153  32,64,128,256,1024
       bus-sc   111   101K       -     1879 16,32
,64,128,256,512,1024,2048,4096
          bus   804    77K       -     5926  16,32,64,128,256,1024
      devstat    10    21K       -       10  32,4096
 eventhandler    57     5K       -       57  64,128
         kobj   125   500K       -      160  4096
       kbdmux     6     9K       -        6  16,256,512,2048,4096
         rman   168    21K       -      576  16,64,128
         sbuf     0     0K       -      840 16,32,64,128,256,512,1024,2048,4096
     pci_link    16     2K       -       16  16,128
        stack     0     0K       -       14  256
    taskqueue    19     2K       -       19  16,32,128
       Unitno    16     1K       -    22074  32,64
          iov     0     0K       - 12126863  16,64,128,256,512
     ioctlops     0     0K       -   388714  16,32,64,128,256,512,1024,2048
          msg     4    30K       -        4  2048,4096
          sem     4     8K       -        4  512,1024,2048,4096
          shm     1    16K       -        1
         ttys  1170   169K       -    80824  128,1024
         ptys     5     2K       -        5  256
         accf     3     1K       -      301  32,64
     mbuf_tag     0     0K       -   520852  32,128
          pcb    47   158K       -  1332310  16,32,128,1024,2048,4096
       soname   187    23K       - 10680643  16,32,128
       biobuf     1     2K       -   143707  2048
     vfscache     1  1024K       -        1
   cl_savebuf     0     0K       -   154293  64,128
     vfs_hash     1   512K       -        1
       vnodes     2     1K       -        3  32,256
  vnodemarker     1     1K       -   124142  512
        mount   111     6K       -      495  16,32,64,128,256,2048
    acpi_perf     8     1K       -        8  64
          BPF     6     1K       -        6  128
  ether_multi    29     2K       -       32  16,32,64
       ifaddr   136    48K       -      136  32,64,128,256,512,4096
        ifnet     7    13K       -        7  256,2048
        clone     6    24K       -        6  4096
       arpcom     5     1K       -        5  16
           lo     1     1K       -        1  32
       acpica  3057   292K       -    68659  16,32,64,128,256,512,1024
     routetbl   303    86K       -     1027  32,64,128
,256,512
     in_multi     4     1K       -        4  64
  IpFw/IpAcct    60     9K       -       60  64,128,2048
    sctp_iter     0     0K       -       65  256
     sctp_ifn     4     1K       -        4  128
     sctp_ifa    66     9K       -       66  128
     sctp_vrf     1     1K       -        1  64
    sctp_a_it     0     0K       -       65  16
    hostcache     1    36K       -        1
      entropy  1024    64K       -     1024  64
     syncache     1   100K       -        1
    in6_multi    16     1K       -       16  32,64,128
audit_evclass   150     5K       -      187  32
     savedino     0     0K       -   406078  256
    newdirblk     0     0K       -     5047  64
       dirrem    18     2K       -  2259617  64
        mkdir     1     1K       -   283528  64
       diradd   183    12K       -  3426340  64
     freefile    55     4K       -  1081462  64
     freeblks    26     7K       -   792864  256
     freefrag     2     1K       -   781740  64
   allocindir     5     1K       -  2842332  128
     indirdep     4     1K       -   116594  64
  allocdirect    62    16K       -  4832896  256
    bmsafemap    12     2K       -   271759  128
       newblk     1     1K       -  7675229  64,512
     inodedep   270   580K       -  2593883  256
      pagedep    12   130K       -   318828  128
  ufs_dirhash  2848  1230K       -    42435 16,32,64,128,256,512,1024,2048,4096
    ufs_quota     1   512K       -        1
    ufs_mount    15   241K       -       51  128,256,512,2048,4096
      UMAHash     5   572K       -       33  512,1024,2048,4096
        USBHC     0     0K       -      660  16
       USBdev    22    10K       -      682  16,128,512
          USB   761   683K       -     4079  16,32,64,128,256,1024
    vm_pgdata     2   129K       -        2  128
       DEVFS1   115    58K       -      115  512
       DEVFS3   250    63K       -      251  256
       DEVFS2   115     2K       -      115  16
   DEVFS_RULE    36    17K       -       36  64,512
        DEVFS    30     1K       -       31  16,128
      io_apic     2     4K       -        2  2048
    pfs_nodes    20     5K       -       20  256
      memdesc     1     4K       -        2  4096
          msi     4     1K       -        4  128
     nexusdev     4     1K       -        4  16
     acpitask     0     0K       -        9  64
         GEOM   104    20K       -      882  16,32,64,128,256,512,1024,2048
     atkbddev     2     1K       -        2  64
       isadev     7     1K       -        7  128
CAM dev queue     2     1K       -        2  128
  ata_generic     1     1K       -        1  1024
      ata_dma     1     1K       -        1  256


Results from systat -v
-----------------------

    1 users    Load   143 90.86 47.13                  Nov 21 19:10

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act 1754100   25964  4719924    55728 1551492  count
All 1916252  113912  9413004   269144          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt    179 cow   16002 total
 73     133 454   2   32k  816 2520    3  29k  726    504 zfod        atkbd0 1
                                                          ozfod       sio1 irq3
86.8%Sys   3.5%Intr  9.2%User  0.0%Nice  0.6%Idle        %ozfod       sio0 irq4
|    |    |    |    |    |    |    |    |    |    |       daefr       ata0 irq14
===========================================++>>>>>     16 prcfr       uhci1 uhci
                                       314 dtbuf       90 totfr       uhci0 uhci
Namei     Name-cache   Dir-cache    400000 desvn          react     2 arcmsr0 26
   Calls    hits   %    hits   %    355344 numvn          pdwak  2004 cpu0: time
   76763   76730 100                 24902 frevn          pdpgs       em0 irq256
                                                          intrn     1 em2 irq258
Disks   da0   da1 pass0 pass1 pass2                934624 wire        em3 irq259
KB/t   9.00  0.00  0.00  0.00  0.00               1697060 act    2000 cpu1: time
tps       1     0     0     0     0              12038912 inact  1996 cpu2: time
MB/s   0.01  0.00  0.00  0.00  0.00                308732 cache  2000 cpu3: time
%busy     0     0     0     0     0               1244784 free   2001 cpu7: time
                                                   219632 buf    1999 cpu4: time
                                                                 1999 cpu6: time
                                                                 2000 cpu5: time

Here is a "normal" sysstat -v to compare when there are no "visible" problems:

    3 users    Load  1.67  1.03  1.02                  Nov 25 22:12

Mem:KB    REAL            VIRTUAL                       VN PAGER   SWAP PAGER
        Tot   Share      Tot    Share    Free           in   out     in   out
Act  797576   31388  2318500    57324 4051340  count
All  952256  114916  6781828   226696          pages
Proc:                                                            Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt    556 cow   16001 total
  1       6 474      5463 1602 3387    1  31k 1567    853 zfod        atkbd0 1
                                                          ozfod       sio1 irq3
 8.4%Sys   4.0%Intr  2.8%User  0.0%Nice 84.8%Idle        %ozfod       sio0 irq4
|    |    |    |    |    |    |    |    |    |    |       daefr       ata0 irq14
====++>>                                              602 prcfr       uhci1 uhci
                                       125 dtbuf     1443 totfr       uhci0 uhci
Namei     Name-cache   Dir-cache    400000 desvn          react       arcmsr0 26
   Calls    hits   %    hits   %    328748 numvn          pdwak  2026 cpu0: time
   52734   52660 100                 24705 frevn          pdpgs       em0 irq256
                                                          intrn     1 em2 irq258
Disks   da0   da1 pass0 pass1 pass2                857028 wire        em3 irq259
KB/t   0.00  0.00  0.00  0.00  0.00                750716 act    2026 cpu1: time
tps       0     0     0     0     0              10564316 inact  1975 cpu2: time
MB/s   0.00  0.00  0.00  0.00  0.00                303468 cache  1977 cpu3: time
%busy     0     0     0     0     0               3748056 free   1999 cpu7: time
                                                   219632 buf    2000 cpu4: time
                                                                 1997 cpu6: time
                                                                 2000 cpu5: time
----------------------------------

Here are the results of vmstat -w 1 during the problem:

 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 da1   in   sy   cs
us sy id
157 110 13   5544M  1111M  1141   0   0   0  1100  44   0   0   47  115
8744  7 14 79

146 34 98   5546M  1099M  4191   0   0   0   729   0   2   0   17 18583
102586  9 91  0

224 33 15   5548M  1091M  3825   0   0   0   664   0   0   0    7 14115
141707 10 90  0

165 103 11   5633M  1064M 12222   0   0   0  6745   0   2   0   42 41519
403437 14 86  0

214 73 4   5653M  1044M  4539   0   0   0   959   0   0   0    7 15698
94269 11 88  1

260 30 1   5664M  1034M  8457   0   0   0  2171   0   0   0   14 36978
248202 12 87  0

57 182 45   5667M  1029M  4761   0   0   0  2535   0   0   0    6 21004
133617 10 90  0

55 24 16   2152M  2454M  7993   0   0   0  3135   0   0   0   13 20263
173347 13 81  7

20 15 2   1972M  2537M 93820   0   0   0 465955   0  10   0  588 99274
716238 23 76  1

13 11 0   1877M  2581M  7820   0   0   0 31044   0   6   0   38 7859
76120 16 83  1

 9 12 1   1816M  2599M  6889   0   0   0 14550   0  20   0   79 359198
21333 14 79  7

11 13 0   1797M  2613M  6542   0   0   0  8416   0   3   0   17 606119
15341 18 61 21

 1 9 1   1740M  2636M  1744   0   0   0  6267   0   2   0   14 11617
15322  8 63 29

 2 3 0   1694M  2657M  3417   0   0   0  8669   0  15   0   52 50341
12045  6 29 65


Here is another view of top at a later date with the same problem happening focusing on IO setting
in Top:
--------------------------------------------------------------


last pid: 17984;  load averages: 39.26, 37.68,
24.75                                                       up 8+09:25:55  04:34:53
539 processes: 59 running, 480 sleeping
CPU:  9.8% user,  0.5% nice, 87.0% system,  2.3% interrupt,  0.4% idle
Mem: 1146M Active, 9663M Inact, 875M Wired, 582M Cache, 214M Buf, 3577M Free
Swap: 8192M Total, 1036K Used, 8191M Free

  PID USERNAME        VCSW  IVCSW   READ  WRITE  FAULT  TOTAL PERCENT
COMMAND
17587 www              446     62      0      0      0      0   0.00% httpd
17763 www              515     37      0      0      0      0   0.00% httpd
17860 www              538     47      0      0      0      0   0.00% httpd
17703 www              457     43      0      0      0      0   0.00% httpd
17701 www              485     34      0      0      0      0   0.00% httpd
17550 www              423     29      0      0      0      0   0.00% httpd
17579 www                0      0      0      0      0      0   0.00% httpd
17864 www              495     39      0      0      0      0   0.00% httpd
17836 www              520     36      0      0      0      0   0.00% httpd
17847 www              451     28      0      0      0      0   0.00% httpd
17756 www              462     29      0      0      0      0   0.00% httpd
17982 www              445     63      0      0      0      0   0.00% httpd
17581 www              451     60      0      0      0      0   0.00% httpd
17761 www              449     37      0      0      0      0   0.00% httpd
17582 www              509     30      0      0      0      0   0.00% httpd
17709 www              447     28      0      0      0      0   0.00% httpd
17705 www              515     30      0      0      0      0   0.00% httpd
17704 www              469     38      0      0      0      0   0.00% httpd
17706 www              508     53      0      0      0      0   0.00% httpd
17833 www              483     34      0      0      0      0   0.00% httpd
17834 www              499     43      0      0      0      0   0.00% httpd
17974 www              489     38      0      0      0      0   0.00% httpd
17978 www              467     45      0      0      0      0   0.00% httpd
17576 www              447     32      0      0      0      0   0.00% httpd
17570 www              443     37      0      0      0      0   0.00% httpd
17762 www              476     31      0      0      0      0   0.00% httpd
17837 www              508     44      0      0      0      0   0.00% httpd
17548 www              443     32      0      0      0      0   0.00% httpd
17783 www              390     22      0      0      0      0   0.00% httpd
17961 www              534     57      0      0      0      0   0.00% httpd
17590 www              498     50      0      0      0      0   0.00% httpd
17700 www              471     35      0      0      0      0   0.00% httpd
17580 www              438     41      0      0      0      0   0.00% httpd

This used to be on a 4.11x system with 1 cpu and only 1gb of ram and ran flawlessly with much less
resources with the same web site code for a long time. I do not have this problem on the other 7.0
machine. I originally thought it was just a cpu issue but it is very closely tied to when something
is trying to use the raid arrays and this seems to be the way to reproduce it.

I am having a hard time determining why the system load is so high. Can you recommend the best way
to identify the culprit?

Thanks,

Paul







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?494967A9.3000503>