From owner-svn-src-stable-9@FreeBSD.ORG Tue Jun 18 05:21:45 2013 Return-Path: Delivered-To: svn-src-stable-9@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id B72E2396; Tue, 18 Jun 2013 05:21:45 +0000 (UTC) (envelope-from scottl@FreeBSD.org) Received: from svn.freebsd.org (svn.freebsd.org [IPv6:2001:1900:2254:2068::e6a:0]) by mx1.freebsd.org (Postfix) with ESMTP id A60EC18F0; Tue, 18 Jun 2013 05:21:45 +0000 (UTC) Received: from svn.freebsd.org ([127.0.1.70]) by svn.freebsd.org (8.14.7/8.14.7) with ESMTP id r5I5LjSW002874; Tue, 18 Jun 2013 05:21:45 GMT (envelope-from scottl@svn.freebsd.org) Received: (from scottl@localhost) by svn.freebsd.org (8.14.7/8.14.5/Submit) id r5I5Leo1002844; Tue, 18 Jun 2013 05:21:40 GMT (envelope-from scottl@svn.freebsd.org) Message-Id: <201306180521.r5I5Leo1002844@svn.freebsd.org> From: Scott Long Date: Tue, 18 Jun 2013 05:21:40 +0000 (UTC) To: src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-9@freebsd.org Subject: svn commit: r251897 - in stable/9/sys: amd64/amd64 arm/arm cam cam/ata cam/scsi dev/ahci dev/md dev/siis dev/tws geom geom/part i386/i386 i386/include i386/xen ia64/ia64 kern mips/mips powerpc/aim ... X-SVN-Group: stable-9 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-BeenThere: svn-src-stable-9@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: SVN commit messages for only the 9-stable src tree List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 18 Jun 2013 05:21:45 -0000 Author: scottl Date: Tue Jun 18 05:21:40 2013 New Revision: 251897 URL: http://svnweb.freebsd.org/changeset/base/251897 Log: Merge the second part of the unmapped I/O changes. This enables the infrastructure in the block layer and UFS filesystem as well as a few drivers. The list of MFC revisions is long, so I won't quote changelogs. r248508,248510,248511,248512,248514,248515,248516,248517,248518, 248519,248520,248521,248550,248568,248789,248790,249032,250936 Submitted by: kib Approved by: kib Obtained from: Netflix Modified: stable/9/sys/amd64/amd64/pmap.c stable/9/sys/arm/arm/pmap.c stable/9/sys/cam/ata/ata_da.c stable/9/sys/cam/cam_ccb.h stable/9/sys/cam/cam_periph.c stable/9/sys/cam/scsi/scsi_all.c stable/9/sys/cam/scsi/scsi_all.h stable/9/sys/cam/scsi/scsi_cd.c stable/9/sys/cam/scsi/scsi_da.c stable/9/sys/dev/ahci/ahci.c stable/9/sys/dev/md/md.c stable/9/sys/dev/siis/siis.c stable/9/sys/dev/tws/tws.h stable/9/sys/geom/geom.h stable/9/sys/geom/geom_disk.c stable/9/sys/geom/geom_disk.h stable/9/sys/geom/geom_io.c stable/9/sys/geom/geom_vfs.c stable/9/sys/geom/part/g_part.c stable/9/sys/i386/i386/pmap.c stable/9/sys/i386/include/param.h stable/9/sys/i386/xen/pmap.c stable/9/sys/ia64/ia64/pmap.c stable/9/sys/kern/kern_physio.c stable/9/sys/kern/subr_bus_dma.c stable/9/sys/kern/subr_param.c stable/9/sys/kern/vfs_aio.c stable/9/sys/kern/vfs_bio.c stable/9/sys/kern/vfs_cluster.c stable/9/sys/mips/mips/pmap.c stable/9/sys/powerpc/aim/mmu_oea64.c stable/9/sys/powerpc/powerpc/pmap_dispatch.c stable/9/sys/sparc64/sparc64/pmap.c stable/9/sys/sys/bio.h stable/9/sys/sys/buf.h stable/9/sys/sys/mount.h stable/9/sys/sys/systm.h stable/9/sys/ufs/ffs/ffs_alloc.c stable/9/sys/ufs/ffs/ffs_balloc.c stable/9/sys/ufs/ffs/ffs_rawread.c stable/9/sys/ufs/ffs/ffs_vfsops.c stable/9/sys/ufs/ffs/ffs_vnops.c stable/9/sys/ufs/ufs/ufs_extern.h stable/9/sys/vm/swap_pager.c stable/9/sys/vm/swap_pager.h stable/9/sys/vm/vm.h stable/9/sys/vm/vm_init.c stable/9/sys/vm/vm_kern.c stable/9/sys/vm/vnode_pager.c Directory Properties: stable/9/sys/ (props changed) stable/9/sys/dev/ (props changed) stable/9/sys/sys/ (props changed) Modified: stable/9/sys/amd64/amd64/pmap.c ============================================================================== --- stable/9/sys/amd64/amd64/pmap.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/amd64/amd64/pmap.c Tue Jun 18 05:21:40 2013 (r251897) @@ -4245,6 +4245,8 @@ pmap_copy_page(vm_page_t msrc, vm_page_t pagecopy((void *)src, (void *)dst); } +int unmapped_buf_allowed = 1; + void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) Modified: stable/9/sys/arm/arm/pmap.c ============================================================================== --- stable/9/sys/arm/arm/pmap.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/arm/arm/pmap.c Tue Jun 18 05:21:40 2013 (r251897) @@ -4474,6 +4474,8 @@ pmap_copy_page(vm_page_t src, vm_page_t #endif } +int unmapped_buf_allowed = 1; + void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) Modified: stable/9/sys/cam/ata/ata_da.c ============================================================================== --- stable/9/sys/cam/ata/ata_da.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/ata/ata_da.c Tue Jun 18 05:21:40 2013 (r251897) @@ -1242,6 +1242,8 @@ adaregister(struct cam_periph *periph, v !(softc->flags & ADA_FLAG_CAN_48BIT)) { softc->disk->d_flags |= DISKFLAG_CANDELETE; } + if ((cpi.hba_misc & PIM_UNMAPPED) != 0) + softc->disk->d_flags |= DISKFLAG_UNMAPPED_BIO; strlcpy(softc->disk->d_descr, cgd->ident_data.model, MIN(sizeof(softc->disk->d_descr), sizeof(cgd->ident_data.model))); strlcpy(softc->disk->d_ident, cgd->ident_data.serial, @@ -1526,13 +1528,19 @@ adastart(struct cam_periph *periph, unio return; } #endif + KASSERT((bp->bio_flags & BIO_UNMAPPED) == 0 || + round_page(bp->bio_bcount + bp->bio_ma_offset) / + PAGE_SIZE == bp->bio_ma_n, + ("Short bio %p", bp)); cam_fill_ataio(ataio, ada_retry_count, adadone, - bp->bio_cmd == BIO_READ ? - CAM_DIR_IN : CAM_DIR_OUT, + (bp->bio_cmd == BIO_READ ? CAM_DIR_IN : + CAM_DIR_OUT) | ((bp->bio_flags & BIO_UNMAPPED) + != 0 ? CAM_DATA_BIO : 0), tag_code, - bp->bio_data, + ((bp->bio_flags & BIO_UNMAPPED) != 0) ? (void *)bp : + bp->bio_data, bp->bio_bcount, ada_default_timeout*1000); Modified: stable/9/sys/cam/cam_ccb.h ============================================================================== --- stable/9/sys/cam/cam_ccb.h Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/cam_ccb.h Tue Jun 18 05:21:40 2013 (r251897) @@ -42,7 +42,6 @@ #include #include - /* General allocation length definitions for CCB structures */ #define IOCDBLEN CAM_MAX_CDBLEN /* Space for CDB bytes/pointer */ #define VUHBALEN 14 /* Vendor Unique HBA length */ @@ -100,7 +99,7 @@ typedef enum { CAM_MSGB_VALID = 0x10000000,/* Message buffer valid */ CAM_STATUS_VALID = 0x20000000,/* Status buffer valid */ CAM_DATAB_VALID = 0x40000000,/* Data buffer valid */ - + /* Host target Mode flags */ CAM_SEND_SENSE = 0x08000000,/* Send sense data with status */ CAM_TERM_IO = 0x10000000,/* Terminate I/O Message sup. */ @@ -572,7 +571,8 @@ typedef enum { PIM_NOINITIATOR = 0x20, /* Initiator role not supported. */ PIM_NOBUSRESET = 0x10, /* User has disabled initial BUS RESET */ PIM_NO_6_BYTE = 0x08, /* Do not send 6-byte commands */ - PIM_SEQSCAN = 0x04 /* Do bus scans sequentially, not in parallel */ + PIM_SEQSCAN = 0x04, /* Do bus scans sequentially, not in parallel */ + PIM_UNMAPPED = 0x02, } pi_miscflag; /* Path Inquiry CCB */ Modified: stable/9/sys/cam/cam_periph.c ============================================================================== --- stable/9/sys/cam/cam_periph.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/cam_periph.c Tue Jun 18 05:21:40 2013 (r251897) @@ -851,7 +851,7 @@ cam_periph_mapmem(union ccb *ccb, struct * into a larger area of VM, or if userland races against * vmapbuf() after the useracc() check. */ - if (vmapbuf(mapinfo->bp[i]) < 0) { + if (vmapbuf(mapinfo->bp[i], 1) < 0) { for (j = 0; j < i; ++j) { *data_ptrs[j] = mapinfo->bp[j]->b_saveaddr; vunmapbuf(mapinfo->bp[j]); Modified: stable/9/sys/cam/scsi/scsi_all.c ============================================================================== --- stable/9/sys/cam/scsi/scsi_all.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/scsi/scsi_all.c Tue Jun 18 05:21:40 2013 (r251897) @@ -5738,7 +5738,11 @@ scsi_read_write(struct ccb_scsiio *csio, u_int8_t *data_ptr, u_int32_t dxfer_len, u_int8_t sense_len, u_int32_t timeout) { + int read; u_int8_t cdb_len; + + read = (readop & SCSI_RW_DIRMASK) == SCSI_RW_READ; + /* * Use the smallest possible command to perform the operation * as some legacy hardware does not support the 10 byte commands. @@ -5755,7 +5759,7 @@ scsi_read_write(struct ccb_scsiio *csio, struct scsi_rw_6 *scsi_cmd; scsi_cmd = (struct scsi_rw_6 *)&csio->cdb_io.cdb_bytes; - scsi_cmd->opcode = readop ? READ_6 : WRITE_6; + scsi_cmd->opcode = read ? READ_6 : WRITE_6; scsi_ulto3b(lba, scsi_cmd->addr); scsi_cmd->length = block_count & 0xff; scsi_cmd->control = 0; @@ -5774,7 +5778,7 @@ scsi_read_write(struct ccb_scsiio *csio, struct scsi_rw_10 *scsi_cmd; scsi_cmd = (struct scsi_rw_10 *)&csio->cdb_io.cdb_bytes; - scsi_cmd->opcode = readop ? READ_10 : WRITE_10; + scsi_cmd->opcode = read ? READ_10 : WRITE_10; scsi_cmd->byte2 = byte2; scsi_ulto4b(lba, scsi_cmd->addr); scsi_cmd->reserved = 0; @@ -5797,7 +5801,7 @@ scsi_read_write(struct ccb_scsiio *csio, struct scsi_rw_12 *scsi_cmd; scsi_cmd = (struct scsi_rw_12 *)&csio->cdb_io.cdb_bytes; - scsi_cmd->opcode = readop ? READ_12 : WRITE_12; + scsi_cmd->opcode = read ? READ_12 : WRITE_12; scsi_cmd->byte2 = byte2; scsi_ulto4b(lba, scsi_cmd->addr); scsi_cmd->reserved = 0; @@ -5819,7 +5823,7 @@ scsi_read_write(struct ccb_scsiio *csio, struct scsi_rw_16 *scsi_cmd; scsi_cmd = (struct scsi_rw_16 *)&csio->cdb_io.cdb_bytes; - scsi_cmd->opcode = readop ? READ_16 : WRITE_16; + scsi_cmd->opcode = read ? READ_16 : WRITE_16; scsi_cmd->byte2 = byte2; scsi_u64to8b(lba, scsi_cmd->addr); scsi_cmd->reserved = 0; @@ -5830,7 +5834,8 @@ scsi_read_write(struct ccb_scsiio *csio, cam_fill_csio(csio, retries, cbfcnp, - /*flags*/readop ? CAM_DIR_IN : CAM_DIR_OUT, + (read ? CAM_DIR_IN : CAM_DIR_OUT) | + ((readop & SCSI_RW_BIO) != 0 ? CAM_DATA_BIO : 0), tag_action, data_ptr, dxfer_len, Modified: stable/9/sys/cam/scsi/scsi_all.h ============================================================================== --- stable/9/sys/cam/scsi/scsi_all.h Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/scsi/scsi_all.h Tue Jun 18 05:21:40 2013 (r251897) @@ -2469,6 +2469,10 @@ void scsi_write_buffer(struct ccb_scsiio uint8_t *data_ptr, uint32_t param_list_length, uint8_t sense_len, uint32_t timeout); +#define SCSI_RW_READ 0x0001 +#define SCSI_RW_WRITE 0x0002 +#define SCSI_RW_DIRMASK 0x0003 +#define SCSI_RW_BIO 0x1000 void scsi_read_write(struct ccb_scsiio *csio, u_int32_t retries, void (*cbfcnp)(struct cam_periph *, union ccb *), u_int8_t tag_action, int readop, u_int8_t byte2, Modified: stable/9/sys/cam/scsi/scsi_cd.c ============================================================================== --- stable/9/sys/cam/scsi/scsi_cd.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/scsi/scsi_cd.c Tue Jun 18 05:21:40 2013 (r251897) @@ -1571,7 +1571,8 @@ cdstart(struct cam_periph *periph, union /*retries*/ cd_retry_count, /* cbfcnp */ cddone, MSG_SIMPLE_Q_TAG, - /* read */bp->bio_cmd == BIO_READ, + /* read */bp->bio_cmd == BIO_READ ? + SCSI_RW_READ : SCSI_RW_WRITE, /* byte2 */ 0, /* minimum_cmd_size */ 10, /* lba */ bp->bio_offset / Modified: stable/9/sys/cam/scsi/scsi_da.c ============================================================================== --- stable/9/sys/cam/scsi/scsi_da.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/cam/scsi/scsi_da.c Tue Jun 18 05:21:40 2013 (r251897) @@ -1378,7 +1378,7 @@ dadump(void *arg, void *virtual, vm_offs /*retries*/0, dadone, MSG_ORDERED_Q_TAG, - /*read*/FALSE, + /*read*/SCSI_RW_WRITE, /*byte2*/0, /*minimum_cmd_size*/ softc->minimum_cmd_size, offset / secsize, @@ -2030,6 +2030,8 @@ daregister(struct cam_periph *periph, vo softc->disk->d_flags = 0; if ((softc->quirks & DA_Q_NO_SYNC_CACHE) == 0) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; + if ((cpi.hba_misc & PIM_UNMAPPED) != 0) + softc->disk->d_flags |= DISKFLAG_UNMAPPED_BIO; cam_strvis(softc->disk->d_descr, cgd->inq_data.vendor, sizeof(cgd->inq_data.vendor), sizeof(softc->disk->d_descr)); strlcat(softc->disk->d_descr, " ", sizeof(softc->disk->d_descr)); @@ -2390,14 +2392,18 @@ skipstate: /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/tag_code, - /*read_op*/bp->bio_cmd - == BIO_READ, + /*read_op*/(bp->bio_cmd == BIO_READ ? + SCSI_RW_READ : SCSI_RW_WRITE) | + ((bp->bio_flags & BIO_UNMAPPED) != 0 ? + SCSI_RW_BIO : 0), /*byte2*/0, softc->minimum_cmd_size, /*lba*/bp->bio_pblkno, /*block_count*/bp->bio_bcount / softc->params.secsize, - /*data_ptr*/ bp->bio_data, + /*data_ptr*/ (bp->bio_flags & + BIO_UNMAPPED) != 0 ? (void *)bp : + bp->bio_data, /*dxfer_len*/ bp->bio_bcount, /*sense_len*/SSD_FULL_SIZE, da_default_timeout * 1000); Modified: stable/9/sys/dev/ahci/ahci.c ============================================================================== --- stable/9/sys/dev/ahci/ahci.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/dev/ahci/ahci.c Tue Jun 18 05:21:40 2013 (r251897) @@ -3010,7 +3010,7 @@ ahciaction(struct cam_sim *sim, union cc if (ch->caps & AHCI_CAP_SPM) cpi->hba_inquiry |= PI_SATAPM; cpi->target_sprt = 0; - cpi->hba_misc = PIM_SEQSCAN; + cpi->hba_misc = PIM_SEQSCAN | PIM_UNMAPPED; cpi->hba_eng_cnt = 0; if (ch->caps & AHCI_CAP_SPM) cpi->max_target = 15; Modified: stable/9/sys/dev/md/md.c ============================================================================== --- stable/9/sys/dev/md/md.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/dev/md/md.c Tue Jun 18 05:21:40 2013 (r251897) @@ -18,11 +18,16 @@ * Copyright (c) 1988 University of Utah. * Copyright (c) 1990, 1993 * The Regents of the University of California. All rights reserved. + * Copyright (c) 2013 The FreeBSD Foundation + * All rights reserved. * * This code is derived from software contributed to Berkeley by * the Systems Programming Group of the University of Utah Computer * Science Department. * + * Portions of this software were developed by Konstantin Belousov + * under sponsorship from the FreeBSD Foundation. + * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: @@ -59,6 +64,7 @@ #include #include #include +#include #include #include #include @@ -162,6 +168,8 @@ static LIST_HEAD(, md_s) md_softc_list = #define NMASK (NINDIR-1) static int nshift; +static int md_vnode_pbuf_freecnt; + struct indir { uintptr_t *array; u_int total; @@ -408,11 +416,103 @@ g_md_start(struct bio *bp) wakeup(sc); } +#define MD_MALLOC_MOVE_ZERO 1 +#define MD_MALLOC_MOVE_FILL 2 +#define MD_MALLOC_MOVE_READ 3 +#define MD_MALLOC_MOVE_WRITE 4 +#define MD_MALLOC_MOVE_CMP 5 + +static int +md_malloc_move(vm_page_t **mp, int *ma_offs, unsigned sectorsize, + void *ptr, u_char fill, int op) +{ + struct sf_buf *sf; + vm_page_t m, *mp1; + char *p, first; + off_t *uc; + unsigned n; + int error, i, ma_offs1, sz, first_read; + + m = NULL; + error = 0; + sf = NULL; + /* if (op == MD_MALLOC_MOVE_CMP) { gcc */ + first = 0; + first_read = 0; + uc = ptr; + mp1 = *mp; + ma_offs1 = *ma_offs; + /* } */ + sched_pin(); + for (n = sectorsize; n != 0; n -= sz) { + sz = imin(PAGE_SIZE - *ma_offs, n); + if (m != **mp) { + if (sf != NULL) + sf_buf_free(sf); + m = **mp; + sf = sf_buf_alloc(m, SFB_CPUPRIVATE | + (md_malloc_wait ? 0 : SFB_NOWAIT)); + if (sf == NULL) { + error = ENOMEM; + break; + } + } + p = (char *)sf_buf_kva(sf) + *ma_offs; + switch (op) { + case MD_MALLOC_MOVE_ZERO: + bzero(p, sz); + break; + case MD_MALLOC_MOVE_FILL: + memset(p, fill, sz); + break; + case MD_MALLOC_MOVE_READ: + bcopy(ptr, p, sz); + cpu_flush_dcache(p, sz); + break; + case MD_MALLOC_MOVE_WRITE: + bcopy(p, ptr, sz); + break; + case MD_MALLOC_MOVE_CMP: + for (i = 0; i < sz; i++, p++) { + if (!first_read) { + *uc = (u_char)*p; + first = *p; + first_read = 1; + } else if (*p != first) { + error = EDOOFUS; + break; + } + } + break; + default: + KASSERT(0, ("md_malloc_move unknown op %d\n", op)); + break; + } + if (error != 0) + break; + *ma_offs += sz; + *ma_offs %= PAGE_SIZE; + if (*ma_offs == 0) + (*mp)++; + ptr = (char *)ptr + sz; + } + + if (sf != NULL) + sf_buf_free(sf); + sched_unpin(); + if (op == MD_MALLOC_MOVE_CMP && error != 0) { + *mp = mp1; + *ma_offs = ma_offs1; + } + return (error); +} + static int mdstart_malloc(struct md_s *sc, struct bio *bp) { - int i, error; u_char *dst; + vm_page_t *m; + int i, error, error1, ma_offs, notmapped; off_t secno, nsec, uc; uintptr_t sp, osp; @@ -425,9 +525,17 @@ mdstart_malloc(struct md_s *sc, struct b return (EOPNOTSUPP); } + notmapped = (bp->bio_flags & BIO_UNMAPPED) != 0; + if (notmapped) { + m = bp->bio_ma; + ma_offs = bp->bio_ma_offset; + dst = NULL; + } else { + dst = bp->bio_data; + } + nsec = bp->bio_length / sc->sectorsize; secno = bp->bio_offset / sc->sectorsize; - dst = bp->bio_data; error = 0; while (nsec--) { osp = s_read(sc->indir, secno); @@ -435,21 +543,45 @@ mdstart_malloc(struct md_s *sc, struct b if (osp != 0) error = s_write(sc->indir, secno, 0); } else if (bp->bio_cmd == BIO_READ) { - if (osp == 0) - bzero(dst, sc->sectorsize); - else if (osp <= 255) - memset(dst, osp, sc->sectorsize); - else { - bcopy((void *)osp, dst, sc->sectorsize); - cpu_flush_dcache(dst, sc->sectorsize); + if (osp == 0) { + if (notmapped) { + error = md_malloc_move(&m, &ma_offs, + sc->sectorsize, NULL, 0, + MD_MALLOC_MOVE_ZERO); + } else + bzero(dst, sc->sectorsize); + } else if (osp <= 255) { + if (notmapped) { + error = md_malloc_move(&m, &ma_offs, + sc->sectorsize, NULL, osp, + MD_MALLOC_MOVE_FILL); + } else + memset(dst, osp, sc->sectorsize); + } else { + if (notmapped) { + error = md_malloc_move(&m, &ma_offs, + sc->sectorsize, (void *)osp, 0, + MD_MALLOC_MOVE_READ); + } else { + bcopy((void *)osp, dst, sc->sectorsize); + cpu_flush_dcache(dst, sc->sectorsize); + } } osp = 0; } else if (bp->bio_cmd == BIO_WRITE) { if (sc->flags & MD_COMPRESS) { - uc = dst[0]; - for (i = 1; i < sc->sectorsize; i++) - if (dst[i] != uc) - break; + if (notmapped) { + error1 = md_malloc_move(&m, &ma_offs, + sc->sectorsize, &uc, 0, + MD_MALLOC_MOVE_CMP); + i = error1 == 0 ? sc->sectorsize : 0; + } else { + uc = dst[0]; + for (i = 1; i < sc->sectorsize; i++) { + if (dst[i] != uc) + break; + } + } } else { i = 0; uc = 0; @@ -466,10 +598,26 @@ mdstart_malloc(struct md_s *sc, struct b error = ENOSPC; break; } - bcopy(dst, (void *)sp, sc->sectorsize); + if (notmapped) { + error = md_malloc_move(&m, + &ma_offs, sc->sectorsize, + (void *)sp, 0, + MD_MALLOC_MOVE_WRITE); + } else { + bcopy(dst, (void *)sp, + sc->sectorsize); + } error = s_write(sc->indir, secno, sp); } else { - bcopy(dst, (void *)osp, sc->sectorsize); + if (notmapped) { + error = md_malloc_move(&m, + &ma_offs, sc->sectorsize, + (void *)osp, 0, + MD_MALLOC_MOVE_WRITE); + } else { + bcopy(dst, (void *)osp, + sc->sectorsize); + } osp = 0; } } @@ -481,7 +629,8 @@ mdstart_malloc(struct md_s *sc, struct b if (error != 0) break; secno++; - dst += sc->sectorsize; + if (!notmapped) + dst += sc->sectorsize; } bp->bio_resid = 0; return (error); @@ -514,6 +663,7 @@ mdstart_vnode(struct md_s *sc, struct bi struct iovec aiov; struct mount *mp; struct vnode *vp; + struct buf *pb; struct thread *td; off_t end, zerosize; @@ -589,7 +739,17 @@ mdstart_vnode(struct md_s *sc, struct bi return (error); } - aiov.iov_base = bp->bio_data; + KASSERT(bp->bio_length <= MAXPHYS, ("bio_length %jd", + (uintmax_t)bp->bio_length)); + if ((bp->bio_flags & BIO_UNMAPPED) == 0) { + pb = NULL; + aiov.iov_base = bp->bio_data; + } else { + pb = getpbuf(&md_vnode_pbuf_freecnt); + pmap_qenter((vm_offset_t)pb->b_data, bp->bio_ma, bp->bio_ma_n); + aiov.iov_base = (void *)((vm_offset_t)pb->b_data + + bp->bio_ma_offset); + } aiov.iov_len = bp->bio_length; auio.uio_iov = &aiov; auio.uio_iovcnt = 1; @@ -620,6 +780,10 @@ mdstart_vnode(struct md_s *sc, struct bi VOP_UNLOCK(vp, 0); vn_finished_write(mp); } + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + pmap_qremove((vm_offset_t)pb->b_data, bp->bio_ma_n); + relpbuf(pb, &md_vnode_pbuf_freecnt); + } VFS_UNLOCK_GIANT(vfslocked); bp->bio_resid = auio.uio_resid; return (error); @@ -628,11 +792,10 @@ mdstart_vnode(struct md_s *sc, struct bi static int mdstart_swap(struct md_s *sc, struct bio *bp) { - struct sf_buf *sf; - int rv, offs, len, lastend; - vm_pindex_t i, lastp; vm_page_t m; u_char *p; + vm_pindex_t i, lastp; + int rv, ma_offs, offs, len, lastend; switch (bp->bio_cmd) { case BIO_READ: @@ -644,6 +807,7 @@ mdstart_swap(struct md_s *sc, struct bio } p = bp->bio_data; + ma_offs = (bp->bio_flags & BIO_UNMAPPED) == 0 ? 0 : bp->bio_ma_offset; /* * offs is the offset at which to start operating on the @@ -661,21 +825,14 @@ mdstart_swap(struct md_s *sc, struct bio vm_object_pip_add(sc->object, 1); for (i = bp->bio_offset / PAGE_SIZE; i <= lastp; i++) { len = ((i == lastp) ? lastend : PAGE_SIZE) - offs; - - m = vm_page_grab(sc->object, i, - VM_ALLOC_NORMAL|VM_ALLOC_RETRY); - VM_OBJECT_UNLOCK(sc->object); - sched_pin(); - sf = sf_buf_alloc(m, SFB_CPUPRIVATE); - VM_OBJECT_LOCK(sc->object); + m = vm_page_grab(sc->object, i, VM_ALLOC_NORMAL | + VM_ALLOC_RETRY); if (bp->bio_cmd == BIO_READ) { if (m->valid == VM_PAGE_BITS_ALL) rv = VM_PAGER_OK; else rv = vm_pager_get_pages(sc->object, &m, 1, 0); if (rv == VM_PAGER_ERROR) { - sf_buf_free(sf); - sched_unpin(); vm_page_wakeup(m); break; } else if (rv == VM_PAGER_FAIL) { @@ -685,23 +842,31 @@ mdstart_swap(struct md_s *sc, struct bio * valid. Do not set dirty, the page * can be recreated if thrown out. */ - bzero((void *)sf_buf_kva(sf), PAGE_SIZE); + pmap_zero_page(m); m->valid = VM_PAGE_BITS_ALL; } - bcopy((void *)(sf_buf_kva(sf) + offs), p, len); - cpu_flush_dcache(p, len); + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + pmap_copy_pages(&m, offs, bp->bio_ma, + ma_offs, len); + } else { + physcopyout(VM_PAGE_TO_PHYS(m) + offs, p, len); + cpu_flush_dcache(p, len); + } } else if (bp->bio_cmd == BIO_WRITE) { if (len != PAGE_SIZE && m->valid != VM_PAGE_BITS_ALL) rv = vm_pager_get_pages(sc->object, &m, 1, 0); else rv = VM_PAGER_OK; if (rv == VM_PAGER_ERROR) { - sf_buf_free(sf); - sched_unpin(); vm_page_wakeup(m); break; } - bcopy(p, (void *)(sf_buf_kva(sf) + offs), len); + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + pmap_copy_pages(bp->bio_ma, ma_offs, &m, + offs, len); + } else { + physcopyin(p, VM_PAGE_TO_PHYS(m) + offs, len); + } m->valid = VM_PAGE_BITS_ALL; } else if (bp->bio_cmd == BIO_DELETE) { if (len != PAGE_SIZE && m->valid != VM_PAGE_BITS_ALL) @@ -709,20 +874,16 @@ mdstart_swap(struct md_s *sc, struct bio else rv = VM_PAGER_OK; if (rv == VM_PAGER_ERROR) { - sf_buf_free(sf); - sched_unpin(); vm_page_wakeup(m); break; } if (len != PAGE_SIZE) { - bzero((void *)(sf_buf_kva(sf) + offs), len); + pmap_zero_page_area(m, offs, len); vm_page_clear_dirty(m, offs, len); m->valid = VM_PAGE_BITS_ALL; } else vm_pager_page_unswapped(m); } - sf_buf_free(sf); - sched_unpin(); vm_page_wakeup(m); vm_page_lock(m); if (bp->bio_cmd == BIO_DELETE && len == PAGE_SIZE) @@ -736,6 +897,7 @@ mdstart_swap(struct md_s *sc, struct bio /* Actions on further pages start at offset 0 */ p += PAGE_SIZE - offs; offs = 0; + ma_offs += len; } vm_object_pip_subtract(sc->object, 1); VM_OBJECT_UNLOCK(sc->object); @@ -851,6 +1013,15 @@ mdinit(struct md_s *sc) pp = g_new_providerf(gp, "md%d", sc->unit); pp->mediasize = sc->mediasize; pp->sectorsize = sc->sectorsize; + switch (sc->type) { + case MD_MALLOC: + case MD_VNODE: + case MD_SWAP: + pp->flags |= G_PF_ACCEPT_UNMAPPED; + break; + case MD_PRELOAD: + break; + } sc->gp = gp; sc->pp = pp; g_error_provider(pp, 0); @@ -1311,6 +1482,7 @@ g_md_init(struct g_class *mp __unused) sx_xunlock(&md_sx); } } + md_vnode_pbuf_freecnt = nswbuf / 10; status_dev = make_dev(&mdctl_cdevsw, INT_MAX, UID_ROOT, GID_WHEEL, 0600, MDCTL_NAME); g_topology_lock(); Modified: stable/9/sys/dev/siis/siis.c ============================================================================== --- stable/9/sys/dev/siis/siis.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/dev/siis/siis.c Tue Jun 18 05:21:40 2013 (r251897) @@ -1939,7 +1939,7 @@ siisaction(struct cam_sim *sim, union cc cpi->hba_inquiry = PI_SDTR_ABLE | PI_TAG_ABLE; cpi->hba_inquiry |= PI_SATAPM; cpi->target_sprt = 0; - cpi->hba_misc = PIM_SEQSCAN; + cpi->hba_misc = PIM_SEQSCAN | PIM_UNMAPPED; cpi->hba_eng_cnt = 0; cpi->max_target = 15; cpi->max_lun = 0; Modified: stable/9/sys/dev/tws/tws.h ============================================================================== --- stable/9/sys/dev/tws/tws.h Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/dev/tws/tws.h Tue Jun 18 05:21:40 2013 (r251897) @@ -137,7 +137,7 @@ enum tws_req_flags { TWS_DIR_IN = 0x2, TWS_DIR_OUT = 0x4, TWS_DIR_NONE = 0x8, - TWS_DATA_CCB = 0x16, + TWS_DATA_CCB = 0x10, }; enum tws_intrs { Modified: stable/9/sys/geom/geom.h ============================================================================== --- stable/9/sys/geom/geom.h Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/geom/geom.h Tue Jun 18 05:21:40 2013 (r251897) @@ -201,6 +201,7 @@ struct g_provider { #define G_PF_CANDELETE 0x1 #define G_PF_WITHER 0x2 #define G_PF_ORPHAN 0x4 +#define G_PF_ACCEPT_UNMAPPED 0x8 /* Two fields for the implementing class to use */ void *private; Modified: stable/9/sys/geom/geom_disk.c ============================================================================== --- stable/9/sys/geom/geom_disk.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/geom/geom_disk.c Tue Jun 18 05:21:40 2013 (r251897) @@ -299,13 +299,29 @@ g_disk_start(struct bio *bp) do { bp2->bio_offset += off; bp2->bio_length -= off; - bp2->bio_data += off; + if ((bp->bio_flags & BIO_UNMAPPED) == 0) { + bp2->bio_data += off; + } else { + KASSERT((dp->d_flags & DISKFLAG_UNMAPPED_BIO) + != 0, + ("unmapped bio not supported by disk %s", + dp->d_name)); + bp2->bio_ma += off / PAGE_SIZE; + bp2->bio_ma_offset += off; + bp2->bio_ma_offset %= PAGE_SIZE; + bp2->bio_ma_n -= off / PAGE_SIZE; + } if (bp2->bio_length > dp->d_maxsize) { /* * XXX: If we have a stripesize we should really * use it here. */ bp2->bio_length = dp->d_maxsize; + if ((bp->bio_flags & BIO_UNMAPPED) != 0) { + bp2->bio_ma_n = howmany( + bp2->bio_ma_offset + + bp2->bio_length, PAGE_SIZE); + } off += dp->d_maxsize; /* * To avoid a race, we need to grab the next bio @@ -467,6 +483,8 @@ g_disk_create(void *arg, int flag) pp->flags |= G_PF_CANDELETE; pp->stripeoffset = dp->d_stripeoffset; pp->stripesize = dp->d_stripesize; + if ((dp->d_flags & DISKFLAG_UNMAPPED_BIO) != 0) + pp->flags |= G_PF_ACCEPT_UNMAPPED; if (bootverbose) printf("GEOM: new disk %s\n", gp->name); sysctl_ctx_init(&sc->sysctl_ctx); Modified: stable/9/sys/geom/geom_disk.h ============================================================================== --- stable/9/sys/geom/geom_disk.h Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/geom/geom_disk.h Tue Jun 18 05:21:40 2013 (r251897) @@ -106,6 +106,7 @@ struct disk { #define DISKFLAG_CANDELETE 0x4 #define DISKFLAG_CANFLUSHCACHE 0x8 #define DISKFLAG_LACKS_GONE 0x10 +#define DISKFLAG_UNMAPPED_BIO 0x20 struct disk *disk_alloc(void); void disk_create(struct disk *disk, int version); Modified: stable/9/sys/geom/geom_io.c ============================================================================== --- stable/9/sys/geom/geom_io.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/geom/geom_io.c Tue Jun 18 05:21:40 2013 (r251897) @@ -1,6 +1,7 @@ /*- * Copyright (c) 2002 Poul-Henning Kamp * Copyright (c) 2002 Networks Associates Technology, Inc. + * Copyright (c) 2013 The FreeBSD Foundation * All rights reserved. * * This software was developed for the FreeBSD Project by Poul-Henning Kamp @@ -8,6 +9,9 @@ * under DARPA/SPAWAR contract N66001-01-C-8035 ("CBOSS"), as part of the * DARPA CHATS research program. * + * Portions of this software were developed by Konstantin Belousov + * under sponsorship from the FreeBSD Foundation. + * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: @@ -44,6 +48,7 @@ __FBSDID("$FreeBSD$"); #include #include #include +#include #include #include @@ -51,6 +56,13 @@ __FBSDID("$FreeBSD$"); #include #include +#include +#include +#include +#include +#include +#include +#include static struct g_bioq g_bio_run_down; static struct g_bioq g_bio_run_up; @@ -180,12 +192,17 @@ g_clone_bio(struct bio *bp) /* * BIO_ORDERED flag may be used by disk drivers to enforce * ordering restrictions, so this flag needs to be cloned. + * BIO_UNMAPPED should be inherited, to properly indicate + * which way the buffer is passed. * Other bio flags are not suitable for cloning. */ - bp2->bio_flags = bp->bio_flags & BIO_ORDERED; + bp2->bio_flags = bp->bio_flags & (BIO_ORDERED | BIO_UNMAPPED); bp2->bio_length = bp->bio_length; bp2->bio_offset = bp->bio_offset; bp2->bio_data = bp->bio_data; + bp2->bio_ma = bp->bio_ma; + bp2->bio_ma_n = bp->bio_ma_n; + bp2->bio_ma_offset = bp->bio_ma_offset; bp2->bio_attribute = bp->bio_attribute; /* Inherit classification info from the parent */ bp2->bio_classifier1 = bp->bio_classifier1; @@ -210,11 +227,15 @@ g_duplicate_bio(struct bio *bp) struct bio *bp2; bp2 = uma_zalloc(biozone, M_WAITOK | M_ZERO); + bp2->bio_flags = bp->bio_flags & BIO_UNMAPPED; bp2->bio_parent = bp; bp2->bio_cmd = bp->bio_cmd; bp2->bio_length = bp->bio_length; bp2->bio_offset = bp->bio_offset; bp2->bio_data = bp->bio_data; + bp2->bio_ma = bp->bio_ma; + bp2->bio_ma_n = bp->bio_ma_n; + bp2->bio_ma_offset = bp->bio_ma_offset; bp2->bio_attribute = bp->bio_attribute; bp->bio_children++; #ifdef KTR @@ -575,6 +596,85 @@ g_io_deliver(struct bio *bp, int error) return; } +SYSCTL_DECL(_kern_geom); + +static long transient_maps; +SYSCTL_LONG(_kern_geom, OID_AUTO, transient_maps, CTLFLAG_RD, + &transient_maps, 0, + "Total count of the transient mapping requests"); +u_int transient_map_retries = 10; +SYSCTL_UINT(_kern_geom, OID_AUTO, transient_map_retries, CTLFLAG_RW, + &transient_map_retries, 0, + "Max count of retries used before giving up on creating transient map"); +int transient_map_hard_failures; +SYSCTL_INT(_kern_geom, OID_AUTO, transient_map_hard_failures, CTLFLAG_RD, + &transient_map_hard_failures, 0, + "Failures to establish the transient mapping due to retry attempts " + "exhausted"); +int transient_map_soft_failures; +SYSCTL_INT(_kern_geom, OID_AUTO, transient_map_soft_failures, CTLFLAG_RD, + &transient_map_soft_failures, 0, + "Count of retried failures to establish the transient mapping"); +int inflight_transient_maps; +SYSCTL_INT(_kern_geom, OID_AUTO, inflight_transient_maps, CTLFLAG_RD, + &inflight_transient_maps, 0, + "Current count of the active transient maps"); + +static int +g_io_transient_map_bio(struct bio *bp) +{ + vm_offset_t addr; + long size; + u_int retried; + int rv; + + KASSERT(unmapped_buf_allowed, ("unmapped disabled")); + + size = round_page(bp->bio_ma_offset + bp->bio_length); + KASSERT(size / PAGE_SIZE == bp->bio_ma_n, ("Bio too short %p", bp)); + addr = 0; + retried = 0; + atomic_add_long(&transient_maps, 1); +retry: + vm_map_lock(bio_transient_map); + if (vm_map_findspace(bio_transient_map, vm_map_min(bio_transient_map), + size, &addr)) { + vm_map_unlock(bio_transient_map); + if (transient_map_retries != 0 && + retried >= transient_map_retries) { + g_io_deliver(bp, EDEADLK/* XXXKIB */); + CTR2(KTR_GEOM, "g_down cannot map bp %p provider %s", + bp, bp->bio_to->name); + atomic_add_int(&transient_map_hard_failures, 1); + return (1); + } else { + /* + * Naive attempt to quisce the I/O to get more + * in-flight requests completed and defragment + * the bio_transient_map. + */ + CTR3(KTR_GEOM, "g_down retrymap bp %p provider %s r %d", + bp, bp->bio_to->name, retried); + pause("g_d_tra", hz / 10); + retried++; + atomic_add_int(&transient_map_soft_failures, 1); + goto retry; + } + } + rv = vm_map_insert(bio_transient_map, NULL, 0, addr, addr + size, + VM_PROT_RW, VM_PROT_RW, MAP_NOFAULT); + KASSERT(rv == KERN_SUCCESS, + ("vm_map_insert(bio_transient_map) rv %d %jx %lx", + rv, (uintmax_t)addr, size)); + vm_map_unlock(bio_transient_map); + atomic_add_int(&inflight_transient_maps, 1); + pmap_qenter((vm_offset_t)addr, bp->bio_ma, OFF_TO_IDX(size)); + bp->bio_data = (caddr_t)addr + bp->bio_ma_offset; + bp->bio_flags |= BIO_TRANSIENT_MAPPING; + bp->bio_flags &= ~BIO_UNMAPPED; + return (0); +} + void g_io_schedule_down(struct thread *tp __unused) { @@ -636,6 +736,12 @@ g_io_schedule_down(struct thread *tp __u default: break; } + if ((bp->bio_flags & BIO_UNMAPPED) != 0 && + (bp->bio_to->flags & G_PF_ACCEPT_UNMAPPED) == 0 && + (bp->bio_cmd == BIO_READ || bp->bio_cmd == BIO_WRITE)) { + if (g_io_transient_map_bio(bp)) + continue; + } THREAD_NO_SLEEPING(); CTR4(KTR_GEOM, "g_down starting bp %p provider %s off %ld " "len %ld", bp, bp->bio_to->name, bp->bio_offset, Modified: stable/9/sys/geom/geom_vfs.c ============================================================================== --- stable/9/sys/geom/geom_vfs.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/geom/geom_vfs.c Tue Jun 18 05:21:40 2013 (r251897) @@ -193,14 +193,14 @@ g_vfs_strategy(struct bufobj *bo, struct bip = g_alloc_bio(); bip->bio_cmd = bp->b_iocmd; bip->bio_offset = bp->b_iooffset; - bip->bio_data = bp->b_data; - bip->bio_done = g_vfs_done; - bip->bio_caller2 = bp; bip->bio_length = bp->b_bcount; - if (bp->b_flags & B_BARRIER) { + bdata2bio(bp, bip); + if ((bp->b_flags & B_BARRIER) != 0) { bip->bio_flags |= BIO_ORDERED; bp->b_flags &= ~B_BARRIER; } + bip->bio_done = g_vfs_done; + bip->bio_caller2 = bp; g_io_request(bip, cp); } Modified: stable/9/sys/geom/part/g_part.c ============================================================================== --- stable/9/sys/geom/part/g_part.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/geom/part/g_part.c Tue Jun 18 05:21:40 2013 (r251897) @@ -429,6 +429,7 @@ g_part_new_provider(struct g_geom *gp, s entry->gpe_pp->stripeoffset = pp->stripeoffset + entry->gpe_offset; if (pp->stripesize > 0) entry->gpe_pp->stripeoffset %= pp->stripesize; + entry->gpe_pp->flags |= pp->flags & G_PF_ACCEPT_UNMAPPED; g_error_provider(entry->gpe_pp, 0); } Modified: stable/9/sys/i386/i386/pmap.c ============================================================================== --- stable/9/sys/i386/i386/pmap.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/i386/i386/pmap.c Tue Jun 18 05:21:40 2013 (r251897) @@ -4256,6 +4256,8 @@ pmap_copy_page(vm_page_t src, vm_page_t mtx_unlock(&sysmaps->lock); } +int unmapped_buf_allowed = 1; + void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) Modified: stable/9/sys/i386/include/param.h ============================================================================== --- stable/9/sys/i386/include/param.h Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/i386/include/param.h Tue Jun 18 05:21:40 2013 (r251897) @@ -140,9 +140,12 @@ * Ceiling on size of buffer cache (really only effects write queueing, * the VM page cache is not effected), can be changed via * the kern.maxbcache /boot/loader.conf variable. + * + * The value is equal to the size of the auto-tuned buffer map for + * the machine with 4GB of RAM, see vfs_bio.c:kern_vfs_bio_buffer_alloc(). */ #ifndef VM_BCACHE_SIZE_MAX -#define VM_BCACHE_SIZE_MAX (200 * 1024 * 1024) +#define VM_BCACHE_SIZE_MAX (7224 * 16 * 1024) #endif /* Modified: stable/9/sys/i386/xen/pmap.c ============================================================================== --- stable/9/sys/i386/xen/pmap.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/i386/xen/pmap.c Tue Jun 18 05:21:40 2013 (r251897) @@ -3444,6 +3444,8 @@ pmap_copy_page(vm_page_t src, vm_page_t mtx_unlock(&sysmaps->lock); } +int unmapped_buf_allowed = 1; + void pmap_copy_pages(vm_page_t ma[], vm_offset_t a_offset, vm_page_t mb[], vm_offset_t b_offset, int xfersize) Modified: stable/9/sys/ia64/ia64/pmap.c ============================================================================== --- stable/9/sys/ia64/ia64/pmap.c Tue Jun 18 04:57:36 2013 (r251896) +++ stable/9/sys/ia64/ia64/pmap.c Tue Jun 18 05:21:40 2013 (r251897) @@ -1884,6 +1884,8 @@ pmap_copy_page(vm_page_t msrc, vm_page_t bcopy(src, dst, PAGE_SIZE); } *** DIFF OUTPUT TRUNCATED AT 1000 LINES ***