From owner-freebsd-questions@FreeBSD.ORG Mon Jul 18 09:46:42 2011 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 0DDA3106564A for ; Mon, 18 Jul 2011 09:46:42 +0000 (UTC) (envelope-from f.bonnet@esiee.fr) Received: from hp9.esiee.fr (hp9.esiee.fr [147.215.1.4]) by mx1.freebsd.org (Postfix) with ESMTP id B90E48FC08 for ; Mon, 18 Jul 2011 09:46:41 +0000 (UTC) Received: from mail.esiee.fr (mail.esiee.fr [147.215.1.3]) by hp9.esiee.fr (Postfix) with ESMTP id A9F6B14E9787 for ; Mon, 18 Jul 2011 11:46:40 +0200 (CEST) X-DKIM: OpenDKIM Filter v2.4.1 hp9.esiee.fr A9F6B14E9787 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=esiee.fr; s=MAILOUT; t=1310982400; bh=q/kgnYIoF57q3b/cSjCf2PiC7Bw4L/HqrB15UYPyztQ=; h=Message-ID:Date:From:MIME-Version:To:Subject:References: In-Reply-To:Content-Type:Content-Transfer-Encoding; b=r2mQk5nkWjTltjsjBUDkoo0onKUEZ8nA0ImLQZZgXxrwjLea99yJJaJPBJFxLWWNo iUBxS5399Hx+l5lcrHOafh1mwr3EuxOsVrSOEmVfW/kg8bRWyJYubbLYTqNqXJkmgy LIMf5+yNR55LTCBDWyMCfzm5n9/RNn9j3YqeBAgQ= Received: from mail.esiee.fr (localhost [127.0.0.1]) by VAMS.dummy (Postfix) with SMTP id 8FC923C3CB8 for ; Mon, 18 Jul 2011 11:46:40 +0200 (CEST) Received: from secure.esiee.fr (secure.esiee.fr [147.215.1.19]) by mail.esiee.fr (Postfix) with ESMTP id 5C1DE3C3CB7 for ; Mon, 18 Jul 2011 11:46:40 +0200 (CEST) Received: from [147.215.1.21] (lisa.esiee.fr [147.215.1.21]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: bonnetf) by secure.esiee.fr (Postfix) with ESMTPSA id 59A58EAF20 for ; Mon, 18 Jul 2011 11:46:40 +0200 (CEST) Message-ID: <4E240100.5070506@esiee.fr> Date: Mon, 18 Jul 2011 11:46:40 +0200 From: Frank Bonnet User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Lightning/1.0b2 Thunderbird/3.1.11 MIME-Version: 1.0 To: freebsd-questions@freebsd.org References: <201107180944.p6I9iAJ9022931@mail.r-bonomi.com> In-Reply-To: <201107180944.p6I9iAJ9022931@mail.r-bonomi.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Tools to find "unlegal" files ( videos , music etc ) X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Jul 2011 09:46:42 -0000 On 07/18/2011 11:44 AM, Robert Bonomi wrote: >> From owner-freebsd-questions@freebsd.org Mon Jul 18 03:55:59 2011 >> Date: Mon, 18 Jul 2011 10:55:58 +0200 >> From: Frank Bonnet >> To: freebsd-questions@freebsd.org >> Subject: Re: Tools to find "unlegal" files ( videos , music etc ) >> >> On 07/18/2011 10:45 AM, Polytropon wrote: >>> On Mon, 18 Jul 2011 10:38:22 +0200, Frank Bonnet wrote: >>>> On 07/18/2011 10:10 AM, Polytropon wrote: >>>>> On Mon, 18 Jul 2011 09:55:09 +0200, Frank Bonnet wrote: >>>>>> Hello >>>>>> >>>>>> Anyone knows an utility that I could pipe to the "find" command >>>>>> in order to detect video, music, games ... etc files ? >>>>>> >>>>>> I need a tool that could "inspect" inside files because many users >>>>>> rename those filename to "inoffensive" ones :-) >>>>> One way could be to define a list of file extensions that >>>>> commonly matches the content you want to track. Of course, >>>>> the file name does not directly correspond to the content, >>>>> but it often gives a good hint to search for *.wmv, *.flv, >>>>> *.avi, *.mp(e)g, *.mp3, *.wma, *.exe - and of course all >>>>> the variations of the extensions with uppercase letters. >>>>> Also consider *.rar and maybe *.zip for compressed content. >>>>> >>>>> If file extensions have been manipulated (rare case), the >>>>> "file" command can still identify the correct file type. >>>>> >>>>> >>>>> >>>>> >>>> yes thanks , gonna try with the file command >>> You could make a simple script that lists "file" output for >>> all files (just to be sure because of possible suffix renaming) >>> for further inspection. Sometimes, you can also run "strings" >>> for a given file - maybe that can be used to identify typical >>> suspicious string patters for a "strings + grep" combination >>> so less manual identification has to be done. >>> >>> >> yes , my main problem is the huge number of files >> but anyway I'm gonna first check files greater than 500 Mb >> it could be a good start > That's what 'find(1)' is for. Something like (run as superuser): > > find / -exec ./inspect {}>> /tmp/suspects \; > > with './inspect' being a trivial (executable!) shell-script: > > #!/bin/sh > file $1 | awk -f ./inspect.awk > > and './inspect.awk' is: > > {file = $1 ; $1 = "";} > /regex1/ {printf("%s %s\n",file,$0;next); > /regex2/ {printf("%s %s\n",file,$0;next); > /regex3/ {printf("%s %s\n",file,$0;next); > ... ... > ... ... > {next;} > > where 'regex1', 'regex2', etc. are things to select 'files' of interest, > based on what 'file' reports. The awk code strips out the file name, so > that the regex will match only against the 'file' output, with no false- > Positives against a substring in the file name itself. > > See the find(1) manpage for things you can put before the '-exec' param, > to filter by size, etc. You can also limit the search to a specific > part of the filesystem tree, by replacing '/' with the name of the directory > hierarchy you want to search -- e.g. '/home' (if that's where all 'user' > files are) -- although, 'for completeness' (given the 'legal" issues) you > may well want to run it over 'everything'. > > > _______________________________________________ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org" Thanks a lot for your help !