Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Jul 2011 11:46:40 +0200
From:      Frank Bonnet <f.bonnet@esiee.fr>
To:        freebsd-questions@freebsd.org
Subject:   Re: Tools to find "unlegal" files ( videos , music etc )
Message-ID:  <4E240100.5070506@esiee.fr>
In-Reply-To: <201107180944.p6I9iAJ9022931@mail.r-bonomi.com>
References:  <201107180944.p6I9iAJ9022931@mail.r-bonomi.com>

Next in thread | Previous in thread | Raw E-Mail | Index | Archive | Help
On 07/18/2011 11:44 AM, Robert Bonomi wrote:
>>  From owner-freebsd-questions@freebsd.org  Mon Jul 18 03:55:59 2011
>> Date: Mon, 18 Jul 2011 10:55:58 +0200
>> From: Frank Bonnet<f.bonnet@esiee.fr>
>> To: freebsd-questions@freebsd.org
>> Subject: Re: Tools to find "unlegal" files ( videos , music etc )
>>
>> On 07/18/2011 10:45 AM, Polytropon wrote:
>>> On Mon, 18 Jul 2011 10:38:22 +0200, Frank Bonnet wrote:
>>>> On 07/18/2011 10:10 AM, Polytropon wrote:
>>>>> On Mon, 18 Jul 2011 09:55:09 +0200, Frank Bonnet wrote:
>>>>>> Hello
>>>>>>
>>>>>> Anyone knows an utility that I could pipe to the "find" command
>>>>>> in order to detect video, music, games ... etc  files ?
>>>>>>
>>>>>> I need a tool that could "inspect" inside files because many users
>>>>>> rename those filename to "inoffensive" ones :-)
>>>>> One way could be to define a list of file extensions that
>>>>> commonly matches the content you want to track. Of course,
>>>>> the file name does not directly correspond to the content,
>>>>> but it often gives a good hint to search for *.wmv, *.flv,
>>>>> *.avi, *.mp(e)g, *.mp3, *.wma, *.exe - and of course all
>>>>> the variations of the extensions with uppercase letters.
>>>>> Also consider *.rar and maybe *.zip for compressed content.
>>>>>
>>>>> If file extensions have been manipulated (rare case), the
>>>>> "file" command can still identify the correct file type.
>>>>>
>>>>>
>>>>>
>>>>>
>>>> yes thanks , gonna try with the file command
>>> You could make a simple script that lists "file" output for
>>> all files (just to be sure because of possible suffix renaming)
>>> for further inspection. Sometimes, you can also run "strings"
>>> for a given file - maybe that can be used to identify typical
>>> suspicious string patters for a "strings + grep" combination
>>> so less manual identification has to be done.
>>>
>>>
>> yes , my main problem is the huge number of files
>> but anyway I'm gonna first check files greater than 500 Mb
>> it could be a good start
> That's what 'find(1)' is for.  Something like (run as superuser):
>
>   find / -exec  ./inspect {}>>  /tmp/suspects \;
>
> with './inspect' being a trivial (executable!) shell-script:
>
>      #!/bin/sh
>      file $1 | awk -f  ./inspect.awk
>
> and './inspect.awk' is:
>
>            {file = $1 ; $1 = "";}
> /regex1/  {printf("%s  %s\n",file,$0;next);
> /regex2/  {printf("%s  %s\n",file,$0;next);
> /regex3/  {printf("%s  %s\n",file,$0;next);
>    ...      ...
>    ...      ...
>            {next;}
>
> where 'regex1', 'regex2', etc. are things to select 'files' of interest,
> based on what 'file' reports.  The awk code strips out the file name, so
> that the regex will match only against the 'file' output, with no false-
> Positives against a substring in the file name itself.
>
> See the find(1) manpage for things you can put before the '-exec' param,
> to filter by size, etc.  You can also limit the search to a specific
> part of the filesystem tree, by replacing '/' with the name of the directory
> hierarchy you want to search -- e.g. '/home' (if that's where all 'user'
> files are) -- although, 'for completeness' (given the 'legal" issues)  you
> may well want to run it over 'everything'.
>
>
> _______________________________________________
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "freebsd-questions-unsubscribe@freebsd.org"

Thanks a lot for your help !




Want to link to this message? Use this URL: <http://docs.FreeBSD.org/cgi/mid.cgi?4E240100.5070506>