Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Aug 2006 11:39:23 +1000
From:      Antony Mawer <fbsd-arch@mawer.org>
To:        "Marc G. Fournier" <scrappy@freebsd.org>
Cc:        Max Laier <max@love2party.net>, freebsd-arch@freebsd.org
Subject:   Re: BSDStats - What is involved ... ?
Message-ID:  <44F39ACB.6090703@mawer.org>
In-Reply-To: <20060828170450.M82634@hub.org>
References:  <20060825233420.V82634@hub.org>	<20060826112115.GG16768@turion.vk2pj.dyndns.org>	<20060826132138.H82634@hub.org> <200608261848.16513.max@love2party.net>	<20060826165209.V82634@hub.org>	<20060828130247.GA77702@lor.one-eyed-alien.net> <20060828170450.M82634@hub.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 29/08/2006 6:07 AM, Marc G. Fournier wrote:
> On Mon, 28 Aug 2006, Brooks Davis wrote:
> 
>> While I understand (or think I understand) the motivations for this 
>> design goal, it's contrary to allowing collection of statistics from 
>> many people.  I'd love to be able to publish data from the FreeBSD 
>> systems (300+) at work, but unless I can do it in an anonymized 
>> aggregate form it's not going to happen.  I just can't justify leaking 
>> that much internal configuration information given a policy of hiding 
>> it (right or wrong and not subject to debate).  If I could run my own 
>> stats server and publish from it that might be possible.
> 
> Agreggate submissions will never be possible, as it will definitely 
> break any attempts at keeping the data 'clean' :(  I do understand that 
> we will never be able to get *everyone* reporting, but we will try as 
> much as possible to make it easy for as many as possible to report 
> *within* limits ...
> 
> I'm going to work on an 'email submission' method in September, that 
> would allow repoting to go *thru* one mailbox, and will include a 
> confirmation/challenge stage *per* server though ...

Brooks, what sort of information are you looking to "anonymise" before 
sending it out? Aggregating to say that I have X of this kind of CPU, Y 
of this IDE chipset, etc, rather than linking it specifically to each 
machine? Where would you feel a comfortable balance lay? Obviously some 
effort needs to be made to minimise fraudulent entries

Perhaps aggregate submissions could be conducted using a registration 
mechanism...

Other thoughts would be having a local stats aggregation server that 
pushes summaries up to the master server... the aggregation server keeps 
the individual details, and some sort of challenge mechanism could be 
randomly selected by the master server to reduce the ease with which the 
numbers can be 'faked'?

... just rambling as I thought of potential ways around this ...




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?44F39ACB.6090703>