Search This Blog

Monday, March 17, 2008

What's the Matter with MediaSentry?

Let me attempt to give a brief sketch of how P2P applications work, targeted toward non-comp-sci people, particularly with respect to the RIAA file sharing suits. This is taken from the general knowledge I have about P2P programs as a computer scientist, and in some cases I know specific things about specific applications.

Basic Architecture
In P2P applications, individual computers act as either clients or servers (or both at the same time), depending on whether they're downloading, uploading, or both at a given time. Somewhere there is a list of computers in a given P2P network - which users must join at some point. For eDonkey, you log into a large network when you start the program, and these large networks contain many peers who may never communicate with each other. With BitTorrent, you are logged into a small network - only containing the files you are downloading/uploading (e.g. MP3s of a single CD) - only while you are downloading/uploading, and you may be connected to many such networks at once.

Somewhere there is a directory of users. This may be stored on a single server (I believe eDonkey was this way), or a network where a computer asks another computer "who have you seen recently?", and then asks that question of all the computers that are returned from the query, etc.

In all cases (unless you've got some kind of file-sharing virus, which I'm actually surprised we haven't seen before), the user voluntarily logs into and out of the network(s) through various actions. As well, files which are shared must be "voluntarily" shared, either from a shared files folder or by tracking specific files that should be shared; however, most programs will automatically share any files you download from other users (and I've heard some programs, when installed, automatically search for and share files that the program thinks would be good to share).

Depending on the system, peer and/or the directory server may not know all the files a given computer has available for sharing. Similarly, if you ask a given computer what files they're sharing, it may or may not be a complete list. In torrents, it only shows the files in the particular torrent; I believe eDonkey lists all shared files. All P2P systems have a way of asking a particular computer what files they are sharing, although the completeness of the response varies.

Okay, getting to the specific legal issues.

Method of Obtaining an IP and File List
First, as there are standard and intended methods of asking a computer what files it's sharing, it's (probably) not true that MediaSentry had to do anything illegal to obtain this list, like hack into the computer. Likewise, they probably didn't have to do anything other users couldn't do (although they probably made a program to scan P2P networks and catalog all files, while the typical user would have to search for people with specific files; I wouldn't call this illegal).

The big question mark is how exactly MediaSentry verified (to the best of its knowledge) that the info they obtained is true, and without knowing this we can't give a good estimate of the false-positive rate (which is likely the reason MediaSentry won't say what their methods are; they're probably lying when they say that they have developed proprietary and novel methods of investigation that should be considered trade secrets), although previous cases have shown this rate to be > 0. There are lots of ways an investigation could go wrong (or become difficult), even if they did see what appears to be a computer sharing copyrighted files.

Outdated Cache Information
It's possible that the directory server or another computer has an outdated list of files shared by a certain computer, in which case they may say that a computer is sharing files that it isn't. One example of how this could happen is that a computer was sharing some files on some network then disconnected from the internet, and another computer logged on and was given the same IP. Such outdated data could indicate that the second computer is sharing files, even though it's not (it might not even be on the P2P network at all), and in fact NOBODY at that IP has been sharing files for some time. This goes directly to the issue of not being able to positively identify a person from an IP address even if you get an IP address that that computer has as the moment the IP address is obtained (although this is highly dependent on the P2P program). This risk of false positives (and the next one or two) can more or less be eliminated by verifying that the files can actually be downloaded at the time the IP is seen "sharing" files.

Leeching
It's possible that the user is a "leecher" - somebody who downloads without allowing their computer to upload anything by messing with their system configuration. This may be done either intentionally (it's not extremely rare for people to leech so they don't have to use upstream bandwidth when all they want is to get something from someone else; such people fall under the "jackass" category) or unintentionally (P2P programs can be a huge pain to set up to work properly when you're behind a home or other type of local network, and even some ISPs block P2P uploading - but not downloading). Obviously if they're a leecher, they haven't so much as made available anything, despite the computer indicating that it's sharing stuff (although intent becomes a big question if they're not intentionally leeching). While some P2P networks will ban leechers, it's possible that leechers can report false info to the server for the explicit purpose of evading leecher banning; consequently, leeching must specifically be ruled out by successfully downloading the "shared" files.

Clock Synchronization
The issue of stale data comes up again at the ISP and organization (if there's a large network such as a school that the violating computer is on); though more importantly, there's no guarantee that the clocks on the MS computer (here I'm assuming they've actually downloaded the files from the sharer) are synchronized with the clocks on the ISP/organization. If these clocks aren't well synchronized, there's always the possibility that the account information they get from the ISP/organization isn't for the account that had the IP at the time the files were shared. This would require explicitly testing clock synchronization between everyone involved; I'd imagine it would be troublesome to get an ISP/other organization to put that kind of effort into a response to a subpoena. Although this possibility can alternately be reduced by the ISP/organization checking that there are no logons near the time sharing supposedly occurred; if there is a very large area where no logons occurred, the probability of a false positive is probably negligible, even if the clocks aren't precisely calibrated.

Network Address Translation
Next, NATs provide a major problem for identifying the offending computer, because it's entirely possible that that there are multiple computers using the same IP at the same time. In theory (and subject to the problem in the next paragraph) the router can distinguish which computer has which connection at what time (NATs assign unique port numbers to each computer sharing an IP address), but the probability of this information still being around by the time a suit is filed is low, even under normal (non-destruction-of-evidence-type) use. Whether an IP is a NAT or a single computer can be halfway reliably determined by investigators like MS using public info (I recall you got hung up on that point in one of the early trials of yours). If the IP is a NAT, it's going to be significantly harder to prove which computer shared the files, and requires forensic examination of the hard drives or someone on the network confessing (or the RIAA's preferred method: file a suit against the account holder and expect them to give up the person responsible rather than face court or settlement costs). However, this problem is short-circuited if the RIAA gets lucky enough that the P2P application uses user names (some do, some don't), and the name of the sharer is known to be used by a certain person (although I suppose someone could maliciously use the name of somebody they don't like).

IP Spoofing/ARP Poisoning
As well, I'm told by people more knowledgeable than me (I came up with the idea, and then asked them to verify that it could be done in real-world networks) that depending on the configuration of a network it's possible to operate under the IP address of somebody close to you (perhaps somebody in your dorm). This would very likely require intent to deceive, but it might be attractive for someone who wants to download stuff without getting in trouble. I don't know if there are tools out there that make this easy enough for your average user to do, but it's definitely technologically possible, given the right network. In fact, I have a friend who is a very skilled network "hacker" (he publishes articles in security journals) who had written a program to disconnect file sharers from his school network because they were hogging bandwidth and making his connection slow (and he did so without being a network administrator, as far as I'm aware; however, that was a simpler case than sustaining two-way communication); network hacking is outside my field of expertise, but I'm betting this involved what I'm describing in this paragraph. it would depend on how secure the network configuration is, but I'd reason (keeping in mind that I know some about networks but they aren't my specialty - I'm just highly inquisitive, and know a moderate amount about many topics) that wireless networks are especially vulnerable to this. Ruling this out requires knowledge of the physical layout of the network the sharing computer is connected to, and the network administration policies (I would guess that this is usually not done due to annoyance for the network administrators, but perhaps some would). This seems like a viable defense, but I'd recommend talking to a network expert about this directly (my friend isn't online at the moment, so the confirmation of feasibility didn't come from him).

Making Available
Finally (at least I think this is the end), there's the nebulous issue of making available. Even if MS knows for sure that this person was on this computer with this IP address at the time, and MS successfully downloaded a valid copy of a copyrighted file, there isn't a guarantee that this file was actually distributed to other people. In P2P applications with very large networks, it's very possible that simply nobody other than MS ever asked for a copy of a file from a specific computer, so there was no actual distribution. In such cases, it becomes very difficult to even estimate the probability of someone else downloading the file (as I've explained that it's not enough simply to ask other computers if they downloaded the file from that computer - even if the P2P application has a way of asking that - as the computer may be lying or propagating incorrect data (it could be that the "sharing" computer is a leecher and only says it uploaded the file). Obviously this is only an issue if making available is not ruled to be equivalent to distribution.

Questions, comments? Anything (or everything) I didn't explain well enough for laymen, or anybody technically apt want to know exactly what I'm referring to in some cases (it might not always be clear exactly what I was referring to, as I didn't explain the technical details behind that list of risks)?

And oops, I did it again - sat down to write something that was supposed to be fairly concise, and ended up writing something that looks like a judge's ruling document. But at least it made me forget about my flu for a couple hours, so that's a good thing.

No comments: