Haven Technical Highlight: Peer-to-peer Data Downloads
Abstract: This document is part of the
Haven project
. It is a suggestion for use of peer-to-peer technology to limit
network demands on servers in massively multiplayer scenarios with
large download volumes. It is not an implementation or a design, just
a draft for a construct.
Availability: Public
|
Contents
Note
- Peer-to-peer Data
Downloads
: Terms
, Concept Outline
, Hurdles
, Improving the Concept
, Explanations
- Further Discussion
: Profitability Analysis
, Security Issues
, Client Cache
, Improving the Design
- Additional Notes
- Additional Resources
- Thanks
- Copyright
First Disclaimer: P2P, Not!
Upon pondering the P2P aspects of the things below, I came to
realize that the term peer-to-peer is misleading. This has nothing to
do with distributed networks or self-governing or serverless operations.
This is really about "client-to-client" services, since both "client"
and "server" are definitely governed by a separate "server", a higher
entity. That said, I'm not about to change the usage of the term at this
time. I sort of like it as it is. (Ok, I'm lazy) One day I will revise
it but don't hold your breath.
Note
This is a slightly updated version of
a submission (by me) to the MUD dev mailing list
. It will (hopefully) be updated with improvements and corrections,
as well as added to to further explain and investigate the concept.
Peer-to-peer Data Downloads
I'm sure many readers are more knowledgeable than myself when it comes
to the exact cost of bandwidth, but suffice it to say that bandwidth
is, at least for MMORPGs, a substantial part of the operational budget.
So there is need to limit network bandwidth usage.
MMORPGs have a substantial need for patching, this is no secret. WWIIO
had a ~70M mandatory patch upon release, updates and patches of various
titles frequently demand similar downloads. To address this issue, I
came up with a design to offload the bandwidth demands of the game operator.
Recently peer-to-peer (P2P hereafter) technologies have received a lot
of attention. The gaming industry is interested in control and I'm
not terribly surprised to see that P2P has not been a popular design
feature, media-control-wise speaking. In online games, additional problems
in the form of hacking and cheating in various ways add to developers'
cautiousness for the subject. I believe it is doable, and this is what
I set out to do:
Terms
- system - all the services and machines operated by the provider
of the application
- service - a server providing files, maintained by the owner
of the system
- client - an application running on a customer's computer
- provider - an application making something available to others,
either a service provided by the system or a feature of an application
on an end user's computer
- pairing service - the application performing matching/assignment
of clients needing a file to client providing a file
Concept Outline
The basic characteristics of the service outlined here, and
how it works
- On demand, dynamic background patching. The client receives
updates of the player's surroundings and then knows what media it
needs. If that data is missing, outdated or corrupt, the necessary files
are requested from the server.
- Clients are registered as they connect, collecting up- and
downstream bitrates and whether the client can support hosting media.
This is a run-time registration only, not customer or subscription which
are separate as needed by the system.
- The server system provides download services and client pairing
(see next item)
- Upon request for a media file, the download service either
provides the file directly or notifies one client to act as server
and the other where to get the file. When assigning P2P client and server
roles, the service must track current bandwidth usage at both ends.
The client requesting the download also receives a checksum map of the
file requested with additional data necessary to detect corrupt or tampered
data.
- Clients verify their media files via checksums. If corrupt
data is encountered, the download service is notified and a new data
provider assigned. The service may take steps to correct the download
provider (the client that provided corrupt data - at the least, the client
should be forced to repatch the data on its own installation)
- To additionally prevent malicious tampering, media files are
not taken from a client in their entirety but in fragments. Typically
only a few kbytes in each block. This way only blocks that pass checksum
tests can be provided, and a checksum of the complete file can also
alert the system to tampering or corruption.
- The stream for download must be throttled so as not to interfere
with actual game updates (patching bitmaps must not prevent the user
from seeing what is happening in the game)
- The media stream (the P2P stream mentioned above) is separate
from the game update stream, ie no game data updates occur in this
stream.
Hurdles
The main hurdles to overcome to make this implementation a reality and,
most of all, useful are as I see them:
- Server processing. The workload for keeping the system in
operation is tremendous. I honestly think the advantages in bandwidth
costs should pay the hardware expense, if not at this time then within
a year or two. Continuous operation also pays off - buying a server is
a one time cost (let's skip other financing options for now, they don't
change basic facts), while bandwidth costs every day, every hour, every
minute.
- Customer client bandwidth. Widespread use of broadband services
is necessary. MMORPGs today all use less than what a 56k modem provides
and most of the bandwidth of broadband customers is unused.
- Clients must be configured to accept and allow P2P usage.
- Clients must either be forced to accept being used as hosts
(less than tasteful if you ask me) or given a good reason to provide
it, such as subscription cost reduction or otherwise improved service.
- One additional hurdle may be about to appear in the form of
ISP pricing. A number of ISPs are changing their pricing to limit bandwidth
usage for their customers (in this case, it's players we're talking about)
or to charge per Mb. This may either make customers unwilling to be hosts
(by either opting out due to lack of incentive as the gain from acting
as server decreases, or by the game operator being unable to make anywhere
near a close match in gain and dropping the entire deal), making the number
of available hosts fall below the threshold where the scheme is functional.
- NAT clients are a problem, as they cannot be immediately reached
by another peer since their IP address is phony. Some schemes in existence
use a mediator to negotiate an opening in the NAT service and then transferring
knowledge of this port to the connecting client. This negotiated address
and port are valid but only negotiated ports can be used, i.e. for each port
used, it must be negotiated. An old discussion to the NAT problem can be
found at http://www.alumni.caltech.edu/~dank/peer-nat.html
- NAT and Peer-to-peer networking
by Dan Kegel.
- Firewalls are more of a problem. The only likely solution is using
a push technique to open a connection from the firewalled client - that
connection can then be used bidirectionally. For situations where both peers
are firewalled, I am not aware of any solution.
Improving the Concept
To additionally improve the concept, one could
- Use multicasting for downloads (this will probably not be
feasible until IPv6 is widespread (2006?) - correct me if I'm wrong
- I know very well it's possible with IPv4, but is it practical? I think
not)
- Use an engine where all media is seamless. There are no zones,
the playing area can be continuously expanded on the edges.
- Require no data (or very limited amounts of data) for installation.
(The application can be distributed on CDROM or DVD, that is just more
convenient than downloading)
- LOD schemes for all media, possibly with a "lowest acceptable
level" of media file installation which must be present for the user
to commence playing. Once the client is verified to have the necessary
basic files installed, the rest can be left to background patching.
- Any type of data can be patched this way - this applies to
for example terrain, textures, models, animations, audio clips or binary
excutables and whatever other data the application uses.
- The client installation could maintain the data in separate
files or in indexed composite files. This is not relevant to the construct,
just a separate improvement.
- A verification system must be in place on the client. The
amount of checksums etc that must be transmitted to verify a client's
data must be limited somehow - this is a bigger problem than it might
at first seem, if one takes the unknown state of the client machine into
account. (This is probably where the "lowest acceptable level" client installation
comes into play).
Explanations
Various loose ends:
- The concept of an in-application P2P system is certainly applicable
in other contexts and the demands are quite similar (imagine Windows
Update having a built-in Napster-like function to offload Microsoft's
servers when propagating patches - Kazaa, a file sharing service like
Napster, uses its own service to distribute its binary btw, I'm sure
there are more examples) . I merely choose this (online games) context
because that was the application it was designed for, plus it's the best
fit for the scheme that I can think of. There are security concerns,
certainly. Such concerns is the reason we study computer security and
cryptography and so on - a problem requires a solution, not resignation.
- The 2006 date is what I recall to be the deadline for when
the world needs to have IPv6 in place (we're running out of addresses),
as well as the date for when it could reasonably be in place. Happy
coincidence, that.
- As a reference to a block-comparing file synchronizing service,
see the implementation of rsynch, a very nifty *nix utility/service.
(I have not gotten to this particular point in the implementation yet,
I was only told that it does things this way - it is relevant for block-wise
synchronizing, not for P2P which it does not do).
- The 3D patent mentioned (below) - I don't know whose it was
or when it was published. I know it was reported on Gamasutra among
others, a search in their archives might be in place.
- I am not unaware of the undeniably insecure nature of a client
installation. Any data that is present on a client machine (in persistent
or temporary memory) must be considered cracked and tampered with.
- Run time properties of the scheme - the scheme requires a
massive scenario to be worthwhile. It also becomes profitable only
in cases where there are substantial downloads.
- The mandatory question: where is my download so I can use
this? Answer: Nowhere. I have no working implementation of this, especially
not in its entirety and in particular, nothing for free.
Ok - so why am I posting this here? (i.e. on the MUDdev list) Well first
because I think this is a nice forum. Second, there's the patent stuff...
I came up with this design over a year ago, and at approximately the
same time, an old colleague mailed me a link to a patent watch site,
saying "hey, isn't this similar to what you and I did?" - and it was.
(It wasn't the same, I don't need advice on previous implementation or anything
- I'm not fighting a patent). Then there's this 3D patent - someone has
a patent for limiting data sent to clients in order to improve graphics
performance (silly, I know). Then there's my wish to create ... games.
Online games, of course - that's why I'm reading this list and posting
here.
As part of my own projects, I figured the scheme I outlined
above would be very nice, and I realized it was probably patentable
(a change in patent law in Sweden (where I live) also plays a role
here) and that I might think it fun to try and patent it.
Life isn't always developing the way you'd want it to however,
and I can no longer expect to be able to follow through on this idea.
I now have neither the money, time or resources to do it. Therefore
I'd rather see the idea in the public domain than patented by someone
else. My idea of filing a patent of my own was more for fun and less
for profit - the thought of a multinational filing the patent makes me
uncomfortable. It is my hope that this post makes this particular patent
easy to annul. It is also entirely possible that this whole construct is
not patentable due to the P2P technologies already in use in various places
or that someone already came up with this whole scheme.
So I choose to dump the idea out here instead - perhaps some
of you on this list can find the idea useful or interesting, perhaps
you can spot a fundamental flaw. I realize that it applies very little
to MUDs, but it certainly applies to MMO and therefore I submit it here.
If this topic has been discussed before on the MUDdev list or elsewhere,
I am not aware of it (yes, I have searched).
Any additional data on bandwidth usage in existing MMORPGs
would be interesting to know as well. The bandwidth requirements of
the client is usually well known (as stated on the box) but the actual
usage and/or cost for the game operator would be interesting to know.
I recall a statement from Verant saying the bandwidth cost was somewhere
in the range of 35-50% of their cost of operations - don't know where I
saw it, sorry (I have tried to find it again but failed).
For reference I submit this text to the MUDdev mailing list
and at the same time put it up on my private (hobby) project site,
located at
http://www.abc.se/~m10383/Haven/Highlights_Technical/P2P.html
. Please note that the project is not intended for widespread
consumption, primarily because some pages may contain less than friendly
views of actors, events and habits in the gaming industry - I'm in
the process of cleaning this up. At present I only use this document
collection to easily supply friends and colleagues with data when discussing.
In
this Slashdot post
, two similar implementations are mentioned. One is
BitTorrent
, an open implementation (apparently in Python) of a very similar scheme.
BitTorrent seems not to bother with NAT bridging (I have not researched it
enough to see what it does with firewalls). The other is
FurthurNet
, a Java application. Both are basically the same scheme as outlined in
this paper. Both have history reaching back to before my posting of this
paper but the slashdot post is the first mention of them that I have seen.
Further Discussion
This section provides some further discussion on the topic.
Profitability Analysis
The factors for determining profitability of implementing/using
the scheme outlined above.
- Cost per Gb of client downloads (C)
- Size of download per day per client (d)
- Bandwidth demand Gb for pairing service synchronization per
client (s)
- Number of clients (n)
- Cost for implementing/operating pairing service (I) (one-time
and continuous cost bundled into one per day number)
- Cost for server hardware acquisition (H) (one-time and/or
continuous cost bundled into one per day number)
Gives inequality:
n * C * d <=> (n * C * s) + (I * H)
The pairing setup must have a backup system of ordinary server
download. One of the reasons is that it cannot be taken for granted
that there will be clients with server capabilities available at all
times.
- Fraction of clients usable as P2P server, quality of said
clients and hit-rate for download files on said clients are intanglibles
- Fraction of successfully completed client-to-client downloads
seen totally across the system (k)
- Cost of overhead for missed pairing attempts (m) (additional
cost due to increased processing demands, one-time and/or continuous cost
bundled into one per day number)
Modified inequality right side:
((n * C * s) + (I * H)) * (k * n) + ((n * C * d) + ((1
- k) * m)) * ((1 - k) * n)
or
(nCs + IH) * kn + (nCd + (m - km))(n - kn)
which means we are comparing
nCd <=> (
knCs) + (Cd
n + (m -
km))(
n -
k
n ) + IH
(note: this looks boggled, must verify when less tired :-P)
Of these factors, all except number of clients (n) and success
rate (k) are best seen as constants. To do: Create a spreadsheet or other
visual representation of how the scheme performs cost-wise with varying
k and n.
Relevant (but not directly affecting equation) data
- Network traffic volume for game updates (game events, not
patches), per client and time played, inbound and outbound
Security Issues
A number of steps must be taken to ensure integrity of the
client's system
- The client must use random ports for downloading
- The protocol (specifically the signal to open a connection
of some sort) must choose random ports after using the port assigned
by the pairing service
- [security as related to protocols? UDP vs TCP etc - encryption
is not the issue here, basic topology issues is]
A number of steps must be taken to reduce risk for compromised
data
- The exact data being distributed must not be selectable by
the client providing download
- As far as possible, the client providing data must be unaware
of the exact events and the identity of the downloader. The actual IP
address/port will be available to any network analysis program (port
watcher) - this cannot be hidden.
- Where possible, the client should not be running with priviliged
access. Its ports are openly visible to any peer and its binary is available
for analysis (both peers can be assumed to be running or have access to
the same client - note that a malicious peer must be assumed to have access
to the binary and be capable of mimicking or otherwise replacing/tampering
with it. Assume the worst.). This means that things like buffer overruns
can be exploited. If the client software is running with privileged access
to the local system, the entire system is compromised.
Client Cache
The construction of the client cache (the data section of the
client installation, minus any mandatory files) should have certain
characteristics
- Data should if at all possible reside in composite files,
making tampered or corrupted data easier to spot (this simplifies spotting
some bad data and makes tampering harder. It does not guarantee anything)
Improving the Design, Getting Inspiration
Initial feedback from the MUD dev list confirmed one of my fears, that
the most serious-looking obstacles might prove to be security and network
topology, namely the opening of a host:port that cannot be hidden from the
other party in the communication, and the problems that arise from firewalls
and NAT.
To seek a solution or improvement for this situation I consulted the real
P2P world. Gnutella is a protocol used by many P2P file-sharing applications.
It is completely server-less and works in many awkward topologies. It can
still be blocked out of course, and cannot alter fundamental truths about
open ports. Nevertheless, the
Gnutella protocol
is interesting reading. The Gnucleus site -
http://www.gnucleus.com/
- has a lot of information regarding various aspects of Gnutella. Of
course, so does the Gnutella site
, http://www.gnutella.com/
.
Among the interesting solutions in the Gnutella protocol is the push,
which enables servers behind firewalls to share files - instead of clients
connecting to them, clients request that the server connect to them
and then transfer the payload. The solution does not work if both parties
are behind firewalls. Gnutella and similar services apprently have no problem
with NATs.
Refinement Discussion
Some points about how the system should be brought to use
- There must be an initial patch service to allow the client
to make critical updates needed before the system can be entered. Any
updates necessary may or may not be assigned to P2P servers as usual,
but the critical minimum installation must be verified before anything
else.
- The client must request an initial verification before entering
the game context. For example, before a character currently logged
off in city A can enter the game (perhaps even before the player can
be selected from a login screen) the appropriate data files must be
updated. The exact extent of the update is up to the application - it
may be that only wireframe models at the lowest LOD are needed, or the
complete data set including the most detailed textures may be required.
Some sort of estimate of the character's progress potential when moving
around the world must be weighted in.
- As dynamic patching occurs in the game, the player should
sometimes be made aware of this (or have the opportunity to see such
notifications). For example, if players A and B are initiating trade,
the respective clients should verify that each have installed the appropriate
textures for items that may in a moment appear in the trade window
for inspection. If player A immediately throws up his very fine Box
of Al Kabulla (with a custom model, special animations and textures),
player B should see some predefined lesser approximation until the appropriate
data have been downloaded. While the box's data is being patched, player
B should have the option of seeing that the data transfer is in progress.
- The system may wish to maintain a least recently used list,
and remove data that has not been needed for a long time. A player
with a character that left the frozen wastes months ago has no interest
in megabytes of snow textures clogging up the disk - assuming of course
that the same player does not have a second character still remaining
in the snow.
Additional Notes
The MUD-dev mailing list can be found
here (http://www.kanga.nu/lists/listinfo/mud-dev/)
. The archives are
here (http://www.kanga.nu/archives/MUD-Dev-L/)
. The original post is
here (http://www.kanga.nu/archives/MUD-Dev-L/2002Q1/msg00556.php)
.
Additional Resources
- OpenP2P
, O'Reilly's center for P2P technology coverage.
- The P2P working group p2pwg.org
seems to be unavailable.
- Networking Magazine, Feb 6 2002 article:
Emerging Technology: Peer-To-Peer Networking Security
.
- "Peer-to-Peer - Harnessing the Power of Disruptive Technologies"
, by Andy Oram (ed.), O'Reilly 2001, ISBN 0-596-00110-X.
Sample chapter and outline
.
- Groove Technologies
, Groove Tools. Includes a
Groove Development Kit
(
download
).
- O'Reilly held a P2P and web services conference in November 2001,
" Inventing the Post-Web World
- The O'Reilly Peer-to-Peer and
Web Services Conference
"
- Peer-To-Peer
Communications in a World of NATs and Firewalls
(pdf), Intel paper on P2P
- AVES
(Address Virtualization Enabling Service), a research project at Carnegie-Mellon
University.
-
Short term NAT requirements for UDP based peer-to-peer applications
, IETF internet draft (C. Huitema, Microsoft).
- DMOZ
directory entry for NAT
.
- Google
directoy entry for NAT
.
-
How Do Peers Connect
,
How Peers Connect, Continued
, articles in BYTE Magazine, Nov-Dec 2000, Jon Udell.
- NAT security
at SecurityDogs.com
.
- NetCallback
, a firewall bridging application.
- Internet Firewall
FAQ
(copy at interhack
)
- Koblas, D., "SOCKS", Proceedings: 1992 Usenix Security Symposium
- RFC
1928: SOCKS Protocol Version 5
- RFC
2775: Internet Transparency
- RFC
3027: Protocol Complications with the IP Network Address Translator
, Chapter
2: Common Characteristics of Protocols Broken by NAT
,
Section 5.1: Protocols designed explicitly to work with NAT enroute:
Activision Games
.
-
RFC 3080: The Blocks Extensible Exchange Protocol Core
. BEEP (The Blocks Extensible Exchange Protocol), IETF's BEEP working group.
BEEP is a proposal for a standard protocol enabling peers to exchange arbitrary
data including MIME content.
-
RCF 3081: Mapping the BEEP Core onto TCP
- RFC
3235: Network Address Translator (NAT)-Friendly Application Design
Guidelines
.
Thanks
Thanks to friends and colleagues who helped with this concept.
Copyright
This text was written in its entirety by Olof Ekström. For more information
about the author of this page, see Olof Ekström's personal information
in the Project Profiles
document.
Copyright © 2002 Olof Ekström/Extro System. All rights reserved.
Bälinge/Uppsala, Sweden, February-March 2002