Page 2 of 2 [ 31 posts ]  Go to page Previous  1, 2

kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

31 Jan 2009, 1:27 am

I can't really find any alternate methods that seem better to me. What exactly is the drawback of using linked lists again? Actually... I just now got an idea. What if I had a seperate(smaller) hash table to handle the collisions. Obviously I'd have to use a different algorithm for the other hash table, but I think sdbm or whatever was pretty close to as good as rs with my test strings. I think If I did this, I could get it so that it rarely needs to do more then 2-3 iterations to find the player.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


Crossbreed
Butterfly
Butterfly

User avatar

Joined: 15 Jan 2009
Gender: Male
Posts: 9
Location: Northern Wisconsin, USA

31 Jan 2009, 2:32 am

There are parts of your problem I don't understand, but I see a possible solution. :idea:

Instead of inserting and deleting users as they come & go, why don't you just insert & delete them as they subscribe/unsubscribe. Then just change their "present/absent" status as they log on or off. That way you could use your binary search and only act on the found user if his/her logon status is "present."



Dussel
Veteran
Veteran

User avatar

Joined: 19 Jan 2009
Age: 60
Gender: Male
Posts: 1,788
Location: London (UK)

31 Jan 2009, 3:04 am

My Class would a bit this way

class Player
{
private

//Player data

Player *pNextPlayer,
*pBeforePlayer;

public:

Player (//Playerdata
Player *pBeforePlayer)

Player *MakeNext (//Playerdata)

...
}

With Coverclass AllPalyer:

class AllPlayer
{
private: Player *pCurrentPlayer,
*pPlayerListStart;

//


----

You can also use the std-Template List - Such lists very flexible to store diffent amounts of data. You just had to be a bit carefull with heap-administration and the pointer handling.



Dussel
Veteran
Veteran

User avatar

Joined: 19 Jan 2009
Age: 60
Gender: Male
Posts: 1,788
Location: London (UK)

31 Jan 2009, 3:12 am

kalantir wrote:
That is going to be the player on the end of the linked list for that entry. Compared to a binary search for example, these results are incredible. A binary search would take 15(sometimes less) iterations to find the object(out of 30k anyways, because 30k is after 2^14 and before 2^15). In addition to this, a binary search would require me to have all my objects in order, which would cause me to spend even more time sorting my player array. If I wanted to increase the efficiency of the table further(less then 6k collisions!), I could double the table size and (very)slightly modify the hash algorithm to give values in a wider range.


You need to sort the list dynamically during the build up process via extra pointers - For search you can than approximate the entry point for the search via list of pointers to the potential entry points ans search than along the sorted pointers - because you never will run fully through the a small recursion is here acceptable and makes live easier for the programmer.



lau
Veteran
Veteran

User avatar

Joined: 17 Jun 2006
Age: 76
Gender: Male
Posts: 9,795
Location: Somerset UK

31 Jan 2009, 11:15 am

Dussel wrote:
kalantir wrote:
That is going to be the player on the end of the linked list for that entry. Compared to a binary search for example, these results are incredible. A binary search would take 15(sometimes less) iterations to find the object(out of 30k anyways, because 30k is after 2^14 and before 2^15). In addition to this, a binary search would require me to have all my objects in order, which would cause me to spend even more time sorting my player array. If I wanted to increase the efficiency of the table further(less then 6k collisions!), I could double the table size and (very)slightly modify the hash algorithm to give values in a wider range.


You need to sort the list dynamically during the build up process via extra pointers - For search you can than approximate the entry point for the search via list of pointers to the potential entry points ans search than along the sorted pointers - because you never will run fully through the a small recursion is here acceptable and makes live easier for the programmer.

Please explain what you mean by "approximate the entry point for the search".

In any case, this does not work. It suffers from an O(n) lookup time.

For a fully compact table (no wasted space), an AVL tree will always give O(ln(n)) time.

With a reasonable hash, and a table that is not too full, the same O(ln(n)) can be achieved, but with a smaller constant, and one can approach O(1), by making the table very sparse.


_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer


Dussel
Veteran
Veteran

User avatar

Joined: 19 Jan 2009
Age: 60
Gender: Male
Posts: 1,788
Location: London (UK)

31 Jan 2009, 12:33 pm

lau wrote:
Dussel wrote:
You need to sort the list dynamically during the build up process via extra pointers - For search you can than approximate the entry point for the search via list of pointers to the potential entry points ans search than along the sorted pointers - because you never will run fully through the a small recursion is here acceptable and makes live easier for the programmer.

Please explain what you mean by "approximate the entry point for the search".

In any case, this does not work. It suffers from an O(n) lookup time.

For a fully compact table (no wasted space), an AVL tree will always give O(ln(n)) time.

With a reasonable hash, and a table that is not too full, the same O(ln(n)) can be achieved, but with a smaller constant, and one can approach O(1), by making the table very sparse.


Consider the following Code

Class Data
{
private:
int A;
//Any other Data

Data *pNext, pBefore, //mere administration
*pSearchCriteriumBefore_A, *pSearchCriteriumBefore_A;

public:
...
}

A second Class

Class DataPointer
{
DataPointer *pBefore,
*pNext;
Data *pData;
int A;

...
}

a 3rd Class:

Class DataPointerPointer
{
DataPointerPointer *pBefore,
*pNext;
Data *pData;
int A;

...
}

And a cover-class:

class DataCover
{
DataPointer *pDataPointerStart, *pDataPointerCurrent;
DataPointerPointer *pDataPointerPointerStart, *pDataPointerPointerCurrent;
Data *pStart, pCurrent;

}

---

If a new Element of the Type Data is added it will be added at the end of the List starting with pStart. Than the value A will read in, it will be looked up in the pDataPointer-List, a near by element represented. Than it will run from this element along the pSearchCriterium pointer till it fits its place, the pSearchCriterium-list will be opened the new element insert. If the gap in between the elements represented in pDataPointerStart-List (each 100th or element) that an new element in DataPointer-List will be introduced.

So you work with List of entry-points pointing to place close to the assumed right place, check at first ot this "entry point" for the search.

At an example: We have list of 1'000'000 and create a number of the and of 100 and 100 search-entries: So looking up in the list of pointers-to-pointers would take in the worst case 100 steps, if we have the mean of the list we can decide to start in the middle, or at the ends. The first search would need in the worst case 50 steps, the second too and 3rd also, so we would end up with 150 step. If we make a higher number lists-of-list and raise those dynamic than the O(n) can be reduced further.

The idea is not that far from a search tree, and the performance is slightly worst, but I used this occasionally to gain by-the-way statistically data - and I recycled this structure for search proposes. iIt can be reasonable set-up with templates without defining endless new classes (when I am ugly, even with pointer to void).



lau
Veteran
Veteran

User avatar

Joined: 17 Jun 2006
Age: 76
Gender: Male
Posts: 9,795
Location: Somerset UK

31 Jan 2009, 1:28 pm

I'm unable to follow why you would want to replace a simple binary tree of 1,000,000 items, which may be searched in a maximum of 20 comparisons, with some linear search that you say takes "worst case 100 steps", but I strongly doubt that, as you have given no indication of what the distribution of your "int A" might be. What if the values presented for A were the numbers 0...999,999?

The problem under discussion was for 30,000 strings to be looked up, not 1,000,0000 ints.

Assuming that a good hash function on the strings gives a result in the range 0..65,535 for instance, that is then directly used to pick up a pointer to what will frequently be the correct item. One can make the number of items searched get arbitrarily close to unity.

Everything I said so far has been straight from memory - and having done quite a lot of sorting and searching. Personally, I even like "Bubble Sort"... in its place.

The Wikipedia article has more detail
http://en.wikipedia.org/wiki/Hash_table

As is often the case (see my Bubble Sort comment above), people shy away from techniques too quickly, at times. E.g. the graph here shows that mere linear probing keeps to an average of a single extra match (i.e., instead of merely finding the item, it averages at having to look at one other, per lookup), up until a load factor of nearly 80%.

So, for 30,000 items, a hash table of 40,000 spaces, and then a linear probe, takes an average of two comparisons.

For 1,000,000 items, a hash table of 1,300,000 spaces will similarly avare at about two comparisons.

=============

PS. If you wish to present code, please wrap it up with [code] [/code] tags, as the indentation (and other white space) will be preserved.


_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer


kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

12 Feb 2009, 10:07 pm

Ok, so, I've decided that for now, I'm going to stick with the linked list to handle my collisions. If it proves to be a problem for some reason or another, I'm going to set it up so that a collision triggers the creation of a new hash table of half the size as the previous one. for example.. my main table is going to be of size 2^16, if a collision happens(which it will) a new table gets created of size 2^15 and the collision will go into there using a different algorithm to hash it. if a collision happens in that table, a new hash table(we'll refer to these as levels of collision, this would be the second level of collisions) gets created of size 2^14 and it goes into there with yet another algorithm. I know of at least 8 different algorithms which are efficient and give fairly good spread. If I end up with so many collisions that I get to a table of size 2^9, then all future ones will just be 2^9. Tell me what you think of this idea. I haven't seen any articles that talk about using a method like this before. I got the idea from double hashing, and just took it a step further. Also. I currently have it set up as 2 tables. I found out that (at least with epoll) file descriptors get assigned starting at 5, incrementing by 1 for each new connection, reusing old fd's that aren't connected anymore first. So, I created another table so I can look up players by FD. I can do this at O(1). Basically, When I call PlayerTable.Add(Player); it hashes their name, sticks them in the main table, then takes their FD and sets fdTable[FD]; to point to the same location. This way I can look them up by FD and I only need to look them up by name if thats the only info about them I have available.

Quote:
Don't be afraid to look outside your horizons. You said there was no way you were going to have a windows server. Why not?

I never did answer this. The reasons are because 1. I find networking on linux to be easier. and the main reason 2. because linux has epoll and windows is stuck using poll or select which are horrible by comparison when handling large numbers of file descriptors as you can see by this benchmark image


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 51
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

13 Feb 2009, 9:48 pm

kalantir wrote:
2. because linux has epoll and windows is stuck using poll or select which are horrible by comparison when handling large numbers of file descriptors as you can see by this benchmark image


Win32 uses IoCompletionPorts for efficient network processing. You don't use polling at all.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

14 Feb 2009, 3:46 am

t0 wrote:
kalantir wrote:
2. because linux has epoll and windows is stuck using poll or select which are horrible by comparison when handling large numbers of file descriptors as you can see by this benchmark image


Win32 uses IoCompletionPorts for efficient network processing. You don't use polling at all.


I don't suppose you have a link to a decent article about it I could read?


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


lau
Veteran
Veteran

User avatar

Joined: 17 Jun 2006
Age: 76
Gender: Male
Posts: 9,795
Location: Somerset UK

14 Feb 2009, 8:05 am

A plain google gives Microsoft TechNet's fairly useless overview:

http://technet.microsoft.com/en-us/sysi ... 63891.aspx

followed by a somewhat more useful article, by the look of it:

http://xania.org/200807/iocp

which gives a link to the MSDN documentation, but calls it "rubbish".

I don't see anywhere that says which version of Microsoft NT-based OS you have to buy, in order to have access to the calls. I seem to recall that each time I tried to use the more subtle parts of Microsoft APIs, it turned out that they were only available on something that cost extra.


_________________
"Striking up conversations with strangers is an autistic person's version of extreme sports." Kamran Nazeer


t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 51
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

14 Feb 2009, 9:01 am

kalantir wrote:
t0 wrote:
kalantir wrote:
2. because linux has epoll and windows is stuck using poll or select which are horrible by comparison when handling large numbers of file descriptors as you can see by this benchmark image


Win32 uses IoCompletionPorts for efficient network processing. You don't use polling at all.


I don't suppose you have a link to a decent article about it I could read?


http://msdn.microsoft.com/en-us/magazine/cc302334.aspx

It's an article but has a sample that might get you started.

IoCompletionPorts have been around since NT 3.5 (or longer) and made it into the home-user versions of Windows starting with XP. I've presonally used completion ports to handle hundreds of thousands of simultaneous connections (you need multiple local ip addresses for this since port numbers are limited to 16 bits).

I believe Microsoft's C# implementation of Socket is built ontop of IoCompletionPorts as well. You can call Socket.BeginReceive() or BeginSend() to provide your buffer and a callback function. The lower level code will execute your callback function when new data is available or data has been written.



kalantir
Veteran
Veteran

User avatar

Joined: 25 Dec 2008
Gender: Male
Posts: 712
Location: Redmond, WA USA

14 Feb 2009, 7:29 pm

Thanks guys. I've already read some of that but it didn't quite tell me everything I was wanting to know. I very likely wont be using it since I'm already well into my project on linux... In fact, I'm far enough along at this point to where I need to start doing more work on my client before continuing on the server. Just curious though... Do you know how efficient it is in comparison to epoll or kqueue? I've tried looking it up, but couldn't find any sort of comparison at all.


_________________
2101729 Kalantir-Bar-Orc-Mal-Cha escaped the dungeon


t0
Veteran
Veteran

User avatar

Joined: 23 Mar 2008
Age: 51
Gender: Male
Posts: 726
Location: The 4 Corners of the 4th Dimension

15 Feb 2009, 10:53 am

kalantir wrote:
Do you know how efficient it is in comparison to epoll or kqueue? I've tried looking it up, but couldn't find any sort of comparison at all.


I don't think they're directly comparable across different OS. You'd have to build similar systems that process messages and then measure them on comparable hardware. Last I read, MS recommends 2 threads per processor when using completion ports which means you have to handle syncronization issues. I would imagine you'd have similar issues if you ran multiple threads calling epoll() as well.



Blake_be_cool
Veteran
Veteran

User avatar

Joined: 6 May 2008
Age: 28
Gender: Male
Posts: 860
Location: Australia, NSW, Sydney

24 Feb 2009, 1:52 pm

still learning VB6


_________________
"Not everything that steps out of line, and thus 'abnormal', must necessarily be 'inferior'."
- Hans Asperger (1938)