# "Proper" Networking Help Needed



## jampott (Sep 6, 2003)

One for you boffins, essentially.

I have 2 servers (WinNT, Compaq boxes) setup on a 172.x.x.x network.

They have another teamed pair of NICs on a VLAN on a 192.x.x.x network, plugged through a HP Procurve switch. The other "device" on this VLAN is a telephony ACD switch (this is a CTI platform).

One server is functioning fine, but the other is showing erratic "heartbeat" failures on the ACD Network Team.

I've had the engineers look at the ACD itself, the server chaps look at the configuration and hardware of the server, and I've also isolated the HP Switch by plugging everything into a hub instead.

However, I'm still getting the same problem. The event viewer has errors virtually every minute, along the lines of:

Event Type: Warning

Event Source: CPQTeamMP

Event Category: None

Event ID: 386

Date: 05/05/2004

Time: 08:39:01

User: N/A

Computer: GOLBAT

Description:

ACD Team: PROBLEM: A Failover occurred: The Primary Network Link is not receiving. ACTION: Please check your cabling or switch port status, or run diagnostics to test card. Also make sure all teamed NICs are on the same network.

Data:

0000: 00 00 00 00 02 00 58 00 ......X.

0008: 00 00 00 00 82 01 04 80 ....'..Ђ

0010: 00 00 00 00 00 00 00 00 ........

0018: 00 00 00 00 00 00 00 00 ........

0020: 00 00 00 00 00 00 00 00 ........

Any ideas??????


----------



## Rogue (Jun 15, 2003)

What Link Speed and Duplex is the card set at mate?
We've had problems here with network cards being set to "Auto" negotiate.
Manually setting them to 100MB Full Duplex cures a lot of connection problems.

Also, you could try upgrading the network card drivers on the server that is causing problems (or both servers if they are the same type of cards).

Rogue


----------



## KevinST (May 6, 2002)

Be careful when setting devices to Full Duplex... if the other end of the bit of wire is a hub (can't do full duplex... ever) or set to auto neg then the link will have a duplex mismatch (half duplex talking to full duplex... resulting in collisions).

Obvious stuff... 
you've checked that the subnet mask of both NICs on the 172 subnet are the same?
Have you tried doing a ping -t from one machine to the other, leave it for about 30 minutes and see how many ping's didn't get through?
Is it possible that there's a cable fault somewhere that's causing intermittant loss of link?
Any other devices on this subnet / VLAN that could be causing a broadcast storm and swamping the link?
Possibility that there's a faulty NIC at one end or the other - does the NIC have any diagnostics?


----------



## whirlypig (Feb 20, 2003)

Certainly had issues in the past with cards in the team set to auto but generally with older drivers and when connected through a switch. In the Intel ProSet or CPQ Team applet you should get a summary of I/O through the NIC, any errors while it's actually running?
What NICs are you using, I guess if it's Compaq servers then they're Intel 8255x chipset (CPQ NetFlex, CPQ Netelligent, Intel Pro 100, etc.).

I've seen quite a few problems relating to heartbeat issues causing failover and poor performance on either the primary or secondary NIC. It's sometimes highlighted by odd errors in the traffic status and usually down to a intermittent fault on one of the cards. First thing I'd try is a new pair of NICs, that's presuming you've got some, if not I've a pile of Intel PRO/100s, fiver each 

BTW for those that may not know, I believe the data in the eventlog is just the MAC address of the NIC backwards, likewise if it was a TCP/IP issue it would be the IP address backwards and in Hex.


----------



## ADB (May 7, 2002)

I remember a problem a couple of years ago that was down to the MAC address that the team were using (failover team). IIRC in the CPQ team properties there is a place to enter the MAC address the team will use, this is usually pre-populated with the primary NIC's MAC address. If you have made any changes to the config then it could be that the primary NIC's MAC address is no longer the one that is entered here. Check the configuration and make sure it is configured this way.

IIRC the heartbeat packets are sent by both NIC's but the source MAC address in these packets is the BIA. This, coupled with _'real'_ traffic from the server was causing the switches to see the same MAC address on multiple ports and therefore altering their CAM tables to reflect the move and subsequently causing the team to fail.

Let us know how you get on.

Andy


----------

