[prev in list] [next in list] [prev in thread] [next in thread] 

List:       openmosix-general
Subject:    Re: [openMosix-general] Migration problem.
From:       vmorgo <vmorgo () alabang ! dyndns ! org>
Date:       2003-03-22 2:25:48
[Download RAW message or body]

Are you running all systems with the same kernel version 2.4.20, or are
some nodes running 2.4.18 and others 2.4.20?

I don't think you are supposed to mix kernel versions on the same
cluster.  Also, I once tried splitting my cluster into two smaller
clusters, but didn't change the IP addresses, e.g. both clusters had
IP's in the same address range.  They were supposedly split due to
differences in the mosix.map files, e.g. one had IP addresses of three
machines, and the other had the IPs of the other three machines, but all
the IPs were in the 192.168.1.x network.

Needless to say, it did not work very well at all.  

A.

P. S.  I'm running 2.4.20-2 on a machine with hardware RAID (3Ware
Escalade Raid 7450) and have not had any problems.  Of course, none of
my nodes has more than 1 GB, so perhaps that is why.

On Fri, 2003-03-21 at 08:45, mosixview@t-online.de wrote:
> Hi Jozef,
> 
> On Freitag, 21. März 2003 12:52, Jozef Ivanecky wrote:
> > Moshe,
> >
> > I have 6x 2x 1.7 GHz Xeon. IBM servers. 100M ethernet just for cluster.
> > Node #1 had 3 other just 2 GB RAM.
> 
> the huge RAM (>2GB) might be the problem.
> You may want to try the following patch from Bryan Bayerdorffer
> (apply it to the current openMosix kernel-sources from the cvs).
> It solves the problem of bad memory-values for system RAM>2GB
> I am currently testing it here and for now it runs stable.
> 
> ##################################################################
> 
> diff -Naur --exclude=CVS linux-2.4.20-cfnode.save/hpc/hpcproc.c 
> linux-2.4.20-cfnode/hpc/hpcproc.c
> --- linux-2.4.20-cfnode.save/hpc/hpcproc.c      2003-02-10 16:40:44.000000000 
> -0600
> +++ linux-2.4.20-cfnode/hpc/hpcproc.c   2003-03-18 12:58:30.000000000 -0600
> @@ -1186,9 +1186,10 @@
>               void *buf, size_t *len)
>  {
>         int i, end;
> -       int val, total = 0;
> +       MOSIX_MEMSIZE_T val;
> +       int total = 0;
>  
> -       end = (*len / sizeof(int)) + filp->f_pos;
> +       end = (*len / sizeof(MOSIX_MEMSIZE_T)) + filp->f_pos;
>         if(end > (i = mosix_config_get_limit()))
>                 end = i;
>         for (i = filp->f_pos + 1; i <= end; i++) {
> @@ -1221,11 +1222,11 @@
>                         mosix_panic("ctl_info_fill - bad name");
>                         return (-EDIST);
>                 }
> -               if (copy_to_user(buf, (char *) &val, sizeof(int)))
> +               if (copy_to_user(buf, (char *) &val, sizeof(MOSIX_MEMSIZE_T)))
>                         return (-EFAULT);
>                 filp->f_pos++;
> -               buf += sizeof(int);
> -               total += sizeof(int);
> +               buf += sizeof(MOSIX_MEMSIZE_T);
> +               total += sizeof(MOSIX_MEMSIZE_T);
>         }
>         *len = total;
>         return (0);
> @@ -1685,11 +1686,11 @@
>                                                              pid_or_node, 
> -ENXIO, -ENETUNREACH));
>                 goto copy;
>         case PROC_NODEID_MEM:
> -               result = sprintf(page, "%d\n", (int)get_item(mem,
> +               result = sprintf(page, "%llu\n", (uint64_t)get_item(mem,
>                                                              pid_or_node, 
> -ENXIO, -ENETUNREACH));
>                 goto copy;
>         case PROC_NODEID_RMEM:
> -               result = sprintf(page, "%d\n", (int)get_item(rmem,
> +               result = sprintf(page, "%llu\n", (uint64_t)get_item(rmem,
>                                                              pid_or_node, 
> -ENXIO, -ENETUNREACH));
>                 goto copy;
>         case PROC_NODEID_SPEED:
> @@ -1697,7 +1698,7 @@
>                                                              pid_or_node, 
> -ENXIO, -ENETUNREACH));
>                 goto copy;
>         case PROC_NODEID_TMEM:
> -               result = sprintf(page, "%d\n", (int)get_item(tmem,
> +               result = sprintf(page, "%llu\n", (uint64_t)get_item(tmem,
>                                                              pid_or_node, 
> -ENXIO, -ENETUNREACH));
>                 goto copy;
>         case PROC_NODEID_STATUS:
> diff -Naur --exclude=CVS linux-2.4.20-cfnode.save/include/hpc/hpcctl.h 
> linux-2.4.20-cfnode/include/hpc/hpcctl.h
> --- linux-2.4.20-cfnode.save/include/hpc/hpcctl.h       2003-02-10 
> 16:38:35.000000000 -0600
> +++ linux-2.4.20-cfnode/include/hpc/hpcctl.h    2003-03-18 10:27:16.000000000 
> -0600
> @@ -10,6 +10,8 @@
>  #ifndef _MOS_MOSCTL_H
>  #define _MOS_MOSCTL_H
>  
> +#include <linux/types.h>
> +#include <linux/autoconf.h>
>  /*
>   * MOSIX API
>   */
> @@ -58,15 +60,21 @@
>  
>  /* load information */
>  
> +#ifdef CONFIG_HIGHMEM
> +#define MOSIX_MEMSIZE_T uint64_t
> +#else
> +#define MOSIX_MEMSIZE_T unsigned long
> +#endif
> +
>  struct mosix_info {
>          unsigned long load;
>          unsigned short speed;
>          unsigned short ncpus;
>          unsigned short util;
>          unsigned short status;
> -        unsigned int mem;
> -        unsigned int rmem;
> -        unsigned int tmem;
> +        MOSIX_MEMSIZE_T mem;
> +        MOSIX_MEMSIZE_T rmem;
> +        MOSIX_MEMSIZE_T tmem;
>  };
>  
>  #endif
> 
> ##################################################################
> 
> hope it helps,
> 
> Matt
-- 
vmorgo <vmorgo@alabang.dyndns.org>



-------------------------------------------------------
This SF.net email is sponsored by:Crypto Challenge is now open! 
Get cracking and register here for some mind boggling fun and 
the chance of winning an Apple iPod:
http://ads.sourceforge.net/cgi-bin/redirect.pl?thaw0031en
_______________________________________________
openMosix-general mailing list
openMosix-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/openmosix-general
[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic