[prev in list] [next in list] [prev in thread] [next in thread] 

List:       kfm-devel
Subject:    RFC: code to determine proper TLD for a given hostname
From:       "Dawit A." <adawit () kde ! org>
Date:       2002-05-14 4:35:57
[Download RAW message or body]

Hi,

We need code to properly determine the top most TLD of a given URL in 
several places: cookies, SSL certificate management, per-domain 
configuration dialogs like the cookie policy and user-agent settings
as well as Java/Javascript configurations.  Writing such code however 
is not as easy as it seems because the sheer number of variations and 
gottcha's involved.  Anwyays, this is my attempt to take a crack at doing 
just that.  The attached self-contained code is intended to find the top most 
domain value given a proper fully qualified hostname.  

Please test it to see if you get the results you expect and send me feedback.
For those that need it, here is how you compile the program:

g++ -I<qt-include-dir> -L<qt-lib-dir> -lqt-mt tld-test.cpp -o tld-test

where
<qt-include-dir>: the directory where the qt header files are located.
<qt-lib-dir>: the directory where the qt library is located.
"qt-mt" can be just "qt" in the unlikely case you compiled qt-3.x without 
thread support or are still on KDE 2.x and hence qt-2.x..

Known Issues:
===========

* <host>.<domain>.<country-code>, ex: foo.bar.ca might not be resolved 
correctly partially because I am unsure if such URLs can or do exist.

* Need to place the known highest TLD names in a separate file in order to 
allow easier management of them, i.e. no recompilation should be necessary 
just to add or remove the highest TLD names.

Appreciate any feedback....

Regards,
Dawit A.

["tld-test.cpp" (text/x-c++src)]

#include <iostream>

#include <qstring.h>
#include <qstringlist.h>

QString findTLD (const QString &host)
{
  QString hostname = host;
  
  if ( !hostname.isEmpty () )
  {
    QStringList partList = QStringList::split('.', hostname, false);
    int count = partList.count();
    
    if (count > 2)
    {
      QStringList tlds;
      
      // Classical TLD's
      tlds << "com" << "net" << "org" << "gov" << "edu" << "mil" << "int";
      
      // The new seven TLD's
      tlds << "aero" << "biz" << "coop" << "info" << "museum" << "name" << "pro";
      
      QStringList::Iterator topTLD = partList.fromLast ();
      
      if ( (*topTLD).length() < 3 )
      {
        --topTLD;
        
        if ( (*topTLD).length () < 3 )
        {
          --topTLD;
          
          // Remove anymore known TLD's...
          while ( topTLD != partList.begin() && tlds.findIndex (*topTLD) != -1 )
            --topTLD;          
                   
          // Remove everything upto this point...
          partList.erase (partList.begin(), topTLD);
          
          // If we still have something, then create a TLD out of it.
          if (partList.count ()) 
            hostname = partList.join (".");
        }
        else
        {
          while ( topTLD != partList.begin() && tlds.findIndex (*topTLD) != -1 )
            --topTLD;
          
          partList.erase (partList.begin(), topTLD);
          
          if (partList.count ())
            hostname = partList.join (".");          
        }
      }
      else
      {
        --topTLD;
        
        while ( topTLD != partList.begin () && tlds.findIndex (*topTLD) != -1 )
          --topTLD;
          
        partList.erase (partList.begin (), topTLD);
        
        if (partList.count())
          hostname = partList.join (".");
      }
    }
  }
  
  return hostname;
}

void usage (const QString& program)
{
  cout << "Usage: " << program << " <host name>" << endl;
}


int main (int argc, char** argv)
{
  if (argc < 2 )
  {
    usage (QString::fromLocal8Bit(argv[0]));
    return 1;
  }
    
  QString hostname = QString::fromLocal8Bit (argv[1]);
  
  if ( hostname.find ("--help", 0, false) == 0 || 
       hostname.find ("--h", 0, false) == 0 )
  {
    usage (QString::fromLocal8Bit(argv[0]));
    return 1;
  }
  
  cout << "Hostname: " << hostname << endl;
  cout << "Top TLD: " << findTLD (hostname) << endl;
}


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic