FOSSology Components and Users

Overview of FOSSology components

To give you a better idea on how to configure your system(s) for FOSSology, let me go through an example of how fossology works. Let's say you have a file you want to scan. A file can be an iso, rpm, deb package, other archive (.bz2, .tar, .gz, ... even different Microsoft install packages), or even a single file (myfile.c). In this case let's say we want to scan a cpio.src.rpm.

Requirement 1: Web Server

You can do scan's from the command line, or submit the file via a web page. Most of the resulting FOSSology reports are available only through the web UI. There is a relationship between sizing the web server and the database server since the web UI is a heavy user of the DBMS. There is no point in allowing a huge number of apache connections and bottlenecking on the dbms.

Requirement 2: Storage

When you submit cpio.src.rpm to be scanned, it will be recursively split into all its component files. This results in its 539 files that need to be stored in the filesystem, as well as the original cpio.src.rpm, as well as any other intermediate archives (any .gz, tar, ... files contained in the rpm). This results in a storage requirement that generally runs 5x - 10x larger than the file that was scanned.

Requirement 3: postgreSQL

Though component files are stored in the filesystem, fossology stores all the metadata about the files in a postgreSQL database. Since this is only metadata, it tends to be fairly small. I've never tried to come up with a rule of thumb on the amount of disk storage postgres needs because I've never seen it as an issue. You can always move or split the database if it exceeds some hard limit. The larger requirements for postgres are memory and processors. Like any DBMS, if you optimize performance, you will want lots of memory. This may not be an issue if you're capable of migrating to a smaller or larger system as needed. There are whole books on how to optimize postgres but our source install docs (fossology/install/INSTALL) give you the essential ones.

Requirement 4: Scan agents

When you submit a file to be scanned, a process is started to do the scan. On larger systems we like to distribute these across multiple machines to distribute the CPU load. Each machine can have their own local repository storage and all the repository storage is crosslinked via NFS to give each server access to all the storage. We refer to this as a Multi-system setup (described below).

Overview of FOSSology Users

There are 3 users/passwords in use by FOSSology:
  1. System user "fossy". This user can be seen via "grep fossy /etc/passwd". This user has its account login disabled. There is also a system group fossy. The system user/group is used to access files in the filesystem.
  2. Database user/password. Depending on how the install was performed, the database user/password is probably either in /usr/local/etc/fossology/Db.conf or /etc/fossology/Db.conf. Only the fossy user should have access to this.
  3. FOSSology user/password. This is stored in the fossology database "users" table and is what is used when logging in to the fossology web user interface or accessing the fossology command line programs.

The same name, "fossy", happens to be used for all three of these account users. However, they are independent users in three separate parts of the system. The FOSSology user/password is used to log in to FOSSology. The Database user/password is used to authenticate with postgresql. The system user/password is used for file authorization.