The database

Main principles

Enhydris supports PostgreSQL (with PostGIS).

In Django parlance, a model is a type of entity, which usually maps to a single database table. Therefore, in Django, we usually talk of models rather than of database tables, and we design models, which is close to conceptual database design, leaving it to Django’s object-relational mapper to translate to the physical. In this text, we also speak more of models than of tables. Since a model is a Python class, we describe it as a Python class rather than as a relational database table. If, however, you feel more comfortable with tables, you can generally read the text understanding that a model is a table.

If you are interested in the physical structure of the database, you need to know the model translation rules, which are quite simple:

  • The name of the table is the lower case name of the model, with a prefix. The prefix for the core of the database is enhydris_. (More on the prefix below).

  • Tables normally have an implicit integer id field, which is the primary key of the table.

  • Table fields have the same name as model attributes, except for foreign keys.

  • Foreign keys have the name of the model attribute suffixed with _id.

  • When using multi-table inheritance, the primary key of the child table is also a foreign key to the id field of the parent table. The name of the database column for the key of the child table is the lower cased parent model name suffixed with _ptr_id.

The core of the Enhydris database is a list of measuring stations, with additional information such as photos, videos, and the hydrological and meteorological time series stored for each measuring station. This can be used in or assisted by many more applications, which may or may not be needed in each setup. A billing system is needed for agencies that charge for their data, but not for those who offer them freely or only internally. Some organisations may need to develop additional software for managing aqueducts, and some may not. Therefore, the core is kept as simple as possible. The core database tables use the enhydris_ prefix. Other applications use another prefix. The name of a table is the lowercased model name preceded by the prefix. For example, the table that corresponds to the Gentity model is enhydris_gentity.

Lookup tables

Lookup tables are those that are used for enumerated values. For example, the list of variables is a lookup table. Most lookup tables in the Enhydris database have three fields: id, descr, and short_descr, and they all inherit the following abstract base class:

class enhydris.models.Lookup

This class contains the common attribute of the lookup tables:

descr

A character field with a descriptive name.

Most lookup tables are described in a relevant section of this document, where their description fits better.

Lentities

The Lentity is the superclass of people and groups. For example, a measuring station can belong either to an organisation or an individual. Lawyers use the word “entity” to refer to individuals and organisations together, but this would create confusion because of the more generic meaning of “entity” in computing; therefore, we use “lentity”, which is something like a legal entity. The lentity hierarchy is implemented by using Django’s multi-table inheritance.

class enhydris.models.Lentity
remarks

A text field of unlimited length.

class enhydris.models.Person
last_name
first_name
middle_names
initials

The above four are all character fields. The initials contain the initials without the last name. For example, for Antonis Michael Christofides, initials would contain the value “A. M.”.

class enhydris.models.Organization
name
acronym

name and acronym are both character fields.

Gentity and its direct descendants: Gpoint, Gline, Garea

A Gentity is a geographical entity. Examples of gentities (short for geographical entities) are measuring stations, cities, boreholes and watersheds. A gentity can be a point (e.g. stations and boreholes), a surface (e.g. lakes and watersheds), a line (e.g. aqueducts), or a network (e.g. a river). The gentities implemented in the core are measuring stations and generic gareas. The gentity hierarchy is implemented by using Django’s multi-table inheritance.

class enhydris.models.Gentity
name

A field with the name of the gentity, such as the name of a measuring station. Up to 200 characters.

code

An optional field with a code for the gentity. Up to 50 characters. It can be useful for entities that have a code, e.g. watersheds are codified by the EU, and the watershed of Nestos River has code EL07.

remarks

A field with general remarks about the gentity. Unlimited length.

geom

This is a GeoDjango GeometryField that stores the geometry of the gentity.

display_timezone

Timestamps of time series records are stored in UTC. This attribute specifies the time zone to which timestamps are converted before displaying or downloading time series. It is a string holding a key from the Olson time zone list. Currently only time zones starting with Etc/GMT are supported.

Although the storage format of the time zone is Etc/GMT[±XX], it is displayed differently on the admin (and elsewhere). Etc/GMT is displayed as UTC; Etc/GMT-2 (2 hours east of UTC) is displayed as UTC+0200; and so on.

class enhydris.models.Gpoint(Gentity)
altitude

The altitude in metres above mean sea level.

class enhydris.models.Garea(Gentity)
category

A Garea belongs to a category, such as “water basin” or “country”. Foreign key to GareaCategory.

Additional information for generic gentities

This section describes models that provide additional information about gentities.

class enhydris.models.GentityFile
class enhydris.models.GentityImage

These models store files and images for the gentity. The difference between GentityFile and GentityImage is that GentityImage objects are shown in a gallery in the station detail page, whereas files are shown in a much less prominent list.

descr

A short description or legend of the file/image.

remarks

Remarks of unlimited length.

date

For photos, it should be the date the photo was taken. For other kinds of files, it can be any kind of date.

content

The actual content of the file; a Django FileField (for GentityImage) or ImageField (for GentityFile).

featured

This attribute exists for GentityImage only. In the station detail page, one of the images (the “featured” image) is shown in large size (the rest are shown as a thumbnail gallery). This attribute indicates the featured image. If there are more than one featured images (or if there is none), images are sorted by descr, and the first one is featured.

class enhydris.models.EventType(Lookup)

Stores types of events.

class enhydris.models.GentityEvent

An event is something that happens during the lifetime of a gentity and needs to be recorded. For example, for measuring stations, events such as malfunctions, maintenance sessions, and extreme weather phenomena observations can be recorded and provide a kind of log.

gentity

The Gentity to which the event refers.

date

The date of the event.

type

The EventType.

user

The username of the user who entered the event to the database.

report

A report about the event; a text field of unlimited length.

Autoprocess

enhydris.autoprocess is an app that automatically processes time series to produce new time series. For example, it performs range checking, saving a new time series that is range checked. The app is installed by default. If you don’t need it, remove it from INSTALLED_APPS. When it is installed, in the station page in the admin, under “Timeseries Groups”, there are some additional options, like Range Check, Time Consistency Check, Curve Interpolations and Aggregations.

You have a meteorological station called “Hobbiton”. It measures temperature. Because of sensor, transmission, or other errors, sometimes the temperature is wrong—for example, 280 °C. What you want to do (and what this app does, among other things) is delete these measurements automatically as they come in. In this case, assuming that the low and high all-time temperature records in Hobbiton are -18 and +38 °C, you might decide that anything below -25 or above +50 °C (the “hard” limits) is an error, whereas anything below -20 or above +40 °C (the “soft” limits) is a suspect value. In that case, you configure enhydris.autoprocess with the soft and hard limits. Each time data is uploaded, an event is triggered, resulting in an asynchronous process processing the initial uploaded data, deleting the values outside the hard limits, flagging as suspect the values outside the soft limits, and saving the result to the “checked” time series of the time series group.

(More specifically, enhydris.autoprocess uses the post_save Django signal for enhydris.Timeseries to trigger a Celery task that does the auto processing—see apps.py and tasks.py.)

Range checking is only one of the ways in which a time series can be auto-processed—there’s also aggregation (e.g. deriving hourly from ten-minute time series) and curve interpolation (e.g. deriving discharge from stage, or estimating the air speed at a height of 2 m above ground when the wind sensor is at a different height). The name we use for all these together (i.e. checking, aggregation, interpolation) is “auto process”. Technically, AutoProcess is the super class and it has some subclasses such as Checks, Aggregation and CurveInterpolation. These are implemented using Django’s multi-table inheritance. (The checking subclass is called Checks because there can be many checks—range checking, time consistency checking, etc; these are performed one after the other and they result in the “checked” time series.)

class AutoProcess
timeseries_group

The time series group to which this auto-process applies.

execute()

Performs the auto-processing. It retrieves the new part of the source time series (i.e. the part that starts after the last date of the target time series) and calls the process_timeseries() method.

source_timeseries

This is a property; the source time series of the time series group for this auto-process. It depends on the kind of auto-process: for Checks it is the initial time series; for Aggregation and CurveInterpolation it is the checked time series if it exists, or the initial otherwise. If no suitable time series exists, it is created.

target_timeseries

This is a property; the target time series of the time series group for this auto-process. It depends on the kind of auto-process: for Checks it is the checked time series; for Aggregation it is the aggregated time series with the target time step; for CurveInterpolation it is the initial time series of the target time series group (CurveInterpolation has an additional target_timeseries_group attribute). The target time series is created if it does not exist.

process_timeseries()

Performs the actual processing.