InfluxDB

InfluxDB is a time series database hosted in the cloud.  

Writing Data

Optimized high performance writing of data to InfluxDB has the following characteristics:

  • Batch writes with 5000 lines of line protocol.
  • Use the coarsest time precision possible for the timestamp InfluxDB adds (default is nanoseconds).  
  • Compress the write request with gzip.  
  • Use the Network Time Protocol (NTP) to synchronize time between the source of the write and InfluxDB (hosts).

Python

Python example and how to write a batch and of size megabyte and of size gigabyte.   How to enable Gzip in Python.   By default, Python uses microsecond precision.  

Rate Limits

InfluxDB has the following rate limits as of February 2023:

  • Free Plan rate of 5 MB per 5 minutes (average of 17 kB/s for uncompressed bytes of normalized line protocol
  • Usage Based Plan rate of 300 MB per 5 minutes (avg of 1000 kB/s) for uncompressed bytes of normalized line protocol
  • Global Limits restrict to 50 MB maximum HTTP request batch size as compressed or uncompressed, and 250 MB maximum HTTP request batch size after decompression.  

Applying the global limit of 250 MB to the Usage Base Plan, than the average maximum write rate is 833 kB/s.  

InfluxDB Data Types & Sizes

Time with nanosecond precision is "2021-01-01T00:00:00.000Z' or 24 bytes.   A Unix timestamp with nanosecond precision is "9223372036854775806" or 19 bytes.   A IEEE-754 64-bit floating-point number is "-1.797693134862315E+308" or 25 bytes.   A signed 64-bit integer is "-9223372036854775808i" or 21 bytes.  

InfluxDB Performance Comparisons

As of October 2022, InfluxDB has whitepapers posted on their website demonstrating faster write (data ingestion) performance than the following competitive time series databases: Cassendra (5x faster), Elasticsearch (3.8x faster), MongoDB (1.9x faster), OpenTSDB (5x faster), Graphite (14x faster), and Splunk (17x faster).   The comparison characterizes the write performance (data ingestion) in terms of values per second.   The values vary in data type and randomly by value, resulting in a random data package size.   This makes it easy to assess the data ingestion performance for a variety of applications, but difficult to convert that ingestion to bytes/second/server for comparison to other system outside of those evaluated in their evaluation.   In all of the comparisons, 100 servers were used concurrently to process 87,264,000 values and the performance was measured over the ingestion of 100 values.  

In the comparison of InfluxDB to Elasticsearch, the whitepaper claims the average write (ingestion) throughput of InfluxDB was 2,674,948 values per second utilizing 100 servers (or 26,749 values/s per server).   In the comparison of InfluxDB to MongoDB, the whitepaper sites that the write or injestion performance of InfluxDB was 2,644,765 values per second utilizing 100 servers (or 26,447 values/s per server).  

NI DIAdem Ingestion Performance

In InfluxDB's performance comparison to MongoDB, a total of 87,264,000 values were created in the test data set, and then the performance was measured as 100 values were ingested.   This type of measurement is difficult to replicate in DIAdem without adversely affecting the ingestion process.  

I wrote a NI DIAdem script to create a CSV text file with 87,264,000 values consisting of 872,264 lines, with each line containing a Unix timestamp with nanosecond precision, followed by 100 random IEEE-754 64-bit floating-point numbers, all deliminated by a semicolon (e.g. 1676996959000000000;3.94714746398025E+299;4.55576917883636E+299;..).   It took NI DIAdem 120.7 sec to read the 1.80 Gb uncompressed text file and write it to a new TDMS binary file, or an average ingestion rate of 722,982 values/s (87,264,000 values / 120.7 sec ), or 53.8 GiB/hr.   Reading the TDMS file after it is created takes 0.1 sec or 1.36E+09 values/s, and writing any changes takes only 3.0 sec or 2.86E+07 values/s.  

NI DIAdem was able to ingest the data 27 times faster than InfluxDB, relative to the InfluxDB per server rate of 26,447 values/s per server.  

Avg Values/sec Test Conditions
722,982 NI DIAdem 87,264,000 values
26,447
100 InfluxDB servers and 87,264,000 values.
26,447 values/s per server
2,644,765 values per second utilizing 100 servers

 

Note several important facts about this test and the comparison:

  • NI DIAdem was able to ingest the data 27x faster than InfluxDB, despite the modest hardware used with NI DIAdem.   (Laptop with AMD Ryzen 7 5700U CPU, 16 GB RAM, a SSD, running NI DIAdem v2022 and Windows 11 OS)  
  • The NI DIAdem data set contained only floating point values.   A mix of other data types (string, boolean, integer, etc) could have been used, and probably would have resulted in even better ingestion performance.   Importing full precision floating point values was considered the most challenging case for NI DIAdem.  
  • NI DIAdem can ingest binary and compressed data at a significantly higher rate than text / CSV files.  
  • InfluxDB can match and exceed NI DIAdem's ingestion performance when 100 servers are employed.   However, the InfluxDB servers must be in close proximity to the source data and with a very high speed data connection in order to achieve the results reported in the InfluxDB test.  
  • Both InfluxDB and NI DIAdem will experience significant data transfer delays if the source data is remotely pushed from slower data connections such as cellular, WiFi, internet, etc.  
  • InfluxDB performance was significantly better than the other time series databases of: Cassendra (5x faster), Elasticsearch (3.8x faster), MongoDB (1.9x faster), OpenTSDB (5x faster), Graphite (14x faster), and Splunk (17x faster).   Since NI DIAdem data ingestion was 27x faster than InfluxDB, it should also be superior to the other time series databases.