MF4 Decoders - DBC Decode CAN Data to CSV/Parquet [Data Lakes]
Need to DBC decode your CAN/LIN data to CSV/Parquet files?
The CANedge records raw CAN/LIN data to an SD card in the popular ASAM MDF format (MF4).
The simple-to-use MF4 decoders let you DBC decode your log files to CSV or Parquet files - enabling easy creation of powerful data lakes and integration with 100+ software/API tools.
Learn more below - and try the decoders yourself!
DBC decode your MF4 files to interoperable CSV/Parquet files
Drag & drop files/folders onto the decoder to process them
Optionally use via the command line or in scripts for automation
Easily create powerful Parquet data lakes for use in 100+ tools
Decoders can be used on both Windows and Linux operating systems
The decoders are 100% free and can be integrated into your own solutions
The ASAM MDF (Measurement Data Format) is a popular, open and standardized format for storing bus data e.g. from CAN bus (incl. J1939, OBD2, CAN FD etc) and LIN bus.
The CANedge records raw CAN/LIN data in the latest standardized version, MDF4 (*.MF4). The log file format is ideally suited for pro specs CAN logging at high bus loads and enables both lossless recording and 100% power safety. Further, the CANedge supports embedded encryption and compression of the log file data (both cases natively supported by the MF4 decoders).
The raw MF4 data from the CANedge can be loaded natively in various software/API tools, including the asammdf GUI/API, our Python API, the MF4 converters - and the MF4 decoders (detailed in this article).
To learn more about the MDF4 file format, see our MF4 intro.
The MF4 decoders can be deployed in numerous ways - below are two examples:
Example 1: Local PC drag & drop usage
- A CANedge records raw CAN/LIN data to an SD card
- A log file folder from the SD card is copied to a PC
- The relevant DBC files are placed next to the MF4 decoder
- The data is DBC decoded to CSV via drag & drop
- The CSV files can be directly loaded in e.g. Excel
Example 2: Cloud based auto-processing
- A CANedge uploads raw CAN/LIN data to an AWS S3 bucket
- When a log file is uploaded it triggers a Lambda function
- The Lambda uses the MF4 decoder and DBC files from S3
- Via the Lambda, the uploaded file is decoded to Parquet files
- The Parquet files are written to an AWS S3 'output bucket'
- The data lake can be visualized in e.g. Grafana dashboards
Learn more in our Grafana dashboard article.
The decoders support DBC files for practically any CAN bus, CAN FD or LIN bus protocol. This includes e.g. OBD2, J1939, NMEA 2000, ISOBUS, CANopen and proprietary OEM-specific DBC files. For details, see the MF4 decoder docs. For each DBC file, you specify which CAN/LIN channel to apply it to (e.g. can1, can2, lin1 etc) and you can provide multiple DBC files per channel.
Yes, the MF4 decoders support CAN based transport protocols, including ISO TP (OBD, UDS), J1939 TP (J1939, ISOBUS, NMEA 2000) and NMEA 2000 TP (NMEA 2000 Fast Packets).
You can customize specific TP related aspects in the DBC file(s) you use with the MF4 decoders and the latest DBC files we provide are generally prepared for use with the MF4 decoders out-the-box. The decoders will automatically identify valid multi-frame sessions, re-assemble the CAN frames and enable DBC decoding of the assembled frames.
The MF4 decoders enable you to output DBC decoded CAN/LIN data as either CSV files or Parquet files. Below we briefly outline the key differences between the formats:
- CSV files are simple, text-based, and universally compatible, ideal for small to medium-sized datasets and ad hoc analyses. They are easy to use, but less efficient for large data
- Parquet files are binary and offer vastly faster performance and storage efficiency vs CSV - but require more specialized tools for analysis
- While CSVs are straightforward for basic tasks, Parquet is generally preferable for performance-intensive analysis and handling large-scale time series data
For most use cases, we recommend to use the Parquet file format if both options exist - and many of our plug & play integrations take outset in Parquet data lakes for the above reasons.
FUNCTIONALITY INTEGRATION EXAMPLES
DBC decode raw MF4 data via drag & drop
The simple-to-use MF4 decoders let you drag & drop CANedge log files with raw CAN/LIN data to DBC decode them using your own DBC file(s) - outputting the data as CSV or Parquet files.
Batch decode (nested) folders
You can also drag & drop entire folders of MF4 log files onto a decoder to batch process the files. This also works for nested folders with e.g. thousands of log files.
Automate decoding via CLI/scripts
The decoder executables can be called via the CLI or from any programming language. Ideal for automated DBC decoding locally, in the cloud (e.g. in AWS Lambda), on Raspberry Pis etc.
import subprocess
subprocess.run(["mdf2parquet_decode.exe", "-i", "input", "-O", "output"])
Easily use with S3 storage
The CANedge2/CANedge3 upload data to your own S3 server. Mount your S3 bucket and use the MF4 decoders as if files were stored locally. Or use in e.g. AWS Lambda for full automation.
Easily decompress and/or decrypt your raw data
The CANedge supports embedded compression and encryption of log files on the SD card. The MF4 decoder natively supports compressed/encrypted files, simplifying post processing.
Create powerful Parquet data lakes
The decoders are ideal for creating powerful Parquet data lakes with an efficient date-partitioned structure of concatenated files - stored locally or e.g. on S3.
Visualize your CAN/LIN data in Grafana dashboards
Many dashboard tools can query data from Parquet data lakes via SQL interfaces (like Athena or ClickHouse), enabling low cost scalable visualization - see e.g. our Grafana-Athena intro.
Use Python to analyze Parquet data lakes
Python supports Parquet data lakes, enabling e.g. big data analysis. With S3 support, you can also analyze data directly in e.g. Colab Jupyter Notebooks. See the docs for script examples.
Use MATLAB to analyze Parquet data lakes
MATLAB natively supports Parquet data lakes - making it easy to perform advanced analysis at scale with support for S3 and out-of-memory tall arrays. See the docs for script examples.
Use Excel or Power BI to analyze your data lakes
Excel and Power BI let you load DBC decoded CSV/Parquet files for quick analysis - or use e.g. Athena/ClickHouse ODBC drivers to query data (beyond memory) from your data lakes via SQL
Easily analyze data via ChatGPT
ChatGPT is great for analysing large amounts of DBC decoded CAN/LIN data in CSV format. Learn more in our intro.
Want to try this yourself? Download the decoders and MF4 sample data below:
Store your data lake anywhere - and integrate with everything
Parquet data lakes combine cheap flexible storage with efficient interoperable integration opportunities.
Agnostic low cost storage
Parquet data lakes are comprised of compact, efficient binary files - meaning they can be stored at extremely low cost in any cloud file storage (e.g. AWS S3, Google Cloud Storage, Azure Blob Storage), self-hosted S3 buckets (e.g. MinIO) - or simply on your local disk. Storing Parquet files in e.g. AWS S3 is typically 95% lower cost vs. storing the equivalent data volume in a database.
Native Parquet support
As illustrated, Parquet data lakes are natively supported by a wide array of tools. For example, you can directly work with Parquet files within any programming language like Python/MATLAB - whether the files are stored locally or on S3. Further, Parquet files can be natively loaded in many desktop tools like Microsoft Power BI or Tableau Desktop.Powerful interfaces
Parquet data lakes are natively supported by a 'interfaces' like Amazon Athena, Google BigQuery, Azure Synapse and open source options like ClickHouse and DuckDB. These expose SQL query interfaces and ODBC/JDBC drivers that dramatically expand integration options - and super charge query speed. You can for example use interfaces to visualize your data in Grafana dashboards.
Parquet data lakes are comprised of files, meaning they can be stored in file storage solutions like e.g. AWS S3. Storing data on S3 is incredibly low cost (0.023$/GB/month) compared to most databases (typically ~1.5$/GB/month), which is relevant as many CAN/LIN data logging use cases can require terabytes of storage over time.
The 'downside' to storing files on S3 vs. in a database is generally the fact that it is much slower to query the data, e.g. for analytics or visualization. However, this is where the interface tools like Amazon Athena come into play as outlined below.
We refer to tools like Amazon Athena, Google BigQuery and Azure Synapse Analytics as 'interfaces' for simplicity. They can also be referred to as cloud based serverless data warehouse services. The serverless part is important: It means that it's simple to set up - and you pay only when you query data.
Many automotive OEM engineers often need to store terabytes of data for analysis - yet, the engineers may only need to access small subsets of the data on an infrequent basis. Yet, when they access the data, the query speed has to be fast - even if they query gigabytes of data. Tools like Amazon Athena are ideally suited for this. When you query the data, Athena spins up the necessary compute and parallelization in real-time - meaning you can extract insights across gigabytes of your S3 data lake in seconds using standard SQL queries. At the same time, all the complexity is abstracted away - and the solutions can be automatically deployed as per our step-by-step guides.
There are too many software/API integration examples for us to list - below is a more extensive recap of tools:
Direct integration examples
Below are examples of tools that can directly work with Parquet files:
- MATLAB: Natively supports local or S3 based Parquet data lakes, with powerful support for out-of-memory tall arrays
- Python: Natively supports local or S3 based Parquet data lakes and offers libraries for key interfaces (Athena, ClickHouse etc.)
- Power BI: Supports reading Parquet files from the local filesystem, Azure Blob Storage, and Azure Data Lake Storage Gen2
- Tad: Free desktop tool for viewing and analyzing tabular data incl. Parquet files. Useful for ad hoc review of your data
- Apache Spark: A unified analytics engine for large-scale data processing that supports Parquet files
- Databricks: A platform for massive scale data engineering and collaborative data science, supporting Parquet files
- Tableau: A data visualization tool that can connect to Parquet files through Spark SQL or other connectors
- Apache Hadoop: Supports Parquet file format for HDFS and other storage systems
- PostgreSQL: With the appropriate extensions, it can query Parquet files
- Cloudera: Offers a platform that includes Parquet file support
- Snowflake: A cloud data platform that can load and query Parquet files
- Microsoft SQL Server: Can access Parquet files via PolyBase
- MongoDB: Can import data from Parquet files using specific tools and connectors
- Teradata: Supports querying Parquet files using QueryGrid or other connectors
- Apache Drill: A schema-free SQL Query Engine for Hadoop, NoSQL, and Cloud Storage, which supports Parquet files
- Vertica: An analytics database that can handle Parquet file format
- IBM Db2: Can integrate with tools to load and query Parquet files
Interface based integrations
Below are examples of tools that can integrate via interfaces like Athena, BigQuery, Synapse, ClickHouse etc.:
- Power BI (driver): By installing a JDBC/ODBC driver (for e.g. Athena), you can use SQL to query your data lake
- Excel (driver): By installing a JDBC/ODBC driver (for e.g. Athena), you can use SQL to query your data lake
- Grafana: Offers powerful and elegant dashboards for data visualization, ideal for visualizing decoded CAN/LIN data
- Tableau: Known for its interactive data visualization capabilities, especially popular for business intelligence applications
- Looker: Employs an analytics-oriented application framework, including business intelligence and data exploration features
- Google Data Studio: Customizable reports/dashboards, known for user-friendly design and integration with Google services
- AWS QuickSight: A fast, cloud-powered business intelligence service that integrates easily with e.g. Amazon Athena
- Apache Superset Apache Superset is an open-source data exploration and visualization platform written in Python
- Deepnote Deepnote is a collaborative data notebook built for teams to discover and share insights
- Zing Data Zing Data is a data exploration and visualization platform supporting e.g. ClickHouse
- Explo Customer-facing analytics for any platform. Designed for beautiful visualization. Engineered for simplicity
- Metabase Metabase is an easy-to-use, open source UI tool for asking questions about your data
- Qlik: Offers end-to-end, real-time data integration and analytics solutions, known for the associative exploration user interface
- Domo: Combines a powerful back-end with a user-friendly front-end, ideal for consolidating data systems into one platform
- Sisense: Known for its drag-and-drop user interface, enabling easy creation of complex data models and visualizations
- MicroStrategy: Offers a comprehensive suite of BI tools, emphasizing mobile analytics and hyper-intelligence features
- Splunk: Specializes in processing and analyzing machine-generated big data via a web-style interface
- Exasol: Offers a high-performance, in-memory, MPP database designed for analytics and fast data processing
- Alteryx: Provides an end-to-end platform for data science and analytics, facilitating easy data blending and advanced analytics
- SAP Analytics Cloud: Offers business intelligence, augmented analytics, predictive analytics, and enterprise planning
- IBM Cognos Analytics: Integrates AI to help users visualize, analyze, and share actionable business insights
- GoodData: Provides cloud-based tools for big data and analytics, with a focus on enterprise-level data management and analysis
- Dundas BI: Offers flexible dashboards, reporting, and analytics features, allowing for tailored BI experiences
- Yellowfin BI: Delivers business intelligence tools and a suite of analytics products with collaborative features for sharing insights
- Reveal: Provides embedded analytics and a user-centric design, making data more accessible for decision makers and teams
- Chartio: A cloud-based data exploration tool, known for its ease of use and ability to blend data from multiple sources
Visualize data in Grafana dashboards
Want to create dashboard visualizations across all of your CAN/LIN data?
The CANedge2/CANedge3 is ideal for collecting CAN/LIN data to your own server (cloud or self-hosted). A common requirement for OEMs and system integrators is the ability to create dashboards for visualizing the decoded data. Here, the MF4 decoders can automate the creation of Parquet data lakes at any scale (from MB to TB) stored on S3 - ready for visualization via Grafana dashboards. Learn more in our dashboard article.
Analyze fleet performance in MATLAB/Python
Need to perform advanced large scale analyses of your data?
The CANedge3 lets you record raw CAN data to an SD card and auto-push it to your own S3 server via 3G/4G. Uploaded files can be DBC decoded to a Parquet data lake, output into a separate S3 bucket. This makes it easy to perform advanced statistical analysis via MATLAB or Python, as both natively support loading Parquet data lakes stored on S3. In turn, this lets you perform advanced analyses - with minimal code. See our script examples to get started.
Quickly analyse data as CSV via Excel
Need to swiftly review your DBC decoded data?
The MF4 decoders can be useful to quickly understand what can be DBC decoded from your raw CAN/LIN data. By simply drag & dropping your LOG/ folder from the CANedge SD card, you can create a copy in DBC decoded CSV form - and directly load this data for analysis in Excel. If you wish to perform more efficient analysis of large amounts of data in Excel, you can alternatively use e.g. a ODBC driver via Athena, DuckDB or Clickhouse - enabling efficient out-of-memory analyses.
Create a self-hosted multi-purpose Parquet data lake
Need a 100% self-hosted Parquet data lake - using open source tools only?
If you prefer to self-host everything, you can e.g. deploy a CANedge2/CANedge3 to upload data to your own self-hosted MinIO S3 bucket (100% open source) running on your own Windows/Linux machine (or e.g. a virtual machine in your cloud). You can run a cron job to periodically process new MF4 log files and output the result to your Parquet data lake. The Parquet files can be analysed directly via Python. Further, you can integrate it with an open source tool like ClickHouse for ODBC driver integrations or dashboard visualization via Grafana dashboards.
FAQ
Yes, you control 100% how you create and store your CSV/Parquet data lake.
In our examples, we frequently take outset in a setup where your CANedge2/CANedge3 uploads data to an AWS S3 bucket - with uploaded data automatically processed via AWS Lambda functions. This is a common setup that we provide plug & play solutions for - hence it will often be the simplest way to deploy your data processing and data lake.
However, you can set this up in any way you want. For example, you might store your uploaded log files on a self-hosted MinIO S3 bucket instead. In such a scenario, you can periodically process new MF4 log files manually (e.g. via drag & drop or the CLI) to update your data lake - or you can set up e.g. a cron job or similar service to handle this. The data lake can be stored in another MinIO S3 bucket - and you can then directly work with the data lake from here (e.g. in MATLAB/Python) or integrate the data using an open source system like ClickHouse or DuckDB.
The same principle applies if you upload data to Google Cloud S3 or Azure blob storage (via an S3 gateway) - here you can use their native data processing services to deploy the MF4 decoders if you wish to fully automate the DBC decoding of incoming data. We do not provide plug & play solutions for deploying this, however.
Of course, you can also simply use the MF4 decoders locally to create a locally stored Parquet data lake. This will often suffice if you're e.g. using a CANedge1 to record your CAN/LIN data - and you simply wish to process this data on your own PC. In such use cases, the Parquet data lake can still be a powerful tool, since it makes it much easier to perform large-scale data processing compared to tools like the asammdf GUI.
We provide two types of MF4 executables for use with the CANedge: The MF4 converters and MF4 decoders.
The MF4 converters let you convert the MDF log files to other formats like Vector ASC, PEAK TRC and CSV. These converters do not perform any form of DBC decoding of the raw CAN/LIN data - they only change the file format.
The MF4 decoders are very similar in functionality as the MF4 converters. However, these excecutables DBC decode the log files to physical values, outputting them as either CSV or Parquet files. When using the MF4 decoders, you provide your DBC file(s) to enable the decoding. These executables are ideal if your goal is to analyse the human-readable form of the data and/or e.g. create 'data lakes' for analysing/visualizing the data at scale.
No, you simply download the converter executables - no installation required.
As evident, there are almost limitless options for how you can deploy your MF4 decoders, how you can store the resulting data lake, how you provide interfaces for it - and what software/API tools you integrate it with.
You can use the CANedge and our MF4 decoders to facilitate any of these deployment setups. However, our team offers step-by-step guides and technical support only on limited sub sets of the deployments, such as our Grafana-Athena integration.
Need an interoperable CAN logger?
Get your CANedge today!
Recommended for you
>