Cloudflight Engineering Blog

Elevating Test Automation Excellence: Leveraging Interaction Modes in Team Topologies

Marco Hampel — Thu, 14 Mar 2024 09:58:30 GMT

In the first part How to enhance Test Automation with Team Topologies, we covered team structures in Team Topologies, with attention to Enabling teams.

Today we will focus on the interaction modes and highlight the possibilities in the context of a test automation enabling team.

Unlocking Interaction Modes: Empowering Enabling Teams

In the dynamic landscape of software development, test automation has emerged as a cornerstone for achieving speed, efficiency, and quality in product delivery. Within this ecosystem, the role of a test automation enabling team is pivotal. These teams specialize in providing the necessary expertise, tools, and support to empower other teams in implementing and maintaining robust automated testing practices. However, the success hinges not only on their technical proficiency but also on the effectiveness of their interactions with other teams.

In the next sections, we will explore three key interaction modes - Collaboration, X-as-a-Service, and Facilitation - and identify which modes are best suited to maximize their impact.

Collaboration: Fostering Shared Responsibility

Collaboration lies at the heart of effective teamwork and is particularly vital for test automation. By adopting a collaborative approach, the automation team can forge strong partnerships with development, testing, and operations teams, working together towards common goals.

Shared Understanding: Test automation enabling teams collaborate closely with other teams to gain insights into their testing needs, challenges, and priorities. By fostering open communication channels, they ensure that automated testing efforts are aligned with overall business objectives.
Cross-functional Expertise: Leveraging the diverse skill sets within the organization, collaboration enables test automation enabling teams to tap into domain-specific knowledge and technical expertise. This cross-functional collaboration enriches the quality of automated tests and enhances the effectiveness of testing strategies.
Continuous Improvement: Through collaborative retrospectives and feedback loops, teams can reflect on their testing practices, identify areas for improvement, and iterate on their automation strategies. By fostering a culture of continuous learning and adaptation, collaboration drives innovation and excellence in test automation.

X-as-a-Service: Providing Scalable Solutions

Providing testing-related services as a centralized offering.

Standardize Practices: Through X-as-a-Service, test automation enabling teams can establish standardized testing frameworks, tools, and processes that can be leveraged across the organization. This standardization promotes consistency, reduces duplication of effort, and accelerates the adoption of automated testing practices.
Scale Resources: Test automation enabling teams can scale their resources and expertise to meet the fluctuating demands of different teams and projects. By offering testing services as a scalable resource, they ensure that teams have access to the necessary support and guidance to accelerate their testing efforts.
Enable Self-Service: X-as-a-Service models empower teams to become more self-sufficient in their testing endeavors. By providing self-service platforms, tools, and documentation, test automation enabling teams enable other teams to autonomously create, execute, and maintain automated tests, thereby reducing dependencies and promoting agility.

Facilitation: Guiding and Empowering Teams

Facilitation plays a crucial role in guiding teams through complex challenges, fostering collaboration, and promoting innovation.

Clarify Objectives: Facilitation helps teams align on testing objectives, priorities, and strategies. By facilitating workshops, planning sessions, and brainstorming exercises, test automation enabling teams can ensure that testing efforts are focused and aligned with business goals.
Resolve Conflicts: Inevitably, conflicts may arise during the testing process. Facilitation techniques such as mediation and consensus-building can help teams navigate conflicts effectively, fostering a positive and constructive working environment.
Empower Teams: Facilitation empowers teams to take ownership of their testing processes and outcomes. By facilitating knowledge-sharing sessions, communities of practice, and peer learning initiatives, test automation enabling teams cultivate a culture of empowerment and collaboration.

Team API

Team API is a concept, referring to the interface or interaction points between teams within a project or an organization. Just like how software systems have APIs (Application Programming Interfaces) for communication between different software components, teams also need clear interfaces and communication channels to collaborate effectively. The Team API defines how teams interact, communicate, share information, and collaborate on work. It encompasses aspects such as responsibilities, expectations, communication channels, decision-making processes, and dependencies between teams. Establishing clear and well-defined Team APIs helps to streamline collaboration, reduce misunderstandings, and improve the overall efficiency of the organization. However, it is just as important to regularly challenge and maintain these definitions.

Here you can find the official Team API template.

Our Journey - Success Through Effective Interaction Modes

In our journey as a test automation enabling team, embracing Team Topologies alongside Lean Coffee sessions, regular Meetups, and the introduction of a Team API has been transformative. Team Topologies provided us with a structured framework for organizing our teams and optimizing interactions, leading to improved collaboration and productivity.

By leveraging collaboration, X-as-a-Service, and facilitation, we can orchestrate success, driving continuous improvement, innovation, and excellence in automated testing practices. Striving to deliver high-quality software at speed, the strategic adoption of interaction modes in team topologies emerges as a critical enabler for achieving this ambitious goal.

Enhancing Test Automation with Team Topologies: Leveraging Enabling Teams

Marco Hampel — Fri, 16 Feb 2024 07:30:45 GMT

In the fast-paced world of software development, teams are constantly seeking ways to optimize their processes and deliver high-quality products efficiently. One approach gaining traction is the implementation of team topologies, which restructures teams to better align with organizational goals and improve collaboration. In this blog post, we'll explore how leveraging enabling teams can enhance test automation within the context of team topologies.

Understanding Team Topologies

Team topologies, introduced by Matthew Skelton and Manuel Pais, provide a framework for designing effective team structures within an organization. It emphasizes the need for teams to align with the organization's architecture and business objectives. The four fundamental team types are:

Stream-aligned Team: These teams focus on delivering value directly to the customer. They are cross-functional and have all the necessary skills to deliver end-to-end solutions. Stream-aligned teams own a specific part of the business or product, enabling faster decision-making and reducing dependencies on other teams.
Enabling Team: Enabling teams provide support, tools, and platforms to streamline the work of stream-aligned teams and acting as a catalyst. They focus on creating self-service platforms, automation, and providing expertise to enable other teams to deliver efficiently.
Complicated Subsystem Team: Some parts of the system are inherently complex and require specialized knowledge. Complicated subsystem teams are responsible for maintaining and evolving these parts of the system. They collaborate closely with stream-aligned teams, providing expertise and guidance when needed.
Platform Team: Platform teams build and maintain shared platforms and services that streamline the work of other teams. They focus on creating reusable components, APIs, and tools that increase productivity and consistency across the organization. Platform teams abstract away common functionality, allowing stream-aligned teams to focus on delivering value.

These team topologies are designed to foster effective communication (there will be a follow-up article, where the communication and interaction models will be discussed in more detail), collaboration, and autonomy within organizations. By aligning teams with the value streams of the business and providing the necessary support and expertise, organizations can improve their agility, speed, and ability to innovate.

Enabling Teams and Test Automation

Enabling teams play a crucial role in fostering collaboration and enabling other teams to succeed. When it comes to test automation, enabling teams can significantly impact the efficiency and effectiveness of testing efforts across the organization. Here's how:

Providing Test Automation Frameworks: Enabling teams can develop and maintain robust test automation frameworks tailored to the organization's needs.
These frameworks can include libraries, tools, and best practices for writing and executing automated tests.
Offering Training and Support: Enabling teams can offer training sessions and workshops to educate other teams on test automation best practices and tools. They can also provide ongoing support and guidance to help teams overcome any challenges they encounter during test automation implementation.
Integrating Testing Tools: Enabling teams can integrate testing tools into the development workflow, making it seamless for teams to incorporate automated testing into their processes. This integration can include CI/CD pipelines, version control systems, and issue tracking tools.
Promoting Collaboration: Enabling teams can facilitate collaboration between development and operations teams to ensure that test automation efforts are aligned with overall business goals. They can encourage cross-functional collaboration and knowledge sharing to improve the quality of automated tests.

Case Study

Implementing Test Automation with an Enabling Team. Let's consider a hypothetical scenario where a software development company decides to implement test automation with the help of an enabling team.

Challenges

Manual Testing Overload: The team is overwhelmed with repetitive manual testing tasks and regression testing, leading to slower release cycles and increased risk of human errors.
Inconsistent Testing Practices: There are inconsistencies in test execution, test coverage and reporting across different projects.
Lack of Automation Expertise: The development teams lack the expertise and resources to implement and maintain test automation frameworks effectively.

A (possible) Solution

To address these challenges, the software development company decides to form an enabling team dedicated to implementing test automation across the organization. The enabling team consists of experienced automation and quality engineers, and DevOps engineers who collaborate to build robust automation frameworks and provide support to development teams.

Implementation Steps

Assessment and Planning
- The enabling team conducts a thorough assessment of existing testing processes, tools, and infrastructure.
- They identify areas where automation can bring the most significant benefits and prioritize them based on impact and feasibility.
Framework Development
- The enabling team designs and develops a flexible and scalable test automation framework tailored to the company's specific needs and technologies.
- They choose appropriate testing tools and technologies based on the project requirements and industry best practices.
Training and Support
- The enabling team conducts training sessions and workshops for development teams to educate them about test automation best practices, tools, and frameworks.
- They provide ongoing support and guidance to help teams adopt automation seamlessly and address any challenges they encounter during the transition.
Integration with CI/CD Pipelines
- The enabling team integrates test automation scripts into the company's continuous integration and continuous delivery (CI/CD) pipelines to automate the execution of tests as part of the build and deployment process.
- They implement reporting and alerting mechanisms to provide real-time feedback on test results and identify issues early in the development lifecycle.
Monitoring and Maintenance
- The enabling team establishes monitoring mechanisms to track the performance and effectiveness of automated tests.
- They continuously monitor and maintain the automation framework, updating it as needed to accommodate changes in the software and technology landscape.

Results

Increased Test Coverage: Test automation significantly increases test coverage, allowing for more thorough testing of critical functionalities and edge cases.
Faster Release Cycles: Automated tests reduce the time spent on manual testing, enabling faster and more frequent releases without compromising quality.
Improved Quality: Automation leads to fewer defects in production, resulting in improved customer satisfaction and reduced support and maintenance overhead.
Empowered Teams: Development teams feel empowered to focus on delivering value-added tasks, knowing that repetitive and time-consuming testing activities are automated.
Continuous Improvement: The enabling team fosters a culture of continuous improvement, regularly evaluating and enhancing the test automation practices to keep pace with evolving technology and business needs.

Conclusion

In conclusion, leveraging enabling teams can greatly enhance test automation efforts within an organization. By providing frameworks, training, support, and promoting collaboration, enabling teams can empower other teams to implement and maintain effective automated testing practices. As organizations continue to prioritize quality and efficiency in software development, the role of enabling teams in driving successful test automation initiatives will become increasingly important.

The series will continue with the next part focussing on communication and interaction patterns soon - stay tuned!

Lakehouse: Securing data access

Stefan Starke — Wed, 14 Feb 2024 17:00:57 GMT

In this article, we'll delve into best practices for securing data access when using Microsoft Fabric Lakehouses. The goal is ensuring that your valuable data remains protected and enable people from within your organization to see subsets of the data that is relevant for their field of work. In the end we will have a look into sharing data with report creating users.

Key Takeaways

Data access restriction is important.
Do not use data masking as a security measure.
Create reports based on shared semantic models.

Starting Point

Let's assume we look at our Microsoft Fabric platform. We see fancy data pipelines that grep data from whatever sources you can think of, transform the hell out of the data leaving us with perfect, golden data inside a Lakehouse. Of course in form of nice delta tables.

Now how can we give data access to people within our organization and ensure that they

Only see tables that are relevant for them? (Object Level Security)
Only see columns that are relevant for them? (Column Level Security)
Only see data (rows) that is relevant for them? (Row Level Security)

Basically how can we enable them to query the data they are allowed to see, understand the data model and build reports on their own? Just in case you need it, the fancy buzzword is Fostering self-service reporting while safeguarding data.

Strategy

To realize this goal we adopted the following strategy:

Read-Only SQL Connection: Users are granted read-only access to the SQL endpoint of the lakehouse. By default, they are shielded from access to underlying tables, minimizing the risk of inadvertent alterations or unauthorized access.
Row-Level Security (RLS): RLS mechanisms are deployed to dynamically filter data at the row level based on user identities or membership in Microsoft 365 groups. This granular access control ensures that users only interact with data relevant to their designated roles or responsibilities.
Column-Level Security (CLS): Complementing RLS, CLS configurations restrict access to specific columns within tables, safeguarding sensitive information from unauthorized disclosure.
Data Masking: As an additional layer of defense, data masking techniques are employed to obfuscate sensitive information displayed to users. Primarily cosmetic, this approach must not be compared to CLS as the risk of data leakage via brute-force queries is a given fact. See example lateron.
Semantic Data Models for Self-Service Reporting: Empowering users with semantic data models serves as a structured abstraction layer, enabling them to navigate and interpret complex datasets intuitively. This approach fosters self-sufficiency in report creation while ensuring adherence to organizational data standards.

Sharing the SQL endpoint of a lakehouse with recipients or groups is straight forward.

To give recipient a basic Connect permission like in an SQL server be sure to NOT select any of the available sharing options. By doing so per default no data is available to be read and GRANT permissions need to be explicity defined. Some kind of a trust nobody strategy.

For more details see MS Data Warehouse Sharing.

Row-Level Security

Row-level security ensures that user can only access rows they are allowed to see.

These restrictions are applied directly inside the data tier (database) every time data is accessed. This makes it more reliable, robust and less error prone than restrictions that are applied inside an application tier.

To enable row-level security two things are needed

a schema containing an inline table-valued function
a security policy using the created function as a predicate

Let' have a look at a simple example.

First creating a new schema including a function that returns 1 when the value of the email is the same as the user executing the query. The value of the user is read via USER_NAME(). To make it more robust all values are converted to lowercase.

CREATE SCHEMA sec;GOCREATE FUNCTION sec.customers_validate_rlssecurity(@email as VARCHAR(500))    RETURNS TABLEWITH SCHEMABINDINGAS    RETURN SELECT 1 AS validate_rlssecurity_result    WHERE LOWER(@email) = LOWER(USER_NAME())GO

Now all that is left to do is create a security policy, make the connection to the corresponding table (dbo.customers) and set its STATE to ON.

CREATE SECURITY POLICY CustomersFilterADD FILTER PREDICATE sec.customers_validate_rlssecurity(Email)ON dbo.customersWITH (STATE=ON);GO

During development it can be useful to disable the policy temporarily. Just set the STATE to OFF.

ALTER SECURITY POLICY CustomersFilterWITH (STATE = OFF);

Looking into performance of RLS is most likely an additonal blog post on its own. Roughly the impact will be comparable to using a view. Two simple, basic recommendations to keep in mind

Keep the predicate function simple. The more joins the worse the performance.
Ensure there are indices on referenced tables.

Column-Level Security

Nothing fancy here. Grant access to users, groups as needed.

GRANT SELECT ON customers(FirstName, LastName, Email) TO [restricted.robert@cloudflight.dev];

It is important to know, that CLS in place will prevent restricted users from executing SELECT * statements like the one below as they will fail with an error message. Wanted columns need to be specified explicitly in the SELECT statement.

SELECT * from customers;

Msg 230, Level 14, State 1, Line 12The SELECT permission was denied on the column 'Age' of the object 'customers', database '****', schema 'dbo'.

In a way this may be seen as kind of leaking information as it tells the user that there are further columns and how they are named BUT at least they are not accessible.

It does however result in the direct SQL endpoint not being usable when creating a Power BI report. It will fail connecting to the endpoint with the same exceptions as above.

This issue can be solved by creating a semantic model of the lakehouse and use it as the base for creating a report. We will come to that later. Moreover it is the recommended approach when speaking about performance.

For more details see MS Column-Level-Security.

Data Masking

The main use case of data masking is to hide sensitive data in the result of a query.

It can be applied easily on columns by using a command like

ALTER TABLE dbo.customers;ALTER COLUMN City ADD MASKED WITH (FUNCTION = 'default()');GO

Resulting, as expected, in queried data being masked.

BUT there is a - from our point of view - rather alarming fact. One should never every use data masking as a replacement for column-level security. Data masking is applied once the data has been queried. This makes it possible to perform (automated) brute force queries for gathering insights into the masked data.

This is how - although data masking is in place - you can find out users from the marvelous, imaginary city Burnsmouth.

With having proper column-level security in place the user would not be able to get such insights as access to the city column would be denied completly.

Fore more details see MS Dynamic Data Masking.

Building reports with semantic models

Now that everything is secure, how can we best share data - including an easy to grasp data model - with someone eager to create reports?

The best way to achieve this is by sharing semantic models.

Let's look at some of the reasons why.

Firstly we have seen above, that with column-level security in place Power BI will fail accessing data from just an SQL endpoint as it is unable to query the structure of the tables. Most likely Power BI is doing a SELECT * FROM table to get all available columns which fails with proper column-level security in place.

Secondly a semantic model is a logical description of an extract of your lakehouse, most likely from a specific domain (e.g. Sales, HR). Most of the times it will be a star schema and can help report creating users easily understand the data model.

Thirdly the default connection type when using semantic models is the so called Direct Lake mode. It combines the existing connection types - Direct Query and Import - by combining their advantages. This means it is fast and there is no need for periodically updating your reports dataset to reflect the latest data changes. Going into detail would be a blog post on its own. For now you can find out more details here.

Every lakehouse creates a default semantic model but the recommendation is to create specific ones for individual use cases, narrowed down to the tables, relationships and columns the users need and are allowed to access (combined with column-level security).

Let's illustrate this process.

Create a semantic model for a lakehouse
Include needed tables and model their relationships
Hide all columns that are not visible to the user because of column-level security (everything but the employee name from the dimension table)

Share the semantic model with a group of users
Let them import the semantic model in Power BI to build reports on top of it

After connecting we see the expected result. The employee table only shows the employee column. Job done.

Remember, with column-level security in place connecting to the SQL endpoint would fail during import.

That is it for now.

Stay tuned for more posts in the area of data engineering with Microsoft Fabric. Especially the important topic of how to automatically test granted permission and security in general might be an interesting next read.

Rapid Development with React-Admin and Fastify

Peter Jedinger — Fri, 13 Oct 2023 06:00:09 GMT

(Written in collaboration with Mihai-Andrei Dancu)

Introduction

Software development is a time-intensive task and requires skilled software engineers to get the job done. Time and budget are directly proportional to one another and therefore as little development resources as possible should be wasted. Especially in small to mid-sized projects most development time should be spent on feature implementation and time spent on other tasks should be minimized.

As part of the rapid development project of the Cloudflight Technical Lab, we researched software solutions with a focus on quickly achieving results and developing a functional MVP in a very short time. Every team was tasked to develop a generic resource management system as described in the introduction of this article series. Our team specifically built an interactive React-Admin frontend and a NodeJS backend using Fastify with a PostgreSQL database. Our development insights are discussed in this article and the technologies are evaluated in terms of their suitability for rapid development.

Tech stack

Besides the general resource management functionality, we implemented the sending of automated confirmation e-mails in the backend using Mailhog as a test mail server.

React-Admin

React is a powerful web development framework with industry-wide adoption due to its clever component architecture and its capabilities to build interactive web pages. React-Admin builds on these attributes and adds useful extensions to simplify development and ensure maintainability. It is the promising bridge between custom high-code solutions and low/no-code solutions by reducing development time through abstracting existing React libraries.

The main benefit of React-Admin is its inclusion of numerous out-of-the-box Material UI components, which can be used to quickly build dashboard-styled web apps similar to YouTube Studio and Spotify. It also offers integrations to quickly implement authentication, permissions, internationalization and lots of other useful functionalities. React-Admin is best used for CRUD-based applications but can be limiting when developing complex web apps.

import {Admin, Resource} from "react-admin";import {SpotList} from "../components/spots";import {dataProvider} from "../providers/dataProvider";export const App = () => (    <Admin dataProvider={dataProvider}>        <Resource name="spots" list={SpotList}/>    Admin>);

The core of React-Admin - the data provider specification, the declared resources (only one in this case) and the Admin component

The SpotList is automatically fetched via DataProvider and all data is displayed in a table.

Pros

React-Admin components are a good base for dashboard-style apps
Components are styled and responsive by default
Routes for resource endpoints are configured automatically
Data provider abstraction allows for automatic API communication
Built-in role-based access control
Integrated internationalization (40+ locales supported)
Scalability assured by React's architecture

Cons

Custom functionality (everything besides CRUD) can be difficult to implement
Backend API implementation has to follow data provider specifications
Some features and UI components are only available via an Enterprise Edition subscription (e.g. site-wide search, breadcrumb paths, AI autocomplete and more)
Little TypeScript documentation is available, mostly JavaScript

Fastify

Fastify is a modern NodeJS web framework that focuses on developer experience, low overhead and responsiveness by efficiently managing server resources. Its powerful plugin architecture allows developers to quickly implement features while requiring minimal configuration effort.

Plugins can be self-developed, but there are also core and community plugins, which can be added as dependencies. As an example, a database connection plugin can be registered in the application context and by using decorators it can then be accessed from anywhere in the application.

As a web framework, Fastify provides an easy solution to declare endpoints and routes for HTTP communication. Hooks can be used as event listeners to execute custom code, whenever a specific action is executed, e.g. a reply header is added before a request is returned.

const fp = require('fastify-plugin');module.exports = fp(function (fastify, opts, done) {    fastify.addHook("onRequest", async function (request, reply) {        reply.headers(            {"Access-Control-Allow-Origin": "*",             "Access-Control-Allow-Headers": "*",             "Access-Control-Expose-Headers": "*"}        );    });    done()})

A simple plugin that adds CORS headers to every reply

Moreover, Fastify requires very low overhead compared to similar frameworks. Only the core configuration file (package.json) for the Node.js eco-system is required and no additional boiler-plate code is needed. A "hello world" application can be written in ~10 lines of code.

const fastify = require('fastify')({ logger: true })fastify.get('/', function (request, reply) {    reply.send({hello: 'world'})})fastify.listen({port: 3000}, function (err, address) {    if (err) {        fastify.log.error(err)        process.exit(1)    }})

"Hello World" in Fastify

Pros

Plugin architecture can accelerate development
Plugins can easily be reused in different projects
Fastify supports TypeScript (declaration file is maintained)
Fastify can serve up to 30k requests per second (claims by Fastify)
The plugin system allows an easy shift from monolithic applications to microservices (when the context is configured correctly)
Good official documentation resources

Cons

Understanding the application context and scope of the plugin registration can be complicated at first
Low overhead also means that all required functionality has to be self-implemented
Files can become very verbose (especially routes with parameter definitions)
Relatively small development community (few online discussions/threads)
No TypeScript documentation is available, only JavaScript

Lessons Learned

Picking the right tech stack is one of the most important decisions when trying to accelerate the software development process. The best choice varies from project to project and it can be difficult to find the sweet spot between high and low/no-code solutions. High-code frameworks require lots of setup and configuration time but come with a lot of functionality out of the box (e.g. Spring Boot). Low-code solutions provide pre-built components that enable developers to bootstrap an application in minutes but might not be as flexible when custom functionality is required. Depending on the chosen technologies and frameworks a trade-off between flexibility and invested time always has to be made when considering development overhead.

The Fastify framework has very low overhead, but this also means that every required functionality needs to be self-implemented or at least self-configured if there is an existing plugin. The plugin architecture is a great system to accelerate development, especially if there is a baseline of existing plugins from previous projects. If a team is confident in working with it, Fastify can be a great framework choice.

React-Admin provides useful abstractions to quickly build simple CRUD-based dashboard-style apps in a few lines of code. If developers are working within the constraints of React-Admin, satisfactory results can be achieved rapidly, but implementing custom features can be very restrictive when working with the framework. Depending on the project scope, React-Admin can be a solid choice. It is best used for less complex projects with resource inspection and manipulation as the main focus.

Sadly there is no clear-cut solution to increase development speed by picking the right frameworks. All project requirements (and possible requirements in the future) need to be considered to make an educated decision. In the end, frameworks are just tools that should help developers fulfil the requirements, but suboptimal framework choices can negatively impact development time or even lead to project failure. The best framework choice is often the one teams are most experienced with because key concepts and limitations are known beforehand. When unfamiliar frameworks are proposed to be used, time should be spent on research and preparation to avoid problems in the future. If the correct framework is chosen for a specific use case, development speed can be increased significantly.

We hope that you learned something by reading this article and maybe gained a new perspective on framework choice. React-Admin and Fastify can be solid options and hopefully, you received some insight on whether they might be a good fit for one of your future projects. Keep on coding, cheers!

Microsoft Power BI

Oguzhan Tuncer — Fri, 06 Oct 2023 11:52:40 GMT

(Written in collaboration with Andreas Schweiger & Stefan Starke)

In today's data-driven world, organizations big and small rely on data to make informed decisions, gain insights, and drive business growth. However, raw data alone is seldom enough; it needs to be transformed into actionable information. This is where Power BI steps in, offering a powerful suite of tools to analyze, visualize, and share data.

Before looking into the possibilities and techniques on how to embed Power BI reports - which is one of the most important questions in our custom software projects, let's have a brief look at the question:

What is Power BI and which key features does it offer?

In brief, Power BI is a business intelligence and data visualization tool that enables users to turn raw data into interactive visual reports and dashboards. It provides a unified platform for data exploration, data preparation, data modeling, and data sharing. With its intuitive interface and robust capabilities, Power BI has become the go-to choice for organizations seeking to extract meaningful insights from their data.

To give an idea, let's look at a sample report (which can be downloaded here):

Key Features

1. Data Integration

Power BI connects to a wide range of data sources, including databases, cloud services, and on-premises data. It can seamlessly integrate data from Excel spreadsheets, SQL databases, DataVerse, Azure, and many more, making it easy to consolidate and transform data from various sources into a single dataset.

2. Data Transformation and Modeling

Power BI offers a powerful data modeling engine that allows users to shape and transform data using the Power Query Editor. This tool is particularly useful for cleaning, filtering, and structuring data before analysis. Users can create relationships between tables, define calculated columns and measures, and apply advanced transformations.

3. Interactive Visualization

One of Power BI's standout features is its ability to create stunning visualizations. Users can choose from a wide range of charts, graphs, and maps to display data in a compelling and informative way. The drag-and-drop interface makes it easy to create interactive dashboards that update in real-time as data changes.

4. Collaboration and Sharing

Power BI enables collaboration by allowing users to share reports and dashboards with colleagues or external stakeholders. The ability to share reports and collaborate on data analysis fosters a culture of collaboration within organizations, promoting better knowledge sharing.

5. AI and Machine Learning Integration

Power BI integrates with Azure Machine Learning, allowing users to embed machine learning models into their reports and dashboards. This enables predictive analytics, anomaly detection, automated insights generation, and a Q&A feature. It uses natural language processing to generate visualizations and answers based on the data available.

6. Mobile Access

With Power BI Mobile, users can access their reports and dashboards on smartphones and tablets. This ensures that decision-makers have access to critical data wherever they are.

Licenses

Choosing the right Power BI license is crucial for maximizing the benefits of this powerful tool while staying within a defined budget.

In general, licensing can be split into two main categories, user-based (Free, Pro, Premium Per User) and capacity-based (Premium, Embedded) licensing.

License Type	Target Users	Description	Cost
Power BI Free	Anyone	It allows users to create reports and dashboards in Power BI Desktop, but it has limitations on sharing and collaboration. It's a good starting point for personal use or exploring Power BI's capabilities.	Free
Power BI Pro	Individual users who need to publish and share reports and dashboards within their organization	User-based license for report sharing and collaboration.	$9.99 per user per month
Power BI Premium	Organizations with larger user bases and more demanding requirements	It provides dedicated capacity for faster and more reliable performance. Pricing depends on the number of virtual cores and the amount of RAM allocated to the Premium capacity.	Variable, based on capacity and users.
Power BI Premium Per User (PPU)	Suitable for small to medium-sized businesses	User-based license that allows users to access premium features without the need for a full-scale Power BI Premium capacity.	$20 per user per month
Power BI Embedded	Developers and ISVs (Independent Software Vendors)	License for embedding Power BI reports and dashboards into custom applications. Pricing depends on the number of virtual cores and the amount of RAM allocated to the Premium capacity.	Variable, based on usage

Integrating Power BI Reports

If you are already developing applications or webpages based on the Power Platform ecosystem (CLF Engineering Blog) then embedding a report into Power Apps is your way to go.

Integrating into Power Apps or Power Pages

This integration allows you to embed Power BI content directly into web pages hosted on a Power Apps Portal, providing a seamless user experience for external customers, or users who may not have direct access to Power BI. Within your Power Apps Portal, you can embed Power BI reports or dashboards into web pages. This can be done using the "Power BI" component or HTML iframe.

For more details, you can refer to the official documentation or our Cloudflight Engineering Blog.

Integrating into a Custom Application

For us - as a company developing custom software - the most interesting approach is to embed reports into other applications using Power BI Embedded.

To ensure that only authorized users can view embedded Power BI content, a robust authentication and authorization system is required and that is where Power BI Embedded Tokens come into play.

Power BI Embedded tokens are a type of security token that grants access to specific Power BI content. They are used to authenticate users and control their access to embedded reports and dashboards. Here's how they work:

Generate Token: When a user requests to view an embedded Power BI report or dashboard, the hosting application (our custom-developed applications) needs to authenticate the user with Power BI. It does this by requesting an embedded token.
Token Parameters: The token request typically includes parameters such as the user's identity and roles and the specific report/dashboard to be accessed. These parameters determine what the user is allowed to see.
Token Issuance: Power BI generates a token based on the provided parameters. This token is a temporary, time-limited access key.
Access Control: The token contains information about the user's permissions, roles and the content they are allowed to access. When the user tries to access the embedded content, the token is validated to ensure the user has the necessary permissions.
Expiration: Power BI Embedded tokens have a limited lifespan, which enhances security. Once the token expires, the backend must request a new one for continued access.

Let's have a look at the steps we implemented to acquire embedded tokens:

User Authentication: Firstly, the user of your web application goes through an authentication process within your web app using your chosen authentication method. This step verifies the user's identity.
Web App Authorization: Our web application, having successfully authenticated the user, utilizes a service principal to establish authentication with Azure Active Directory (Azure AD). This step grants our web app the necessary permissions for interaction with Power BI REST APIs by requesting an Azure AD token.
Embed Token Request: Our web application communicates with the Power BI Embed Token REST API operation, initiating a request for an embed token. This specific token defines precisely which Power BI content can be embedded within our application as explained above. In response to our request, the REST API provides our web application with the embed token, which is specific to the requested Power BI content.
Passing the Embed Token: Our web application then securely passes this embed token to the user's web browser, allowing the user's browser to facilitate the interaction with Power BI.
User Access: Finally, the web app user employs the embed token within their browser to access and interact with Power BI content, as authorized by the token's permissions.

Feel free to have a look at the source code provided by Microsoft to see how you can acquire an Azure AD token using a service principal and how to generate embed tokens.

Regarding the frontend integration, Power BI supports all major UI frameworks - Angular, VueJS and React.

type: "report",        id: "",        embedUrl: "",        accessToken: "",        tokenType: models.TokenType.Embed,        settings: {            panes: {                filters: {                    expanded: false,                    visible: false                }            },            background: models.BackgroundType.Transparent,        }    }}    [cssClassName] = { "reportClass" }    [phasedEmbedding] = { false }    [eventHandlers] = {        new Map([            ['loaded', () => console.log('Report loaded');],            ['rendered', () => console.log('Report rendered');],            ['error', (event) => console.log(event.detail);]        ])    }>

Row Level Security

Row Level Security (RLS) in Power BI is a security feature that allows you to control access to data at the row level based on user roles and filters. This means you can restrict what data individual users or groups of users can see within a Power BI report or dataset.

RLS is typically implemented by creating user roles within your Power BI model. Each user role can have specific data access rules associated with it. User roles can be defined and managed in Power BI Desktop or through Power BI service.
To enforce RLS, you use filter expressions within user roles. These filter expressions are written in a DAX (Data Analysis Expressions) language and define which rows of data are visible to users in that role. Filter expressions can be as simple or complex as needed, allowing you to create dynamic filters based on user attributes, such as username or department. You can find an example usage of DAX expressions where the users will see their data rows ONLY if the underlying data set has a record under their username.
RLS provides dynamic security, meaning that the data is filtered in real time as users interact with the report or dataset. Users will only see the data that aligns with their role and the applied filters. The filters can be applied to specific tables and roles as follows:
Once RLS rules are defined and tested, you can publish your Power BI report or dataset to the Power BI service. RLS rules are enforced in the service as well, ensuring consistent security across different platforms and devices.

Example of using RLS for filtering per person:

We want to filter everything in the sample report such that the logged-in executive can only see their data. For that, we will have to use USERPRINCIPALNAME() in DAX which will return the logged-in executive. Create a new measure on the table that has the executive names and name it "User" with the value USERPRINCIPALNAME(). Then, create the security role by clicking on the Modeling tab -> Manage Roles. Create a role and then define the filter. This filter simply means that the logged-in user will only see his/her records in the whole data set:

This is how the sample report looks when we view it as the executive "Andrew Ma" after RLS:

Example of using RLS to hide/show items depending on the security role:

A function to hide an entire page or specific elements in a report for certain roles is not available but there is a workaround on PowerBI Desktop. You can build your report as normal, then add a card and make it big, so that it overlays all the things you want to hide.

Warning: Note that this workaround would only visually hide the report from the UI and a malicious user would still have access to the data in the underlying dataset. Therefore, it should not solely be employed in combination with RLS when working with real data.

To do that, enter new data and create the following table:

Create a new measure and write the following DAX which creates the message you want to display:

Message = IF(HASONEFILTER('RLS Table'[RLS]), "You are not authorized!", "")

Create another measure and write the following DAX which controls the background of the overlay card we will create:

Make Transparent = IF(HASONEFILTER('RLS Table'[RLS]), "#White", "#FFFFFF00")

Create a card and make it as big as the report page. Choose the "Message" measure to be displayed on it.

Format the RLS Table's visual in the general tab and find the background properties. Set the first dropdown to "Field value" and the second to "Make Transparent".

Create a new role that will not be authorized to see the report:

This is how the report should look like when you make the overlay card as big as the report:

Conclusion

Power BI is a powerful tool that empowers organizations to harness the full potential of their data. With the possibilities for embedding reports into applications, it is a valuable puzzle piece in our toolkit when it comes to efficiently implementing custom software solutions.

Reliable communication using the Transactional Outbox Pattern

Andrei Cotor — Fri, 29 Sep 2023 12:13:21 GMT

In today's digital age, email communication remains an indispensable tool for businesses and individuals alike. Whether it's sending important notifications, marketing campaigns, or transactional updates, emails play a pivotal role in ensuring effective communication.

At first glance sending an email is just a line of code, right? Well, integrating any asynchronous messaging functionality (e.g. sending emails, sending data to 3rd party services, billing systems etc.) into software applications can be a daunting task, especially when it comes to ensuring the reliability and consistency of message delivery. Most of the time applications need a way to send one message if and only if the database gets updated with an entry. A simple example of when this behavior is necessary would be user registration. When a new user registers, the application has to send a confirmation email to their address. Seems easy enough, so what could go wrong?

Let us try writing some Spring Boot code with Kotlin to illustrate the problem:

@Transactionalfun registerUser(user: User){    userRepo.save(user)    emailService.send(ConfirmationEmail())}

At first sight, this code may seem right. But what happens if the server encounters an error after saving the new user? The email will get sent but the user will not exist in the database. Now, you might think of wrapping this code in a try-catch block such that we don't execute the send function if the save operation fails. This would look something like this:

@Transactionalfun registerUser(user: User){    try {        userRepo.save(user)        emailService.send(ConfirmationEmail())    }    catch(err: UserRepoException) {        // ...        }}

Unfortunately, this isn't a good approach either. What if the email provider is not available at the moment? The user will be persisted in the database, but the confirmation email will never arrive to them. It will not even be sent to begin with. This seems unacceptable in a professional, modern web application. We would like to somehow roll back the saving of the user if sending of the email fails. This might remind you of the lesson about transactions from the databases course.

Transactional Outbox Pattern to the Rescue

The Transactional Outbox Pattern, a proven architectural design, provides a robust and reliable solution for managing email sending within applications. It addresses various use cases where message delivery is crucial and helps mitigate potential failures that can occur when this pattern is not implemented.

The basic idea is to have an Outbox table that contains the emails our application has to send out. We can now insert, in the same transaction, into this new table and the User table when a new user is created.

Key Components of the Transactional Outbox Pattern:

Outbox: The central component of this pattern is the "outbox," which acts as a temporary storage for messages that need to be sent. When a message needs to be sent (e.g. an email), instead of sending it directly, it is first placed in the outbox, in the same transaction as the update on the other table (e.g. Insert in the User table). This outbox can be implemented as a database table, a message queue, or any other persistent storage. In our case, it will be a database table.
Message Queue or Scheduler: An essential part of the pattern is a mechanism that monitors the outbox for messages and sends them at an appropriate time. This can be done using a message queue (e.g. RabbitMQ, Kafka) or a scheduler that periodically checks the outbox for pending messages. When a message is sent successfully, it is marked as "sent" in the outbox.
Transactional Behavior: The Transactional Outbox Pattern ensures that message sending is part of a larger transaction as an atomic database operation. If the transaction fails (e.g., due to an error or an exception), the messages are not inserted into the outbox and the data is not updated in the other table (UsersTable in our example). This guarantees that messages are only sent when the whole transaction is successful, maintaining data consistency. This behavior can be easily achieved in Spring JPA using the @Transactional annotation since it satisfies the "ACID" requirements: it is Atomic, Consistent, Isolated and Durable.

Sequence Diagram

We can illustrate the flow of registering a new user in an application that implements the Transactional Outbox pattern. It would look something like this:

Implementation

Now that we understand the problem and have a solution, we can write an EmailService class to achieve reliable email sending:

@Serviceclass TransactionalEmailService(    val outboxRepository: OutboxRepository,    val emailOutboxMapper: EmailOutboxMapper) {    @Transactional    override fun sendEmail(email: Email) {        this.outboxRepository.save(this.emailOutboxMapper.emailToOutbox(email))    }}

The service has only one function: sendEmail(). The function saves the Email object into the Outbox table as a database entry. It is annotated with @Transactional, allowing Spring to handle the database save operation as part of a transaction. Please note that the default propagation for @Transactional is required since we want the code in the sendEmail() function to run in the same transaction as the code in the function calling it (so both the original database operation and the save on the outbox repository happen in the same transaction). Using another propagation might result in creating a second transaction for sendEmail(), which would defeat the purpose. For more information please check the official documentation.

Scheduled task (Email Relay)

To fetch the Outbox table for new entries we decided to use a scheduled task. This task fetches batches (pages) of entries from the Outbox until there aren't any un-fetched entries. It is important to implement this way and to not only fetch one batch per execution since, if at any point a lot of messages enter the Outbox (if a newsletter has to be sent, for example), the scheduled task will not stop until it has tried to send every message, thus saving the time it would have had to wait between the scheduled runs.

We should also mention that in the Outbox table, besides the usual email fields, we also store:

the scheduled date to send the email, which serves two purposes: to send emails at a certain point in the future; and if sending fails, mark at which point a resend should be retried
number of tries: if an email fails too many times the service will stop trying to send it

To optimize our selects, we created a compound index on the Outbox table on the columns ID, Number of tries and Scheduled date.

A pitfall we have to watch out for is that we may have multiple instances of the scheduled task running at once. To make sure that the two tasks don't select the same batch of emails (which would result in sending the same emails multiple times) we can use a distributed lock, like ShedLock. Its purpose is to make sure only one instance of the scheduled task is running at a certain point in time. Another way, which is a bit more database management system specific is using the row locks FOR UPDATE SKIP LOCKED. This command is available for PostgreSQL, Oracle and others but not present for example in SQL Server and SQLite.

A pseudocode of our implementation would look something like this:

acquire distributed lockrepeat until no batches left {    batch = select relevant entries from Outbox table    for each email in batch {        try {            send(email)            // send - successful            delete email from Outbox table        }        catch {            // send - failed            email -> increase number of tries            email -> set scheduled send date sometime in future   // (time of retry)            update email in Outbox table               }    }}release distributed lock

Email Sender

There are multiple approaches to handling this step, and it is very project-specific. You might use an SMTP server or something like AWS SES. At this point, there isn't anything that can go wrong as long as you are careful to catch all exceptions in the Scheduled Task so that they are handled properly.

We created a custom AWS SES Sender class that implements JavaMailSender. This way we can easily switch between the default JavaMailSenderImpl SMTP implementation and AWS SES implementation just by changing a configuration. This is Spring Boot specific, but it should be pretty similar in other languages and frameworks. Our implementation looks something like this:

@ConditionalOnProperty(    value = ["email.sender"],    havingValue = "AWS",    matchIfMissing = false)@Componentclass AWSSESJavaMailSender(    @Autowired val sesClient: AmazonSimpleEmailService): JavaMailSender {    override fun send(mimeMessage: MimeMessage) {        try {            val outputStream = ByteArrayOutputStream()            mimeMessage.writeTo(outputStream)            val buf = ByteBuffer.wrap(outputStream.toByteArray())            val rawMessage = RawMessage(buf)            val rawEmailRequest = SendRawEmailRequest(rawMessage)            sesClient.sendRawEmail(rawEmailRequest)        }        catch (ex: Exception) {            throw MailSendException("Could not send email through AWS SES", ex)        }    }    override fun createMimeMessage(): MimeMessage {        return MimeMessage(Session.getDefaultInstance(Properties()))    }    // other functions' from  JavaMailSender implementation...}

Usage

The purpose of this project was to implement the pattern in an easily reusable way for our Spring Boot projects. We provided a way to make the pattern easy to use while keeping the code clean.

The sendEmail() function of the TransactionalEmailService can be called in any other transactional function:

@Serviceclass ExampleEmailServiceImpl(    val userRepo: UserRepo,    val emailService: TransactionalEmailService){    @Transactional    fun exampleUsage() {        this.userRepo.save(User(...))        val email = Email(...)        this.emailService.sendEmail(email)  }}

Note that both the exampleUsage() function and our sendEmail() function are annotated with @Transactional. Spring is smart enough to handle both database changes in a single transaction, fulfilling the ACID requirements.

For a more decoupled approach we can make use of event listeners:

@Serviceclass ExampleUserService(    val userRepo: UserRepo,    val eventPublisher: ApplicationEventPublisher){    @Transactional    fun exampleUsage() {        val user = userRepo.save(User(...))        eventPublisher.publish(UserCreatedEvent(user))    }}@Serviceclass UserEventListener(    val emailService: TransactionalEmailService) {    @Transactional    @EventListener    override fun handleUserCreatedEvent(event: UserCreatedEvent) {        val email = Email(...)        emailService.sendEmail(email);    }}

Conclusion

If right now you're thinking about that project that you worked on in the past and you didn't implement information sending in a truly reliable way, you're not alone. This is a very common mistake unfortunately, but we hope that through this article we were able to paint a clearer picture about this design pattern, why it is useful and how to implement it.

Microsoft Power Platform

Alex Ciosa — Mon, 25 Sep 2023 13:54:10 GMT

(Written in collaboration with Gerasimos Fousekis & Stefan Starke)

While our roots are firmly planted in traditional software development methodologies, our commitment to innovation and client satisfaction drives us to continuously explore new horizons. This leads us to consider various emerging platforms, like Microsoft Power Platform, as potential additions to our already extensive toolkit.
While this might seem a bit unconventional for a software development company, our motivation is rooted in a deeper understanding of the ever-evolving business landscape and the diverse needs of our clients.

"What was your goal?", you ask.

In brief:

Fast results
How much faster can we implement solutions when working in an environment with topics such as authentication, infrastructure and deployment, more or less out of the box?
Customer empowerment
Can the use of something like Power Platform help clients maintain their applications, run updates, and introduce new features with little to no prior coding knowledge?
Project diversity
Which projects do we see this approach as being beneficial, optimal even?
Are there any blockers that could prevent us from following this approach?

Overview

Power Platform Architecture

Microsoft Power Platform is a suite of integrated tools and services designed to assist individuals and organizations in creating custom business applications, automating workflows, analyzing data and generating reports.
It's comprised of four core components:

Power Apps & Power Pages
They allow users to create custom applications without extensive coding knowledge, to begin with. They offer canvas apps and model-driven apps: canvas apps enable building applications with a visual interface, while model-driven apps are more data-centric and built on the Common Data Service. Both interact with (business) data stored in the Dataverse with the main difference being that Power Pages is targeted at external users backed by the many authentication options (e.g. Azure AD, LinkedIn, Facebook) that are available out of the box.
Power Automate
Formerly known as Microsoft Flow, enables the automation of repetitive tasks and workflows across various applications and services. It connects to hundreds of apps and services, allowing users to create automated processes without much complexity (e.g. automated email sending).
Power BI
A powerful business analytics tool that allows users to visualize data, generate reports and build dynamic dashboards. It's particularly useful for data analysis and decision-making by turning raw data into actionable insights.
Power Virtual Agents
Allows users to create chatbots that can be used to engage with customers, provide support, answer queries and automate various interactions.

Aiming to fulfill the requirements mentioned previously, we ultimately focused on the use of Power Pages and Power Automate.

Pros & Cons

PROS	CONS
Smaller, data-driven applications can be clicked together really quickly	Generally useful features (i.e. manual code changes, template presets) are better avoided altogether due to limited usability
Out-of-the-box ecosystem (e.g. infrastructure, scaling, IP whitelisting etc.)	Frequently unreliable and lengthy loading times during development, especially when reloading the Designer
Built-in application lifecycle management (e.g. environments with pipelines)	Cost can become very high for projects with a larger audience
Built-in i18n capabilities	Changes performed directly on the tables inside the Dataverse (e.g. scheduled job manipulating some rows) can take up to 15 minutes to apply
	Limited monitoring capabilities and having no way to see where anonymous users come from (i.e. IP addresses)

Next, we'll share some useful insights gained along the way.

Pricing & Costs

First, a word of advice - make sure to understand the licensing and billing of the entire Power Platform ecosystem to avoid unpleasant and costly surprises. In our setup, the main cost driver (besides Dataverse storage) was the amount of monthly active users, both anonymous and authenticated. Prices vary depending on the chosen subscription plan, meaning either a package of a set number of users or the option of "pay as you go" (PAYG). Technically, we used a billing policy linked directly to an active Azure subscription.
That being said: be aware that current costs are not immediately visible and it can take up to 24 hours until they become available within Azure cost management.

"Why is there so much emphasis on pricing?", you might be asking.
Imagine a scenario where someone sets up a basic cron job that operates on the Dataverse every minute or so. A premium flow that can either run in the cloud or attended costs $0.60 per run, mind you.
We'll let you do the math and figure out how high the costs get when the cost management alerts start triggering, which - as explained above - can take up to 24 hours to be updated.
Spoiler alert: It's $864.
Use a Power Automate license in similar scenarios.

Speaking of monthly active users...

They are reported in a daily summary, downloadable within the Power Admin Portal.
Considering the uniqueness of an anonymous user being tracked using a browser cookie, we see a potential risk of unpredictable costs as a consequence. The claim is that malicious attacks, bots and crawlers are excluded from being counted, but we could not 100% confirm this statement.

TL;DR

Understanding the main use case for the application, with regards to target users, is highly advised to avoid precarious financial situations
Make sure to choose the optimal licensing model from the very beginning
As in all cloud projects: enable cost alerts from Day One!

Development

As with every new technology, there will be a learning curve. It's not particularly steep when it comes to the Power Platform itself, but one has to be wary that the workflow is not comparable to a 'traditional' software development process. Coupled with the documentation provided by Microsoft, which sadly tends to be a bit outdated at times, this can be a cause for sudden spikes in the overall learning curve.

Coming into this methodology, it took us some time to get used to the way collaborative work can be set up on a shared Power Pages environment. Because there's no implementation of version control, merge requests or individual commits, it took further coordination to avoid interfering with one another. This is easily manageable for smaller-scale projects, though not recommended for larger projects. In turn, we recommend a team size of no more than 3 people for optimal workflow.

One undisputable upside is that coming from a background of data-driven applications with CRUDL (Create-Read-Update-Delete-List), overall development time is crazy fast. It essentially boils down to "creating tables, views and forms (rinse and repeat)". Development is further sped up by the use of provided components for features, such as:

Login
User Registration
Authentication Providers

TL;DR

Ideal for small, low-complexity projects
Works best for smalled-sized teams (2-3 people max)
Close to no collaborative features, a developer would normally be accustomed to
Always use (and re-use) components whenever possible

Testing

We know how much developers love testing code (*sarcasm*). Good news: manual testing is pretty much your main 'go-to' approach in the Power Platform. For those interested in automated testing, we can use our default framework (Cypress), triggered by an external build.

Deployment

When it comes to deployments, we set up a pipeline that can be executed within the Power Platform, together with the typical pattern of cross-environment deployment: Dev Test Prod.
We have noticed that it is sufficient for the before-mentioned, smaller projects.
Of course, it is not as powerful as either Gitlab or TeamCity, but that would not be needed.

Scale, Expand & Maintain...ability

In terms of scalability, besides maybe investing more money into a broader user base or file storage capabilities, not much to add in this regard. If the final goal is to create smaller, lower complexity applications, we think the expandability options provided should suffice (an API for CRUDL can be used, as well as Azure Functions with an HTTP trigger, for more complex logic).

Similarly, with the premise of a smaller application, maintainability options offered should again suffice for most projects.

i18n ("Internationalization")

i18 support is provided on the platform. There's a reasonable list of available languages that can be activated for every given page, the pattern used is a 'key-value'-type approach: language translation.
Except for a few components that are translated automatically (i.e. login), translations must be provided contextually on each page, for each desired language. On the flip side, there's the option of providing a resource file for said translations (that being said, we did not delve into this approach enough to confirm its utility).

Lessons Learned

For some, Microsoft's Power Platform might seem to only thrive in specific scenarios (think one-trick pony). We believe that it has earned its place in the ecosystem of modern technology stacks, and when it comes to:

Rapid Fire MVPs
Purely data-driven projects
"Set it and forget it"-type projects
Pre-defined user bases
Minimal UI
Pre-existing Microsoft ecosystem integration
"Developing without developers" (Welcome to 2023)

there is close to no competition. We say "give it a try".
It might end up being the solution you were looking for.

Rapid Development with Strapi and Vue.js

Andrei Cotor — Fri, 15 Sep 2023 11:49:41 GMT

Introduction

As programmers, we often revel in the most technical-complete solutions and prefer writing our services from the ground up. That is both a blessing and a curse: the control, efficiency, and power of writing everything yourself comes at the cost of speed in development. As part of our Cloudflight technical lab, we decided to explore different solutions that help in delivering software faster - rapid development. This is specifically useful when our teams are asked to deliver a functional MVP in a very short time.

If you want to learn more about the challenge please read the introduction article first.

Tech stack

Strapi

Strapi is the biggest selling point of our solution. It is a powerful, open-source Node.js and TypeScript Content Management System (CMS). At its core, it is a shortcut for creating REST or GraphQL APIs, replacing the need to write all the backend code yourself. The CMS part, Content Management System, basically means that Strapi provides an admin dashboard running on a web application where you can view your database data, create new API endpoints, and build your server using an intuitive GUI.

Strapi is the leading open-source headless CMS. Its 100% Javascript, fully customizable, and developer-first. Strapi

Diving deeper inside the inner workings of Strapi, we found out that it uses Koa as a web framework and Bookshelf.js, which is powered by Knex, as an Object Relational Mapping (ORM). On top of these two Node.js libraries a "Strapi framework" was built, which can be used by the autogenerated code, or programmatically in case you don't want or can't use the dashboard generator for a specific case. This framework provides generic Controllers, Services to handle CRUD requests, and a generic Entity Service to handle database operations, like find, create, update, and delete. This makes it very easy to customize Strapi, and also for the dashboard to generate the code.

Some nice-to-have feature of the generated code is the possibility to add complex filtering and join operations right in the requests. The format of the parameters gets converted by the Entity Service into a Knex ORM query, which then translates into SQL. Another great thing is out-of-the-box pagination.

Customizing Strapi is generally very facile. Most of the time the only thing that the developers need to do is add some extra functions, inside an object, passed as a parameter to the constructors of the Controllers or Services. You can use them for custom validations or business logic, and they have access to the original Controllers generated for that entity, and all the Services and Entity Service, so developers don't have to reinvent the wheel. For more complex use cases custom endpoints and database queries can be written.

import {factories} from '@strapi/strapi';export default factories.createCoreController('api::reservation.reservation', ({strapi}) => ({    // ...    async find(ctx) {        const {data, meta} = await super.find(ctx);        // validation for admin users        if (ctx.state.user.role.name === 'Admin') {            return {data, meta};        }        // custom validation for regular users        const userId: number = ctx.state.user.id;        if (data.some((el) => el.attributes.users_permissions_user.data.id !== userId)) {            return ctx.forbidden('Data not created by this user', {});        }        return {data, meta};    },    // ...}));

This code is essentially creating a controller for a reservation API endpoint, with special handling for admin users and regular users. Admins can access reservation data without restrictions, while regular users can only access data that they have created. If they try to access data created by other users, they receive a "forbidden" response.

None of us had any experience working with Strapi or anything remotely similar before. As such, we used pair programming in the first week of working together. It proved to be an excellent use case of our time, as we all collaborated in understanding how to work with Strapi. Once we all got used to the overall architecture of our app, we only pair programmed on the important stuff. In the end, we believe it is very easy and intuitive to use, as a couple of days of research were enough to use Strapi to its full potential, as a self-sufficient complex backend service.

Plugins

The true strength of Strapi lies in its plugins, which can be added free of charge from Strapi's market (built into the admin dashboard). They add additional functionalities to Strapi, extending its capabilities with no additional code. Some of the most useful plugins we have discovered:

Documentation: generates an OpenAPI document for all of Strapi's endpoints that can either be opened on the browser or be used to generate code down the line. Add this plugin to list all available endpoints and see how to properly make requests to Strapi.
Config Sync: allows programmers to share Strapi settings between environments, like access rights to different operations based on roles, either from the CLI or the GUI. This plugin is a must if multiple people are working on the project, or if you want to deploy Strapi.

Other plugins that were preinstalled in our initial Strapi project:

Content Type Builder: Add new data tables in your database from the Strapi GUI. This is what allows you to build new entities and CRUDs. In Strapi, entities can be Collection Types or Single Types. The difference between them is that Single Types only allow for a single value, acting like singletons, while Collection Types are regular tables that support multiple rows.
Content Manager: Quick, code-less way to see, edit, and delete the data in your database. This may also allow the owner of the application to manage content without redeployment.
Email: Configures the application to send emails. This plugin helps you format your emails and send them further to 3rd party providers.
Media Library: Load images and use them in your API - you can easily store multiple image types and use them on your website.
Roles & Permissions: JWT-based API security and user management system. This adds the User data type in your application, which you can use for authentication purposes. It supports multiple providers and multiple security roles.
Internationalization: Adds the ability to create new locales and set up i18n for your API. Having this plugin allows the owner of the app to localize content without redeployment. The API will return the content for the right locale based on your API request parameters.

Performance

We load-tested Strapi's create reservation endpoint using Apache JMeter to have a reliable, reproducible result. We used a variable number of users, increasing it over time. We haven't experienced any performance issues so far.

Mail Sender

Strapi has a built-in email plugin, but it doesn't automatically send the emails, it just formats them and forwards them to certain providers. There are some providers (node modules) built in, but they are either 3rd party platforms, or they are bare bones (such as nodemailer) so, if you want to handle failures yourself and retry after some time, or you don't want an external service to process your emails, you need to create your own provider. Providers are node modules that extend the functionality of a Strapi plugin.

Since we decided to implement email sending using RabbitMQ and a microservice, our provider pushes the formatted email that it receives from the Strapi email plugin to the specific RabbitMQ exchange. Then, this exchange sends it to an email queue, where it will be read by our email processing microservice.

To reliably send emails we decided to set up RabbitMQ so that we have an email queue, a retry queue, and an error queue. In the email queue, we have the emails that need to be processed by our microservice. If, when trying to send the email an error occurs, the email sender will post the message inside the retry queue if the number of trials to send the specific message is smaller than a set constant. Otherwise, it is going to send it to the error queue. In the retry queue, emails remain for a specific amount of time (they have a time to live), then they get discarded and sent back to the email queue.

This approach guarantees that in most cases our service is going to be able to send the emails, and if not, they will remain in the error queue.

VueJS

All webpages are implemented in VueJS using Composition API. Using our previous experience in Angular and React made VueJS easy to pick up. The official documentation and tutorial were very helpful. To speed up development, we have implemented generic components for all views of our app: a generic form using Vuelidate for validation, a generic table with edit and delete actions, and a generic object details card. Using Pinia, it was also very easy to implement reactive stores.

We decided to use the free version of CoreUI as the main components library. The date picker component of CoreUI is locked behind a paywall, and we struggled at first to find a reliable date picker component for our reservations view. We settled on using the date picker of PrimeVue.

OpenAPI generator

Most web developers have seen the OpenAPI specification before, as it is what Swagger tools use. Naturally, we decided to create this specification for Strapi. Upon adding the Documentation plugin to our project, we gained access to a .json file containing the full list of our endpoints in OpenAPI specification. This .json file is generated every time we start the Strapi development server. As such, we can open this file using Swagger to view and better understand how to use the APIs provided by Strapi. This plugin allows for customization of the resulting specification file, like the ability to include or exclude properties or endpoints.

Ultimately, we used the OpenAPI specification of our API to also automatically generate REST services for the frontend that can communicate with Strapi. The tool we used generates Typescript files which we add to our VueJS project. This was the biggest timesaver in our frontend development. Usually, writing the frontend REST services yourself is a redundant task that should be automatized based on your API of choice's specifications.

Postgres

PostgreSQL is the official database system recommendation in the setup guide, but multiple database engines are supported. We have used Postgres due to personal preference and convenience.

Spring boot - where is it?

The initial tech stack included a self-managed Spring boot application, acting as a man in the middle between the VueJS frontend and the Strapi backend. We removed this layer of complexity in the final solution because Strapi was sufficient enough to secure endpoints and capable enough of adding our app's business logic. Thus, adding Spring would have been redundant and would have just added extra complexity, another possible bottleneck, and would have taken more time to develop. Of course, for larger applications there might be requirements that Strapi might not be suitable for, so a Spring backend might be required as well.

Pros/cons discussions

This is an overview of using Strapi for our backend services:

Pros

Facile implementation of CRUDs for models with simple logic
Easy to use: using the GUI for boilerplate code is very convenient
Free to use for any project, if managed on your own premises
Can be extended with custom implementations (you're not limited by the auto-generated code)
Comes with multiple authentication providers out of the box
Offers a lot of plugins, both official and community-made
Multiple SQL database types are supported
Applications may be both vertically and horizontally scaled

Cons

Less time-efficient than a self-managed service
Lack of documentation - debugging can be slow
Unexpected crashes for the CMS /admin dashboard while editing the schemas of the collections (not a problem for production - modifying the schemas is disabled in production)
Incomplete migrations support - migrations run before schemas are updated since schemas are managed by Strapi and migrations by Knex
Virtually no validations support - need to do custom endpoints
Does not support MongoDB natively

Lessons Learned

There is no tech stack right for every project, but for sure there is a best tool for the job. In our case, considering the functional requirements and the deadline, Strapi did a great job. Using a monolithic architecture and built-in Strapi features was the fastest way for us to deliver the MVP in the given period. We found out about Strapi that it is a very powerful and versatile way of building backend services and using it was a satisfying experience. It is also both database and frontend agnostic. We definitely will consider it in the future for new projects, either as a stand-alone, self-sufficient server or as a microservice, since it is more than suitable for delivering MVPs or for creating a reliable microservice in larger projects. It is capable of reducing development time and it is quite easy to pick up. However, for the best experience, we recommend installing the plugins of your Strapi project immediately upon setup, to prevent possible merge conflicts and other issues during collaboration with your team members.

Please note that there are also small caveats, like the one that we learned about when we stumbled upon difficulties as we tried to introduce complex logic and validations. For instance, we had 2 entities with a one-to-many relationship between them, but found out that Strapi did not support cascade delete in the default API implementation (see feature request). As such, we had to add a database-level trigger to delete the ends of the one-to-many relationship.

Another lesson we learned is that Strapi can scale. If you want to scale your application horizontally, you may use a load balancer like Nginx. For additional optimization, you will want to implement caching between sessions. This can be achieved by using the Rest Cache Strapi plugin with a self-managed Redis server.

Also, we find VueJS to be a technology worth using in the frontend frameworks ecosystem. Previous experience with other component-based frameworks is a plus, but the official documentation and forums are a great way to start. As an event-driven architecture, it is intuitive to learn and use.

Let's roll the credits (in alphabetical order) and wrap up this chapter in our software dev diary: Andrei Cotor, Daniel Todac, and Gergely-Pter Mtys contributed to this exploration. Remember, the code may compile, but the journey is what truly matters. Stay curious, stay coding, and let's keep pushing the boundaries of what's possible, one line of code at a time. Until next time, happy coding, my fellow developers!

Navigating Efficient Web Application Development: Cloudflight's Architectural Insights

Stefan Starke — Tue, 12 Sep 2023 06:45:09 GMT

In the fast-evolving realm of digital solutions, the efficacy and agility of web application development have emerged as critical determinants of success. Within this landscape, we want to find out what the best architectural choice is.

In this insightful blog post series, we embark on a captivating exploration of diverse web application architectural approaches, dissecting each and evaluating how they align with the goals of efficiency, innovation, and competitive advantage. Whether you're a seasoned developer aiming to stay abreast of the latest trends or a business leader seeking strategies for informed decision-making, this journey will provide invaluable insights to help you select the optimal architectural approach for your unique projects.

Architectural Approaches

Let's see the key architectural approaches that we selected for this journey:

Strapi with Spring Boot Backend and Vue.js Frontend: The elegance of Strapi, a powerful headless CMS, in tandem with the robust Spring Boot backend and Vue.js frontend. We assume that this combination creates a dynamic ecosystem that streamlines content management and backend operations with Vue.js as our go-to choice for a seamless, responsive and interactive user experience.
No-Code Power Platform Suite: Recognizing the demand for rapid development, we look at the power of no-code platforms like Microsoft's Power Platform Suite. This approach empowers businesses to create applications with minimal coding thereby accelerating development, enhancing agility, and lowering entry barriers.
React Admin Frontend with Node.js Backend: Pairing React Admin as a frontend framework with a clean Node.js backend. This approach fosters modular, efficient, and fast user interfaces for admin panels, while Node.js provides a scalable and event-driven foundation for backend operations.

Goals

As we progress through this exploration, we'll delve into how these architectural approaches contribute to these core goals:

Expediting Development Speed: Swift application development across diverse projects. By leveraging proven frameworks and architectural patterns, we want to minimize time-to-market, ensuring applications are ready to meet the ever-evolving digital landscape.
Efficiency for Diverse Projects: The ability to efficiently cater to projects of varying sizes, from MVPs to intricate prototypes. This approach enhances competitiveness by delivering cost-effective solutions promptly without compromising quality, security, or reliability.
Eradicating Redundancy: Abolishing redundant efforts. By advocating the use of reusable components and standardized modules, we want to transform each project into a building block of innovation, fostering a cumulative repository of best practices.

A simple real-world scenario

Every good analysis needs real-world challenges to tackle in order not to become a theoretical exercise. Therefore we set ourselves the goal of developing an application that fulfills the following minimal requirements to compare the different architectural approaches.

Develop a resource reservation system applicable to diverse scenarios. This system will enable users to reserve specific resources within various categories for a configurable duration (initially set to 30 minutes).
Key Requirements:
Informative Website: The system will include a website that offers general information and allows users to customize content on two distinct pages without requiring technical expertise. The third page, accessible post-login, will provide specialized details.
Resource Reservation: Once users are logged in, they can explore available resources across different categories and make reservations for a defined duration. The data model encompasses User (ID, Username, Password, Email), Resource Category (ID, Category Name), and Reserved Resource (ID, Category ID, Number, Reserved Until).
Scheduled Task for Reservation Cleanup: An automated task will periodically clear reserved resources once the specified time duration has passed.
Admin Functionality:
Administrators have additional privileges:
Resource and Category Management: Admins can establish new resource categories and related resources, contributing to the efficient organization of available resources.
User Oversight: Admins can create new users, both with administrative and non-administrative roles. This functionality simplifies user access and role assignment within the system.
A user-friendly interface, resource reservation capabilities, and administrative tools are key requirements for effective resource management. Users will interact with informative content, reserve resources across categories, and benefit from automatic reservation clearance. Admins can create and manage resource categories, resources, and users, providing a comprehensive solution for resource reservation and administration.
Additional (optional) requirements:
Integration of OIDC (OpenID Connect): The system shall integrate OpenID Connect, with the possibility of utilizing a service like Keycloak. This integration shall ensure secure user authentication and authorization, enhancing the overall security of the system.
Integration of Payment Provider: The project shall incorporate an integration with a payment provider. This integration shall facilitate seamless transactions for users making payments for their reserved resources.
Mail Sending: The system shall be equipped with the capability to send emails. It shall be responsible for notifying users about various events such as successful reservations, upcoming reservation expirations, and other relevant communications, ensuring efficient user engagement.
Responsiveness (Desktop + Mobile): The website's design and implementation shall prioritize responsiveness. The user interface shall be adaptable and user-friendly across a range of devices, including both desktop and mobile platforms.
Internationalization (i18n): The project shall incorporate internationalization (i18n) support. The system shall be designed to accommodate various languages and locales, thereby promoting accessibility and usability for a diverse user base.

As our journey unfolds, we'll uncover how each architectural approach aligns with these overarching goals.

Stay tuned!

Test automation: API-based model

Jovan Ilić — Tue, 13 Jun 2023 06:50:49 GMT

The King Is Dead, Long Live The King

As technology evolves, industries are constantly updating their practices to keep up with the latest advancements. Old, although widely used technologies do have a shelf-life, thus at some point need to be exchanged for new, better-suited ones. An example of this is the use of the Page Object Model (POM) in test automation.

While POM has been an industry standard for years, it may not fit modern testing needs best. In this article, we will explore the shortcomings of POM and suggest an alternative approach, the API-based model.

What is Page Object Model (POM)?

The POM is a battle-proven and go-to industry-standard design pattern used in test automation to organize and maintain a test codebase for applications. Check out this article which encourages this as a best practice with a bunch of code examples. Or check this article with some visuals on how POM works.

All in all, POM involves creating page classes corresponding to the application's pages that contain all the different objects within those pages, aka WebElements, such as buttons, links, and forms. Those pages and WebElements are then used in test scripts to interact with the application under test.

Let's illustrate how POM works, in a stand-alone version, without any supporting tools:

An abstracted contract signing and printing test case examples with simple steps were used to paint the picture (Login and Create new contract test cases were purposely omitted, they would behave in the same manner).

From the picture above you can see how POM acts as a proxy to the application creating a single point of entry for any actions performed throughout the testing. You should be able to spot a thick red line that highlights how POM relies heavily on the application's UI to do any test-related activities. Disclaimer: This is the main problem that we will challenge here. Here's an article from Cypress's blog that shares this opinion and provides another alternative to POM.

On another hand, countless online articles describe and praise POM, and they are not wrong. POM has many benefits and a few drawbacks, and let's face it - using any design pattern is better than no pattern, right?

Let's take a look at these benefits:

Test maintenance - when the application changes, any impacted tests must also be updated. POM gives you a single point of access, as all tests will share the same page elements. Fixing these elements is far easier than fixing all the affected tests.
Code abstraction - POM creates a clear separation between the test method and step implementation layers allowing you to write clear test methods with steps like loginPage.setUsername("John") or loginPage.clickLoginButton(). All the technical details go into the step implementation (the POM) layer. For concrete examples, you may refer to this article.

What about the downsides of POM?

Even though POM has been widely used in the industry for a long time, it has some limitations that make it less suitable for modern testing needs, especially at scale. They force you to introduce additional tools, such as database scripts and yes, the API as well (we'll talk about this hybrid model below).

Here are a few problems that come together with POM :

POM is time-consuming to maintain - this model adds another independent layer to your testing framework that requires maintenance. This layer serves only one purpose, and creating and maintaining objects for every element on a web page is a time-consuming process. As web pages become more complex and dynamic, it can be quite challenging to keep up with changes and updates. This results in a codebase that is difficult to maintain and can slow down the testing process.
POM is time-consuming to execute - this model relies on the UI to perform any test actions, which can lead to very long test runs. Especially if the tests are started from the (almost) clean database and dont rely on any pre-existing test data. Autonomous tests that dont rely on prepared databases are next to impossible at scale.
POM is brittle in execution - every test will need to perform each action in the UI, which usually includes logging in, navigation, preparation, etc. Loading every page entirely and interacting with so many elements is bound to randomly fail at some point. This leads to flaky tests. This leads to test reruns, and... you get the picture.
POM has database requirements overhead - any large project using POM will require a significant amount of time to either prepare the database snapshot with necessary test data, prepare the database queries that will populate the database before each test run, or even resort to teardowns (which is a test automation anti-pattern by the way). Some will simply choose to rely on older, pre-existing test data and perform no cleanups whatsoever. All of these are setting up a path with many issues down the road. On the other hand, most projects will likely leverage API to populate the test data on the fly and, you've guessed it, these are baby steps toward an API-based model and strong steps toward the hybrid model.

Before we continue, lets take a moment to appreciate one of the cornerstones of proper test automation: autonomous tests.

POM falls short in this area and is simply not enough on its own to foster true autonomous tests. Projects will have to rely on the API or some other solution to make this possible. Soon enough, projects end up with a hybrid mode of two layers which both require effort to maintain. In my opinion, this is a no-go.

The Alternative: API-based Model

Another approach to test automation is the API-based model. Although it's nothing ground-breaking or fresh on the market, if used and utilized properly it renders POM useless. Here is the golden rule of thumb for this model:

Perform any action in the UI once, every other time use the API

Following this rule, this is how a cypress test case example would look like:

it('Sign contract', () => {     cy.login();     const contractData = { name: 'my contract' };     cy.createNewContract(contractData).then(contractId => {         cy.visit('/contract/' + contractId);         cy.contains('Sign contract').click();         cy.contains('Contract is signed').should('be.visible');     }); });it('Print signed contract', () => {     cy.login();     const contractData = { name: 'my contract' };     cy.createNewContract(contractData).then(contractId => {         cy.signContract(contractId);         cy.visit('/contract/' + contractId);         cy.contains('Print contract').click();         cy.contains('Print successful').should('be.visible');     }); });

Notice how the sign contract action is only done through the UI only in the first test case? In the second test, the custom command was used that does this through the API.

Let's take a look at how those API commands would look in the commands.ts file (the API layer):

Cypress.Commands.add('login', () => {     const userData = { username: 'myUser', password: 'myPassword' };    cy.request({         method: 'POST',         url: 'api/login',         body: userData     });});Cypress.Commands.add('createNewContract', contractData => {    cy.request({         method: 'PUT',        url: 'api/contract/create',        body: contractData    }).then(response => response.body.id); // response.body.id will be returned as a result of this function});Cypress.Commands.add('signContract', contractId => {     cy.request({         method: 'GET',         url: 'api/contract/sign/' + contractId     }); });

This illustrates how an API-based model would work under the hood:

Comparing this image to the previous one, it's obvious how the heavy work is delegated to the application's API layer, while only the core of the test is done over the UI. This utilizes the application strengths more appropriately.

Let's analyze the above-listed benefits of POM, this time in the API-based context:

Test maintenance - respecting the golden rule will provide you with a setup that, just like POM, enables you to fix the problem in only one place.
Code abstraction - step implementation (the API) layer can be as abstracted as desired, and the same goes for the test methods layer. Add all the abstractions you like, whether using utility classes or functions.

Now let's compare the above-listed problems of the POM, and see if they are the same for this model:

API-based model IS time-consuming to maintain - this model also introduces an additional layer to maintain. Unfortunately, there's no way around that. There's one plus side however, the API layer can be reused for API testing, and vice versa. This enables you to save time and effort in test creation and maintenance. Additionally, it allows for a more holistic testing approach, where the UI and API can be tested with the same tool to ensure better quality, test coverage, and a more efficient testing process overall.
API-based model is NOT time-consuming to execute - because this model interacts with the application's API directly, it creates test data in a matter of (milli)seconds. It's blazingly fast. This enables the creation of necessary test data on the fly and makes the tests truly autonomous. Additionally, rerunning complex test cases which rely on a lot of test data is as simple as it gets, and maintaining them is quite straightforward.
API-based model is NOT brittle in execution - During test preparation, this model doesnt rely on a browser to fully render a web page, it doesn't need to load additional resources (like images, JavaScript, or CSS files), nor does it need to interact with the page in any sense, chances for something going wrong are drastically lower, making it extremely robust. Any frontend issues in the application won't stop your other tests from executing. The brittleness of the page is only contained within the core of the test case.
API-based model does NOT have database requirements overhead - unlike POM, using this model you are free to choose if you are going to start with a clean database and build up all the test data on the fly or use some sort of prepared environment. This model won't stand in your way.

And finally, let's mention some drawbacks that this model has, that POM doesn't:

Test data format - The biggest flaw of this model is that all test data for test preparation has to correspond to the existing API specifications. This can be unintelligible and difficult to comprehend or write, mainly depending on how well the API server was written by developers or how familiar with JSON syntax you are.
Mobile testing - this model would only be suitable for web applications, which use a standardized REST interface. Mobile testing, especially when it comes to native apps, should still fall back to POM.
Client-heavy applications - some client-heavy applications that do most of the work on the client side are not suitable for this model, as you cannot use the API to prepare the environment upfront.
Online resources - there are no (or at least I couldn't find any) other resources online to confirm the claim that this model is indeed better suited than POM for testing the standard web applications.

Conclusion

In environments where projects scale fast, and test frameworks grow exponentially, where the shift-left paradigm moves the testing efforts ever-so early in the development process, the time has come to challenge the old ways, to give a chance to new, modern solutions.

The API-based model provides a more versatile, efficient, and robust approach suitable for multiple testing needs. As technology continues to evolve, it's important to adapt testing practices to ensure that they remain effective and efficient, to make the testing and thus your (work) life easier.

Use the next opportunity to try out this model, and let me know how it worked out for you.

Happy testing!

Idempotency

Valentin Dreismann — Fri, 02 Jun 2023 10:25:13 GMT

Idempotency as a concept is a key aspect of modern robust and easily recoverable systems. As such, it is important whether or not the workload under scrutiny is running in a Cloud environment or not. However, due to traits inherent to public Cloud providers, idempotency has become much more important for correctly functioning systems than it used to in on-premises systems. Hence, this article aims to shed some light on why idempotency became so important and how to implement it correctly. We are not going to reiterate documentation that is already readily available; instead, we are going to highlight the why and how on a conceptual level. After reading this article, you will know how to properly design for idempotency.

Note how many aspects covered herein are only going to occur when operating at scale. This is for technical reasons which we shall elaborate on later on. In particular, developers are unlikely to encounter the pitfalls mentioned herein when drawing on demo applications from Cloud providers. Hence, programs that are functioning just fine at small scale are going to break when put under actual load. This makes it even more important to upskill developers from the very beginning to avoid unobvious problems down the line.

The concepts presented herein will be presented with the well-known AWS Powertools for Python, but apply to other language stacks just as well.

Motivation

By definition, an operation is idempotent if its repeated execution yields the very same result. Among others, this is helpful in the following scenarios:

Deduplication
If a certain process executes based on some form of queue or bus, then idempotency ensures that duplicated messages do not lead to duplicated actions. As such, it deduplicates messages.
Error Recovery
Consider a workflow consisting of steps A -> B -> C. where A executed successfully, B failed and C was not executed due to B's failure. Then an engineer might investigate and fix the root problem which caused Bto fail in the first place. After this fix is in place, the engineer still must ensure that the system returns to a desired state; this will usually require the entire workflow to have run successfully. If idempotency is not in place, then the engineer needs to devise a strategy on how to recover from the failed workflow. This could entail
- rolling back actions of A and B and then executing from scratch
- rolling back B and then restarting from B -> C

In all cases, this is a manual process which requires investigation and effort on the engineering side, thus substantially increasing response times for production problems. Note that partial workflow execution may not be well supported by major providers, first and foremost AWS StepFunctions does not currently support this.

If idempotency is in place then the engineer fixes the underlying root problem and then simply restarts the workflow from scratch.

Explainability
Another, rather subtle advantage, is the fact that the overall system becomes much easier to reason about. This is because idempotency forces developers to cleanly separate input, actions and output, which results in a cleaner design. We will show further down that this is not a design principle that people should follow, but rather a hard technical requirement.

Why Idempotency is now more relevant than ever

All of the aforementioned points apply to on-premises and Cloud systems alike. Now we shall briefly explain why idempotency is of particular importance in the Cloud. Due to their design, many Cloud services are prone to produce duplicates. Take the simplest possible Cloud example you could think of: subscribing a serverless function to a blob store event (in AWS parlance: S3 -> Lambda). This setup is generally going to produce duplicates, as S3 events are only guarantee to be delivered at-least-once. So if you subscribe a Lambda to S3 and have not accounted for duplicates, then you are already wrong. The underlying reason is that it is in fact very costly to ensure once-only delivery in distributed systems, hence the Cloud providers default to a better performing approach. However, this problem is not confined to the underlying platform, but also to project-level resources.

Consider scaling. For the sake of this example, we are going to assume ECS containers provisioned via Fargate. Assume that we have a container that scans a database for changes and pushes discovered changes to an event bus for downstream systems to react to. Now, assume further that a second container comes online. This is generally going to happen when a health check fails, when a new version is rolled out (to ensure business continuity) or for autoscaling. Now both instances scour the database for changes and are going to simultaneously publish the very same changes until one of them shuts down. This leads to duplicated messages in the event bus, unless special measures are taken. In this example, you can freely exchange ECS with EC2, EKS or any other relevant service.

The increased relevance is due to the fact that the Cloud is built to scale and the individual components are inherently decoupled. Where - in the past - you had a small number of servers with monolithic programs, it was in fact very hard to scale, so this problem would typically be solved on thread-level on the same machine. Difficulty in scaling made the problem less pressing, simply because there was less scaling. Nowadays, however, ideally each component in an architecture is able to scale and replicate individually, irrespective of the other components' behaviour. This means that there is a lot more room for error due to concurrency. Worse, in managed environments it is not always clear how and when components are going to scale. This brings us back to the aforementioned point; if we put a lot of load on the underlying S3 infrastructure, then it is more likely to scale, hence increasing likelihood for duplicated events in the first place. In short: a lack of idempotency is going to hit you when you start running solutions in production.

The need for a dedicated idempotency layer

Before diving deeper into specific aspects, let us shortly argue why a dedicated layer is needed. Note how many services will have built-in support for deduplication; that includes queue services like SQS on AWS, but even more fully-fledged solutions like Azure Durable Functions.

the way and scope of deduplication varies across services
In particular, you cannot assume that all services in your stack are going to support deduplication. Good examples that are likely never going to support this are databases like RDS. Note how even there is no way to make S3 itself idempotent; writing several times to S3 is always going to produce several events by design - these would need to be deduplicated by downstream systems like an event bus.
deduplication support is typically windowed
The time windows possible very across services and are mostly tightly limited, from minutes to a maximum of several hours.
If your engineers need more time to fix a problem, you are going to be in serious trouble. There is typically no way to preserve the current state for longer.
no way to inspect current state
Depending on the problem at hand, you may want to inspect the current state of an (idempotent) operation and/or alter it to fix production problems. In particular, you might want to remove an item to restart an action that would otherwise be skipped. This is not possible in managed deduplication stores.
Proprietary lock-in solutions
One of these is definitely Azure Durable Functions. These try to solve the problem for you, but in return require you to overhaul your entire code, designing within the confines of whatever your Cloud provider deems good. In particular, the invoker needs to know whether or not a function is idempotent, hence seriously coupling components.

Having a common idempotency layer enables you to work in a service-agnostic way while granting you full visibility and the ability to intervene if necessary. This keeps response times low and leads to fewer problems that escalate.

Implementing idempotency

Using AWS Powertools, idempotency is a piece of cake. AWS Powertools is generally available for major language stacks like Java, .NET and Python; as it has exquisite documentation available, we only present the most important points here. In particular, to make a function idempotent, all you need to do is:

choose and/or compute an idempotency key
Each idempotency key corresponds to a unique set of parameters. Given the same parameters - and hence the same idempotency key - multiple executions of the function yield the very same result. We also say the function is idempotent with respect to a given idempotency key. This key is either given as a string or computed as a hash over a provided data structure.
choose one or several idempotency backends.
Powertools supports stacking backends s.t. the quickest backend is consulted first. A typical chain would be inMemoryCache -> DynamoDB . For pipelines with much traffic, consider injecting a DynamoDB DAX instance or - cloud-agnostic - a Redis instance in between.
choose a time window
By default, both Powertools and DynamoDB impose a TTL. This is a sensible default; evaluate for how long you want to keep this data. Note that you can always turn off item retirement to give engineers more time to fix production problems. DynamoDB is going to catch-up when you re-enable cleansing. Also note that cleansing is typically asynchronous, so that there is no guarantee of immediate retirement upon expiry.

Once you have these decisions in place, you can simply annotate your function with one of the provided annotators - and voil, it is idempotent with respect to input:

@idempotent_function(data_keyword_argument="idempotency_key", persistence_store=persistence_layer, config=config)def handler(record: SQSMessage, idempotency_key: SomeIdentifier):    # run some intricate logic ...    # [...]    return someResult

Note is is advisable to wrap idempotent_functionin your own decorator to retrieve persistence_storeand config from a Singleton or Identity pattern. This helps you tot keep your code DRY.

Note AWS Powertools enables you to compute the idempotency key in a variety of ways. In particular, you can leverage JSONPATH to dynamically construct new objects for usage as an idempotency key from the set of input parameters. This enables you to keep the caller agnostic of the callee being idempotent.

The overall flow - right from the official documentation - looks like this:

Requirements for Idempotency

The aim of this article is to empower you to design for idempotency. In particular, you need to know how to design systems s.t. they can easily be made idempotent if needed. It can be prohibitively expensive to refactor an existing codebase for idempotency. Given the emerging need as outlined above, it absolutely makes sense to design for idempotency in the first place, irrespective of an immediate need.

Idempotent functions must abide by these principles:

their payload must be serialisable to compute the idempotency key
their return value must be serialisable as the original execution context may not be available when a function is continued
the function must not have side effects owing to the original execution context not necessarily being available

Serialisability of return value
Consider the example from above: A -> B -> C. Further assume that B depends on the output of A and C on the output of B. Then, if B fails during the first execution and the engineer decides to recover by rerunning the whole workflow, then A will be skipped during the next execution. However, B is still executed and it depends on the return value of A. Hence, A's return value is persisted by the idempotency layer and passed to B upon the second execution. This typically also means that any idempotent function's return value is limited in size. Note how most idempotency stores like DynamoDB and Redis have strict limits on their item sizes.

Side effects
In the above scenario, if A, B share an execution environment (for instance: a process), any changes that Amay have made to the environment are not necessarily available to B during the second workflow execution, as B may run in a different execution environment (for instance: a process) than A. In particular, this forbids any passing of reference-holding objects, specifically (local) file resources, streams, ORM-mapped instances etc.

If all of these three requirements are met, implementing idempotency (retrospectively) becomes very straightforward. Note how these requirements enforce a clean separation between functions, hence enforcing a high level of decoupling. This also means that the mere design for idempotency increases code quality.

Choice of idempotency keys

Let us note a few important choices regarding idempotency keys.

Environments Typically, you are going to run several development environments within the same account or subscription. Depending on the cloud platform, the exact method of separation is going to differ. Still, different environments must not interfere with each other, i.e. the environment identifier should generally be part of the idempotency key.

Business Keys
Generally speaking, it is advisable to rely on business keys for idempotency key creation as much as possible. The reason is that the outcome is typically idempotent with respect to the input data, which - in turn - is typically mostly defined by its business keys. This simplifies the idempotency key construction significantly, as business keys are not subject to the behaviour of individual services in the pipeline which may or may not pass values through or regenerate them when they retry.

To give an example that is easy to grasp: consider processing a payment for a certain customer in a queue. Then, a unique paymentID (GUID) that is generated by the system is a business key. In contrast, an SQS message id that happens to be generated when that payment is flowing through an SQS queue is a technical key. Technical keys change as the flow of the data changes, when processing is retried and so on and so forth; business keys do not.

Runs
Ideally, actions are executed once and only once. However, in reality, not all downstream systems may be as well-designed as the solution you are currently building. Hence, it may be necessary for you to rerun bits of your workflow even if there is absolutely not your system at fault. To this end, it may be helpful to introduce the notion of runs.
Normally, there is only one run per business key combination, let's say run 0. If you ever need to rerun that computation, you can simply rerun it by parameterising the run with 1 ; this way, there is no need to artificially work around the idempotency mechanisms, but instead work with them.

Identifiers from upstream systems
As pointed out above, if possible, it is advisable to always rely on business keys. However, that may not be possible in all circumstances, in particular if the source does not provide stable business keys. In this case, it is advisable to rely on automatically generated, unique identifiers generated by the Cloud Platform. In this case, one can at least guarantee that the system works correctly given a certain starting point which is assumed to be reliable.
So, for instance, if we can assume that a certain message (identified by some business keys unknown to us) arrives only once to a certain SQS queue, and we generate the idempotency key from the unique SQS message identifier generated by AWS, then we can guarantee that our system works correctly from this point. Note that even in this case you may want to incorporate additional information like environments or runs (outlined above), so just relying on the automatically generated id may not be sufficient. This is often necessary when implementing fan-out.

Observability
Idempotency input is hashed into a string to produce a (hopefully unique) idempotency key. Henceforth, this one is used to ensure idempotent execution from a programmatic point of view. For operations, however, one often wants to inspect the current state of idempotency. This requires the backend to save (at least part of) the idempotency input for later inspection. While the hash is strictly limited in size, the original idempotency input is not. Therefore, it is quite prone to exceed limits posed by the idempotency backend.
Ensure to establish important information beforehand and persist it to the idempotency table while considering the size constraints. This is paramount to efficiently resolve problems with respect to idempotency.
One very straightforward approach is to label function executions with their idempotency keys, index by these and persist the respective information in the telemetry solution. These typically impose less stringent constraints and will be the point from where operators commence their debug journey in the first place. In particular, integrating with the telemetry solution enables operators to quickly establish which function invocations actually executed and which invocations were skipped due to idempotency.

Conclusion

When considered from the very beginning, it is very straightforward to design for idempotency. The article outlined a number of aspects and/or pitfalls that are best sorted before rushing into an implementation. This ensures that the system architecture is not needlessly convoluted and the recovery journey is as easy as possible. In particular, we asserted that a dedicated layer is sensible in keeping system complexity low.

Idempotency practices are best set by a platform team that provides a basis for individual development teams to build upon. This relieves individual project teams of undue mental load and ensures a coherent approach across larger platforms. In particular, it reduces the likelihood of unpleasant surprises when you want them least.

Ensuring Test Coverage Using ArchUnit

Alex Cosma — Wed, 10 May 2023 05:23:51 GMT

In a previous article, we've seen an introduction to what ArchUnit can do in terms of imposing coding standards. This is great. Not only is your code safe when being refactored by having tests, but with the help of ArchUnit, architectural decisions are easy to impose and maintain on a test level, with pressure being let off of the code review process.

Another detail that is also usually verified within the code review process is the writing of said tests. Often, a comment like "needs more tests" might pop up, sending the code back to the developer. As we'll see, along with ensuring architectural unity, through some nifty usage of ArchUnit and naming conventions, test coverage can also be ensured on a test level.

Ensuring classes have their equivalent test classes

Say the first rule we want to impose is that there is a test class created for all our controllers, services, and repositories. ArchUnit makes writing this very easy, giving us just a bit of work to do implementing a custom condition.

@ArchTeststatic final ArchRule ensureTestClasses =    classes()        .that()        .areAnnotatedWith(RestController.class)        .or()        .areAnnotatedWith(Service.class)        .should(haveTheirEquivalentTestClass());

The only method that we need to write is haveTheirEquivalentTestClass().

private static ArchConditionsuper JavaClass> haveTheirEquivalentTestClass() {    return new ArchCondition<>("have associated test classes") {        @Override        public void check(JavaClass item, ConditionEvents events) {            final String className = item.getSimpleName();            final boolean hasTestClass = getEquivalentTestClasses(item).size() > 0;            if (!hasTestClass) {                events.add(                        new SimpleConditionEvent(                                item, false, "%s does not have a test class".formatted(className)));            }        }    };}

The ArchCondition class is used to check the required conditions, its check() method accepting a JavaClass parameter (our controllers, services, or repositories) and ConditionEvents which is going to be used to pass any violations.

The getEquivalentTestClasses(item) method will fetch all test classes associated with the one passed as a parameter following the next conventions:

it has to be within the same package
it has to have the form ClassNameTest or ClassNameIntegrationTest

Of course, different conventions can be implemented here depending on the needs and context of the project. This indirectly ensures a naming convention consistency for test classes which is just an added benefit.

As you might have noticed, any references to Repository beans are missing. We will tackle them in the next section where I show how to impose a percentage of methods coverage for any particular type of beans and thus will be writing one more method.

Percentage of methods test coverage

Simply put, we may want to enforce an x% test coverage for any or all of our classes. Let's assume we enforce a 100% method test coverage for our repositories. What does this mean? First of all, we ensure that all our repositories have an associated test class (explained in the previous section), and then, we ensure that all declared methods have an associated method with the same name (or following some other convention) in any of those test classes.

public static final double MIN_COVERAGE_REPOS = 1.0;@ArchTeststatic final ArchRule fullMethodTestCoverageForRepositories =    classes()    .that()    .areMetaAnnotatedWith(Repository.class)    .should(haveTheirEquivalentTestClass())    .andShould(havePercentMethodCoverage(MIN_COV_REPOS));

So how should the havePercentMethodCoverage(MIN_COV_REPOS) work?

It should take all public methods from the evaluated class, all public methods from the test classes associated with it, subtract the ones from the test class from the ones found in the evaluated class, and based on the percentage passed in the method, decide on whether or not to add a SimpleConditionEvent alert.

After running the tests, assuming a @Service class with 3 public methods, 1 private one, but only 2 of those public methods having their equivalent in the test class, the message you should expect to receive running the ArchUnit tests will look like this:

(assuming anything above 0.667 is passed as minimum test coverage in the parameter)

A final touch that can be added are excluded classes. Say that for whatever reason, some classes need to be excluded from these checks. This is very easy to do. All one needs is an array of class names that will be checked before any of these custom ArchUnit methods do any checks.

private static final String[] excludedClasses = {        "ExcludedService",};...public void check(JavaClass clasz, ConditionEvents events) {    final String className = clasz.getSimpleName();    if (Arrays.asList(excludedClasses).contains(className)) {        return;    }...

In the end, havePercentMethodCoverage() method will look like this

private static ArchConditionsuper JavaClass> havePercentMethodCoverage(double coverage) {    return new ArchCondition<>("have most their methods covered") {        @Override        public void check(JavaClass clasz, ConditionEvents events) {            final String className = clasz.getSimpleName();            if (Arrays.asList(excludedClasses).contains(className)) {                return;            }            final List equivalentTestClasses = getEquivalentTestClasses(clasz);            if (!equivalentTestClasses.isEmpty()) {                final List classMethods =                        getPublicNonStaticMethodsOfClass(clasz)                        .stream().map(JavaMember::getName)                        .toList();                final Set testMethods =                        equivalentTestClasses.stream()                 .map(CoverageTests::getPublicNonStaticMethodsOfClass)                            .flatMap(List::stream)                            .map(JavaMember::getName)                            .collect(Collectors.toSet());                final List missingMethods =                        classMethods.stream()                    .filter(method -> !testMethods.contains(method))                                    .collect(Collectors.toList());                final int claszMethodsSize = classMethods.size();                final double percentMissing = (double) missingMethods.size() / claszMethodsSize;                if (1 - percentMissing < coverage) {                    events.add(                        new SimpleConditionEvent(                            clasz, false,                            "%s has just %.2f%% (%d/%d) of its methods covered [missing: %s]".formatted(className, (1 - percentMissing) * 100,(claszMethodsSize - missingMethods.size()),                    claszMethodsSize,String.join(", ", missingMethods))));                }            }        }    };}

Caveats

There are obvious limitations to using something like this. Should any "sneaky developer" want to trick these conditions, they could simply create empty test methods and these checks will not know any better than to consider that they are covered. Oftentimes, however, seeing an empty test method more readily creates the want to fill it up. Furthermore, we assume good intent from the people we work together with, it being a necessary precondition for any team to function at peak performance.

This also does not cover anything in the way of cyclomatic complexity, branching methods, and, well, anything within those tests or equivalent methods. Just like in the previous article linked above in which we found an introduction to ArchUnit, these tests do not check substance, their purpose being of enforcing form.

All source code can be found in this repository.

Database Migrations using Flyway in dynamic multi-tenant Spring Boot applications

Alex Cosma — Mon, 08 May 2023 18:27:17 GMT

Introduction and use cases

Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers. Each customer is called a tenant. Software as a Service (SaaS) platforms like Shopify provide a way for anyone to build up their own dropshipping business in a matter of hours. No code, no hassle! What makes these SaaS platforms possible are multi-tenant architectures that provide data consistency across all tenants.

There are 3 ways in which one could achieve this, all described in this article.

Database per Tenant: Each Tenant has its own database and is isolated from other tenants.
Shared Database, Shared Schema: All Tenants share a database and tables. Every table has a Column with the Tenant Identifier, that shows the owner of the row.
Shared Database, Separate Schema: All Tenants share a database, but have their own database schemas and tables.

Interestingly enough, Florian Weingarten, a former engineering lead and manager at Shopify describes their architecture as being of the second type in this talk at SreCon16. Needless to say, the shared database approach has problems of its own but surely, data consistency is not one of them. One can easily provide the same database structure to all tenants as there is just one schema.

The tricky part of having a shared database is dealing with large amounts of data that tend to accumulate in its tables. Each table carries with it a tenantId column and all queries have to basically have appended a where clause that discriminates based on this id. Very large tables can pose performance issues, thus, sharing data either on different databases altogether or on different schemas within the same database seems to be a better choice. With the added difficulty of setting up a database for each tenant, the multiple schemas approach seems to offer the best trade-off.

What happens when there are separate schemas, one for each tenant?

Maintaining data consistency and ensuring migrations don't introduce unwanted side effects all the while introducing new schemas dynamically (whenever a new tenant registers) seems like a handful. But it's actually pretty simple. Here's how you can manage it in a simple Spring Boot app.

Spring boot application example

Configuration

The code for this can be found in this repository. We will be modeling a simple application with just one entity called Book.

We'll be needing a db connection with two databases:

metadata (single schema, containing data about users, tenants, and the links between them)
tenant (multiple schemas, containing data belonging to each tenant in particular)

Your application.properties will define those two separate databases and their credentials.

Next, we have to tell Spring where to find the entities for each one of these databases in order to properly map them. That implies a package structure that will split the entities (and/or functional modules) in two. meta and tenant.

We configure this by having two separate configuration classes, MetaDbConfig and TenantDbConfig.

@Configuration@EnableJpaRepositories(    basePackages = "com.multitenant.multitenancy.meta",    entityManagerFactoryRef = "metaEntityManagerFactory",    transactionManagerRef = "metaTransactionManager")public class MetaDbConfig {  @Primary  @Bean  @ConfigurationProperties("meta.datasource")  public DataSource metaDataSource() {    return DataSourceBuilder.create().build();  }  @Primary  @Bean  public LocalContainerEntityManagerFactoryBean metaEntityManagerFactory(      EntityManagerFactoryBuilder builder) {    return builder        .dataSource(metaDataSource())        .packages("com.multitenant.multitenancy.meta")        .persistenceUnit("metaDB")        .build();  }  @Primary  @Bean  public PlatformTransactionManager metaTransactionManager(      @Qualifier("metaEntityManagerFactory") EntityManagerFactory userEntityManagerFactory) {    return new JpaTransactionManager(userEntityManagerFactory);  }}

@Configuration@EnableJpaRepositories(    repositoryFactoryBeanClass = QuerydslJpaRepositoryFactoryBean.class,    basePackages = "com.multitenant.multitenancy.tenant",    entityManagerFactoryRef = "tenantEntityManagerFactory",    transactionManagerRef = "tenantTransactionManager")public class TenantDbConfig {  @Autowired  private JpaProperties jpaProperties;  @Bean  JpaVendorAdapter jpaVendorAdapter() {    return new HibernateJpaVendorAdapter();  }  @Bean  @ConfigurationProperties("tenant.datasource")  public DataSource tenantDataSource() {    return DataSourceBuilder.create().build();  }  @Bean  public LocalContainerEntityManagerFactoryBean tenantEntityManagerFactory(      MultiTenantConnectionProvider multiTenantConnectionProviderImpl,      CurrentTenantIdentifierResolver currentTenantIdentifierResolverImpl) {    Map jpaPropertiesMap = new HashMap<>(jpaProperties.getProperties());    jpaPropertiesMap.put(Environment.MULTI_TENANT, MultiTenancyStrategy.SCHEMA);    jpaPropertiesMap.put(Environment.MULTI_TENANT_CONNECTION_PROVIDER, multiTenantConnectionProviderImpl);    jpaPropertiesMap.put(Environment.MULTI_TENANT_IDENTIFIER_RESOLVER, currentTenantIdentifierResolverImpl);    LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();    em.setDataSource(tenantDataSource());    em.setPackagesToScan("com.multitenant.multitenancy.tenant");    em.setJpaVendorAdapter(this.jpaVendorAdapter());    em.setJpaPropertyMap(jpaPropertiesMap);    em.setPersistenceUnitName("tenantDB");    return em;  }  @Bean  public PlatformTransactionManager tenantTransactionManager(      @Qualifier("tenantEntityManagerFactory") EntityManagerFactory tenantEntityManagerFactory) {    return new JpaTransactionManager(tenantEntityManagerFactory);  }}

We can already see something of interest when it comes to multi-tenancy. These three properties from TenantDbConfig

jpaPropertiesMap.put(Environment.MULTI_TENANT, MultiTenancyStrategy.SCHEMA);jpaPropertiesMap.put(Environment.MULTI_TENANT_CONNECTION_PROVIDER, multiTenantConnectionProviderImpl);jpaPropertiesMap.put(Environment.MULTI_TENANT_IDENTIFIER_RESOLVER, currentTenantIdentifierResolverImpl);

Both MultiTenantConnectionProvider and CurrentTenantIdentifierResolver are interfaces that need to be implemented.

First, one gets implemented in TenantConnectionProvider which gets the tenantDataSource injected in its constructor and has two methods of interest:

  @Override  public Connection getConnection(String tenantIdentifier) throws SQLException {    final Connection connection = getAnyConnection();    connection.setSchema(tenantIdentifier);    return connection;  }  @Override  public void releaseConnection(String tenantIdentifier, Connection connection)      throws SQLException {    connection.setSchema(DEFAULT_SCHEMA);    releaseAnyConnection(connection);  }

This is where each hibernate connection gets its schema set.

Next, we have CurrentTenantIdentifierResolver that is implemented in TenantSchemaResolver

@Componentpublic class TenantSchemaResolver implements CurrentTenantIdentifierResolver {  @Override  public String resolveCurrentTenantIdentifier() {    String tenantUUID = TenantContext.getCurrentTenant();    return tenantUUID != null ? tenantUUID : DEFAULT_SCHEMA;  }  @Override  public boolean validateExistingCurrentSessions() {    return true;  }}

We see a TenantContext class which allows us to set the current tenant/schema from anywhere in the app (note that when doing so you must ensure the current transaction is not opened on any other schema, otherwise, you would need to manually create a new transaction)

There is a filter (AuthTokenFilter) that intercepts requests and takes the schema name from the headers and through this TenantContext class sets the current schema for any subsequent requests.

Flyway Migrations and the Tenant Pool

What about database migrations? Also, when are new tenants/schemas created?

Schemas are normally created whenever a new user registers. However, schema creation can be a computationally expensive operation and we want to offer new users a smooth transition into the application when registering. It would be great if we'd have schemas prepared before the users get registered and simply assign them to the users when they do.

Introducing the concept of TenantPool (or, schema pool)

A tenant pool is nothing but a pool o unused but created schemas waiting to be attributed to newly registered users. A scheduled job set to run as often as one likes fills up that tenant pool.

  @Scheduled(cron = "0 * * * * *")  public void execute() {    log.info("TenantPoolJob");    tenantPoolService.fillUpTenantPool();    log.info("TenantPoolJob finished");  }

Finally, here, we introduce Flyway which will be our preferred database migration tool. It is lightweight, easily configurable, and can be programmatically used (which, in our context, as we have dynamic schemas is exactly what we need)

The TenantPoolService will trigger our FlywayService that has two methods: initNewTenantSchema and initMetadataSchema

  public void initNewTenantSchema(String schema) {    Flyway tenantDbMigration =        Flyway.configure()            .dataSource(tenantDbUrl, tenantDbUsername, tenantDbPassword)            .locations("classpath:migrations/tenant")            .target(LATEST)            .baselineOnMigrate(true)            .schemas(schema)            .load();    tenantDbMigration.migrate();  }  public void initMetadataSchema() {    Flyway tenantDbMigration =        Flyway.configure()            .dataSource(metaDbUrl, metaDbUsername, metaDbPassword)            .locations("classpath:migrations/metadata")            .target(LATEST)            .baselineOnMigrate(true)            .schemas(METADATA_SCHEMA_NAME)            .load();    tenantDbMigration.migrate();  }

However, it will only call the initNewTenantSchema as the metadata schema is hopefully created by now.

The same methods are used when first running the application.

@SpringBootApplication@RequiredArgsConstructorpublic class InitRunner {  private final FlywayService flywayService;  private final ApplicationContext context;  public static void main(String[] args) {    new SpringApplicationBuilder(InitRunner.class).web(WebApplicationType.NONE).run(args);  }  @PostConstruct  public void run() {    flywayService.initMetadataSchema();    flywayService.initNewTenantSchema("public");    System.exit(SpringApplication.exit(context, () -> 0));  }}

The last thing on our list is making sure that when introducing a new migration, all existing schemas suffer changes.

A FlywayConfig file is used for this purpose with a single method aptly called migrateFlyway()

  @PostConstruct  public void migrateFlyway() {    final Set schemas = tenantService.getSchemas();    schemas.forEach(        tenant -> {          Flyway tenantDbMigration =              Flyway.configure()                  .dataSource(tenantDbUrl, tenantDbUsername, tenantDbPassword)                  .locations("classpath:migrations/tenant")                  .target(LATEST)                  .baselineOnMigrate(true)                  .defaultSchema(tenant)                  .load();          tenantDbMigration.migrate();        });    Flyway metadataDbMigration =        Flyway.configure()            .dataSource(metadataDbUrl, metadataDbUsername, metadataDbPassword)            .locations("classpath:migrations/metadata")            .baselineOnMigrate(true)            .target(LATEST)            .load();    metadataDbMigration.migrate();  }

We fetch all existing tenants from the TenantService and apply new migrations to all of them. This happens automatically when the application starts which does lead to the drawback of having rather slow application startups when there are many tenants.

To ensure that all schemas contain all migrations in their flyway_database_history table, you must provide a baseline migration file in your respective migrations folders for both tenant and metadata databases.

Once again, all the code can be found in this repository. Hope you enjoyed a small introduction to how to work with multiple tenant architectures with database migrations in Spring Boot!

How to upload videos to Jira after Cypress test run

Tomasz Buga — Wed, 26 Apr 2023 10:21:40 GMT

Preface

It's easy to integrate Cypress with Jira using the API. Once, you're done with this tutorial, you will be able to attach videos to Jira tasks programmatically from within your Cypress test suite.

Prerequisites

Existing Jira platform
Node.js
Basic knowledge of JavaScript and Cypress framework
Basic knowledge of how to use the API
Practical knowledge of using Terminal (or similar tools like Windows' Powershell)

Getting started with a new Cypress project

First of all, let's initialize a new project. To do that, let's create the simplest project with the npm.

Step 1: Create a new directory for our Cypress project

Step 2: From the new directory, execute the npm init command within the Terminal to create a new npm-based project. This step will create a package.json file.

Step 3: From the root directory (the same as in Step 2), from the root directory the npm install cypress --save-dev command can be used to add Cypress to the project's dependencies.

Step 4: Run Cypress open command. There are two ways (at least) to do so:

Use cypress open -e2e -b chrome command from the root directory
Add "test": "cypress open --e2e -b chrome" to the scripts section of the package.json file and run the script with npm run test command from the root directory

Step 5: Below you can find a picture of what you should see shortly after running the command within the Terminal. Press the continue button and select the Scaffold example specs to create some example test files. To make sure that everything is working as expected execute one of the available spec files (e.g., todo from 1-getting-started directory)

Step 6: Tests should pass and thus end up with green arrows, as in the illustration below.

Recording videos with Cypress

As you could've already noticed Cypress doesn't record videos when running via Cypress Runner (cypress open command). I couldn't find precise information on why is that exactly, so I have to assume it's related to the Cypress architecture when running in the so-called headed mode (contrary to the headless mode).

On the other hand, video recording is automatically done when running Cypress via cypress run command. Let's fix our scripts section of the package.json, shall we?

"scripts": {    "cy-open": "cypress open --e2e -b chrome",    "cy-run": "cypress run --e2e -b chrome --spec cypress/e2e/1-getting-started/todo.cy.js"},

With --spec flag we're able to specify a single spec file that we want to be executed. The next step is to verify that videos are being recorded. Run the npm run cy-run command from the root directory.

Once, the test is finished you should end up with a brand-new directory named videos which should contain a video file named like the spec file with the *.mp4 suffix (e.g., todo.cy.js.mp4).

Cypress videos upload to Jira

For the sake of simplicity, let's try out the simplest solution for the video upload.

Jira Personal Access Token & Base URL

First of all, we need to provide our Cypress tests with access to Jira. To do that, we can utilize built-in Jira's personal access token functionality.

Log in to your Jira platform, and click on your profile icon. Select the "Profile" option and go to the Personal Access Tokens tab from the side nav.

Press the Create token button, provide the title and save the token somewhere safe. Once, we got the token, let's extract the Jira platform base URL as well.

With both of these values, we can create environment variables within cypress.config.js file:

module.exports = defineConfig({  e2e: {    env: {      jiraApiToken: 'yourJiraApiToken',      jiraApiBaseUrl: 'https://yourJiraPlatformBaseUrl/rest/api/2',    }

Cypress videos overview

Now, let's focus on how Cypress works when it comes to video recording.

Cypress records videos of the entire spec file (not per test case)
Cypress records are compressed by default. To change that, we can add the videoCompression: false flag within the cypress.config.js file (like in the example below). The video compression flag accepts values: false and numerical values from the range 0-51.
```
 e2e: {     videoCompression: false, // either false, or values from 0 to 51     env: {       jiraApiToken: 'yourJiraApiToken',       jiraApiBaseUrl: 'https://yourJiraPlatformBaseUrl/rest/api/2'     }
```
Cypress clears the videos directory before a cypress run. You can change it by adding a flag trashAssetsBeforeRuns: false within the cypress.config.js file (e2e section)
We can only utilize the after:spec Cypress' event to gain control over what's happening with the video file

With ground rules established, we can proceed with the coding.

The package.json required dependencies

We need to update our package.json file to enable us with the possibility to upload files via API. To do that, add two required dependencies, like shown below:

"dependencies": {    "form-data": "^4.0.0",    "node-fetch": "^2.6.7"}

form-data is an interface to provide a key/value format to form fields and their respective values, which can be then sent via fetch().

node-fetch is a package that provides the window.fetch() to the Node.js environment. You can learn more here: https://www.npmjs.com/package/node-fetch

Please do consider the dependencies versions, when you'll be implementing this code into your Cypress-based framework, as you may encounter some difficulties if you modify those values. It's not forbidden - just make sure that you know what you're doing :)

Code implementation

With everything prepared, what's left is to update our cypress.config.js file with the appropriate code.

const {defineConfig} = require("cypress");const path = require("path");const fs = require("fs");const FormData = require('form-data');const fetch = require('node-fetch');module.exports = defineConfig({  e2e: {    videoCompression: false,    env: {      jiraApiToken: 'yourJiraApiToken',      jiraApiBaseUrl: 'https://yourJiraPlatformBaseUrl/rest/api/2'    },    setupNodeEvents(on, config) {      on('after:spec', async spec => {        const videoPath = path.normalize(`${config.videosFolder}/${spec.baseName}.mp4`);        if (fs.existsSync(videoPath)) {          const form = new FormData();          form.append('file', fs.createReadStream(videoPath));          const requestDetails = {            method: 'POST',            headers: {              'Authorization': `Bearer ${config.env.jiraApiToken}`,              'Accept': 'application/json',              'X-Atlassian-Token': 'no-check',              'Content-Type': `multipart/form-data; boundary=${form._boundary}`            },            body: form          }          await fetch(`${config.env.jiraApiBaseUrl}/issue/COPQMS-570/attachments`, requestDetails)            .then(response => {              if (response.status !== 200) {                console.log('Server sent wrong status. Status: ' + response.status)              } else {                console.log(`Added video as evidence: ${videoPath}`);              }            })            .catch(err => {              console.error(`Video (${videoPath}) couldn't be uploaded due to an error`);              console.error(err);            });        } else {          console.error(`Video file not found: ${videoPath}`);        }      })    },  },});

Let's break it down:

setupNodeEvents(on, config) - is a default Cypress config settings function, that allows us to modify the internal Cypress behaviors (learn more here)

on('after:spec', async spec => { - we're using the Cypress' after:spec hook, to perform video upload once the test run based on the spec file is done

const videoPath = path.normalize(`${config.videosFolder}/${spec.baseName}.mp4`); - this line is responsible for video file path normalization (to put it simply - we want the path to be compatible with different file systems)

if (fs.existsSync(videoPath)) {          const form = new FormData();          form.append('file', fs.createReadStream(videoPath));          const requestDetails = {            method: 'POST',            headers: {              'Authorization': `Bearer ${config.env.jiraApiToken}`,              'Accept': 'application/json',              'X-Atlassian-Token': 'no-check',              'Content-Type': `multipart/form-data; boundary=${form._boundary}`            },            body: form          }

The code above does a couple of things:

if (fs.existsSync(videoPath)) { - checks if the video file exists. If not - then an error is displayed in the console with information that the video file couldn't be found
const form = new FormData(); form.append('file', fs.createReadStream(videoPath)); - creates a new FormData and appends the video file as a ReadStream. This enables us to send the binary files over the API
Entire section related to requestDetails contains API request details. The most important things in terms of Jira integration are:
1. 'Authorization': Bearer ${config.env.jiraApiToken} - we're using the Jira's Personal Access Token
2. 'X-Atlassian-Token': 'no-check' - this is the required header (mentioned here)
3. 'Content-Type': multipart/form-data; boundary=${form._boundary} - we need to specify the Content-Type. Otherwise, the 415 Unsupported Media Type error will occur. Also, the boundary is required, unless you want to see the 500 Internal Server Error
body: form - we're passing the FormData with the appended file as an API request body

await fetch(`${config.env.jiraApiBaseUrl}/issue/COPQMS-570/attachments`, requestDetails) - The last part is implemented to perform the API request. We're using the hardcoded value of the Jira ticket (COPQMS-570), but feel free to tinker around it (e.g., add an automatic Jira ticket based on the Test Plan/Run/Execution).

Also, notice the await next to fetch() function. Without it, Cypress will just go with the flow and skip the file upload part entirely.

After fetching the response, we're performing simple validations based on the response status code and availability of the resources provided to the fetch() function.

Testing the solution

Let's try out the code. Run the npm run cy-run command from the root directory and check the Jira ticket afterward.

As you can see my test passed with flying colors and there is a log that states that the video file has been uploaded.

We can also confirm from the Jira platform that the attachment is added to the ticket.

If you have any questions - feel free to leave them in the comments section.

Happy coding!

Moving GitLab to Azure

Dominik Süß — Mon, 24 Apr 2023 05:29:51 GMT

At Cloudflight, we recently moved our on-premise build infrastructure to Azure. This includes GitLab, TeamCity, Nexus and a lot of other supporting services, enabling us to build software the way we do. Most of these services were already running on OpenShift which made the migration pretty painless and straightforward but our GitLab instances were running on virtual machines to avoid complicated networking setups concerning SSH in OpenShift.

This blog post outlines the path we took, the decisions we made and the challenges we encountered while moving these GitLab instances to a Kubernetes-based setup. It should not serve as a guide, but help you on your journey moving GitLab to the cloud.

Requirements and Constraints

Before designing the target architecture, we have to analyze the requirements and constraints.

The hard constraints (i.e. non-negotiable) in our case were:

Only allow access from our company network
No difference should be noticeable for end users (this means changing URLs/ports is a no-go)
Have all services be zone-redundant

Other requirements include:

Cost-Effective resource sharing with other services
Utilize hosted offerings whenever it makes sense
Minimized latency to our offices

Additionally, we can use this opportunity to pick up some new possibilities along the way like zero-downtime upgrades.

Current Architecture and migration targets

At the start of this journey, we were operating two self-hosted GitLab instances. One of these is older and was set up at a time when we had comparatively few projects and employees. The other one is newer and contains the majority of our active projects. The newer instance is completely automated regarding user/repo management while the older one is a chaotic mess of manually created repositories, groups and assigned permissions. Our long-term goal is to move all projects off the old instance to the new one but as we, as site reliability engineers, know: permanently turning off a service is harder than one thinks.

This setup gives us one advantage: we can first migrate the older, less important instance and iron out the bugs before migrating the business-critical one.

Both instances are running the official gitlab-ce container using podman and are connected to a fast PostgreSQL database.

For this post, I'll refer to these instances as gitlab-legacy and gitlab-current respectively

Target Architecture: First Iteration

To figure out where to begin, the first step is a rough architecture overview. This is not final and intended to evolve but visualizing the "big-picture" shows us where to focus and what challenges we might encounter.

I sketched out the first iteration pretty quickly and without looking too much into the technical details (this will come back to bite us later). My goal for this iteration was to set up a cluster, run GitLab on it and connect it to our network.

Now that we know what services we're going to use - let's provision them and see how far we get with this current design!

Configuring Azure

Our tool of choice for cloud configuration is Terraform, so we first created a module setting up shared services (AKS, VNet, Terraform State) and a separate one for GitLab-specific resources (PostgreSQL, Redis, Storage Accounts).

As most of our offices are located in the DACH area, I chose germany-west-central as the target Azure region.

AKS VNets

To connect AKS to the target VNet, the node pool needs to be in a subnet, contained in the virtual network. It would be possible to route between VNets, but this costs per GB, and we're trying to optimize for cost. To peer with our company network, we only have a /24 netmask available. As we also want to have other services on that virtual network, I split the network into two /25 blocks. The first 128 Addresses are reserved for the AKS nodes, while the rest can be used for Services. Services include Azure resources like Redis and PostgreSQL but also exposed services such as the ingress controller of our cluster.

After spending a morning, putting my architecture diagram into code, I ran into the first issue

Message="The VM size of AgentPoolProfile:default is not allowed in your subscription in location 'germanywestcentral'

Where Instances?

As it turns out, the germany-west-central region is currently out of resources. Even though the cloud is supposed to scale with your needs and enable dynamic workloads, at the end of the day, it's just other computers.

At this point, the advantages of Infrastructure as Code come into play. By changing ger-west-central to eu-west, our entire infrastructure is now provisioned on a new region.

Future me talking: Due to unforeseen delays in the migration, we actually managed to request resources on germany-west-central (at least in one subscription) so do not be confused if you see references to gwc in the rest of the article.

Supporting resources

This is the easiest part of the whole ordeal. As we will need to support two GitLab Instances, I created a terraform module that contains storage accounts, containers, databases and caches. This module then gets included twice in our main terraform repository - once for each instance.

During the setup of these resources, we noticed especially high costs associated with Azure Cache for Redis because the premium tier is the smallest tier with Zone-Redundancy. Since setting up HA Redis is pretty simple (in contrast to Postgres) and the data is not critical, we decided to deploy Redis in Kubernetes instead.

Secret management

At some point, our cluster would love to handle some secrets but most of our confidential resources (PostgreSQL, Redis, Storage) reside outside AKS. This means we have to bridge the gap between the Terraform and Kubernetes worlds.

The azure-native way for this lies in the Azure Key Vault Provider for Secrets Store CSI Driver which is not only a very long name but also a project to provide KeyVault secrets in Kubernetes.

The Azure documentation tells us to use the az CLI to enable the add-on for the cluster but as we want to be code-driven, this is not an option. The terraform module does not support enabling add-ons, so we drop one level deeper and install the addon ourselves. Lucky for us, the AKS addons are simply helm charts that can be provisioned by any cluster admin. More on how we configure the helm chart later.

After the addon is provisioned, pods can either mount the secrets directly or reference a secret, which is created on the fly.

Configuring Kubernetes

The preferred way to set up GitLab on Kubernetes is the GitLab Helm Chart. An operator is in the works, but as it is helm-based, we do not gain any advantages by using it in our case.

As seen in the documentation, the chart includes a lot of other components. Some of those we want for both instances (most of the core components) while others we want to share between the tenants (e.g. certificate handling).

At this point, our first architecture draft comes back to haunt us. The reason why the GitLab chart includes its own ingress controller is port 22. In the original architecture, we use one shared ingress controller for both instances. This won't work, as each GitLab instance expects itself to be available at port 22 for SSH traffic. This is configurable of course, but we want to minimize friction and meet the expectation of developers, that SSH cloning works without specifying a port.

So back to the drawing board it is!

Target Architecture: Second Iteration

This new architecture splits up the ingress controller into two. Each of these instances has its IP Address inside the Azure VNet and is thus able to independently forward port 22 to the correct instance.

To specify the IP Address for the ingress controller, the following annotation can be used on the service

'service.beta.kubernetes.io/azure-load-balancer-internal': 'true''service.beta.kubernetes.io/azure-load-balancer-internal-subnet': '',

Getting rid of YAML

Yes - we're going to configure Kubernetes without YAML. You might ask how and the answer is Jsonnet! Jsonnet is a data-templating language that allows us to reuse and simplify configuration. To apply the configuration we utilize tanka, a wonderful tool developed by the folks at Grafana Labs.

You might also wonder why we do not simply use helm. This is best explained in the Tanka documentation:

Helm relies heavily on string templating .yaml files. We feel this is the wrong way to approach the absence of abstractions inside YAML, because the templating part of the application has no idea of the structure and syntax of YAML.
Jsonnet on the other hand has got you covered by supporting mixing (patching, deep-merging) objects on top of the libraries output if required.

But we intend to use the GitLab helm chart - how does this work? Fear not! Tanka has a solution for that. Tanka ships with support for Helm. This works by calling helm template during the evaluation of the jsonnet code. This allows us to configure the helm chart using jsonnet, render it, and then customize it further - again using jsonnet.

Now that we've clarified how our configuration will be done, let's get to configuring some stuff!

Initial repository setup

For the initial setup, a simple call to tk init suffices. This creates the required folder structure, and sets up dependency management using jsonnet-bundler.

We'll structure our deployment into three environments. In tanka terminology, environments are separated configuration units. In our case the following environments allow us to have a clean separation of concerns:

environments/infrastructure: Contains shared components like the cert-manager or kured
environments/gitlab-legacy: Contains the configuration for the legacy GitLab instance
environments/gitlab-current: Contains the configuration for the current GitLab instance

Each of these environments is scoped to its respective namespace. Now let's fill these environments with some resources!

The infrastructure environment

As we intend to reuse this cluster for different applications, it would be highly inefficient to have each application provide its own way of managing certificates. So our first order of business is to install cert-manager in our cluster. We can use the cert-manager jsonnet mixin provided by Grafana, which greatly simplifies things.

Another thing we'll put in the infrastructure environment is the previously mentioned CSI Secret Provider. This is our first contact point with the tanka helm integration so let's take a closer look.

First, we have to download the helm chart. The tk tool contains a handy utility to do this for us:

# initialize a chartfile used for dependency managementtk tool charts init# add the repotk tool charts add-repo azure-secrets-csi https://raw.githubusercontent.com/Azure/secrets-store-csi-driver-provider-azure/master/charts# add the charttk tool charts add azure-secrets-csi/csi-secrets-store-provider-azure@1.0.1# download all charts specified in the chartfiletk tool charts vendor

Now we can build our reusable library component. By conventions, these go into the lib directory. The configuration is pretty straightforward as the helm chart does not require much configuration:

local tanka = import 'github.com/grafana/jsonnet-libs/tanka-util/main.libsonnet';local helm = tanka.helm.new(std.thisFile);{  csiSecretsStoreProvider: helm.template('aks', '../charts/csi-secrets-store-provider-azure', {    namespace: 'kube-system',    includeCrds: true,    values: {      'secrets-store-csi-driver': {        syncSecret: {          enabled: true,        },      },    },  }),}

This can then be referenced by our environment main.jsonnet:

(import 'secrets-store-csi-driver.libsonnet')

And that's it! When executing tk apply environments/infrastructure the helm chart will be rendered, parsed and injected in the correct location.

Configuring GitLab

Now that everything is up in place, we'll need to configure GitLab to work with our stack. Under the confusing name command-line-options.md we find the best documentation for configuring the helm chart. Let's go through the things required to get our stack up and running.

The following customizations all take place in the values of the helm chart

Disabling things not needed

In the global section, we need to disable a few things not required as they have been replaced by Azure offerings or shared instances:

{  global: {    ingress: {configureCertmanager: false},    kas: {enabled: false},    minio: {enabled: false},  }}

Sadly, the helm chart has a lot of locations where we need to disable things. These are the other parts:

{  certmanager: {install: false},  'gitlab-runner': {install: false},  postgresql: {install: false},  registry: {enabled: false},}

Depending on your needs, you might want to keep some of these components enabled.

Connecting to Azure services

As outlined at the top, we want to use managed services where possible. The connection to Azure blob storage is established like this:

{  global: {    appConfig: {      object_store: {        enabled: true,        connection: {          secret: $.secretName,          key: 'objectstore-connection',        },      },      lfs: {        enabled: true,        proxy_download: false,        bucket: 'lfs',      },      artifacts: {        proxy_download: false,        bucket: 'artifacts',      },      uploads: {        proxy_download: false,        bucket: 'uploads',      },      packages: {        proxy_download: false,        bucket: 'packages',      },      externalDiffs: {        proxy_download: true,        bucket: 'external-diffs',      },      terraformState: {        bucket: 'terraform-state',      },      ciSecureFiles: {        enabled: false,      },      dependencyProxy: {        enabled: false,        bucket: 'dependency-proxy',      },    },  }}

For this to work, a secret containing the connection string must exist. In our case, this secret is provisioned by the CSI Secret Store Provider. Configuring this in detail is out of the scope of this article but might be the topic of a future post so stay tuned.

The connection string has the following format:

provider: AzureRMazure_storage_account_name: azure_storage_access_key:  key>azure_storage_domain: blob.core.windows.net

PostgreSQL is set up in a similar manner:

{  psql: {    host: $._config.gitlab.dbHost,    password: {      secret: $._config.gitlab.secretName,      key: 'database-password',    },    username: $._config.gitlab.dbUser,    database: $._config.gitlab.dbName,  },}

After applying this configuration, we were greeted with a fresh GitLab instance!

Moving data

Now that GitLab is set up, we need to get the date there somehow. The way to go here is to follow the official migration guide.

Since importing large GitLab instances can take quite some time, we have to adapt the import process a bit. The documentation instructs us to run kubectl exec -it -- backup-utility --restore -t _ but if the import takes longer than the configured Kubernetes API timeout (default 10 minutes), you'd have to manually interact with the session every few minutes. Having the process run detached from the connection would be okay, but unfortunately, the last steps of the import ask for interactive confirmation. We also want to track the progress somehow so having a persistent session available would be of great use. To accomplish this, we can either use a custom toolbox image (the clean way) or run the toolbox as root and install tmux/screen in the running container (the easy way).

As we're already using Azure blob storage for the GitLab data, we can reuse this to transfer the backup. The easiest way I found was to upload the backup archive using azcopy and a Shared Access Signature (SAS). By using Shared Access Signatures we avoid storing any long-lived credentials on the machine. Shared signatures can also be used during the restore process as the GitLab backup-utility supports loading backups from a remote URL.

Testing HA

With a running GitLab instance, we can now check if our availability requirements hold. For managed services like PostgreSQL and Blob Storage, we have to trust Microsoft. Testing zone failure in AKS however is quite simple. We just have to shut down all instances in one region. This can be done in the Virtual Machine Scale Set.

So let's do that!

A glance at our pods shows us that some of the GitLab containers were scheduled in the same zone which is a no-go! To prevent this, the following has to be set in the values:

{  antiAffinity: 'hard',  affinity: {    podAntiAffinity: {      topologyKey: 'topology.kubernetes.io/zone',    },  },}

After retrying the failover with correctly scheduled pods, we at least get a response from GitLab instead of a generic 502 response from nginx. Taking a closer look reveals two pain points: Redis and Gitaly

Redis

As described at the top, we did not want to use a managed offering for Redis. As a tradeoff, we now have to take care of HA ourselves. The Redis subchart did not work well when we tried it so we substituted it for our own Redis HA Module. This configures a Primary/Replica setup with sentinel to switch over in case of failure.

Gitaly (or the curse of GWC)

The Gitaly component is used by GitLab to store and retrieve repository data. In the default setup, it is set up as a single replica statefulset which would be fine in most situations as the pod could restart in a different zone and mount the same storage. Sadly, the Germany-West-Central region is missing a crucial feature: Zone-Redundant Storage. This means that once storage is provisioned, it is now locked to a specific zone. Obviously, this will not work for us, so we have to fix this at the application level and configure gitaly to be highly available. The abstraction layer responsible for replicating repository data is called Praefect and it requires a separate database. Praefect handles distribution/replication in a Gitaly cluster and can be set up using the Helm chart:

{  praefect: {    enabled: true,    psql: {      host: $._config.gitlab.dbHost,      user: $._config.gitlab.dbName + '_praefect',      dbName: $._config.gitlab.dbName + '_praefect',      sslMode: 'require',    },    dbSecret: {      secret: $._config.gitlab.secretName,      key: 'praefect-password',    },    virtualStorages: [      {        name: 'default',        gitalyReplicas: 2,        maxUnavailable: 1,      },    ],  },}

This does not migrate data however so the simplest approach is to tear down the resources and perform the migration again.

After these changes, High availability works without any downtime and only a few dropped requests while Redis is electing the new leader.

TOFU, but not the good kind

After performing the migration, everything looked fine at first until we tried the first git clone.

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!

We certainly didn't do anything nasty so what changed? The underlying transport of git clones, SSH, uses a trust model called Trust On First Use (TOFU). This means, that the first time you connect to a host its host key gets stored in your .ssh/known_hosts. A crucial detail the GitLab migration guide misses out on is that the SSH Host Keys do not get restored during the restorations.

To migrate the SSH keys, edit the gitlab-gitlab-shell-host-keys secret and replace the values with the appropriate keys. If you're coming from a docker-based installation, these files can usually be found in the volume mounted to /etc/gitlab, next to the gitlab.rb file. Afterwards, delete the gitlab-shell pods to reload the keys from the secret. Now the developer and CI machines see the same key they're used to.

Conclusion

Depending on your setup and needs moving GitLab to the cloud can seem like a daunting task. However, the excellent migration guide combined with the flexibility of the Helm Chart can make this a breeze. The migration is well worth it with easy scaling, zero-downtime updates and high availability with minimal effort. We hope this guide helps you find your ideal setup and makes you aware of any potential pitfalls and hope your migration goes well!

You might not want Rxjs

Kangrui Ye — Fri, 10 Feb 2023 09:15:27 GMT

State management solutions are usually tied to some change propagation mechanism. This comes from what the framework supports most of the time. So for Angular, it would be Rxjs and the composition API for Vue.js. React does not directly enforce a solution onto you, thus leaving a lot of room of exploration to the user. The most popular ones are nonetheless written with Rxjs.

I am, however, not convinced that Rxjs should be used for state management solutions. A store is basically a BehaviorSubject with some niceties around it. The following code is thus good enough to demonstrate the reason for my view point:

const store$ = new BehaviorSubject(someValue);const derived$ = store$.pipe(    // ... some operators later    switchMap(async value => {        return await asyncCall(value);    }));// somewhere elseconst derivedValue = await firstValueFrom(derived$);

Can you spot the bug here? When we try to get the value out of the observable, we might be reading stale data, because the promise has not been resolved yet. It is like eventually consistent databases, where a read directly after writing does not guarantee the latest changes are available.

This leads to incorrect data being constructed and persisted that sometimes happens in production, but is rarely reproducible in the dev environment, because the async calls are being resolved fast enough.

The root cause is Rxjs trying to abstract over asynchronity, but does not have a system in place to tell downstream to invalidate their state. There is no way to know whether the value we just read is really up-to-date.

Why would we want to get the value out of the observable? Reading data from the store and working with it on user action is very common. This is especially true for big and complex applications, where multiple layers of abstractions are build on top of the store with Rxjs pipes and combinatorics.

This is not the reactive way you might say, in which case you are absolutely correct. The reactive way, however, does not solve our real world requirements.

Considered Solutions

Many ideas have been considered in how we can fix this problem, and none of them are without drawbacks.

Blocking User Interaction

The idea is to block the user interaction until the action has been completed. The bad UX aside, having a combineLatest somewhere could unblock the application too early. The reason this can occur is due to combineLatest using stale data itself.

Another similar approach is queuing user actions, waiting for the previous action to finish and then start processing the next one. Since the next action might need to read data from the abstractions we have built with Rxjs, this can only work if we know when the state has become consistent, which is the very issue we are trying to solve in the first place.

No Reads After Writes

Writing to the store is only allowed at the end of the action. This prevents reading stale data within the context of that action. However, what if the user clicks around very fast? We would get stale data which should have been properly updated by the previous action.

Disallow De-Coloring Promises

We would get Observable> instead of Observable and all operators like debounceTime, combining switchMap with async functions are forbidden. The changes are pushed to downstream immediately and only needs additional awaits. This is the most attractive solution in my eyes. Downside would be a lot of features from Rxjs get disabled and increased mental overhead because of mixing promises and observables together.

Is there a way to fix the behavior in Rxjs itself?

Rxjs can implement a way to invalidate downstream data. There would be, however, a lot of edge cases. What should be the behavior of something like shareReplay(10)? What should be invalidated there? I think this approach would lead to fundamental redesign of Rxjs and might not be feasible.

Closing Words

I am open to ideas for how we can deal with this issue properly. Up until then, I would not recommend Rxjs based state management solutions for any applications of reasonable size.

Moving Elasticsearch into Kubernetes

Dominik Süß — Tue, 07 Feb 2023 09:14:10 GMT

For a customer project, we recently had to shovel a bunch of Elasticsearch data into a Kubernetes environment. Let's take a closer look at how we did that and what to watch out for.

The Methods

When searching for ways to migrate between two Elasticsearch clusters we have a couple of different methods at our disposal.

Reindexing

Reindexing is the process of iterating over existing documents and inserting them again. This can be used to apply a different set of indexing parameters, to update the underlying index or to move the documents to a different index. This also works when considering a migration between different clusters.

When choosing this approach be aware of the following limitations:

Index mappings and aliases must be migrated manually
Puts heavy load on the cluster while reindexing every document
Index-based settings have to be copied manually

The advantage of this approach is its simplicity. It also allows you to perform an update of the index version at the same time.

Joining the clusters

This method is the only one which works without downtime. With this approach you join the new cluster nodes to the existing cluster, drain the old nodes and de-provision them after your applications all refer to the new nodes. It requires your nodes to communicate bidirectionally which might be hard to accomplish depending on your setup.

Snapshot/Restore

By utilizing the built-in snapshot/restore feature, we eliminate the drawbacks of the reindexing approach. Snapshots can either be taken of single indices or the entire cluster allowing for gradual migration. The downside here is the setup of a working snapshot repository. This either requires shared file storage or third-party software providing object storage capabilities.

Third-party tools

While looking for migration options, we also evaluated elastic-dump but this did not work well for some indices. In our specific setup, it also presented a network bottleneck as data is copied to the host running the tool.

The Process

After evaluating our options and testing their viability on a smaller dataset, we decided to go with the Snapshot/Restore approach for three reasons:

We already have a working snapshot setup
Our index settings are very complex, which rules out reindexing
The data layout allows for gradual migration

Taking it slow

The cluster we're trying to migrate contains around 605 Gi of data. Migrating this all at once would take a lot of time and since we're not able to do this without downtime, we should at least minimize it.

Our big advantage: The data in question is grouped into indices based on creation time and does not change after creation. This allows us to disable writes to old indices way ahead and move them ahead of the final migration. By moving old indices first, we can take our time for the biggest chunks of data and reduce the final downtime as only very few indices are actively being written to.

Performing the migration

Let's get to moving around some data. For this, we first have to take a snapshot of it. Before creating the snapshot, an existing repository must be in place. In our case, we have a distributed volume available on all of our nodes so we choose this as our snapshot repository.

To create the snapshot, either use a graphical tool or issue the corresponding API request directly:

curl -X PUT "localhost:9200/_snapshot//migration_snapshot?wait_for_completion=true&pretty" -H 'Content-Type: application/json' -d' {} '

This will create a snapshot called migration_snapshot. If your indices are very large, you might want to consider removing the wait_for_completion=true parameter and check on the snapshot periodically using the following API call:

curl "http://localhost:9200/_snapshot//_current"

After the snapshot completes, we need a way to get this to the new cluster. The easiest way for us was to serve the contents of the backup directory using python -m http.server. If you have an RWX volume at your disposal, mounting this in all of the Elasticsearch pods is also a viable option.

To restore from this snapshot, the cluster must be configured to allow the remote host as a valid backup path. When using the ECK operator, the configuration looks something like this:

apiVersion: elasticsearch.k8s.elastic.co/v1kind: Elasticsearchmetadata:  name: example-clusterspec:  # ...  nodeSets:    - name: default      config:        repositories.url.allowed_urls: 'http://'      # ...

Once this is rolled out, this URL can be used to create a new snapshot repository.

curl -X PUT "localhost:9200/_snapshot/migration_repository?pretty" -H 'Content-Type: application/json' -d'{  "type": "url",  "settings": {    "url": "http://"  }}'

If the configuration is applied correctly, this will create the migration_repository pointing to the remote endpoint.

To verify that everything works as expected, let's inspect the contents of the snapshot repo;

curl -X GET 'http://localhost:9200/_snapshot/migration_repository/*'

{  "snapshots": [    {      "snapshot": "migration_snapshot",      "uuid": "w1UH162UQkeDuQ0fVWyMGA",      "repository": "migration_repository",      "version_id": 7150299,      "version": "7.15.2",      "indices": [        "..."      ],      "data_streams": [      ],      "include_global_state": true,      "state": "SUCCESS",      "start_time": "2023-01-24T09:49:58.937Z",      "start_time_in_millis": 1674553798937,      "end_time": "2023-01-24T09:53:51.665Z",      "end_time_in_millis": 1674554031665,      "duration_in_millis": 232728,      "failures": [      ],      "shards": {        "total": 833,        "failed": 0,        "successful": 833      },    },  ],  "total": 1,  "remaining": 0}

Great! If you do not see your old snapshots here, make sure that you serve the correct files. The file index.latest should be served at the root.

The last step is to import the indices of this snapshot:

curl -X POST "localhost:9200/_snapshot/migration_repository/migration_snapshot/_restore?wait_for_completion=true&pretty" -H 'Content-Type: application/json' -d'{  "ignore_unavailable": true,  "include_global_state": true,  "include_aliases": true}'

Again, you can leave out the wait_for_completion=true parameter and check on the progress manually. If you want to limit the scope, provide an indices key.

And we're done! The cluster will begin routing the shards to a fitting node and your new cluster is ready to be used.

Reference Stories

Manuel Zarka — Wed, 04 Jan 2023 09:52:42 GMT

Why? What is the Problem?

As a service contractor with many projects, it's not unusual that team constellations change. This can have many reasons: Sometimes projects come to an end, sometimes people just want to evolve and learn new things, and sometimes we have to react to unforeseen changes. As new team members join the team, the understanding of the complexity of a user story starts to differ and this can lead to lengthy discussions. In many cases these discussions are about the right amount of story points, you might have heard it already: "For me, that is more of a 13" or "I would say that's rather a 21", and so on. That happens because the numbers do not reflect someone's understanding of complexity - and this is where reference stories come in.

What are Reference Stories?

Reference stories are already estimated & implemented user stories, which serve as a reference if discussions in your Backlog Refinement tend to get unproductive. They are elaborated collaboratively by the team members in a game, which I am going to explain to you in a minute. Reference stories create a common understanding of the correlation between estimation and complexity. In our case, each Fibonacci number has one reference story assigned, which we can refer to if we can't decide on an estimation.

Action: Let's find our Reference Stories!

Some prerequisites for the game:

Choose 15 to 30 user stories and make sure they have different estimations. I used a simple Jira Query which filtered my stories on Status and Story Points.
The stories you choose as your candidates should be already implemented and in status Done. This ensures that you know how much effort you actually had.
If possible: Try to collect stories where most of the team members participated in the original estimation, the stories shouldn't be too old. Therefore you have a higher chance to interpret the story correctly.
You are not going to compare the new value with the old one. See it as a recalibration of your estimation.
Try out our Reference Stories template in Miro

Part I - Sort the Stories horizontally

We now have a pile of user stories with different estimations and our goal in Part I is to sort them horizontally from small to big. At this stage, it doesn't matter if one story is definitely bigger than another. You don't have to be too precise yet, we are going to sharpen them in the next round. What is important is, that we have established a horizontal order with our user stories and got a consensus about that.

Part II - Estimate the Stories

The next step is to discuss each story from left to right and our goal is to assign them to Fibonacci numbers. A good way to do this is by creating categories/buckets. Notice, that you do not have to estimate the story like you would do in your Backlog Refinement, you did that already. It is more important to find the right category/bucket for each story. Stories per category/bucket might not necessarily be the same size but are somewhat comparable. Start with the smallest story, propose a category, and discuss the decision with your team members. The smallest story on the left doesn't necessarily need to be a "1", it might be a "2" or maybe even a "3". This depends on the story you have chosen for this game. After you've estimated the first story, put the sticky in the correct category/bucket and move on to the second, discuss it, and move to the next one. Try not to look back on the original estimated value and focus on doing the discussion from scratch, together in your current team constellation. That is important to fulfill our overall goal: To have a common understanding of complexity. You might run into heavier discussions about the estimations but that is okay, find an agreement within your team. After estimating all stories, your board should look something like this:

Part III - Choose your Reference Story

Congrats! You have all your Reference Stories assigned to an estimation. Now there is one last step left. The goal of Part III is to decide on one user story per Fibonacci number. These Reference Stories should help you in case you can't get an agreement in your Refinement Meeting. To reduce the room for more discussions and uncertainty, I recommend having only one story per estimation available, so in round III, choose the ones you want to go for.

And Then?

In the future you want to fall back on your team's Reference Stories when you have troubles in your Refinement Meeting, so document them somewhere in your project space. You will see, the longer your team members work together, the better their common understanding gets. If your team gets adapted, repeat the game and create shared understanding again.

Conclusion for CLF

In the last month, our team grew by five people and the Reference Stories helped us to create a common understanding of how we estimate complexity. I decided to use this method in order to react to lengthy discussions within our Refinements, which turned out to get unproductive. It is important to note, that Reference stories do not remove all your discussions - and they shouldn't! We are living in a fast-moving world and IT services are more popular than ever. Team changes will happen in the future. My next step is to spread the word throughout our company directly to our project leads, establish a common practice and adapt them if necessary. I hope this helped and if you have any questions or inputs, let me know!

Handling Mutations

Kangrui Ye — Tue, 03 Jan 2023 09:01:02 GMT

Required reading: https://engineering.cloudflight.io/defining-boundaries

A big part of what makes up complexity are side effects. The challenge with them is the (semi) non-deterministic behavior of their execution. A network call can give me the data I want on success, or error out with any of the multiple error codes it can have. Mutations can completely destroy data integrity. This time we will talk about handling mutations.

The Problem Statement

Imagine the following code:

const data = {    foo: 'hello',    world: 'bar'};function doSomething(dataToRead) {    const fooValueBeforeCall = dataToRead.foo;    await apiCall();    const fooValueAfterCall = dataToRead.foo;}function modifyData(dataToModify) {    dataToModify.foo = 'some new value';}doSomething(data);modifyData(data);

What we see here is reading the property foo does not always give us the same result. This is especially painful for programs with concurrent execution, since the business logic might require certain conditions to be fulfilled.

The modification of the data structure happens somewhere else in the code base. In the example, it is right below our call to doSomething but in reality, it can be somewhere else we do not know. Of course, the code above violates the boundaries.

Inspirations

How can we prevent mutations from becoming the source of bugs, then? We can take a look at how different programming languages handle them.

Haskell

In Haskell, mutations are not allowed at all. Every modification we want to make must be done by returning a new data structure with the changes applied. Mutations can't be the source of bugs if we don't have any, now, do we? Of course, like everything else in real life, this approach also has some downsides. You see, the majority of computers out there are based on the Turing machine, which in itself is based on mutating data. The (low-level) abstractions we use on top of it today also do not deviate from that idea. That is a limitation for Haskell since the low-level details need to be implemented in a way with side effects. Otherwise, the performance suffers because of the discrepancy between what the machine is good for and what Haskell wants to do. Not all of those low-level details can be completely abstracted away, though, thus making Haskell not suited for performance-critical work on the level of C and C++.

Rust

Rust on the other hand competes on that level, so it gave up on the idea of being mutation free. The language found a sweet spot between the two worlds: Mutability as a keyword and the ownership model. Remember what was written in the boundaries article? We want to express as much as possible with the type system and fall back on other solutions if there is no other way. Rust encodes mutability within the type system with the mut keyword. Most code does neither need nor do modify data, so immutability is the default in Rust. In case we do need mutations, we can use the mut keyword. What would be harder to understand would be the ownership model of Rust. I am not going to explain every detail of the ownership model here. The official docs exist for that. Instead, I will take the bits of interest for us, namely references and borrowing. The constraints from Rust are the following: We can have as many readers of that data as we want, but no writers. Or we can have one writer to that data, but no readers. Doesn't this pattern seem familiar? Yes, it is a read-write-lock, but enforced at compile time, made possible by the type system.

Defining the Architecture

We can use those two systems as inspiration for how we should manage mutations. Following the Haskell approach fully won't work, since we do have global state in the Frontend, which is the store. Rust has great ideas, but we cannot commit to it either because Typescript has no way to express the ownership model with its type system. Because of those reasons, we need to rely on conventions instead of the type system for what we are set to do.

The Single Source Of Truth

Let's introduce the concept of the owner. Our idea of the owner is a simple one: Where is the single source of truth? Let's say we have a function that accepts some data. Where is the single source of truth of that data? Yes, the caller, because the caller passed in the data to the function. The caller might have gotten the data from its own caller, in which case the single source of truth also shifts up to that caller. Alternatively, the caller might have gotten the data from a store, then the store becomes the single source of truth. See where I am going with this? I am constantly asking where the data comes from and ultimately, we will land at one place, which holds the absolute say about the data and everyone depends on it.

Can we have Typescript express the read-write-lock pattern like in Rust? No. Typescript does not have a way to do that. What we can do, is assume there are N readers, always. In Rust's case, nobody can write to the data. But that is not realistic for us, since mutations are needed to update the state in the store and display user changes in the UI. If we cannot eliminate it, then we can constrain the usage of mutations. Where should we be allowed to mutate? Not anywhere where the data is read, since that will make the reader into a writer, and we must not mix them. We have no way to ensure no readers being existing after all. So there is only one answer to this question: Wherever the single source of truth is. Since everything reads the data from the owner, aka the single source of truth, or a derivation of it, we can also assume that it can handle changes to the data it owns correctly.

A reader needs to communicate with the owner that some changes should be made to the data. There are two ways to make that happen: Tell the owner what to modify or give the owner an already modified copy to replace the data with. The former requires the owner to understand what the reader wants from it. This concept is an interface sitting between the data and the outside world, guarding what is allowed to be changed. The latter exposes the structure of the data fully, which might be more than enough when your requirements are not very complex.

One thing to keep in mind here is every component executes this concept for its own source of truth, which is its caller or parent or whatever it is called. Why can't we just write directly to the actual single source of truth? Because the callee does not and should not know where its caller got the data from. Ultimately, the one piece of code reading the data out of the store or the database is also the one writing the data back to it.

Replace Instead of Mutate

We are not done, though. The aforementioned architecture only makes reasoning easier by grouping relevant behavior together. Assume the following code:

let data = {    foo: 'hello',    world: 'bar'};export function modifyFoo(newFoo) {    data.foo = newFoo;}export function readData() {    return data;}

This implementation of the store looks fine, does it not? Accessors for writing and reading are both within the same file, and the data itself is not exposed. But it can lead to the same bug from our first code example. Haskell has half of the answer for us already: Do not modify existing data, but create new ones with the changes applied. In other words, we need to create a new data structure and replace the old one with it.

let data = {    foo: 'hello',    world: 'bar'};export function modifyFoo(newFoo) {    const updatedData = {        ...data,        foo: newFoo,    };    data = updatedData;}export function readData() {    return data;}

Now, with the new implementation, the bug from our first code example is also solved.

One question worth asking would be calling readData multiple times can lead to returning different data. Did we really fix the bug? The answer is yes. Reading from the store itself is considered a side effect, since the store is just a global variable with some abstractions around it. The bug we fixed is reading data from the input parameter always returning the same data, done by not mutating existing data anywhere.

Cutting the Folder Structure

Kangrui Ye — Thu, 29 Dec 2022 09:01:26 GMT

How to organize the files can be quite confusing when starting in programming, especially when the code base grows and the existing structure cannot manage that much content anymore.

Value of Having Structure

Why does it matter to us? Modern editors have file search implemented, and we can simply use that to find our files, no?

Let's assume we do not need folder structures. In that case, multiple thousands of files are in the root folder. The scroll bar has become very tiny. You are tasked with adding a new feature to this software. How would you do this? You can create your files here, only having them placed somewhere in the whole list of files due to the sorting. Finding out which files belonged to your feature? That will be quite an undertaking. Oh, you have written a small utility that can be helpful for other places in the code base? Nice. Time to tell our colleagues about it and where is that file again? Wait, are we sure a similar utility did not already exist somewhere?

Concept of a Folder

A folder is nothing but an abstraction to hide information you might not be interested in right now. The convention to put files into folders is to group them by some properties of the content of the file. With a proper folder structure, we can navigate the code base according to some characteristics of the file we know about. The file name is not always known, and searching by file name is also not the only search we need.

Horizontal and Vertical Cutting

For a software project specifically, cutting the folder structure can be done horizontally or vertically.

Horizontal cutting is a flat structure, where each file is put into a folder according to the kind of file. This approach is straightforward but does not scale well. Why does it not scale? Because having files from multiple domains, which are not directly related, in the same folder leads to clutter.

src/|- component/|- model/|- pages/|- util/

Vertical cutting is domain-based, meaning the files are grouped based on the feature/business need. The benefit here is clear separation of pieces of code, which do not have much to do with each other anyway. Imagine we are building our own mini photoshop clone, and we might get a folder structure like below:

src/|- color/|- crop/|- image/|- text/|- ...

These two techniques are not mutually exclusive. Since cutting horizontally can't scale by itself, we can cut the code base vertically first and then horizontally for the files inside to get the best out of both worlds. For commonly used code, creating a folder called `common` should suffice.

src/|- color/  |- component/  |- model/  |- page/  |- util/|- common/  |- component/  |- model/  |- page/  |- util/|- crop/  |- component/  |- model/  |- page/  |- util/|- image/  |- component/  |- model/  |- page/  |- util/|- text/  |- component/  |- model/  |- page/  |- util/

Scaling Up

For even bigger projects, one vertical layer might not be good enough. To address that, adding more vertical layers will simply solve the problem. There is, however, one potential caveat. Let's take a look at the following structure:

src/|- color/  |- rgb/  |- hsl/|- common/|- crop/|- image/  |- editor/  |- filter/    |- blackwhite/    |- sepia/    |- contrast/|- text/  |- bitmap/  |- parser/  |- renderer/  |- svg/

What problem do you see here?

There is too much nesting. Too much information has been hidden from us. Every folder only has a few folders, and those only have a few too. The investment of return for opening them is quite small. To combat that, we can reduce the nesting of some of them and use a prefix instead.

src/|- color-rgb/|- color-hsl/|- common/|- crop/|- image-editor/|- image-filter/  |- blackwhite/  |- sepia/  |- contrast/|- text-bitmap/|- text-parser/|- text-renderer/|- text-svg/

Defining Boundaries

Kangrui Ye — Wed, 21 Dec 2022 06:44:47 GMT

Managing complexity is a discipline on its own, and many books have been written about it. Regardless of which ones you mastered, compartmentalization of impact will always be on the top list. Let's say we have a very big code base, and we want to do some changes there. We did the change and now something somewhere else is broken. Without proper compartmentalization, any adaptions in code can break any other part of the codebase. Imagine the horror: We would need to test every possible combination our software is used to make sure nothing is broken. Not only that, the confidence to refactor is lost and nobody dares to touch existing code anymore. For this very reason, we need to break up code into smaller pieces and set boundaries between them.

With that being said, how do we do it? Initially, I wanted to reason that code should not pass behavior around, but communicate in data, like only with input and output data. This approach has served me well wherever I applied it and seemed like the truth for me. When I tried to reason why this is the best practice, some other programming patterns made me question my view: What about higher-order functions? It does inject behavior. I also use them a lot. The same question can be asked for plain JavaScript. Do I have confidence that the input parameters I get are of the correct type? No, not really. I don't know the type of data. There was a blind spot in my knowledge I did not realize before.

Starting from Zero

So let's start from the very beginning instead. We have pieces of code, some get called and some call others. What we want to achieve is when we do a change in one of them, we want the area of effect to be as small as possible. We can do that by establishing clear boundaries between the code. The first question we need to ask is, what are boundaries? Boundaries are ways to communicate characteristics. I accept data with the following characteristics. I have the following characteristics when executed and so on. It is the language of defining contracts. The second question is, what ways exist to define those contracts? From my experience, there are many of them, including but not limited to the type system, common conventions and documentation. We will discuss them a little later. The third question is what characteristics do these different solutions have? For what interests us, we can roughly categorize them with the following characteristics:

When the contract takes effect
How the contract takes effect
The effort to maintain the contract

The earlier the contract takes effect the better since a shorter feedback loop improves the productivity of the programmer. The more strict the contract takes effect, the fewer violations there are in the code base. Aborting the build is better than warnings, which is better than ignoring the violations. Of course, there is also the human effort in maintaining those contracts. It is also something that differs from solution to solution.

The Type System

With that nailed down, we can take a look at the solutions, starting with the type system. Imagine the following function in Typescript:

function sort(data: T[], compareFn: (a: T, b: T) => number): T[];

There are certain constraints communicated here. The first parameter must be an array of some type. The return type is the same as the parameter data. The second parameter must be a function. Can the caller pass in a boolean as the first parameter? Can the caller pass in a compareFn which returns a string instead of a number? No, not really, because Typescript will fail to transpile your code otherwise. All of this is possible because the type system can handle data models well

Type systems are nonetheless not all-powerful. There are constraints we cannot express with them, not even with the best one we have right now. Say, how can we assure compareFn always returns the same output given the input? It can also make a network call in there. Who said that the sort function actually sorts the array? Maybe it just prints Hello world into the console and returns an empty array regardless of the input. Nothing prevents those things from happening. Data modeling is a well-covered topic. The challenge for better type systems lye in modeling behavior.

Conventions

For requirements outside the reach of the type system, we need something else to fill that gap. This is where conventions and documentation come into play. We expect a function to do what its name tells us. We expect all the parameters of a function to be used. Likewise, we expect the documentation is true to the behavior it describes. These are social contracts. Contracts between humans and not code. We expect the other programmer to hold onto some convention we all hopefully agree upon. All that is because we have nothing better at the moment. Since there is no strict enforcement of them, it can happen where these contracts are not held. Functions might not do what their name tells us. Documentation can be out of sync and so on.

Looking at the bigger picture, we want to be able to define our contracts in a (contract) language with the fastest feedback loop, is very strict and has the least maintenance effort. Falling back to other solutions should only happen if that language is not expressive enough for our needs. That means we want to define whatever is possible with our type system first and rely on conventions and documentation as a fallback.

Summing Up

The core of any type system is data modeling. Therefore, data models are the primary means of communication between code. Functions receive data as input and return data as output. This puts us back to my initial thought about this topic I mentioned at the beginning. Plain JavaScript does not even have a proper type system. Thus, sending data around itself does not instill any confidence. Now, what about higher-order functions? What if we want to send behavior around? Type systems from languages like Haskell can express some properties of behavior, such as if they have any side effect or what kind of side effect it has. In those programming languages, sending behavior around won't be that much of a big deal, or might even be encouraged. If the type system cannot do that, then we, as programmers, need to fall back to other contract solutions like common conventions and documentation. It does not always work but is good enough to get our job done.

Why Kotlin over Java

Nils de Groot — Mon, 07 Nov 2022 06:55:29 GMT

For most projects related to server-side software development, Cloudflightprefers to use Kotlin over Java. This has a multitude of reasons, some of whichI'll describe in this blog post.

A plain old Java object according to Kotlin

When transferring data from one place to another with Java, it's common to usea plain old java object (POJO). Usually, this is an object with some propertiesand accessor methods.

Kotlin provides means to reduce the amount of code used in these objects. Inthis example, we'll take a POJO, and convert it into a Kotlin class.

Plain old Java objects

Let's start with a UserDto, for now, it will only contain a few fields, andaccessor methods for those fields. It could look as follows:

public class UserDto {    private Integer id;    private String name;    private String email;    public Integer getId() {        return id;    }    public void setId(Integer id) {        this.id = id;    }    public String getName() {        return name;    }    public void setName(String name) {        this.name = name;    }    public String getEmail() {        return email;    }    public void setEmail(String email) {        this.email = email;    }}

If you are using IntelliJ IDEA and would copy this code into a Kotlin file,we'll get a notification asking if the Java code should be converted to Kotlincode. After clicking yes, we get the following code.

class UserDto {    var id: Int? = null    var name: String? = null    var email: String? = null}

The piece of Kotlin code is 25 lines shorter compared to the Java code. All theaccessor methods can be omitted by the syntax of Kotlin.

Constructors

To be able to instantiate a fully initialized copy of the earlier defined Javaobject we would need to add a constructor. If we would make one for all ourarguments, it would look as follows:

public class UserDto {    public UserDto(        id: Integer,        name: String,        email: String    ) {        this.id = id;        this.name = name;        this.email = email;    }    ...}

Adding the constructor in Java would add more code. if we add more propertiesto the class, we would have to update the constructor as well.

Meanwhile, on the Kotlin side, we can update our class as follows to add aconstructor.

class UserDto(    var id: Int? = null,    var name: String? = null,    var email: String? = null)

By replacing the braces with parentheses, we added a constructor to the class,without adding a line of code. Both constructors are called the same way inboth Java and Kotlin code.

Equals, Hashcode, ToString

Commonly, Java classes use some boilerplate methods as well. This would addmore code to our UserDto class.

public class UserDto {    ...    @Override    public boolean equals(Object o) {        if (this == o) {            return true;        }        if (o == null || getClass() != o.getClass()) {            return false;        }        UserDto userDto = (UserDto) o;        return id.equals(userDto.id) && name.equals(userDto.name) && email.equals(userDto.email);    }    @Override    public int hashCode() {        return Objects.hash(id, name, email);    }    @Override    public String toString() {        return "UserDto{" +                "id=" + id +                ", name='" + name + '\'' +                ", email='" + email + '\'' +                '}';    }}

Similar to the constructor, when updating our class, we would need to updatethese methods as well.

Kotlin again provides a solution for this with only one line change. Kotlinintroduces data classes. Data classes define a handful of extra methods for us(equals, hashCode, toString and copy). If we would use a data class togive us these methods, our class will look as follows.

data class UserDto(    var id: Int? = null,    var name: String? = null,    var email: String? = null)

But all of this code can be generated by any modern IDE. Why would I useKotlin for this?

The resulting Java version has a whopping 63 lines of code, and 63 lines ofmaintenance. Meanwhile, the Kotlin version has only 5 lines. Leading to acodebase that's easier to navigate, has fewer surprises and is easier tomaintain.

I took a quick look at a relatively small project and found 35 similar classesto the freshly refactored class. I'd estimate that using Kotlin in this projectwould save 4000 lines of code and just POJO's.

Safer and more expressive code

The following section will refactor a function from a Java-looking function,into a Kotlin function, using features Kotlin provides us to write safe andmore expressive code.

First, we define a basic tree. A tree is either a branch with multiple Treeitems or a leaf with one value in it. Here is the Java version:

interface Tree<T> {}class Branch<T> implements Tree<T> {    public Branch(List> nodes) {        this.nodes = nodes;    }    List> nodes;    public List> getNodes() {        return nodes;    }}class Leaf<T> implements Tree<T> {    public Leaf(T value) {        this.value = value;    }    T value;    public T getValue() {        return value;    }}

In Kotlin it would look as follows:

interface Tree<T>class Branch<T>(val nodes: List>) : Treeclass Leaf<T>(val value: T) : Tree

Next, we write a function to refactor into Kotlin. For this example, I made afunction to sum all numbers in the tree. In Java you could make it like this:

class TreeUtil {    static Integer sumTree(Tree tree) {        Integer count;        if (tree instanceof Branch) {            var branch = (Branch) tree;            count = 0;            for (var node : branch.getNodes()) {                count += sumTree(node);            }        } else if (tree instanceof Leaf) {            var leaf = (Leaf) tree;            count = leaf.getValue();        } else {            throw new IllegalArgumentException("Unknown variant");        }        return count;    }}

Directly translating this to Kotlin gives me the following:

fun sumTree(tree: Tree<Int>): Int {    var count: Int    if (tree is Branch) {        val branch = tree as Branch        count = 0        for (node in branch.nodes) {            count += sumTree(node)        }    } else if (tree is Leaf) {        val leaf = tree as Leaf        count = leaf.value    } else {        throw IllegalArgumentException("Unknown variant")    }    return count}

Now let's apply some of Kotlin's features to make this function more readableand expressive.

Expressions

A common pattern in Java is to define a variable set to null and update itduring an if statement. In Kotlin, if statements are expressions, this allowsus to return values from the if statement instead of mutating it during the ifstatement. Allowing us to refactor the method as follows.

fun sumTree(tree: Tree<Int>): Int = if (tree is Branch) {    val branch = tree as Branch    var count = 0    for (node in branch.nodes) {        count += sumTree(node)    }    count} else if (tree is Leaf) {    val leaf = tree as Leaf    leaf.value} else {    throw IllegalArgumentException("Unknown variant")}

Instead of returning a value, we can return the if statement as a whole, makingfor less noise code.

Smart casting

Type casting an object in Java requires two steps. First, a check needs to bemade if the type matches the expected one, then the value needs to be cast intoanother type. Kotlin introduces smart casting here. After checking the type,the compiler already knows the actual type, so here we could omit the cast.When using smart casting, our code would look like this.

fun sumTree(tree: Tree<Int>): Int = if (tree is Branch) {    var count = 0    for (node in tree.nodes) {        count += sumTree(node)    }    count} else if (tree is Leaf) {    tree.value} else {    throw IllegalArgumentException("Unknown variant")}

After calling the type check, tree becomes either a Branch or a Leafimplicitly, allowing us to remove the unsafe typecast and directly access theproperties from the tree variable.

Pattern matching

Kotlin allows us to pattern match for a type, then we could check the typewithout adding cases to an if statement, refactoring our code to the following.

fun sumTree(tree: Tree<Int>): Int = when (tree) {    is Branch -> {        var count = 0        for (node in tree.nodes) {            count += sumTree(node)        }        count    }    is Leaf -> tree.value    else -> throw IllegalArgumentException("Unknown variant")}

The when statement in Kotlin is similar to the switch in Java, but itprovides some superpowers compared to switch, primarily that we can define acondition for each branch instead of only checking equality.

Sealed classes

Kotlin introduces sealed classes, all implementors of a sealed class must be inthe same module as the sealed class itself. We can make our tree a sealed classin the following manner:

sealed interface Tree<T>

Now the compiler knows all possible types a tree could be, allowing us tosafely remove the else clause from the when statement.

fun sumTree(tree: Tree<Int>): Int = when (tree) {    is Branch -> {        var count = 0        for (node in tree.nodes) {            count += sumTree(node)        }        count    }    is Leaf -> tree.value}

If we would remove the sealed modifier, the program will not compile becausethe pattern match is not exhaustive. Because only two versions of Tree arepossible, we only need to check for those types.

Functional abstraction

To clean up the last part of the Java code, we could rewrite the iteration ofall nodes in the branch into something more readable.

fun sumTree(tree: Tree<Int>): Int = when (tree) {    is Branch -> tree.nodes.sumOf { node -> sumTree(node) }    is Leaf -> tree.value}

The sumOf method is a standard library method turning iterable into an int.If we would write it ourselves it could look like this:

fun  Iterable.sumOf(operation: (T) -> Int): Int {    var sum = 0    for (item in this) {        sum += operation(item)    }    return sum}

A few things are happening here:

We are defining an extension method, even though Iterable is not in ourclass, we can write functions for it. This is not possible in Java at all.
The parameter operation is a function that expects type T and returns aInt.

Using the tools that Kotlin provides, the Java-like function is rewritten into 4lines of expressive Kotlin code.

Other features

Next, some other features of Kotlin.

Compile-time null safety

One of the primary selling points for Kotlin is compile-time null safety. For avariable to be null, it needs to be defined as nullable. Nullable variablesneed to be handled or checked. This is one of the primary defects in Javasystems.

In Kotlin, a parameter can be assigned as nullable as follows:

fun sumTwoNumbers(n1: Int?, n2: Int?): Int {    return n1 + n2 // Will not compile because n1 or n2 could be null}

This example will not compile. Both n1 and n2 could be null. To make itcompile, we need to add some checks or have different behavior for null values.Using smart-casting, we can safely add the values.

fun sumTwoNumbers(n1: Int?, n2: Int?): Int {    if (n1 == null || n2 == null) throw Exception("Invalid value passed")    return n1 + n2}

Interoperability with Java

Kotlin was designed to be fully cross-compatible with JVM Java. This allows usto define logic or structures in either Kotlin or Java code, and call them fromboth Kotlin and Java code. I'll take the earlier created UserDto as anexample.

data class UserDto(var id: Int, var name: String, var email: String)

With Gradle or Maven properly configured, we can call this code the Java sidein an idiomatically correct manner. It would look at follows:

void main() {    var user = new UserDto(1, "Cloudflight", "info@cloudflight.io");    System.out.println(user.toString()); // UserDto(id=1, name=Cloudflight, email=info@cloudflight.io)    System.out.println(user.getId()); // 1}

Preferred immutability

Java includes a final keyword, which disallows a variable from beingreassigned. A small example is shown below.

void main() {    var someMutableValue = "The first value";    someMutableValue = "The value is updated"    final var someImmutableValue = "The first value"    someImmutableValue = "Will not compile now"}

The final keyword does add noise to the code. Kotlin has a much more subtileapproach to this.

fun main() {    var someMutableValue = "The first value";    someMutableValue = "The value is updated"    val someImmutableValue = "The first value"    someImmutableValue = "Will not compile now"}

Kotlin has two keywords for defining variables, val and var. val is usedfor immutable variables, while var is for mutable variables.

This pattern can also be found in the Kotlin standard library.

Due to historical reasons, most collections in Java have a method named add,which adds an item to the collection. This includes immutable lists. Theseusually throw an UnsupportedOperationException at runtime when called.

void main() {    var list = new ArrayList();    list.add(1);    list.add(2);    var immutableList = List.of();    immutableList.add(1); // Throws an UnsupportedOperationException at runtime}

In Kotlin, collections are immutable by default. To create a mutable listmutableListOf should be called, and to create an immutable list, listOf iscalled. An immutable list has no to add items to a collection, this couldprevent runtime defects.

fun main() {    var list = mutableListOf()    list.add(1)    list.add(2)    var immutableList = listOf()    immutableList.add(1) // Will not compile here}

Internal visibility

Kotlin introduces a new visibility keyword, internal. This allows code to bevisible in the following cases:

The internal code can be called within the same Maven project.
The internal code can be called within the same Gradle source set.
The internal code can be called by code compiled with the same kotlincinvocation.

This helps with modularizing codebases when working with separate modules.

Clarifications of some risks

There are a few points commonly made on why switching to Kotlin is a bad idea.This section explains why these points are not as strong as commonly thoughtof.

Is it possible to find Kotlin developers?

According to the StackOverflow developer survey 2022, 9.2% of developers workwith Kotlin compared to 33.27% who work with Java. The survey also finds thatat least 10% of these developers want to work with Kotlin instead of Java.

Turning a Java developer into a Kotlin developer is easy. Writing Kotlincompared to writing Java is very similar. Both are usually run in the sameenvironment as well. Both for development and execution. Together with thebetter code safety of Kotlin, a Java developer can quickly write good Kotlincode.

Is Kotlin future-proof?

At I/O 2019,Google announced that Kotlin is going to be the preferred programming languagefor all Android apps. This shows that Google is certain Kotlin will stay andKotlin stays for a long time to come.

According to Google, 80% of the top 1000 android apps use Kotlin as aprogramming language.

Conclusion

So, a few of the reasons why Cloudflight (and other companies) prefer to useKotlin over Java for server-side app development.

Sources

Easy Azure Architecture for B2B Web Applications

Vincenzo Sessa — Fri, 09 Sep 2022 07:31:43 GMT

Introduction to the Easy Architecture

In this post, we will explore a quick and rather easy architecture for deploying your web applications with a low to medium complexity to Azure.

The used components are:

Azure App Services
- With App Service integrated Authentication EasyAuth
Azure API Management APIM
Azure Container Registry ACR
Azure Key Vault

Additionally, you typically integrate a database of your choice, an SSO provider for authentication (in our case Azure Active Directory (AAD) in conjunction with Azure Active Directory B2C (AADB2C), and optionally Azure Storage for file or blob storage.

Discussion on why these components were selected over other similar ones is left for the last part of this post.

Easy Architecture Walkthrough

Without further ado, here is the proposed architecture and I will walk you through it step by step:

1.) Unauthenticated User requests the SPA

Now here's the thing: Notice how the blog post title says B2B Web Applications. For most publicly available websites, a typical user flow for authentication would look like this:

User loads the SPA
SPA checks if user is logged in and if not shows the login screen
SPA handles login of the user (e.g. using Authorization Code Flow with PKCE)

For B2B Web Applications this is often an undesireable scenario, since the bundle itself can contain sensitive information (such as API calls) revealing information about the intent of your application, contained business entites, but also potentially full business processes. Our chosen approach will not serve anything to a user unless they authenticate first. Azure App Service provides a middleware called Easy Auth which can achieve this with little configuration. The user has a session on the AppService and unless authentication has been successful before, it will redirect to SSO. It is an Azure (and Azure App Service) specific lightweight implementation of what e.g. Oauth2-Proxy does.

2.) SSO

We will not go into too much detail here. Whatever SSO / IdP you choose, configuration will be slightly different. You will always want an Object representing the application with a clientId and secret, which you configure AppService with. Easy Auth uses a Hybrid Flow, which will send you an id_token with information about the user in the redirect to the appservice (so quite similar to an implicit flow) but will fetch access & refresh tokens through a backchannel following an Authorization Code Flow. So make sure implicit flows are allowed for the Application / Client on your SSO side for things to work properly.

3.) Authenticated User requests the SPA

After successful authentication, Azure App Service will allow the request to proceed. Now how you serve the SPA is up to you, an obvious choice would be a plain nginx server returning the bundle.

Retrieving credentials for use on backend calls by the SPA

Easy Auth provides two endpoints as GET on the App Service serving the frontend:

/.auth/me will return a json document containing id_token, access_token and refresh_token (which you don't need) as well as other metadata by App Service
/.auth/refresh which will exchange the refresh token stored by the appservice against the SSO for a fresh set of access and refresh tokens

With these two endpoints your SPA has all the tools it needs to acquire valid tokens for use against backend / APIM calls, or get new access and refresh tokens as needed.

Important information regarding sessions & refresh tokens

App Service sessions and refresh tokens are two different decoupled things and sadly App Service does not care about the lifetime of your refresh tokens. This means you can end up in an inconsistent state if:

Your App Service session is still valid
Your refresh token is expired

Calling the /.auth/refresh endpoint will result in a 403 from that point on and reloading the page will simply cause the bundle to reload and try to refresh the tokens, which fails. The only way out is to wait for your session to expire, or to clear your cookies, which both of course are no good options. The default session duration for AppService is 24 hours at the time of writing, which is (in my opinion) too long for refresh tokens to be valid.

In order to not be caught in this loop, you will need to adjust the session duration of App Service to be smaller or equal to your refresh token duration. You can achieve this by setting a property on the subscriptions/{subscription-name}/resourceGroups/{resource-group-name}/providers/Microsoft.Web/sites/{your-appservice-name}/config/authsettingsV2 resource either through an ARM template or through Azure Resource Explorer. Look for the "cookieExpiration.timeToExpiration" value and change it to an appropriate value (e.g. 1 hour and 59 minutes for the App Service session if your refresh tokens have an expiration of 2 hours, just to avoid corner-cases of refresh failures).

4.) SPA calls APIM

This is again straightforward: By including a valid access token the SPA may access APIs published on APIM. We suggest that your API will validate JWT claims which allow access to the app in general, while the backend behind APIM will validate access and fine-grained permissions. APIM can validate JWTs by using the validate-jwt policy. The presence of the policy (and to not accidentally configure APIM to be accessible by unauthenticated users has to be verified either manually or through automation, but stay tuned for a future blog post concerning best practices for (shared, because quite expensive) APIM instances.

Considerations on networking

API Management services can serve multiple purposes, but for us the key drivers were discoverability of APIs and a central point for applying governance. To achieve this goal, you typically want to have one (or very few) shared APIM instances (additionally, APIM instances are also very expensive), which are typically placed in shared subscriptions / networks and in the responsibility of multiple application development teams (Which often form a shared infrastructure team). In order to reduce dependencies between application development teams and APIM teams, our architecture does not require tying App Services and APIM instance together by SDN terms (e.g. Private Endpoints). Instead, every App Service instance and the APIM instance are publicly avaibable, but also have a static IP for outbound traffic, which is used for IP whitelisting as required. With that in mind, let's go over what happens "behind" APIM.

A.) Calls from APIM to Backends

App Services hosting backends have a public IP address (you can also use the microsoft assigned domain names just fine - as it's background traffic only - to avoid handling DNS and SSL tasks). To force traffic through APIM, we place an Access Restriction on the App Service to allow the APIM IP only. We achieve the same result (while maybe being slightly more exposed on a DoS side, but that's just me guessing) as having everything routed privately without any networking hassle.

B.) Calls from Backends to On-Prem through APIM

You can also securely expose On-Prem systems through APIM by whitelisting access to APIs which route to On-Prem for App Service IPs where Backends are hosted only. By using App Service vnet integration in conjunction with a fixed-IP NAT Gateway you can force all traffic originating from your App Services through a fixed IP, which you can whitelist. To successfully access On-Premise systems you therefore need to have:

A specific IP assigned to one or multiple of your backends verified by the ip-filter policy
A valid JWT with permissions to access the API verified by the validate-jwt policyThis is arguably again the same result as having everything tied together in one SDN with less hassle, as you dont need two parties (one with control of the shared APIM subscription and one with control of the application subscription). A breach is only possible by penetrating your App Service instance (which can be accessed only via specific APIs on APIM as explained in A.)).

C.) Calls from App Services to ACR and Key Vault

In order to pull container images and load secrets securely, your App Service instances need access to these services. While network integration exists, it complicates things, as you typically also have other services outside of an Azure ecosystem (e.g. agents running your deployment pipelines).

For pulling an image by App Service from ACR we use a (User Assigned) managed identity, for pushing images we use a service account provisioned in Azure AD with appropriate access (and e.g. certificate authentication). We consider this approach secure enough to not hide the ACR behind networking boundaries.
For reading Secrets from Key Vault the App Service can make use of the same managed identity. Again, we consider this secure enough to not require networking integration.

D.) Calls from Backend to Azure Storage and Database

For Azure Storage and the Database (PostgreSQL in our case) we have private link integration to our backend as this is the only service needing both on a regular basis and deny all traffic on the public endpoints. For manual debugging we add access exceptions as needed (e.g. allow your IP on the Database server for the time being if you have to connect, remove the exemption otherwise).

Why not use Service X or architecture Z?

Azure offers a ton of services. Way too many to be an expert on every one of them. We evaluated most options to run a Web Application the way we do it (which is using a Java / SpringBoot + Angular stack) and decided to use App Services for most simple to moderately complex scenarios, and Azure Kubernetes Service for everything that is more sophisticated.

App Services offer a quick way to get your applications running (containerized or not), have DNS + SSL + Authentication out of the box, good monitoring tools (at a cost, but easy to use) and do require less extensive knowledge to operate successfully and safely (unlike the aforementioned AKS). In today's market where knowledgeable IT employees are highly sought after, this is a clear advantage over more complex architectures requiring not only knowledge of how to deploy your applications to e.g. Kubernetes, but also require knowledge on how to safely operate your infrastructure (Yes, AKS helps, but is no magic 0-maintenance solution - there is a lot which can go wrong during a typical cluster lifecycle). At the end of the day, most IT affine people will have a basic understanding of IP whitelisting and setting up authentication, which should be enough to design and operate our proposed Architecture safely - this puts the "Easy" in our title.

If you want to discuss this blog post or discover how we can help you with developing and deploying your software to Microsoft Azure use the comment section, feel free to drop me a mail anytime to vincenzo.sessa@cloudflight.io or add me on LinkedIn, thanks for reading.

Proper Error handling, what does it mean?

Mehran Khaksar Haghani — Wed, 24 Aug 2022 09:39:23 GMT

Why do we need error handling?

Error! It is a well-known creature by users of every application developed in the world and by every I mean it :DIt's an inseparable part of any application - so let's face it!

We cannot avoid errors since not everything is under the control of developers and not everything is perfect in the world. To have a robust application we need to have a good error handling solution in place.

What is a proper error handling solution?

Every solution is developed to solve a problem. So to have a proper solution we need to know the problem properly - trivial, right?Now let's have a closer look at what we want to achieve with our error handling solution.

One important goal of every application is to keep its users happy even if there is an error.

if you agree continue;

Different types of users interact with the application and each type has their own expectations when an error occurs. Keeping them all happy means that we need to meet all those expectations (don't panic!). Let's break it down.

End users
Here is what I'd expect as an end user:
- If an error occurs I expect to get a clear, easy-to-understand yet descriptive error message without any unneeded technical details.
- If I can do something to resolve the problem I'd appreciate any hint of course.
- Otherwise, I want to be able to easily report the error or get support from the help desk team.
- I don't like to be alone with the error!
Help desk team
As a member of the Help Desk team, I also have expectations!
- I expect to have categories for errors so that I can easily find the resolution.
- I'd appreciate it if we could share the same language with end users! Believe me, it's hard to understand someone that's talking in another language.
- I need to know if I can resolve the problem or if I should forward it to the development/maintenance team.
Development or maintenance team
Our expectations? That's easy! We only have one expectation!
- We would like to have everything! Of course related to the error. The more we know, the better :D

Solution

Now that we know what the expectations are, we can get into the fun part.

The concepts we've discussed so far were generic and language agnostic. Now we need to get concrete so that I can explain the solution that we're using in some of our projects in Cloudflight.

I'll describe the solution in the context of a web application developed using Kotlin and Spring Boot but feel free to tweak it based on your context and needs.

Custom application level exception

We need to differentiate errors related to the business logic from other errors like network or database level errors so that we could only provide end users with relevant information.

We create a base exception class named ApplicationException. Whenever something is going wrong we throw an instance of this exception or any of its subclasses. You can define as many subclasses as you want!

open class ApplicationException(    open val code: String,    val httpStatus: HttpStatus = HttpStatus.INTERNAL_SERVER_ERROR,    override val cause: Throwable?,    override val message: String,) : RuntimeException()open class ApplicationNotFoundException(    override val code: String,    override val cause: Throwable? = null,    override val message: String,) : ApplicationException(code, HttpStatus.NOT_FOUND, cause, message)open class ApplicationBadRequestException(    override val code: String,    override val cause: Throwable? = null,    override val message: String = "",) : ApplicationException(code, HttpStatus.BAD_REQUEST, cause, message) ...

code: This will be used as the error category.

End users could use it to get help from the help desk team. It's the common language between the end users and the help desk team!
Help desk team could use it to look up the possible resolution for the error from a prefilled table mapping error codes to the possible resolutions.

httpStatus: This will be used to set the HTTP status based on the type of error.

cause: This is the nested exception if any exists.

message: This is ... you guessed it! "clear, easy to understand yet descriptive error message without any unneeded technical details"

Unified error model

Now we can differentiate our application level exceptions from other exceptions. We also need to define a class describing an error. All exceptions will be translated to this model before leaving the application. This way we would have a unified error response model and our application clients can treat all errors in the same way. Let's call it APIErrorDTO

data class APIErrorDTO(    val id: String,    val code: String,    val message: String,    val details: List,)data class ErrorDetailDTO(    val code: String,    val message: String,)

id: This will uniquely identify an error.

End users may be provided with a button report error which will create a ticket out of the error with all the details attached to it.
Development team can also use it to find the logs and all the other details related to the error.

code and message: Those comes from the Application level exception.

details: If there are nested Application errors, code and message of them will be added to details so that users get more information on them.

Global exception handling

In Spring boot you can handle almost all the exceptions globally in one place using @ControllerAdvice annotation. We will catch the exceptions here and convert them to an APIErrorDTO instance.

@ControllerAdviceclass GlobalExceptionHandler : ResponseEntityExceptionHandler() {    ...    @ExceptionHandler(ApplicationException::class)    fun handleApplicationException(exception: ApplicationException, request: WebRequest): ResponseEntity {        val apiErrorDTO = createAPIErrorDTO(exception)        val httpStatus = getHttpStatus(exception)        if (HttpStatus.INTERNAL_SERVER_ERROR == httpStatus) {            request.setAttribute(WebUtils.ERROR_EXCEPTION_ATTRIBUTE, exception, WebRequest.SCOPE_REQUEST)        }        return ResponseEntity(apiErrorDTO, HttpHeaders(), httpStatus)    }    fun createAPIErrorDTO(        exception: Throwable?, defaultErrorCode: String = DEFAULT_ERROR_CODE,        defaultMessage: String = ""    ): APIErrorDTO {        val errorId = UUID.randomUUID().toString()        val errorCode = if (exception is ApplicationException) exception.code else defaultErrorCode        var message = if (exception is ApplicationException) exception.message else defaultMessage        val errorDetail = mutableListOf()        var cause = exception?.cause        while (cause != null) {            if (cause is ApplicationException) {                errorDetail.add(ErrorDetailDTO(cause.code, cause.message))            }            cause = cause.cause        }        return APIErrorDTO(errorId, errorCode, errorDetail, message)    }    private fun getHttpStatus(exception: ApplicationException): HttpStatus {        var httpStatus = exception.httpStatus        var cause = exception.cause        while (cause != null) {            if (cause is ApplicationException) {                httpStatus = cause.httpStatus            } else if (cause is AccessDeniedException) {                httpStatus = HttpStatus.FORBIDDEN            }            cause = cause.cause        }        return httpStatus    }}

If you expect to receive any other type of exception here, you can add a new method for that. Then you can call handleApplicationException from there with appropriate arguments. This way you can create an APIErrorDTO out of any exception. For example, if you expect to receive NestedRuntimeException, you can add the following method to your GlobalExceptionHandler class.

   @ExceptionHandler(NestedRuntimeException::class)    fun nestedRuntimeExceptionTransformer(        exception: NestedRuntimeException,        request: WebRequest    ): ResponseEntity {        return handleApplicationException(            ApplicationUnprocessableException(DEFAULT_ERROR_CODE, "a relevant message!", exception),            request        )    }

By the way, we cannot catch all exceptions in the GlobalExceptionHandler. We need to define a custom ErrorAttributes to translate all exceptions - even those that are out of our control -into the APIErrorDTO

Custom error attributes

Using the following class we could translate other exceptions to the APIErrorDTO.

class CustomErrorAttributes : DefaultErrorAttributes() {    override fun getErrorAttributes(webRequest: WebRequest?, options: ErrorAttributeOptions?): MutableMap {        val defaultErrorCode =            getStatusCode(webRequest)?.toString() ?: DEFAULT_ERROR_CODE        return createAPIErrorDTO(            this.getError(webRequest),            defaultErrorCode,            defaultMessage = getDefaultErrorMessage(webRequest)        ).toMap()    }    private fun getDefaultErrorMessage(webRequest: WebRequest?): String {        val defaultErrorMessage: String        val errorMessage = webRequest?.getAttribute(WebUtils.ERROR_MESSAGE_ATTRIBUTE, RequestAttributes.SCOPE_REQUEST)?.toString()        defaultErrorMessage = if (!errorMessage.isNullOrBlank()) {            errorMessage        } else {            try {                val statusCode = getStatusCode(webRequest)                if (statusCode == null) UNKNOWN_ERROR_MESSAGE else HttpStatus.valueOf(statusCode).reasonPhrase            } catch (e: Exception) {                UNKNOWN_ERROR_MESSAGE            }        }        return defaultErrorMessage    }    private fun getStatusCode(webRequest: WebRequest?): Int? {        val statusCodeAttribute = webRequest?.getAttribute(WebUtils.ERROR_STATUS_CODE_ATTRIBUTE, RequestAttributes.SCOPE_REQUEST)        return if (statusCodeAttribute != null) statusCodeAttribute as Int else null    }}

Aspect oriented programming (AOP)

We all value clean and readable code, right? Instead of polluting our code with try/catch blocks, we will use Spring AOP to make our code clean and concise. Let's define an aspect to decorate our methods with and get rid of some boilerplate code here and there.

@Target(AnnotationTarget.FUNCTION)@Retention(AnnotationRetention.RUNTIME)annotation class ExceptionWrapper(val exception: KClass<out ApplicationException>)

@Aspect@Component@Order(Ordered.HIGHEST_PRECEDENCE)class ExceptionWrapperAspect {    @Around("@annotation(io.cloudflight.thiscouldbeyourproject.server.common.exception.ExceptionWrapper)")    fun wrapException(joinPoint: ProceedingJoinPoint): Any? {        try {            return joinPoint.proceed()        } catch (e: Throwable) {            val methodSignature: MethodSignature = joinPoint.signature as MethodSignature            val method: Method = methodSignature.method            val exceptionWrapper: ExceptionWrapper =                method.annotations.find { it is ExceptionWrapper } as ExceptionWrapper            val constructor =                exceptionWrapper.exception.constructors.firstOrNull { it.parameters.size == 1 && it.parameters.first().type.classifier == Throwable::class }            throw            constructor?.call(e) ?: ApplicationException(                code = DEFAULT_ERROR_CODE,                i18nMessage = DEFAULT_ERROR_MESSAGE,                cause = e            )        }    }}

Having these, We only need to annotate the entry point of the use case with the @ExceptionWrapper annotation. This way, you can make sure, That end users would get an appropriate message if anything goes wrong. Notice that you don't need to annotate all of your methods with@ExceptionWrapper It's enough to have it only at the entry point of the use case. You still can catch an error in a lower layer, If it's recoverable or if you want to share more information with the end user that you only have available where the error occurs. For the latter case, You can catch the error and throw a sub-class of ApplicationException!

    @Transactional    @ExceptionWrapper(CreateUserException::class)    override fun createUser(user: UserChange): User {        if (EmailFormatIsInvalid(user.email))            throw EmailFormatIsInvalid()        ...        return createdUser    }

class CreateUserException(cause: Throwable) : ApplicationException(    code = "ERR-CU-1",    message = "Failed to create the user",    cause = cause)class EmailFormatIsInvalid : ApplicationUnprocessableException(    code = "ERR-CU-2",    message = "Format of provided email is invalid. The format should look like local-part@domain e.g. example@cloudflight.io",)

Conclusion

Now let's see if we satisfied all our user's expectations with this solution:

End users
- Using the message we can describe the error and provide hints to resolve that if it'd be possible.
- Using the id we can let our users report the error with minimal effort.
- Using the code users can get support from the help desk team.
Help desk team
- Using the code:
  - We can categorize errors and prepare a resolution table mapping the error codes to the possible resolution.
  - We make the communication between users and the help desk team easier.
  - Help desk team can decide if they should forward the request to the development team or not. You might ask how? If code is set to the default error code they should forwarded it, right?
Development or maintenance team
- Using the id we can uniquely identify each error and find the logs or any other information related to it and use them to debug the error.

All in all, I guess now our users should be happy even when there is an error. If you don't think so, or you see any improvement possibility please let me know your opinions in the comments. Thanks!

Food for Thought: Why is Rxjs unreadable?

Kangrui Ye — Tue, 16 Aug 2022 10:46:23 GMT

RxJS has become quite popular in the frontend field, and a lot of people use it in combination with React and VueJS. Nonetheless, fully diving into the reactive approach of RxJS can yield a lot of unreadable code. Let's explore why that is the case.

RxJS has the concept of an observable, which gives us values over time. You can think of it as having a variable that tells us its value has changed, or simply as a stream. All kinds of operations can be applied on top of it with the .pipe method.

const value$ = new BehaviorSubject(9000)const transformedValue$ = value$.pipe(    map(value => {        // ...    }),    filter(value => {        // ...    }))

Everything looks fine so far, but what if we have complex calculations or need to combine multiple observables?

const transformedValue1$ = combineLatest([    value1$,    value2$]).pipe(    switchMap(([value1, value2]) => {        const value3$ = // ...        return withLatestFrom(value3$).pipe(            map(value3 => {                // ...            })        )    }),    // ...)const transformedValue2$ = combineLatest([    transformedValue1$,    value4$]).pipe(    map(([transformedValue1, value4]) => {        // ...    }))

As you can see, the code becomes quite confusing pretty fast. And we haven't started adding more combinatorics to the pipe yet. The culprit here is callback hell. Promises had the same issue with chained .then and returning promises inside. That led to a lot of nesting and hard to follow code, since the logic that should belong together are found at places with different indents or even not in the same .then block at all. In the aforementioned example, you can see there is a variable called transformedValue1$, which exists simply to spit up the complexity of the whole transformation.

So how did promises solve the callback hell problem? JavaScript introduced new syntax to the language called async/await. Instead of chaining .then we can just use await and the value we want will be unwrapped for us. The new syntax turned what is previously chained and nested code into procedural code.

promise    .then(value => {        return fetch(...).then(nestedValue => {            // ...        })    })    .then(value => {        // ...    })// turns intoconst value1 = await promiseconst value2 = await fetch(...)const value3 = await // ...const value4 = await // ...

As you can see, there is no nesting and the logic is not scattered all over the place anymore. Now the question becomes can we do something similar to RxJS? For that, we can take a look at Svelte.

$: reactiveValue = doSomeStuff(foo, bar)

With this, reactiveValue will be recalculated whenever foo or bar changes. We can of course go a step further:

$: value1 = doSomeStuff(foo, bar)$: value2 = value1 + 100$: value3 = value2 > 9000 ? 'over 9000' : 'not enough power leveling'$: value4 = // ...

Just like what async/await did to promises, the unconventional syntax from Svelte solved the callback hell problem for RxJS. Well, strictly speaking, anything with a .subscribe method works with Svelte. Thus, RxJS just happened to benefit from it. We can say $: is syntax sugar for .subscribe, just like await is syntax sugar for .then (await works on anything that has a .then method, not only promises).

The approach from Svelte only works in Svelte, though, since we need the Svelte compiler to transform the code into normal JavaScript with the same behavior. There is no custom syntax support for RxJS in normal JavaScript right now. Nonetheless, it is an interesting solution to the challenges RxJS faces.

With that being said, what are your thoughts about this topic? What do you do to keep the RxJS chains readable? Write it in the comment below.

An introduction to ArchUnit

Lovász Botond — Mon, 08 Aug 2022 07:00:01 GMT

In our day-to-day project work we write Unit Tests, Integration Test, etc. to test our business functionality. But what if we could apply the same principles to test our architecture?

That is where ArchUnit comes into the picture.ArchUnit is a Java testing library which lets us make assertions on our code's structure, relationships and properties.

By using this library we can maintain code quality and standards by creating conventions, either on a per-project or across-project basis, that developers can follow, thus raising overall efficiency. A lax following of architectural rules risks introducing bugs and headaches down the line; this is what ArchUnit is great for.

A complete working example can be found here.

Project installation

For a typical project we can get the archunit base dependency from MavenCentral.

If we are also using JUnit in our project then it is recommended that we get the more specific modules archunit-junit4 or archunit-junit5 instead. They help with the runtime of our unit tests by caching the imported classes but also have additional features. In the examples for this article I am going to use the archunit-junit5 dependency.

For Maven:

<dependency>    <groupId>com.tngtech.archunitgroupId>    <artifactId>archunit-junit5artifactId>    <version>0.23.1version>    <scope>testscope>dependency>

For Gradle:

dependencies {    testImplementation 'com.tngtech.archunit:archunit-junit5:0.23.1'}

Simple Example

Using the archunit-junit5 module

If we use the specific junit5 module for our project then by using the @AnalyzeClasses annotation, we can skip some boilerplate code. Additionally, by using the @ArchTest annotation it is enough if we declare an ArchRule for that test. The import and the assertions against the classes are done automatically.

@AnalyzeClasses(packages = ["io.cloudflight.archunit"])class MyFirstArchTest {    @ArchTest    val services_should_only_be_accessed_by_controllers: ArchRule = classes()        .that().resideInAPackage("..service..")        .should().onlyBeAccessed().byAnyPackage("..controller..", "..service..");}

Under the hood ArchUnit also automatically caches the imported classes between tests, thus also improving the efficiency of our tests.

When using the @AnalyzeClasses annotation we can have a few ways of controlling the import classes. By default, the whole classpath is loaded and checked against our tests. We can define specific packages that we want to test like in the above example. If we need to further filter the imported classes then we could use the importOptions attribute for example by defining that we do not want to check against the test classes: @AnalyzeClasses(importOptions = [ImportOption.DoNotIncludeTests::class])

Writing ArchUnit tests

Here is a list of the types of tests we can potentially write using this libary:

Package dependency checks
Class dependency checks
Class and package containment checks
Inheritance checks
Annotation checks
Layer checks
Cycle checks
Custom checks

Examples

Using the Library API and including predefined rules

The ArchUnit Library API provides a set of common rules that many projects would likely need.

import com.tngtech.archunit.library.GeneralCodingRules.*...@ArchTestval classes_should_not_access_standard_streams_from_library = NO_CLASSES_SHOULD_ACCESS_STANDARD_STREAMS@ArchTestval classes_should_not_throw_generic_exceptions = NO_CLASSES_SHOULD_THROW_GENERIC_EXCEPTIONS@ArchTestval classes_should_not_use_java_util_logging = NO_CLASSES_SHOULD_USE_JAVA_UTIL_LOGGING

These are just a few examples, there are more GeneralCodingRules.

Defining a layered architecture

By defining a set of layers and the relationship between them, the library can check the interaction between those layers. For example one of the most common layered architecture conventions is that Repositories should only be accessed by Services.

@ArchTestval layer_dependencies_are_respected: ArchRule = layeredArchitecture()    .withOptionalLayers(true)    .layer(Layers.API)    .definedBy(Packages.API)    .layer(Layers.CONTROLLER)    .definedBy(Packages.CONTROLLER)    .layer(Layers.SERVICE)    .definedBy(Packages.SERVICE)    .layer(Layers.ENTITY)    .definedBy(Packages.ENTITY)    .layer(Layers.REPOSITORY)    .definedBy(Packages.REPOSITORY)    .whereLayer(Layers.API)    .mayOnlyBeAccessedByLayers(Layers.SERVICE)    .whereLayer(Layers.CONTROLLER)    .mayNotBeAccessedByAnyLayer()    .whereLayer(Layers.SERVICE)    .mayOnlyBeAccessedByLayers(Layers.CONTROLLER)    .whereLayer(Layers.ENTITY)    .mayOnlyBeAccessedByLayers(Layers.SERVICE, Layers.REPOSITORY)    .whereLayer(Layers.REPOSITORY)    .mayOnlyBeAccessedByLayers(Layers.SERVICE)    .`as`("Package structuring does not match the expected one.")

Where Layers.API is the definition of a layer in our code and Packages.API is the definition for a package identifier (see the code below for details).

object Packages {    const val API = "..api.."    const val CONTROLLER = "..controller.."    const val ENTITY = "..entity.."    const val SERVICE = "..service.."    const val REPOSITORY = "..repository.."}object Layers {    const val API = "Api"    const val CONTROLLER = "Controller"    const val SERVICE = "Service"    const val REPOSITORY = "Repository"    const val ENTITY = "Entity"}

In case we are using a Onion/Hexagonal Architecture, the library provides support for that also here.

Defining packaging containment conventions

By using the library's fluent DSL we can easily write custom test like the example below, which checks if classes that are annotated with the @Service annotation have indeed a simple name that ends with the string Service and are in a package called service.

@ArchTestval services_must_reside_in_service_package = classes()    .that().areAnnotatedWith("org.springframework.stereotype.Service")    .should().resideInAPackage(Packages.SERVICE)    .andShould().haveSimpleNameEndingWith(Layers.SERVICE)    .`as`("Services should reside in a package ${Packages.SERVICE}")@ArchTestval controllers_must_reside_in_controller_package = classes()    .that().areAnnotatedWith("org.springframework.web.bind.annotation.RestController")    .should().resideInAnyPackage(Packages.CONTROLLER)    .andShould().haveSimpleNameEndingWith(Layers.CONTROLLER)    .`as`("Controllers should reside in a package ${Packages.CONTROLLER}")

Custom convention checks

If we agree on a convention with our team, we can also write custom checks for those cases. A simple example would be checking that interfaces should not have @RequestMapping annotation on them:

@ArchTestval no_interface_should_do_request_mapping: ArchRule = noClasses()    .that().areInterfaces()    .should().beAnnotatedWith(RequestMapping::class.java)    .`as`("Implementations only should define RequestMappings.")

Conclusion

A key takeaway from ArchUnit for me is that it is easier to explicitly enforce an architectural decision by having written Unit Tests for it, than implicitly trying to remember and follow these conventions. This way we are not relying on reading wiki pages to establish conventions. Bugs that would not be spotted at code-review which would lead to technical debt, can be prevented. Thus improving the resiliency of our architecture.

Another big advantage of using this library is that code refactoring through the IDE will also be applied for our tests, meaning that we are not breaking our tests by refactoring our code.

By writing ArchUnit tests we can follow decisions, structures, conventions that were agreed upon and written down in the form of Unit Tests, thus eliminating code changes that were not intended that are hard to change down the road.

At Cloudflight we use ArchUnit for our custom Archunit-CleanCode-Verifier library, which verifies common best-practices for our projects.

Footnotes

All example code from this article can be found in this repository.
For a more detailed guide please visit the ArchUnit User Guide.
For a set of examples written for every sub-module of the project (archunit, junit4 and junit5) please visit the ArchUnit Examples Repository.
For a set of custom examples for Java Persistence Framework (JPA), Spring Framework and Spring Boot please visit the Cloudflight Archunit-CleanCode-Verifier repository.

On Testing

Kangrui Ye — Wed, 03 Aug 2022 13:07:23 GMT

I have seen strong obsession with terminologies when it comes to testing, like integration tests versus unit tests, should there be mocks in unit tests and so on. Working with people who care to improve is a bliss. Nonetheless, being too focused on those concepts might not be a good idea. The essence of testing, the thing that actually matters, stays the same no matter the concepts built on top of it.

The Purpose Of Automated Tests

Before starting with this topic, we must first be clear what tests are for, the why to test. In order to answer that, let's imagine a code base without tests. The growth of needs, change of requirements and alteration of environment in production forces the software to be modified to fulfill the previously uncovered areas. How can the stability of the program be ensured, when the code gets modified? What proves existing features are still working the way they are supposed to, when the programmer touches some long forgotten algorithms used in the code base? That is what tests are for: To ensure the quality and stability of the software.

What is the quality of software? Is quality and stability the only characteristics tests are good for? For the brevity of this article, I will define the quality of the software as fulfilling all the requirements. Other properties of tests, TDD, performance testing and others won't be mentioned here either since they are not relevant.

This goal can be achieved in many ways. One of which is to have a test protocol and manually go over it every time something is modified. However, this approach does not scale. Why? The massive investment of manpower is the limiting factor. When it comes to repetitive tasks, humans are error-prone, slow, expensive and have inconsistent output depending on the weather of the day. So what can be done when humans are real pains to deal with? Yes, doing what our ancestors have been doing for a long time: Automation.

How does a test look like?

In order to draft the properties a test should have, we need to look into how the process of working with automated tests look like. There can be a lot of them in any given project. They get executed and some might fail. In case of failure, the programmer wants to know which ones need to be looked into in conjunction with getting to the root cause as fast as possible. The important questions are where the failed tests are, what (or which requirement) failed, and why it failed. With that nailed down, we can think of how to make it easier for the programmer to get this information.

The testing framework already provides the answer for the Where-question by logging or highlighting the failed tests in some way. So as an author of tests, the only things to care for is to provide the What's and Why's. The What-question is all about the name of the test. It should be descriptive enough that the programmer knows the context of the failure. Looking into the implementation of the test is of course an option, but needs more investing to understand and process through than a good test name. The next step is to provide the answer to why the test failed. The reason is highly situation dependent and because of that, hard to answer at the time the test is written. What can be done is to structure the test in a way so investigating the root cause becomes efficient. The below-mentioned methods are some of the best practices to achieve that efficiency.

Given-When-Then

Naming is hard. If there are 500 tests in the project, how can we ensure the test names are descriptive? Having some kind of pattern will help a lot. That is when the Given-When-Then paradigm comes into play. First and foremost is the test name: Given environment When executing action, Then the result is X. With the help of this pattern, the most important pieces of information are contained within the test name. In addition to that, Given-When-Then also tells us how a test itself is, or should be structured. First is the setup part (Given), afterwards the execution (When) and lastly the checks for the result (Then).

- given year 2001 when checking for leap year then return false- given year 1900 when checking for leap year then return false- given year 2000 when checking for leap year then return true- given year 2004 when checking for leap year then return true- given year 1 when checking for leap year then return false- given year 0 when checking for leap year then return true- given year -1 when checking for leap year then return false- given year -4 when checking for leap year then return true

A common discussion I often hear is about the topic of how the tests should be grouped when there are many of them in a spec file. Should it happen by Given or by When? The test code should be structured to aid the developer in finding the answers for the What and Why. The essence of the discussion is which way of grouping fulfills that purpose better. What we test are not the environments, the Given section, but the When. We do not create environments and ask what the output of all kinds of actions should be, but which environments do I need to craft for the action, so I can test all kinds of aspects of it properly. The answer is grouping by When. The group structure will look like below.

when checking for leap year- given year 2001 then return false- given year 1900 then return false- given year 2000 then return true- given year 2004 then return true- given year 1 then return false- given year 0 then return true- given year -1 then return false- given year -4 then return true

KISS - Keep It Simple Stupid

Given-When-Then is nice. Nonetheless, it is usually not enough, especially when abstractions, DRY and other principles get applied in the name of clean code. We want to find the answers to our questions when a test fails. Do all of these principles help us achieve our goal? Abstractions are not without cost. They add one layer of complexity, and complexity makes the code harder to understand. If something makes the situation worse, then the opposite makes it better, right? In this case, yes. The key here is simplicity. DRY? Inheritance? Higher order functions? Loops? None of that are allowed unless skimming over the code would be harder without them.

DRY creates dependencies by merging existing code into one place. If I modify the extracted code, then two or more tests will change with that. Is that really something desired?Inheritance hides logic behind super classes. How should the developer find the reason for the failed test without jumping around in the inheritance chain?Higher order functions hide complexity in the function body. It causes the same problem as inheritance.Loops make it harder to understand how many times and what exactly is executed each time at a quick glance.

The aforementioned patterns are not the only ones where the drawback is bigger than the gain for test code. They are mentioned because they are the most common pattern I see people use. How can we manage the complexity in case the testing code becomes unruly then? The Given- and Then-blocks are the only places where any kind of abstractions are allowed. The When-block is where you execute the code you want to test. Any kind of sugaring is not allowed. For setting up the environment, only simple functions should be allowed as abstractions. In case the result is too complex to assert for with the utilities provided by the testing framework (or other libraries), then custom assertions can be created. Custom assertions should still conform to the patterns of the testing library used, e.g., taking in a structure as expected output or simple and clear naming like "isJsonStringValid".

On Mocks

A concept not foreign to the majority of those who have written tests before, are mocks. They are substitutes for dependencies in tests, so those dependencies do not need to be used. Let us go back to the first paragraph of this article and take a look at what tests are for: To ensure the quality and stability of the software. This begs the question whether mocks are helping us achieve that goal. Since mocks are so common and widespread, of course it helps us do that, does it not? Is that, however, really true?

Assume the following case: I mock away a dependency and in 2 months that very dependency changed its implementation to return something else given the same input. It gave me 4 as return value before, whereas it returns 6 now, but I have a check for throwing an error for any values bigger than 5 because we don't accept anything bigger. I didn't know about it, so I didn't change the mocks, or better said the person who changed that implementation forgot to change the mocks too. It might not even be known that there is a mock for it in a certain test. Now, is my test still doing what it is supposed to do?

The point of a test is to ensure the quality and stability of the software. If I am not testing code that will get run in production, then I am not fulfilling that purpose. The very nature of mocking actually works against our goal! Ideally speaking, there should never be any mocks anywhere. Of course, reality is different. Mocks solve real world problems.

Assume the following case: I need to make a network call to the backend. This means I have to have a running backend instance, which most probably also requires a database and potentially some other backend services too. Starting all of that will simply take very long, and the calculations and communication overhead will make everything even slower. A test suite that takes a lot of time to run is simply not acceptable, since the feedback loop is too long. This is where mocks come into play: The backend gets replaced with a dummy, and we return some predefined answers for the calls we need.

Isn't it confusing? Mocks work against the reason tests exist, but sometimes we really require them to make our life easier. That means mocks are good after all? When should mocks be used? Are there any criteria for that? Which in turn leads us to the cost of mocking and not mocking, namely the following:

Setup Cost

How much does it cost to set up all those dependencies? Setting up the backend on the CI for frontend might not be easy for multiple reasons. Mocks can have varying costs too. Sometimes noops (no operation) is enough, and sometimes more nuanced answers should be returned. It depends on the mock.

Runtime Cost

Runtime cost is further divided into start-up cost and the cost to call it. Starting dependencies like backend servers and databases can take some time. Some frameworks like Spring do a lot at runtime, which can blow up the start-up cost too. The calling cost is how long it takes for the dependency to return results when requested. Mocks are dummies, so their runtime costs are very low.

Maintenance Cost

In case of dependency updates, how easy is it to keep the current level of quality and stability of the test? It might be just a change in version string when using the actual implementation, or no action is required since that dependency is in the same code base. Mocks on the other hand need to be checked by hand if they still conform to the behavior expected from their respective targets.

Not all costs are equal, though. The human factor in maintenance is very big, making it weigh more than the other two costs. If mocks should be preferred in certain situations is simply a calculation of costs: Sum of all costs of using the actual implementation compared with the sum of all costs of using mocks. Whichever one is smaller is the one you want to go with.

Mocking in Unit, Integration and E2E Tests

With all that said, how does it work together with existing terminologies? When should one mock in a unit, integration or E2E test? Surely the testing pyramid is not new to you, am I correct? 😉

An E2E test ensures the software in its entirety works as expected, just as if it runs in production. Since E2E tests are for the whole software mocking anything there is a bad idea. Are you sure it will also work in production if, e.g., the backend is mocked away?

An integration test ensures the combination of units, which makes up a bigger part of the system, is working correctly. The units in integrations tests are pieces of code from the same code base. In this case mocking away external dependencies like backends, databases etc. is fine if needed.

A unit test ensures that one single unit does what it is supposed to do. It is debatable how to define the unit: Is a class/function/ with all its dependencies a unit or without them? If it is the former, then what is the difference between that unit test and an integration test? If it is the latter, then shouldn't I mock away all the dependencies since they don't matter? On the other hand, according to the aforementioned scenarios and costs for mocks, that should not be done. What is what now?

Let's take a step back. Take a deep breath and ask ourselves the question: What do we actually want to achieve? The end goal is clear: To ensure the quality and stability of the software. How do we ensure the quality of this piece of code the best? Right, mock only when its cost is smaller than not doing so. If it is called an integration test, then so be it. If it is called a unit test then so be it too. Does a unit also include its dependencies or not? The categorization is not important here. Terminologies can help us reach our goals, but they are not the goals themselves. Remember the first sentence of this article?

Mock Management

Alright, we understood the pros and cons of mocking and when to apply it. The biggest drawback of mocks is the maintenance thereof. The setup cost is a one time payment, so in the long term it does not amount to much. So is there a way to reduce the maintenance cost? The answer is yes.

Use Upstream-Provided Mocks

It is not our responsibility anymore if the mocks are from the dependency maintainer. 😉 One example would be ReadyAPI for OpenAPI.

Use Existing Mocks

Have only one mock per dependency in the whole code base, and use that in every test in need of it. This way, the need to go through all uses of that dependency is not needed anymore.

Consistent Mock Location

Either place all mocks in one folder or directly where the actual implementation is. Removing the barriers to updating the mock will decrease the frequency of forgetting doing so.

Architecture as Code using C4 + PlantUML

Klaus Lehner — Tue, 05 Jul 2022 11:37:05 GMT

What is Architecture as Code?

For more than 4 years I'm a big fan of Simon Brown's C4-Model and its implementation Structurizr.

In short from his homepage:

The C4 model is an "abstraction-first" approach to diagramming software architecture, based upon abstractions that reflect how software architects and developers think about and build software.

and about Structurizr:

Structurizr builds upon "diagrams as code", allowing you to create multiple diagrams from a single model, using a number of tools and programming languages.

If you ask yourself why we need that, check this tweet and its comments:

https://twitter.com/simonbrown/status/1011895634654367744

Over the last couple of months, we've open-sourced some add-ons to the Structurizr Java-Client and its Export Library which provides some boilerplate code to easily create a modular model of your software architecture and then export it either to Structurizr Web or PlantUML.

Introducing Structurizr-AutoConfigure

If you follow the instructions of the default library, you easily end up with classes with 100s of lines of code without any structure. We've build Structurizr AutoConfigure which uses Spring Boot in order to

cluster your C4 components into chunks of reasonable size
separate model and view-generation from each other and ensure that views are only created after the model is fully initialized
provide helper code that allows you to export your workspace either to Structurizr Web or to PlantUML files using C4-PlantUML.

This blog post will give you a step-by-step tutorial on setting up your project. You can also have a look at the sample project on GitHub for the full code.

Setup Gradle

My choice for dependency management is Gradle, but you can of course also use Maven. The only library you need is Structurizr-AutoConfigure, it takes care of all transitive dependencies for you.

repositories {    mavenCentral()}dependencies {    implementation("io.cloudflight.structurizr:structurizr-autoconfigure:1.0.1")}

Configure the application

Next thing is to create a file called application.yaml inside the folder src/main/resources with the following content:

structurizr:  workspace:    name: My project name    description: My project description

Change both the name and the description.

Create the main class

We are using Kotlin to create our C4 model, but you can also use Java. Still, we recommend Kotlin, it keeps the code much cleaner.Create a file Architecture.kt first with the following content:

package io.cloudflight.architectureimport io.cloudflight.architecture.structurizr.SpringStructurizrimport org.springframework.boot.SpringApplication@SpringBootApplicationclass Architecturefun main(args: Array<String>) {    SpringStructurizr.run(Architecture::class.java)}

We need this once to start Spring Boot.

Now we are ready to populate our model.

Create a simple C4 workspace

The idea is to create plain Spring @Components and wire them together. We are using a simplified version of the Cloudflight Coding Contest Platform to illustrate the usage of this library.

Populate your model

We are starting with our personas, we have a plain user and an admin.

@Componentclass Personas(model: Model) {    val user = model.addPerson(Location.External, "Contest Participant", "")    val admin = model.addPerson(Location.Internal, "Administrator", "")}

Please note that the Structurizr Model can be autowired into your components, it has been created by the StructurizrAutoConfiguration behind the scenes.

Next step is to create our SoftwareSystem and some Containers:

@Componentclass CodingContest(model: Model, personas: Personas) {    private val platform = model.addSoftwareSystem("Coding Contest Platform")    init {        val registration = platform.addContainer("Registration",                 "maintains all users and contests, provides SSO")        val catCoder = platform.addContainer("CatCoder",                 "provides the possibility to solve coding challenges")        registration.uses(catCoder, "fetches contests", "REST")        with(personas.user) {            uses(registration, "performs login")            uses(catCoder, "solves coding challenges")        }        with(personas.admin) {            uses(registration, "creates public contests")            uses(catCoder, "maintains coding games")        }    }}

We created two containers, injected our Personas component and wired everything together.

If you now run the application, nothing will happen, because we didn't create a view so far. That will be our next step:

Create views

Views in Structurizr have to be created before the model is fully initialized. As we modularized our C4 model into multiple Spring components, simply implement the interface ViewProvider with its method createViews. Structurizr-AutoConfigure will call this method after all models have been initialized.

@Componentclass CodingContest(model: Model, personas: Personas) : ViewProvider {    private val platform = model.addSoftwareSystem("Coding Contest Platform")    init {        // init the model like above    }    override fun createViews(viewSet: ViewSet) {        viewSet.createContainerView(platform, "ccp", "Coding Contest Platform").apply {            addAllContainersAndInfluencers()            addAllPeople()        }    }}

We get the ViewSet here and create a new ContainerView for our SoftwareSystem and add all containers and people to it.

Now we're ready to export our workspace in order to finally see some diagrams.

Export the workspace

Structurizr-AutoConfigure provides two export targets for your C4 workspace

Export to Structurizr
Export as PlantUML files

Export to Structurizr

The web server of Structurizr is the most powerful option to render C4 workspace. Create a user at https://structurizr.com/, there exist different pricing models (including an on-premise solution). After successful registration, create a new workspace there.

Then get back to your code and configure your app in the application.yaml:

structurizr:   workspace:      name: My project name      description: My project description   export:      structurizr:         enabled: true         id: xxx         key: "any"         secret: "any"

This will configure a StructurizrClient as described in the docs. Take the id, key and secret from Structurizr.

Then run your application, your workspace will be synced to Structurizr. No need to create and configure a StructurizrClient on your own.

Go back to Structurizr now and have a look at your workspace. Activate the auto-layout functionality once and you will see a picture like that:

We won't go into detail here about numerous editing capabilities of the Structurizr Diagram Editor, you may have a look in the docs or even better: try it out yourself, it's really great stuff!

Export as PlantUML files

A second option to render your workspace is to export its diagrams as PlantUML files. There exist plugins for Intellij IDEA, Confluence and many other tools to transform those files into images. Simon has already published an export-library for PlantUML and we have released another one on top of that with some additional features, specifically targeting the great C4 PlantUML library.

In order to activate the PlantUML export, change your application.yaml as follows:

structurizr:  workspace:    name: "Coding Contest Platform"    description: ""  export:    c4-plant-uml:      enabled: true

Run the application again, it will print a file called build/c4PlantUml/ccp.puml with the following content:

@startumltitle Coding Contest Platform - Containerstop to bottom direction!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4.puml!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Context.puml!include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.pumlPerson_Ext(ContestParticipant, "Contest Participant", "", $tags="Person+Element")Person(Administrator, "Administrator", "", $tags="Person+Element")System_Boundary("CodingContestPlatform_boundary", "Coding Contest Platform") {  Container(CodingContestPlatform.Registration, "Registration", "maintains all users and contests, provides SSO", $tags="spring+Container+Element")[[https://register.codingcontest.org/]]  Container(CodingContestPlatform.CatCoder, "CatCoder", "provides the possibility to solve coding challenges", $tags="spring+Container+Element")[[https://catcoder.codingcontest.org/]]}Rel_D(Administrator, CodingContestPlatform.Registration, "creates public contests", $tags="Relationship")Rel_D(Administrator, CodingContestPlatform.CatCoder, "maintains coding games", $tags="Relationship")Rel_D(CodingContestPlatform.Registration, CodingContestPlatform.CatCoder, "fetches contests", "REST", $tags="Relationship")Rel_D(ContestParticipant, CodingContestPlatform.Registration, "performs login", $tags="Relationship")Rel_D(ContestParticipant, CodingContestPlatform.CatCoder, "solves coding challenges", $tags="Relationship")SHOW_LEGEND()@enduml

Use PlantUML to render that diagram into an image and you will get the following result:

Please note that compared to the Structurizr version we didn't do any layouting here, PlantUML did that all for us. That's a pro and a con as well. While this might work fine for smaller diagrams, you'd easily be unhappy with the automatic layout of bigger layouts, and compared to Structurizr Web you don't have any possibilities here to influence the layout.

Advanced C4 models

You might think that what you've read so far sounds like a bit of overhead just to create this tiny diagram with 4 boxes and some lines. "I could have done that much faster with Visio, Powerpoint, or Draw.io". True.

But the real power of Structurizr/C4 comes when you create bigger, real-life models of your architecture. The following chapter gives you an impression of what I mean:

Creating multiple views from the same model

If you read the chapter Maps of your code on the C4-Model-Website, you understand the idea of having multiple views on the same model depending on the level of granularity.

The highest abstraction level is the SystemLandscapeView and to create that, we implement another ViewProvider

@Componentclass ViewConfigurer: ViewProvider {    override fun createViews(viewSet: ViewSet) {        viewSet.createSystemLandscapeView("codingcontest", "").also {            it.addAllElements()        }    }}

The model stays the same.

Run the application and you will get the following additional diagram using the PlantUML-Exporter:

We don't care about the fact anymore that the Coding Contest Platform consists of two applications behind the scene, all we're interested in is the out-side-view.

Using Tags and Styles

Similar to CSS, you can add tags and styles to your workspace in order to group elements and also change their view representation. Extend the ViewConfigurer as follows:

@Componentclass ViewConfigurer(workspace: Workspace) : ViewProvider {    init {         with(workspace.views.configuration.styles) {              addElementStyle(MyTags.Database).shape(Shape.Cylinder)              addElementStyle(Tags.PERSON).shape(Shape.Person)          }    }    override fun createViews(viewSet: ViewSet) {        // like before    }}object MyTags {   const val FileSystem = "FileSystem"   const val Database = "Database"}

Then we're adding new components to our CodingContest component with some of those tags:

@Componentclass CodingContest(model: Model, personas: Personas) : ViewProvider {    private val platform = model.addSoftwareSystem("Coding Contest Platform")    init {        // create registration and catCoder like before        val catCoderDb = platform.addContainer("CatCoder-DB", "", "MariaDB").apply {            addTags(MyTags.Database)            catCoder.uses(this, "reads and updates contest data", "JDBC")        }        val registrationDb = platform.addContainer("Registration-DB", "", "MariaDB").apply {            addTags(MyTags.Database)            registration.uses(this, "reads and updates user data", "JDBC")        }    }}

This will give you the following picture:

You see that the styles of the database components have been applied and they are showing as cylinders. And even better, without changing the model, you could create another view that excludes all databases only from that new view:

@Componentclass CodingContest(model: Model, personas: Personas) : ViewProvider {    override fun createViews(viewSet: ViewSet) {        viewSet.createContainerView(platform, "ccp", "Coding Contest Platform").apply {            addAllContainersAndInfluencers()            addAllPeople()        }        viewSet.createContainerView(platform, "ccpNoDB", "Platform without DB").apply {            addAllContainersAndInfluencers()            addAllPeople()            removeElementsWithTag(MyTags.Database)        }    }    }

The important line here is removeElementsWithTag(MyTags.Database) on the second view. While the view ccp will render all containers (including the databases) as shown above, the new view ccpNoDB will omit all components with the tag Database. Still, they are operating on the same model, which means if another ApplicationServer or a new dependency is added to the model, you will see that on both diagrams.

Adding Icons

In the early years of Structurizr I often heard the argument that one can draw much more beautiful diagrams with other tools, especially when it comes to icons. Latest since Structurizr supports Themes, this is not true anymore. You can use those themes for target platforms like AWS, GCP, or Azure out-of-the-box and have great styling including coloring and icons.

To give a bit more assistance and type-safety, we published the Architecture-Icons library, which is provided as a transitive dependency of Structurizr-AutoConfigure.

To make use of it, you just have to add the theme to your styles object and then use the tags from that library:

import io.cloudflight.architectureicons.azure.AzureMonoIconsimport io.cloudflight.architectureicons.tupadr3.DevIcons2@Componentclass ViewConfigurer(workspace: Workspace) : ViewProvider {    init {         with(workspace.views.configuration.styles) {              addElementStyle(MyTags.Database).shape(Shape.Cylinder)              addElementStyle(MyTags.FileSystem).shape(Shape.Folder)              addElementStyle(Tags.PERSON).shape(Shape.Person)              addElementStyle(DevIcons2.SPRING.name)                    .background("#6DB33F").color("#000000")          }          with(workspace.views.configuration) {              addTheme(AzureMonoIcons.STRUCTURIZR_THEME_URL)              addTheme(DevIcons2.STRUCTURIZR_THEME_URL)          }    }    // createViews}

Themes have been added here, and we also introduced a new ElementStyle called DevIcons2.SPRING. Now add some tags to our components:

@Componentclass CodingContest(model: Model, personas: Personas) : ViewProvider {    private val platform = model.addSoftwareSystem("Coding Contest Platform")    init {        val registration = platform.addContainer("Registration", "").apply {                addTags(DevIcons2.SPRING.name)        }        val catCoder = platform.addContainer("CatCoder", "").apply {            addTags(DevIcons2.SPRING.name)            registration.uses(this, "fetches contests", "REST")        }        val catCoderDb = platform.addContainer("CatCoder-DB", "", "MariaDB").apply {            addTags(                 MyTags.Database,                  AzureMonoIcons.Databases.AZURE_DATABASE_FOR_MARIA_DB.name            )            catCoder.uses(this, "reads and updates contest data", "JDBC")        }        val registrationDb = platform.addContainer("Registration-DB", "", "MariaDB").apply {            addTags(                 MyTags.Database,                  AzureMonoIcons.Databases.AZURE_DATABASE_FOR_MARIA_DB.name            )            registration.uses(this, "reads and updates user data", "JDBC")        }    }}

This will create the following PlantUML diagram:

Of course this also works perfectly fine in Structurizr Web:

And again, here you have the big advantage of much more styling options and manual layouting, which does not work with PlantUML.

Full Example

If you have ever worked with Azure or any other hyperscaler then you know that nowadays things are much more complicated. No problem with Structurizr/C4, simply add all containers and wire them together, you might end up with a ContainerView like that:

You can also create separate Deployment Diagrams based on the same model:

Please note that this diagram still does not reflect 100% of reality and lots of dependencies are missing here - I just didn't wanna create too much noise here just for demonstration purposes.

Find the full code example on GitHub:

https://github.com/cloudflightio/structurizr-autoconfigure-sample

Summary

Although the initial setup comes with a bit of time-invest and you also need some rudimentary coding-skills, maintaining your software architecture as code comes with a lot benefits, especially when your software systems grow:

Code means version-control, therefore history over time.
You can programmatically create your model and your views (with loops, if-clauses and all of that stuff).
You can create different views on the same model. All changes on the model are reflected to all views. You always have a consistent picture.
You can even create the architecture model automatically from code.
You can export to different target systems.

While different tools exist, Structurizr is our preferred choice. Check it out!

Run, Cypress, run!

Jovan Ilić — Mon, 27 Jun 2022 08:10:54 GMT

Testing with Cypress is fun - let's start from there.The tool itself is modern, fast, and reliable. It has comprehensive documentation and was built to cover most of the pain points other automation tools of the past had. It's open-sourced and with all that in mind, it's understandable why Cypress is a popular choice, with a buzzing and motivated community.

You can check out what my colleague Klaus wrote after giving Cypress a go.

What role does an alive and striving community play for Cypress, and us? Firstly, StackOverflow is packed with questions and answers regarding all the possible examples you can imagine. Secondly, there are plenty of Cypress plugins out there built by the community members, like you and I. Plugins are extensions for Cypress that give you the possibility to use Cypress beyond its vanilla bounds, speeding up or extending the testing process. From out-of-the-box database support, over visual regression, and all the way to the plugin that we are going to build right here, right now.

One might think: "Hey, first I need to be a Cypress expert if I'm to write a Cypress plugin, ain't nobody got time for that". While being an expert in the matter would help, it's not a must-have in order to write a Cypress plugin.

Actually, creating and publishing a Cypress plugin is pretty easy. Let me demonstrate that in this post.

Arrange

Before we start, here's a list of things you'll need preinstalled to create a Cypress plugin:

node installed (it comes together with npm, which we'll use to publish our plugin)
...

Actually, that's it, that's the whole list.

We can break down the plugin writing into two major parts:

a problem worth solving, and
a code that solves that problem.

As you might imagine, finding a problem worth solving, one that's preferably not already solved a million times, is hard, I'll give you that. This is something you'll have to figure out on your own. For this blog's sake, I'll pretend that Cloudflight wants Cypress to output the Cloudflight logo in the console whenever a cypress run command is invoked as a solution that will be easily reused across different projects. Pretty handy, right?

Act

We'll start by creating a new folder cypress-cloudflight and start our npm package there. As with any other npm project, let's start with a very simple package.json file:

{    "name": "cypress-cloudflight",    "version": "1.0.0"}

You'll add here a bunch of stuff like description, author and license info, repository link, keywords, and such, though we will use only the bare minimum needed to get the plugin published to npm.

A quick google search says there's a bunch of npm packages that print images to console, so we'll just reuse one of those as there's no need to write everything from scratch.This terminal-image package looks promising at first glance, so let's give it a go. Run npm install terminal-image. This will add a dependency in our package.json file with the latest released version (2.0.0 at the moment of writing). The package readme says we can use it just with these two lines:

import terminalImage from 'terminal-image';console.log(await terminalImage.file('unicorn.jpg'));

Let's tailor this to our needs and test it. First, create an index.js file in the project root and add these two lines. Running it like this will produce the following error: SyntaxError: Cannot use import statement outside a module. Just add "type": "module" to the package.json file to fix it. Let's also change the file name from unicorn.jpg to cloudflight.png, add the logo image of the same name to the project root and execute the code.In the terminal run node index.js and lo and behold, we solved half of our problem:

My dear friends, just look at that beauty.

Let's put this into Cypress context. We wanted to output this logo whenever any Cypress tests were started. To do that we'll need to listen to Cypress before:run event. Let's encircle the console.log within the cloudflightPower function, export it and inside listen to the Cypress before:run event. Hardcoding image name like that won't really do the trick, as packing assets doesn't work that way in npm. Luckily there's an easy workaround using import.meta which properly exposes our logo to any package that installs it. The resulting code is this:

export const cloudflightPower = async (on) => {    on('before:run', async () => {        const logoPath = new URL('./cloudflight.png', import.meta.url);        console.log(await terminalImage.file(logoPath));    });}

Let's break it down: on is the Cypress function we'll use to register listeners to various events, which gives us an option to run arbitrary code when those events are emitted by Cypress. In this case before:run will emit before any tests are started and execute our code. on will be provided as a parameter whenever somebody imports our plugin into their project.The next line needs no introduction, the famous console.log(). Inside it, we run the terminal-image code that will load the image and output it to the console.

Assert

To check if this works we'll set up a demo Cypress project called cypress-demo right next to the cypress-cloudflight project. npm allows installing local packages for testing purposes, so let's go ahead and do that, create a package.json file with this content:

{  "scripts": {    "cypress:open": "cypress open --browser chrome --e2e",    "cypress:run": "cypress run"  },  "dependencies": {    "cypress": "10.2.0",    "cypress-cloudflight": "../cypress-cloudflight"  }}

Run npm install followed by npm run cypress:open. This will start Cypress runner and auto-generate some Cypress configuration files. You may close the runner, we don't need it anymore.Next, open the generated cypress.config.js file, and inside the setupNodeEvents function, import the cloudflightPower function from the cypress-cloudflight project using a dynamic import, and execute it by providing the on parameter. The resulting cypress.config.js file should look like this:

const { defineConfig } = require("cypress");module.exports = defineConfig({  e2e: {    async setupNodeEvents(on, config) {      // implement node event listeners here      const {cloudflightPower} = await import('cypress-cloudflight');      cloudflightPower(on);    },  },});

Note that you should add async in front of the setupNodeEvents method as it allows using await for the dynamic import.The last thing we need is a simple test file to run. Inside the cypress folder create another e2e folder, and within it a test.cy.js file with the following content:

it('demo test', () => {});

This test won't do anything, but it will be enough for our example.

Start Cypress tests by executing npm run cypress:run and watch the magic. If everything was done right, you should again see the Cloudflight logo in the console.

Now, most of the hard work is finished here. The next step is to write a README.md file explaining how to set up and use your plugin, publish it to npm, and "inspire" all your colleagues to use it. All of them. No exceptions.

Go over to the npm home page and register an account if you don't have one. Open the terminal from the plugin root folder and run npm adduser. Follow the authentication instructions. After that run npm publish, and your plugin will be published to the npm. You can even fork cypress-documentation repository and add your plugin to the list for everyone to see.

Conclusion

And that's it, that's everything you need. The example plugin we just made is already published, and you can install, test, and, inspect the code inside. Just run npm install cypress-cloudflight and run some Cypress tests.

Hope this article shows just one of the reasons why Cypress became as popular as it is. Now go out there and write some amazing plugins!

Happy testing!

Hibernate Optimistic locking with Spring Boot

Harald Vogl — Tue, 21 Jun 2022 14:05:04 GMT

Optimistic locking is a concept to avoid concurrent changes on the same data. To be more precise: It's not about preventing concurrent changes (this would be pessimistic locking), but to allow and detect those changes happening at the same time and avoid concurrent modification.

There are several articles out dealing with this topic. Some examples include: Optimistic Locking in JPA [1], Optimistic locking with JPA and Hibernate [2], Testing Optimistic Locking Handling with Spring Boot and JPA [3].

Usually you find articles only handling the problem of concurrent database transactions: Two or more transactions load the same state of an entity, then modify and save it to the database again, leading to the fact that the last transaction which is committed overwrites the changes from the previous ones.

But there is a second problem, very well described in article [3]: concurrent long conversations: Typically two or more users are loading the same state of an entity to their local UI, then modify some fields and send their state back to the server. Without optimistic locking, the user who saves last, overwrites changes from the previous users without knowing that the data was changed by others in the meantime.

Normally the proposed solution is to just add an @Version attribute to your entity. But typically this is not working for the concurrent long conversations problem. To understand why, let's first take a look at an example using the @Version attribute as proposed.

The code is written in kotlin and available on GitHub [4].

EntityMapping

To make an entity ready for optimistic locking, you need to add a version attribute to your database table and entity mapping like this:

@Entity class Person(     @Id @Type(type = "uuid-char") var id: UUID,     var name: String,     var address: String ) {     @Version     var version: Long? = null }

TIP: If you have an application generated key, e.g., like the UUID or any other business key from the example above, I recommend using a nullable type for the version field (Wrapper type Long instead of primitive long in java). This saves an extra query on insert or updates, because Spring Data is then using the version field to determine if it is a new entity to be saved (version is null), hence an insert is necessary or if it is an existing one to update. If neither the ID nor the version is a nullable type (and the org.springframework.data.domain.Persistable interface is also not implemented in the entity), then there is an extra select query before the insert or update is done to determine if insert or update is necessary.

By adding this @Version field, the update and also delete queries are extended by the and version = ? in the where clause instead of just using the id:

update person set address=?, name=?, version=? where id=? and version=?

Keep in mind that changing the version of an entity programmatically during a transaction is not possible. You can set the version programmatically without any exception being thrown, but the manual set version will be ignored (except when having a detached entity and then attaching this entity to the persistence context).

Testing the functionality is only possible by having multiple threads calling the same transaction at the same time. There is an unit test example in the GitHub project, similar to the one from article [3]:

@Testfun `concurrent database transactions`(output: CapturedOutput) {    // given    val createDto = PersonDto(UUID.fromString("a426dbe8-e711-45cc-a2f7-5651dc2ea124"), "John Doe", "Doe Street 1")    val createdDto = personApi.createPerson(createDto)    // when    val executor = Executors.newFixedThreadPool(5)    for (i in 1..5) {        val updateDto = PersonDto(createdDto.id, "${createdDto.name} update $i", "${createdDto.address} update $i", 0)        executor.execute {            try {                personApi.updatePerson(updateDto)            } catch (_: Exception) {                // ignore optimistic lock exceptions            }        }    }    executor.shutdown()    executor.awaitTermination(10, TimeUnit.SECONDS)    // then    assertThat(personApi.getPerson(createDto.id).version).isEqualTo(1)    verify(exactly = 5) { personService.updatePerson(any()) }    assertThat(output.all).contains("ObjectOptimisticLockingFailureException: " +            "Batch update returned unexpected row count from update [0]; actual row count: 0; expected: 1; " +            "statement executed: update person set address=?, name=?, version=? where id=? and version=?")}

Handling concurrent long conversations

Preventing accidental overwriting of changes from user B, which were saved while the input form was being changed and then saved by the user A, is not supported by Hibernate out of the box.

The solution is to manually check the version in your service code and throw an exception if a concurrent modification is detected:

override fun updatePerson(personDto: PersonDto): PersonDto {    val person = getOrThrow(personDto.id).also {        if (it.version != personDto.version) {            throw OptimisticLockException(it)        }        it.name = personDto.name        it.address = personDto.address    }    return saveAndMap(person)}private fun getOrThrow(id: UUID) = personRepository.findByIdOrNull(id) ?: throw EntityNotFoundException()private fun map(person: Person) = PersonDto(person.id, person.name, person.address, person.version)// flush is necessary to get new version for dto mappingprivate fun saveAndMap(person: Person) = map(personRepository.saveAndFlush(person))

Writing a unit test for this scenario is quite easy just try to perform subsequent changes with the same version of the entity:

@Testfun `concurrent long conversations`() {    // given    val createDto =        PersonDto(UUID.fromString("a426dbe8-e711-45cc-a2f7-5651dc2ea124"), "John Doe", "Doe Street 1")    val createdDto = personApi.createPerson(createDto)    // when    val updateDto =        PersonDto(createdDto.id, "${createdDto.name} update", "${createdDto.address} update", 0)    personApi.updatePerson(updateDto)    // then    assertThatThrownBy { personApi.updatePerson(updateDto) } // version is 1 in the meantime!        .hasMessageContaining("${HttpStatus.UNPROCESSABLE_ENTITY.value()}")}

The API controller converts the OptimistickLockException to HTTP Status 422 UNPROCESSABLE_ENTITY which is caught here in the unit test

Instead of manually checking the version in the service code, you could also implement a custom hibernate FlushEntityEventListener to compare the manual set version on the entity with the version from the database before flushing. In both cases some manual work is necessary, and I prefer the service solution because it is less hacky and more explicit.

Alternative solution without `@Version`

Instead of having the @Version field, you could use @DynamicUpdate with the @OptimisticLock annotation:

@Entity @DynamicUpdate @OptimisticLocking(type = OptimisticLockType.ALL) class Person (     @Id @Type(type = "uuid-char") var id: UUID,     var name: String,     var address: String )

This would lead to an update & delete query using all the fields in the where clause:

update person set address=?, name=?, version=? where id=? and name=? and address=?

A detailed description can be found in the article How to prevent OptimisticLockException with Hibernate versionless optimistic locking [5].

However, solving the problem of concurrent long conversations by using @DynamicUpdate can only be achieved, if the UI sends all the previous field values back to the server, which are then programmatically compared to all the old field values, instead of comparing the single version field in the service code as shown above.

Conclusion

Implementing optimistic locking with hibernate and Spring Data JPA is quite easy. Adding an @Version field to your entities does most of the job. If you need to support concurrent updates of the same entity version, like several users using the same data on an UI at the same time, you have to do a simple if check before doing any changes to your data.

The sample code with a running REST interface for manual testing and unit tests can be found on GitHub[4].

Enjoy catching and handling optimistic lock exceptions! :)

How to track changes, stay agile and sane at the same time

Raphael Seywerth — Fri, 03 Jun 2022 07:41:47 GMT

The Intro

Welcome! If you are a software developer you might have encountered similar feats to what I am about to describe. Maybe you wondered if you really found the best solution. I will put code (SQL, Kotlin) only in the chapter about The Fight. In case you do not feed computers with commands for sport - you will know what to do.

The Philosophy

They say Change is the only constant in life. By "they" I mean that this is attributed to Heraclitus a Greek philosopher. It is often used in variations to show in a classic play on words how unrealistic our wish to define rules that should be valid for months and years to come really is. In the following quest I encountered with my brave team not only the changes of requirements over time, but tracking changes was part of the feature itself - yes - a bit meta, I know.

The Feat

Without further ado, what did we face?

"We will have a lot of data that will be added in many different forms" - pretty standard.
"From a small form to hundreds of forms and tables" - bigger database schemas then.
"Data supporting multiple languages and various configurations" - ok, we might invest in data migrations.
"A lot of it will be optional, some will change over the next months and years" - let us see..
"All that data will get a status and once submitted should get a version and be recallable" - oh well..

So much for a relational database - or is it?

Having seen similar requests in different projects I knew it was time to prepare for a longer run. Be it managing historic data for insurance contracts where there are not only current and historic but also pending versions. Huge university databases where old curricula data needs to be supported for a long time. Or projects where it is simply about tracking changes of master data that could be changed by multiple users in a system. It was time to become a chess player, thinking ahead multiple steps and changes, to choose not only the currently best fitting solution but the one that can be adapted to be the best for future me to worry less about.

The Heroes

Usually, you have some of the following solutions raising their hands:

The extra relational database schema (the ancient one, differentiates always perfectly between the challenges)
The relational database with extra historic tables (the old hero, often sung about for its stable results)
The non-relational database (the new one risen, celebrated for its agility on change)
The extra combined key with a timestamp to have all in the same place (the jack of all trades, avoids conflicts by being slow)
The database-specific versioning (the underdog, usually for limited use only)
The combined structure - formerly XML, nowadays often JSON - pushed into a relational database (the Harley Quinn, mostly chaotic, may produce unexpected results)

Yes, they all come with their own downsides:

be it problems when forgetting to adapt changes in historic tables
keeping different tables in sync in different schemas or then duplicated logic
having to deal with complex keys and queries because of too much data
incompatibilities arise from too many changes in non-relational structures
the dependency on a specific database without any way out or..
the sheer suicidal amount of problems Harley Quinn might bring to the table (pun intended)

The Underdog

With all that being said, our team went with the somewhat unsung hero: Database specific versioning. We chose to use System-versioned tables introduced by MariaDB in version 10.3.4 in 2018 [1]. This was only possible since we decided to go for MariaDB when we chose our relational database but there are similar solutions for other databases out there. The selection of this solution might not be straightforward. Let me outline why:

System-versioned tables are a database-specific solution, which means that native queries will have to be written. In a generalist view, this is disputed as some might prefer staying out of any lower layer to be database-independent.
Then system-versioned data rows cannot be dumped in a simple way, which might make copies and backups a bit less obvious and investigations into problems needing extra steps.
These rows also cannot be adapted by their nature, meaning that changes can only take place on the last, current row. This is one of the most important features of our hero to ponder about.
Although those data rows cannot be changed, the data and type could change if the table or column configuration is changed. This has to be considered in every migration.
Every change to a value in a System-versioned table will lead to yet another row added which usually means being prepared for A LOT of data.

So why would we choose this solution even though it seems to have so many limitations?To answer that I want to highlight again what I stated in the beginning about chess. We need a solution that will not slow us down and enable us to do a lot of changes later on. So even though we take into account that we will need native queries and problems when we forget to modify those, we have the advantage of only one schema, one crowd of entities to support. There will be compilation failures would there be problems in mappings or incompatible changes in the future. This is all a huge improvement regarding the alternative of extra historic tables or throwing all in non-relational Database structures. Another downside is the fact that, since we only need specific submitted versions, we will potentially generate a lot of data that we do not even need. Here we knew that we can handle that later by cleaning up not-used rows, enjoying the luxury of not having to deal extra when creating new versioned tables for our cause.

The Fight

Enabling system-versioning is the easy part. All you need (though this is specific to MariaDB but similar in other databases) is to add system versioning:

ALTER TABLE project ADD SYSTEM VERSIONING;

Some potential problems were anticipated written in the chapter above, but let us have a look at the actual challenges we faced. And of course, let us see how we prevented being knocked out early.

Round 1

In the beginning, it was mostly about finding out that we will not get around writing native queries. This was also an internal fight with ourselves. It meant making peace with leaving the rule to stay as general and independent as possible of a specific database. Still, we went with a duplication approach: providing different logic to support a current version fetching without specific queries. This would help in the future to find out potential performance issues and stay compatible and neutral on this side. Like this, it was easier to implement the historic queries part by part.

    @Transactional(readOnly = true)    override fun getPartnerFinancing(        partnerId: Long,        version: String?    ): PartnerFinancing {        return projectVersionUtils.fetch(version,            projectId = projectVersionUtils.fetchProjectId(version, partnerId),            currentVersionFetcher = {                PartnerFinancing(                    finances = partnerFinancingRepository                        .findPartnerFinancingById(partnerId)                        .toFinancingModel()                )},            previousVersionFetcher = { timestamp ->                PartnerFinancing(                    finances = partnerFinancingRepository                        .findPartnerFinancingByIdAsOfTimestamp(partnerId, timestamp)                        .toPartnerFinancingHistoricalData()                )}        ) ?: PartnerFinancing(            finances = emptyList()        )    }

The take-away: It is totally fine to commit to a database system (pun intended) that you use in today's software.

Round 2

Next was the realization of what it means if there are schema changes necessary. MariaDB will prevent accidental changes by default so any modification on system versioned tables will result in errors when running those migration scripts. There is a property to set to let that pass. But of course, you have to try to live with the fact that this will technically alter the past.

SET @@system_versioning_alter_history = 1;

Any change on versioned tables should always be a conscious decision not taken lightly.

Round 3

Getting the first small heart attack when being spontaneously asked to show how to backup data before we were ready to do so locally. While this sounds obvious since the documentation let us knew beforehand, it was still a moment of uncertainty. Knowing it should be possible because data can be queried, but also knowing that historic data really cannot be inserted manually.

mysqldump does not read historical rows from versioned tables, and so historical data will not be backed up. Also, a restore of the timestamps would not be possible as they cannot be defined by an insert/a user. [1]

As it also turns out here: a containerized environment is definitely for the win. A few statements and everything is done without even having to take care of database-specific queries. See docker 'use volumes' [2].

Know your environment! Use containers - and map your volumes!

Round 4

Next up it was time to get the first almost knock-down when debugging through historic values. If specific versions are needed only, like in our case, there is the need for an own version table to map system-versioned rows to the ones of your versions. This makes the 'row_end' one of the most important values. Therefore it is good to know that you can query those columns even though they might not show in the database view of your choice.

SELECT id, name, row_start, row_end FROM project;

Know all the features you are using and read about their limitations first.

Round 5

The last great challenge I want to describe was trying to win against the new opponent: the request to 're-open' old versions. As stated above historic rows cannot be modified, so that meant saving old historic data again to have those values back as the current ones.For defending against this attack on the rules we build-up, it was necessary to come up with a trick. Otherwise, it would have meant implementing something specifically for pretty much all the different tables we had so far, as it would not have been easy to copy all data and remove specific ids already present. This plan saved us many hours of potentially painful development and included the following parts:

knowing that we would be able to trace all version-specific tables back to one specific identifier over its relations
being able to query for table names due to their constraints, the ability to generate only queries for tables that had changed data relevant to the restoring
knowing what we do and having the guts to disable the foreign key constraints for the actual execution of the re-saving (insert-emoji-either-screaming-or-in-awe)

Sometimes dangerous actions (like disabling constraints) might be worth the try.

Well, we are up and running, I am sure there are more rounds to come.

The Closure

What did we learn?It may not be natural selection, but it is the fittest method that we are looking for. It is the implementation that we should be choosing not for being the best nor the fastest, but the one we can ideally implement early on, easily adapt and that offers overall the best compromises.

So there, take that: a technical article starting with a Greek philosopher and ending with Darwin!

The End.

[0] https://pixabay.com/photos/caterpillars-cocoons-hatch-5486651/ Header image source
[1] https://mariadb.com/kb/en/system-versioned-tables MariaDB versioned tables
[2] https://docs.docker.com/storage/volumes/ Docker docs: Use volumes

DB-Lock-Issues with Transactional REQUIRES_NEW — More Spring/Java transaction handling pitfalls

Paul Klingelhuber — Wed, 18 May 2022 08:42:03 GMT

If you havent read it yet, its probably a good idea to read part 1 of this post-series first.

From now on well assume that we remember roughly what part 1 was about. As a refresher, heres one of the main points again:

What is Propagation.REQUIRES_NEW?
One may think that it magically tells the database to make a nested transaction. However this mental model is wrongmany Databases dont even have nested transactions, so something else needs to happen.
What actually happens is that Spring will open a new connection to the database.

In part 1, we already learned that a side-effect of this is that control-flows that need a transaction initially and then use propagation=REQUIRES_NEW later on as well, need 2 database-connections. And as we saw, this can even lead to deadlock situations, if all connections from the pool are in use by threads that in turn all wait for yet another connection. That may happen if they all finished some initial processing and are now all waiting at the REQUIRES_NEW boundary for their second connection theyll never get. In reality you can hope for your connection-pool being configured with a timeout that will eventually resolve the situation.

We also explained, that there is another issue that can arise where execution can get stuck. This scenario (called scenario 2 in the first post) does not even require multiple threads at all. This is what well explore in more detail in this post.

A Little Story

This time, well start with a little story to explain how we might end up with code that runs into this problem in a real-world situation.

Day 1:
PO: We want to have a protocol/log in our database of all user triggered actions.
DEV: Easy enough, Ill create a new table and a service which simplifies this. It takes the user-ID and some text that describes what happened.

Day 2:
PO: The feature works great, but it has come to my attention that if the user tries to perform a task & it fails with a server-error, we dont see any log-entry in the DB that the action was tried.
DEV: Ah yes, thats because of transactionality and rollbacks. Since our log-code takes part in the main transaction, if somethingan exception happening later when the action is actually run happens, it also rolls back our protocol. But thats easy to fix, well just use REQUIRES_NEW as the propagation level, then the protocol-code wont be affected by the other rollback.

Day 3:
PO: The changes work great. Today we got a new authentication proxy, if that sends a certain http-header, our system should automatically create a new user on the fly. This could happen on any request to our system.
DEV: Sure thing, no problem, well just use a filter that is called on any request, that will always work.

There we have it, now (depending on how the transaction-handling is done) we may now very likely have a situation where well get a deadlock on the database-level.

To the Code!

The example repository lives here: https://github.com/NoUsername/transactions-handlinge-issue-2

Sidenote: the code doesnt totally reflect what the story told. For simplicity were using a simple service and no request filter.

The important part well want to look at is this snippet form the UserService:

@Transactional  fun signup(name: String) {      // shortened      val user = userRepository.save(          User(              name = name,              signupDateTime = LocalDateTime.now()          )      )     // NOTE: this uses @Transactional(propagation = REQUIRES_NEW)     userLogService.logUserAction(user.id!!, "User signed up")     // shortened  }

The logUserAction function will try to write a protocol of what action is happening related to that user, however that wont work. It will simply block on the database (for a few seconds before failing).

Whats the Problem here?

The problem is, that were using 2 database connections and operating on related data.

Lets go through it step-by-step: We have our outer Transaction (O) opened by the signup method and we have our inner Transaction (I) opened by logUserAction.

(O) writes new user with ID=1 to the USER tabledoes not commit yet
(I) tries to write an entry to the USER_LOG table referring to a USER entry with ID=1.

Since the DB is smart enough to realize that (I) would be able to finish, if only (O) would commit, it makes (I) wait for (O). What the DB cant know is that our application does exactly the opposite. Our application waits for the (I) transaction to finish committing, before it would go on to let (O) finishso (O) waits for (I).

This should sound familiar again, because this is exactly the deadlocking pattern: two sides waiting on resources in exactly the opposite order.

Simply put, imagine opening 2 SQL-Clients (without auto-commit) and entering this command in the first one:

INSERT INTO USER (id, name) VALUES (1, 'foo');  -- no commit yet

And in the second client running:

INSERT INTO USER_LOG    (id, user_id, message)   VALUES    (123, 1, 'message');  COMMIT;

We would see in the second client, that the commit does not go through yet, because we try to refer to uncommitted data in a foreign-key.

This is exactly what is happening behind the scenes in our application.

No blogpost of mine would be complete without a crappy drawing, so here you go:

Typical circular waiting deadlock pattern

But I would never write this Code

Of course we would never write this code. But as we saw in the story (and as it usually goes in real-life) the code is much more complex in your production application. That certain control-flow that triggers the bug typically only happens in very rare conditions which typically arent covered by tests.

As in our story: the create user code would not be visible directly. Instead it would be hidden in some request-filter or even another module.

Reality Check

Granted, many (most/all but I can only speak for MariaDb and MsSql in recent versions that do this) database systems dont really let those connections deadlock. They have some built-in timeout, which will then return an error to the callers. But this still means:

Our application (the request-thread, and db-connections) will be blocked for several seconds typically.
The action we tried to take (db write operations) will fail.

So how would we even find this Issue in our Code?

As always with software-issues, a great first step is if we can reproduce the issue 😁

So lets trigger a request that causes the issue (POST to /signup) and then take a thread-dump to see where our application is actually hanging, which will show us something like this:

Simplified & annotated Stacktrace

What this shows us is that at (1) the thread is currently waiting to hear back from the Database. And at (2) we see that its currently in the process of writing the user-log entry to the DB. If we would check back a few seconds later, the stacktrace would still look the same (no progress).

If we have turned on application-logging, that would confirm this state.

Since were waiting for the database itself, thats the place we should turn to next. Because how should our application know why the database isnt responding?

There are a few useful commands that can help us figure things out, like:

SHOW FULL PROCESSLIST;  SHOW ENGINE INNODB STATUS;

However, if we want to see an already simplified view on which queries are currently being blocked and by what, this article has this nice query for us:

SELECT r.trx_id              waiting_trx_id,       r.trx_mysql_thread_id waiting_thread,       r.trx_query           waiting_query,       b.trx_id              blocking_trx_id,       b.trx_mysql_thread_id blocking_thread,       b.trx_query           blocking_queryFROM information_schema.innodb_lock_waits w         INNER JOIN information_schema.innodb_trx b             ON b.trx_id = w.blocking_trx_id         INNER JOIN information_schema.innodb_trx r             ON r.trx_id = w.requesting_trx_id;

In our case this would show something like the following result:

Result of the blocking connection analysis query

If we are impatient and ran the test-query multiple times in parallel, we would even see multiple such lines.

Notice how the waiting_thread and blocking_thread values do not overlap. Each insert into the user_log table is blocked by its corresponding insert into the user table.

Sadly the blocking_query doesnt show the actual query that was previously executed in this db-thread/transaction. We could get to it, if we would have turned on the performance_schema, thereby having access to historic query data. But thats out-of-scope of this article.

How do we fix this?

As in part 1 of the series, its again a good idea to question using REQUIRES_NEW.

What can we do instead, if we dont want to use the independence of transactions? Serialize the database access. What we mean by that is not related to storing things in binary/text format, but instead getting rid of concurrency.

Concurrency? But were only talking about single threads? > doing a quick internet search on concurrency vs. parallelism will clear that up. Since REQUIRES_NEW opens a new DB-connection, we are always concurrently using these 2 resources.

So serializing the database access means first completing one transaction before starting the next.

A simple way to do that is for example leverage Springs TransactionSynchronizationManager:

// Propagation.MANDATORY makes sure a transaction is ongoing//   so we have something to "sync to"@Transactional(propagation = Propagation.MANDATORY)fun logUserActionFixed(userId: Long, action: String) {  LOG.info("logUserActionFixed start for userId {} and action {}",             userId, action)  TransactionSynchronizationManager    .registerSynchronization(object : TransactionSynchronization {      override fun afterCompletion(status: Int) {        // NOTE: we could even check if the initial transaction        // failed or succeeded via the status parameter        LOG.info("starting actual logging to repository")        userLogRepository.save(          UserLog(            userId = userId,            dateTime = LocalDateTime.now(),            log = action          )        )        LOG.info("actual logging to repository done")    }  })  LOG.info("logUserActionFixed done (no actual logging yet)")}

This will lead to the actual execution looking something like thisin our logs:

Another way would be to go a more extreme route and use some form of messaging (like JMS) to postpone the action. Even in that case we should be careful that the JMS message is only sent at the end of the transaction, otherwise we might still run into race-conditions.

Wrapping up

After Pool Locking in the first part article, weve now also seen the second potential deadlock scenario Database Locking in action. Both these issues are results of what

@Transactional(propagation = Propagation.REQUIRES_NEW)

does under the hood. Should we never use Propagation.REQUIRES_NEW? Never say never. There may be cases where this is the cleanest and quickest solution AND we are sure that it isnt a problem (because we somehow ensured it). Issues may still arise later, if some aspect of the code is changed and suddenly there might be a case where we run into these problems.

So what should be our takeaway? Every time we see Propagation.REQUIRES_NEW in a codebase, we should at least switch into our alert/danger/spidy-sense-is-tingling mode and double-check what is going on. Maybe ask ourselves these questions:

Were these potential problems considered when writing this code?
How is the DB-pool configured?
Is concurrency limited when it comes to calling this code?
What data (tables) is accessed inside this code and which ones outside (before) calling this? Typically we only care about write-access here.

So theres quite a list of things to check before being able to say that were reasonably sure that this little REQUIRES_NEW wont come back to bite us.

And finally, how did we magically come to this knowledge? Well of course things went wrong in real world scenarios. Hint: your inner locking DB-access could even be hidden inside some stored-procedure, for extra ease of debugging 😋

Running the OpenShift console in plain Kubernetes

Dominik Süß — Mon, 02 May 2022 09:13:34 GMT

The OpenShift console is a nice GUI intended for use within OpenShiftclusters. It offers a consolidated overview of resources, integrated metrics,alerting and also allows you to update cluster resources from a web browser. Asthe name implies, the console is developed with OpenShift being the primaryuse case. During my work at Cloudflight, Ive grown to really like the OpenShift console forquick checks and visualization of resources, so I wanted to set it up for myself.The only problem: I run k3s at home due to resource constraints. So I set out ona journey to see if I can get the console running on plain Kubernetes.

Prerequisites

If you want to follow along on your own cluster, authentication on your clusterneeds to be set up properly. I am using OIDC, so this is what well set up herebut feel free to experiment with other providers

Starting the console locally

Luckily, the developers of the console did a very good job in keeping the consoleseparated from the rest of the OpenShift components and even provide a guide onhow to develop the console locally against a plain Kubernetes cluster. So letscheck out the code and see if it works.

The repository provides us with thefollowing instructions:

./build.shsource ./contrib/environment.sh./bin/bridge

Provided your local kubeconfig file is set up correctly, this will start theconsole on localhost:9000 without any errors. When clicking around however,youll notice that youre not logged in with your user, butsystem:serviceaccount:kube-system:default. The reason for this can be found inthe contrib/environment.sh file. This script sets up authentication using thefirst service account found in the kube-system namespace. This is obviously notwhat we want. We want every user to log in as themselves, so lets take a look atthe available authentication methods.

Authentication

At the time of writing, the console supports either the openshift or oidc userauthentication methods.

If you perform authentication on another layer (e.g. via a VPN gateway) you coulddisable user auth altogether and use the service-account or bearer-token methodto perform actions taken in the console using an existing service account. Iwant each user to log in to the console directly, so I will set up userauthentication using the oidc strategy.

The console can be configured three ways: a configuration file, environmentvariables or command line arguments. The values are also applied in this order, so command line arguments override anything else, which is why Im using them todebug the setup locally. Names of configurable parameters can be found byrunning bin/bridge -h.

bin/bridge -k8s-auth oidc -user-auth oidc#W0501 15:29:54.430483   37685 main.go:213] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!#W0501 15:29:54.430526   37685 main.go:347] cookies are not secure because base-address is not https!#F0501 15:29:54.430541   37685 validate.go:50] Invalid flag: base-address, error: value is required

That didnt work. The program tells us about missing values, so lets configurethem according to our oidc provider

bin/bridge \  -k8s-auth oidc -user-auth oidc \  -user-auth-oidc-client-id=<client-id> \  -user-auth-oidc-client-secret=<client-secret> \  -user-auth-oidc-issuer-url=<issuer-url> \  -base-address http://localhost:9000

These values can be taken from your api server configuration. If everything isset up correctly (dont forget the redirect URI in the OIDC provider), you willbe authenticated and perform actions using your own user account.

Metrics

The OpenShift metrics stack is an entirely different component and performs theheavy lifting of monitoring workloads in a cluster. As with the console, itcould be run in plain Kubernetes as well, but I already have a metrics stack setup usingkube-prometheus. Itis still possible to connect this to the console, but involves a bit more work.

The relevant configuration parameter for this is calledk8s-mode-off-cluster-thanos. If this is the first time you hear ofthanos, dont be afraid. It is a high availability andlong term storage solution for prometheus, but can be replaced by plainprometheus for our purposes. If you have set up thanos already, more power toyou, but I did not yet have the time for this.

When supplying the console with the correct thanos/prometheus url, you willalready see your first metrics:

One out of four! Not too bad. Lets see how we can retrieve these missing metrics. By clicking on the panel, were directed to the metrics query. Here the problem becomes obvious.

This looks like a custom metric (a.k.a. recording rule) to me. But where do wefind the source for this, so we can reuse it in our monitoring stack? Sourcegraphto the rescue! By using the query repo:openshift pod:container_cpu_usage:sum wequickly find the source for these rules: prometheus-rule.yaml in thecluster-monitoring-operator.

Placing these rules into your prometheus/thanos instance is left as an exerciseto the reader.

Afterwards, you will at least see the memory and CPU Usage. The other metrics inthe console partly depend on OpenShift features and/or metrics under defaultnames. They can be reconstructed using recording rules as well, but I am happywithout them.

Dealing with state

While running the console, you might have noticed errors in the form of Failedto get user data to handle user setting request: the server could not find therequested resource. This occurs when the console tries to save your usersettings in a config folder. To store user settings, the console expects auser.openshift.io resource to be present, which is only available in OpenShiftclusters. The easy workaround for this is to specify-user-settings-location=localstorage, as this will skip this step. Maybe supportfor clusters without this resource present, will be contributed some time in thefuture.

Deploying the console

Deploying the console is as easy as deploying any other application. Containerimages are available from quay.io/openshift/origin-console. Sadly only amd64images are available but if you need to use arm images, you can build theconsole yourself.

Ill not go into detail here but simply refer to my configuration repo if yourequire inspiration on how to set it up on your cluster.

Note: If you plan to run the console inside your cluster, you still need toconfigure it to use the off-cluster Kubernetes mode. Otherwise it will use theOpenShift default endpoints instead of your configured URLs.

Further Customization

Of course, the customization does not end here. Feel free to play around withother parameters of the bridge binary (especially the branding ones)! Amultitude of further integrations are possible (alertmanager, grafana, developerperspective). I might post more about them on my personal blog as I get them working.

Transactional REQUIRES_NEW considered harmful — Spring/Java transaction handling pitfalls

Paul Klingelhuber — Mon, 07 Mar 2022 09:44:01 GMT

Databases and Transactions are great, Spring is great, Kotlin/Java are great (ofc. Kotlin is better 😉), simplifying transaction-handling is great. But as we all know, simplifying things almost always creates some new pitfalls if the things that were simplified arent fully understood.

The Issue

The Spring @Transactional Annotation is a powerful tool to make transaction-management easy for developers.

As always: With great power, comes great responsibility / Know your tools.

Especially the variant @Transactional(propagation = Propagation.REQUIRES_NEW) can quickly become a trap (which I've personally witnessed multiple times already).

What is Propagation.REQUIRES_NEW?

Often people think of this propagation setting as magically tells the database to make a nested transaction. However this mental model is wrongmany Databases dont even have nested transactions, so something else needs to happen.

What actually happens is that Spring will open a new connection to the database.

Transactional and Transactional(REQUIRES_NEW) interacting with the Connection-Pool. Connection 2 is just an example of any currently idle connection.

This can be problematic for at least 2 reasons:

Scenario 1: blocking resource allocation for a limited resource (connections from a pool) already sounds like something we might have heard in computer-science when we heard about deadlocks.

Scenario 2: handling 2 connections to the database at the same time where comitting one depends on comitting the other is also a bad ideaespecially if the connections might operate on related tables.

Deadlocks on all Sides

Quick recap on deadlocks: [] a state in which each member of a group waits for another member, including itself, to take action [] from [2]

In Scenario 1 the deadlock happens on the Java-Side:

There 2 (or more) threads each have acquired a database-connection (and thereby emptied the java db-connection pool).
Now each thread wants to acquire another connection and thereby waits for the other threads to release them.
The other threads to the same, so they would only release a connection after they got an additional one.

The only thing that will ever break/unblock this situation is a timeout on the connection-pool side. That timeout is often configured to at least be a few seconds and by that time users / other systems are probably already frantically retrying the problematic request, guaranteeing that the situation will stay bad. The system will basically appear completely frozen/broken to almost all users, since the majority will only see error-responses and the few that actually do make it through will be slow.

In Scenario 2 the deadlock may appear on the Database-Side: (albeit of scope of this repositories example)

The Java-Process has opened a connection and e.g. inserted a new user in a table.
Then the Java-Process (same thread) has opened another connection (via a call to a @Transactional(propagation = Propagation.REQUIRES_NEW) annotated method)
In this second connection, it tries to insert something in a user-preferences table and use a foreign-key to the just inserted user.
When the inner method is left and Spring tries to flush the inner-transaction (second-transaction is more accurate), the database will block.
The database waits for the first connection to complete (commit) before it lets the second one commit.
This is completely valid from the databases point of view, since those 2 connections are completely independent.
However the Java-side introduced a blocking dependency which runs in the opposite direction (the inner method wont finish/return, so the calling method of course cant finish either).

NOTE: this will depend a bit on the concrete database, but similar situations can probably be reproduced on most database systems with slightly altered scenariose.g. causing DB-side-locks some other way).

Reproduction

If you want to code/play/ along, here is the example repository with the code.

Well start the application and then run the following command via our terminaluse either git bash or WSL on Windows.

seq 1 8 | xargs -I $ -n1 -P10 curl "http://localhost:8080/nested"

Note that this example-code will only show the problem described in scenario 1 (java-side blocking).

Identifying The Issue

What can we do if we think our application has this problem? It may just be a feeling, but how can we be sure?

If we can, we should activate logging from our connection-pool and tune parameters like connectionTimeout so that we'll see stacktraces of where issues occur.

Then a good approach would be the following:

1) Put the system under load (try to put a good amount of stress on the systemtypically parallel requests) so that you start seeing problems.
2) Take some thread-dumps [4].
3) Analyze the thread-dump(s) and the logs.

We can collect this data by running the example-app and the provided bash-script.

Logs
If we are in the lucky situation (like with the example-app), to have enough log-statements in the code, the logs can already be enough to pinpoint the issue.

No Logs

Still, lets pretend we arent so lucky to have these logsor just not yet fully convinced.

In the thread-dumps we would see the following:

Shortened Stacktrace showing 2 threads stuck at datasource.getConnection()

NOTE: I removed lots of things to make it easier to seebut once you know what youre looking for, this is easy.

We see that:

(1) The outer transaction got started (acquired a connection).
(2) Our own code runs, which tries to call a service-method, but before it gets there
(3) the spring-proxy takes over to do the transaction-handling and tries to create a separate transaction.
(4) That code is waiting for a connection.

Also we see that there are multiple other threads running which are blocked at exactly the same place. Even if we look at a thread-dump taken a few seconds later, it will still look the same, so we may assume that the execution hasnt progressed from where it was.

Other Metrics/Indicators

If we have access to other metrics like JMX-Values, those can be good indicators too.

In this screenshot of JVisualVM, we see the current status of the Hikari Connection-Pool while the load-bash-script is running:

Pool-status showing us that lots of connections are waiting (requested from the pool, but could not be provided yet).

What we expect to see to confirm our theory that our application is suffering from pool-locking is this: The values stay high (dont change) while we refresh a few times and our simulated clients (curl-commands) dont make any progress.

Possible Solutions

There are basically 3 ways this can be fixed:

Getting rid of the nested transaction. It might not be necessary at all to have 2 separate transactions. If thats the case, just get rid of it by using the default propagation level.
Serializing the 2 transactions. Instead of having an inner transaction that is started after the first, but must finish before the first, it might be an option to just have 2 separate transaction after one another.
The easiest way is to just have 1 outer method which isnt transactional which then calls 2 transactional services after one another.
If that isnt so easy, Springs TransactionSynchronizationManager.registerSynchronization [3] might be worth a try. That way we can register some code that runs once the current transaction finished.
Limiting the concurrency on the calling side to a level that guarantees the case will never be hit. This could be done in a number of ways.
One way would be to reduce the allowed parallelism of allowed incoming connections (http).
Another way is to introduce some other limiting-code (e.g. only around the parts of the application which is known to need nested transactions).
The right limit depends on a few factors: the amount of nesting the application has (every level of nested calls to Propagation.REQUIRES_NEW adds another connection that is required by a single thread), the database connection pool settings and the amount of otherwise occupied db-connections.
Also keep in mind that other parts of the application might also occupy db-connections: frequently running tasks, long-running jobs, message-queue-message-processing, RPC-endpoints etc. @Transactional(propagation = Propagation.REQUIRES_NEW) is simply a great tool which I've seen being used in problematic ways multiple times by now.

Caveats

Just increase the pool-size is NOT a solution. And its not even a good workaround for any system of meaningful sizesee [1]. Of course for any application the db-connection-pool should have an appropriate sizebut after reading [1] we might realize that its lower than we initially thought.
This is NOT an issue with the HikariCP Connection-Pool. If anything, HikariCP gives us great tools (like leak-detection-threshold) to identify such issues and many other parameters to properly tune our connection-pools.
This is also NOT a problem/bug in Spring or even Java-specific. Any system can have this issue if it has the following property: a pooling mechanism with a fixed upper-bound and multiple nested and blocking requests to it.

Conclusion

Should we never again use @Transactional(propagation = Propagation.REQUIRES_NEW)?
No, I would not go that far.

Should we question if we really need it, every time we intend to use it?
Yes, definitely. In my experience, the code system usually becomes easier to reason about, once we have gotten rid of the handling 2 db connections at the same time aspect of it.

Developing and Testing Spring Boot Apps with Azure Storage locally

Klaus Lehner — Sun, 30 Jan 2022 10:50:01 GMT

Azure Storage is the Microsoft equivalent of Amazon's S3 (Simple Storage Service). Its blob storage stores any type of text or binary data, such as a document, media file, or application installer.

Microsoft provides a great client library for Spring Boot applications to access Azure Storage.

All you need to do is to add the following dependency:

<dependency>   <groupId>com.azure.spring</groupId>   <artifactId>azure-spring-boot-starter-storage</artifactId>   <version>3.10.0</version></dependency>

Then enter your account name, account key, and blob-endpoint to your application.yaml - you get all of these values from the Azure Portal:

azure:  storage:     account-name: <storage-account-name>    account-key: <storage-account-access-key>    blob-endpoint: <storage-endpoint-URL>

You can then autowire beans like the BlobServiceClientBuilder which allow to access your storage and upload or download files, as explained in detail in the client library.

While this setup is easy to use it has one major drawback: It makes you dependent on Azure Storage at development and integration test time, which means without access to the internet you can't develop anymore, and if your integration tests depend on Azure as well you even can't run your build without being online. While especially in cloud native applications this will be more and more the case, especially when it comes to sophisticated AI services, I believe that for fundamental services like storing files on a blob storage you should make yourself independent as far as you can.

Fortunately, there is a solution:

Azurite as local Azure Storage

Just like there is MinIO for AWS S3, there is Azurite for Azure Storage.

Azurite is an open source Azure Storage API compatible server (emulator). Based on Node.js, Azurite provides cross platform experiences for customers wanting to try Azure Storage easily in a local environment. Azurite simulates most of the commands supported by Azure Storage with minimal dependencies.

While Azurite supports wide parts of the Azure Storage API (blob storage, queue storage, table storage), it's still not the same as the original part on Azure, which reminds me of my previous blog entry where I've written about MockMvc in Spring as a replacement for a full-blown web server.

Anyways, for most of the use cases, Azurite is more than good enough, just like MinIO as a mature local replacement for Amazon's S3.

Starting Azurite locally can be done with a one-liner via Docker:

docker run -p 10000:10000 -p 10001:10001 -p 10002:10002 mcr.microsoft.com/azure-storage/azurite

Now change your application.yaml from above to the following values:

azure:  storage:     account-name: devstoreaccount1    account-key: Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==    blob-endpoint: http://localhost:10000/devstoreaccount1

Restart your application, and you will be connected to your local Azurite instance and can act fully autonomously.

Using Playtika TestContainers for local integration testing

One of the great things with Docker is TestContainers which allows a seamless integration with JUnit5 tests.

Playtikas testcontainers-spring-boot library is a wrapper around TestContainers which integrates well with Spring Boot`s auto-configuration concept. It preconfigures your beans and takes care of dynamic ports to not interfere with other local containers.

It provides modules for a lot of containers like MySQL, RabbitMQ or ElasticSearch, but there was no support yet for Azurite, so I implemented that on my own and created a pull request which was merged and published very soon.

Support for testing Azurite via Testcontainers and Spring Boot is now available to everyone, and this is how it works:

Import that library first:

<dependency>    <groupId>com.playtika.testcontainers</groupId>    <artifactId>embedded-azurite</artifactId>    <scope>test</scope></dependency>

In your src/test/resources/application.yaml, add the following properties:

azure:  storage:    account-name: ${embedded.azurite.account-name}    account-key: ${embedded.azurite.account-key}    blob-endpoint: ${embedded.azurite.blob-endpoint}

Those embedded.azurite.* properties are being produced by embedded-azurite as explained in detail in the docu.

Last but not least, create a @SpringBootTest which adds @EnableAutoConfiguration, you can then autowire beans like the BlobServiceClientBuilder, and Testcontainers will take care of starting Azurite before your test is being executed.

The test class of embedded-azurite shows you an example for that.

@SpringBootTest(classes = EmbeddedAzuriteBoostrapConfigurationTest.AzuriteTestConfiguration.class)class EmbeddedAzuriteBoostrapConfigurationTest {    @Autowired    BlobServiceClientBuilder blobServiceClientBuilder;    @Test    void accountName() {        BlobServiceClient blobServiceClient = blobServiceClientBuilder             .buildClient();        assertThat(blobServiceClient.getAccountName())            .isEqualTo(AzuriteContainer.ACCOUNT_NAME);    }    @EnableAutoConfiguration    public static class AzuriteTestConfiguration {    }}

Now you can not only develop locally but also run integration tests with Testcontainers locally without being dependent on any external service.

https://github.com/Playtika/testcontainers-spring-boot/tree/develop/embedded-azurite

How End-to-End Testing finally makes fun

Klaus Lehner — Sun, 09 Jan 2022 07:15:25 GMT

In this blog entry, I will show you

why I finally fell in love with end-to-end testing of web application
how Cypress achieved exactly that and how to setup e2e testing with Cypress
a link to my sample code at Github, combining Cypress and Typescript
some more information about Continuous Integration

Background

I am a professional software developer now for almost 20 years, and I have run numerous successful projects of all sizes and in all possible branches, mostly big web applications.

Test-Driven-Development (TDD) is something I am really convinced of (more on that in later blog posts) but one thing was always painful: End-To-End Testing.

If you are coming from Selenium, you most likely have a similar mindset as I had:

Everything at E2E-tests is slow and cumbersome

Everything:

setting up everything locally
setting up everything on your CI server including Browser Support
developing tests
executing them
maintaining them
keeping track of false positives

Now, to be honest, Selenium was first released in 2004, and over the years it made a lot of improvements, but the web changed as well. Dynamic web applications built on Angular, React, or Vue on the one side, dozens of new Browsers and end devices on the other side.

Still, I'm coming from the Java corner, and that's why I always kept an eye on Selenium, also because the programming language was familiar to me. I'm not a big fan of recording E2E-tests, I'd rather like to code them as well, build reusable components, basically develop those tests like application code. Recorded tests are hard to maintain and while you may achieve fast results, you quickly end up in maintenance hell.

At Cloudflight, we're currently developing a quite huge mobile app and last year we've also implemented a test suite based on Selenium and Browserstack. We spent a lot of energy on getting it running, had contact with the really good support team at Browserstack as our APK was regularly crashing there, finally had a working test setup, also integrated with our CI-Pipeline, developed a couple of tests, and then it happened what happened with almost every E2E-Testing-Project I saw in the last years: The application continues to evolve, the tests not, developers get tired in maintaining the tests (or simply don't have time for it), false-positives frustrate everyone, and quite suddenly a dust layer builds up on those tests, and people don't care about it anymore.

Some months later I jumped into that project again, helping a bit before a big release. And as my frontend capabilities are really limited, I asked the team if it makes sense to get those Browserstack/Selenium tests running again. So, I've cloned the git repo, followed the readme, downloaded some 100 MBs of software, installed Webdriver, Android SDK, and Appium for local development. I also asked the team a bit to help me, and they were all rolling eyes, stating things like "yes, it's all not that easy". After some hours I had everything somehow running, tried to start to fix the failing tests, and it was just frustrating. The feedback loop was so long, finding out what exactly was the problem is the tests were - when you come from JUnit - so incredibly cumbersome, that after some hours I gave up.

Now frankly, I don't wanna blame either Selenium or Browserstack there. Those frameworks are mature and used by numerous projects all around the globe, the engineering teams behind those products are skilled and motivated and they have done great jobs (Selenium is even free to use). Probably we've also set it up wrong.

So I gave Cypress a try. Me, a Java developer for 20 years, now doing Javascript/Typescript E2E-testing. And what should I say? After 2 hours from scratch, I had a full test suite for the above-mentioned photobook designer. Technically, that was quite easily possible, as our app is built for cross-platform, having an Angular kernel, with Cordova for mobile and Electron.js for desktop around it, but there also exists a plain web version based on the same source base, so I can test almost all use cases also in the browser, including:

login
selecting products
uploading images
design a photobook
edit some settings like the title of the book
sending the order to the basket
performing checkout

Let me shortly share the basics about Cypress with you:

Create a cypress test project

The official cypress homepage already has great tutorials (starting with npm install cypress and very compact getting started tutorials) which I won't copy over to this blog, instead I'll use parts of our setup at Cloudflight that worked fine when working in teams.

All you need to do is to create a package.json with the following content (adapt name to your needs:

{  "name": "cypress-hashnode-sample",  "scripts": {    "cy:open": "cypress open"  },  "devDependencies": {    "cypress": "9.2.0"  }}

Then, run npm install and npm cy:open. Cypress will be downloaded on the first run (which may take a while), but then an application will open for you and you will see the following screen:

There is much to say here, and we will come back to that one later, for now, we'll notice two things:

Cypress has generated sample test files for us automatically
We can run all these tests immediately by clicking on the "Run 20 integration specs" link on the right side of the window.

So, let's go! A chrome window opens and you will see something like that:

There is so much great stuff in there:

It just works
Cypress executed 120 (!) end-to-end tests against https://example.cypress.io in only 76 seconds
You can drill down into each test case into every single step and have a look at how the DOM and also the screen looked like at exactly that moment:
No installation of WebDriver or something similar
You have live reloading of all test scripts
It just works!

How does that work?

Cypress has the advantage that it is built entirely different from Selenium, as Cypress' founder Brian Mann explains nicely in this video:

https://www.youtube.com/watch?v=lK_ihqnQQEM

On the Cypress homepage, there is also a detailed page how it works, and the first one mentions exactly that major advantage.

Most end-to-end testing tools are Selenium-based, which is why they all share the same problems. To make Cypress different, we built a new architecture from the ground up. Whereas Selenium executes remote commands through the network, Cypress runs in the same run-loop as your application.

And as Cypress runs together with your application inside the same loop you can gain all advantages that were not possible with webdriver-based solutions.

But there is much more to say, and I especially love their asynchronous support. Whoever has tried to automate e2e-tests on a single page application (i.e. Angular-based) knows how cumbersome it is to wait for client-side changes on the UI triggered by JavaScript. Thread.sleep() everywhere in your test cases - you don't need to do that anymore with Cypress.

Adding Typescript support

We don't wanna use the sample scripts now, instead we're gonna create our own simple scripts against this blog. But before that, we're gonna add Typescript support to our project. I told you, I'm coming from the Java world, that's why I'd

Adding Typescript is quite easy, just create a tsconfig.json in your project root:

{  "compilerOptions": {    "target": "es5",    "lib": ["es5", "dom", "es2019"],    "types": ["cypress", "node"],    "esModuleInterop": true  },  "include": ["**/*.ts"]}

and add Typescript to your package.json:

{  "name": "cypress-hashnode-sample",  "version": "1.0.0",  "description": "sample application to test cypress against hashnode",  "scripts": {    "cy:open": "cypress open"  },  "devDependencies": {    "cypress": "9.2.0",    "typescript": "^4.5.4"  }}

That's it - we can now implement our specification files in a typesafe manner.

Adding the Cypress configuration file

Another important thing you should create right from the beginning is to create a Cypress configuration file called cypress.json in your root folder. You can use that file later for all kind of configuration properties, but for now we'll only add those two properties:

{  "projectId": "cypress-hashnode-sample",  "baseUrl": "https://agilecoding.io"}

Create a test specification

Now it's time to create our first test, create cypress/integration/blog-tests.ts with the following content:

context('Blog tests', () => {    it('Open homepage', () => {        cy.visit('/');    });})

Your Cypress UI should have detected that new script and should look like that now:

If you've accidentally closed the UI, open it again by calling npm cy:open.

Now run the test, a special instance of Chrome will open and you will see something like that:

Now we're gonna add our first assertion to our specification:

context('Blog tests', () => {    it('Open homepage', () => {        cy.visit('/');        cy.get('.blog-title').should('contain.text', "Agile Coding")    });})

The tests are being executed immediately (hot-reloading), and the result is:

Now let's do another test case, let's also input text and we're gonna search for an article:

    it("Search article", () => {        cy.visit('/');        cy.get('[data-title="Search"]').click();        cy.get('input[placeholder*="Type"]').type("Hello");        cy.get('input[placeholder*="Type"]').parent()                  .siblings()                  .children('a')                  .first()                  .click();        cy.get('[data-query="post-title"]')                  .should('have.text', 'Hello AgileCoding.io');    })

This will click the search icon on the top right corner, enter the text "Hello", click the search, and clicking on the first anchor of the result list. After that, we're asserting that the title of the resulting page is "Hello AgileCoding.io".

Please note that the way how we're accessing the result list here using .siblings() and .children is not a best practice as this test then heavily depends on the DOM of the page. If you have access to the DOM of your page under test, then use data-attributes as decribed in this official guide.

Anyways we can't change the DOM of Hashnode, so let's run the test:

It passes, great. If you're interested in the whole project, check out my Github account.

CI Integration

Now that we've seen that locally on our development machines everything works well, you might ask yourself: and what about Continuous Integration? You might want to run your tests regularly on a central server and you know how hard it is to set that up with webdriver-based technologies Selenium.

Luckily, there exist two great options here with Cypress as well:

The Cypress Dashboard: this is basically the business model of Cypress, it provides a hassle-free web platform where you can review all test runs including videos and screenshots without setting up your own infrastructure; it's a classical SaaS product, and it's really great.
The open-source alternative Sorry Cypress which requires you to have your own infrastructure and a bit of patience to set everything up properly, but once you've done that, you can run all tests on your local CI server as well (we've done that at Cloudflight).
Currents.dev which runs Sorry Cypress for you on a managed environment.

Summary

Seems I've found my holy grail of end-to-end testing for web applications. Cypress is really a game-changer and it even allows us to do Test-Driven-Development, meaning we are reproducing frontend bugs with Cypress before we fix them.

Cypress is really a game-changer, it allows us to integrate end-to-end testing into our daily routines.

There would be a lot to say about the details, but again: not only Cypress itself is great, but it also has awesome documentation and fantastic tutorials, so just dig into their resources.

Thanks, Cypress for finally making me smile when it comes to E2E.

Common Security Pitfalls in Spring Web Applications

Daniel Marth — Thu, 30 Apr 2020 06:39:23 GMT

Introduction

Cloudflight recently started a meetup collaboration with SBA Research. The first talk provided by us was held online on March 25th, 2020 and demonstrated some common security pitfalls in web applications developed with the Spring framework. In case you missed it, we will cover some of the content presented at the meetup talk in this blog post.

Spring is an open-source application framework for the Java platform. In several years of penetration testing, we discovered some common pitfalls when using Spring. We are not talking about bugs in the framework here, but some possibly unexpected default configurations, hidden features and framework internals unknown to beginners.

Content Negotiation

First, we have a look at a quite common problem. Given a REST API, we want to restrict the fields of an object that are sent to the client. Lets make it obvious that the field to be hidden is important by calling it secret:

public class State {    private Integer id;    private String secret ;    // Constructors, getters, setters, other fields, ...}

Following method in a REST controller will handle the requests:

@RequestMapping(value="/state/{id}", method=RequestMethod.GET)@ResponseBodypublic State get(@PathVariable("id") Integer id) {    return repository.get(id);}

In order to send it to the client as XML, we add the following annotations to the State class:

@XmlRootElement@XmlAccessorType(XmlAccessType.FIELD)public class State {    private Integer id;    private String secret ;    // Constructors, getters, setters, other fields, ...}

Of course, we do not want secret to be send to the client, so we add the @XmlTransient annotation to the field:

@XmlRootElement@XmlAccessorType(XmlAccessType.FIELD)public class State {    private Integer id;    @XmlTransient    private String secret ;    // Constructors, getters, setters, other fields, ...}

In fact, this solution works perfectly when the client requests XML. But what if this is not the case?

Spring MVC also supports the serialization as JSON by default. There are several (configurable) ways on how to request a different content type from a Spring REST controller, one is the Accept HTTP header. Setting it to application/json will happily return the State object as JSON. As the @XmlTransient annotation to hide the secret field is specific to the XML format, it is not active when requesting data in the JSON format and the secret field will be sent to the client.

One possibility to improve the situation is to add another annotation like @JsonIgnore for JSON to the field. However, this is not really a reliable solution. If the underlying serialization library is replaced, the annotation might be ignored by the changed implementation. A safer but verbose way to prevent this issue is to create a separate class without the secret field, which is then used to send the object to the client. Finally, it is possible to restrict the content type to the desired one either per controller class, per controller method or globally for the whole application.

Unprotected Dependency Endpoints

There are several awesome libraries which support the development and monitoring of applications. For example, JavaMelody only requires a single dependency to be added to the project to be up and running. It then provides a nice web-based overview of the state of the application (e.g. resource usage, HTTP sessions, processes on the server, ).

However, with this simplicity, it is tempting to overlook the security of the automatically exposed endpoints. These libraries may leak sensitive information or provide dangerous debug functionality without the developers being aware of that. Exact attack scenarios naturally depend on the corresponding library and its functionality.

To avoid exposing endpoints unintentionally, regularly check your direct but also transitive dependencies. Sadly, this is not really feasible for large projects with potentially huge dependency graphs. The Spring Boot Actuator provides a REST endpoint listing all URL mappings (/actuator/mappings). You might be surprised to not see the /monitoring endpoint listed here. Because JavaMelody is implemented as a filter, it is not listed in the URL mappings. We can add a custom Actuator endpoint that lists all filters:

@Component@Endpoint(id="filters")public class FilterEndpoint {    private ServletContext servletContext;    @Autowired    public FilterEndpoint(ServletContext servletContext) {        this.servletContext = servletContext;    }    @ReadOperation    public List<String> getFilters() {        return new ArrayList<>(servletContext.getFilterRegistrations().keySet());    }}

After activating the endpoint in the application properties, it serves the names of all filters under /actuator/filters.

Another very simple possibility is to check the route mappings and filters that are printed by Spring Boot at startup.

Note that although the integration of Spring Security automatically requires authentication for all endpoints by default, applications may allow users to register accounts themselves and thus become authenticated rather easily.

Conclusion

In this blog entry, we had a look at two common security pitfalls in Spring web applications.

Spring MVC is configured to be able to serialize objects in REST controllers to XML and JSON by default. Be cautious when using format-specific annotations in serialized objects.
Dependencies might expose additional endpoints without the developers being aware of that. Make sure you have an overview of your dependencies and routes. Tools like the Spring Boot Actuator can be of great help here.
Finally, we want to emphasize that Spring is an amazing framework for application development. Dont let a few pitfalls stop you from digging deeper into it.

Thats it, see you at the next meetup!

Spring's @Transactional does not rollback on checked exceptions

Klaus Lehner — Wed, 08 Apr 2015 05:22:54 GMT

Were using the Spring Framework in most of our applications (and thus also in the Cloudflight Platform) and are really satisfied with it.

One of the big advantages is the the declarative transaction handling using the @Transactional attribute.

import org.springframework.transaction.Transactional;@Transactionalpublic class MyService implements IMyService {  public List getResults () {    // do something  }  public void foo() {    throw new java.lang.UnsupportedOperationException();  }  public void bar() {    throw new java.lang.Exception();  }}

That simple annoation on class managed by a Spring ApplicationContext causes all method calls onto that service to be bound to a transaction. The transaction is committed after the method call has left the service again and its rollbacked for the case an exception is thrown (e.g. after calling the (quite silly) method foo()).

But be careful: Only unchecked exceptions (that is, subclasses of java.lang.RuntimeException) are rollbacked by default. For the case, a checked exception is thrown, the transaction will be committed!

The Spring documentation explains that as follows:

While the EJB default behavior is for the EJB container to automatically roll back the transaction on a system exception (usually a runtime exception), EJB CMT does not roll back the transaction automatically on an application exception (that is, a checked exception other than java.rmi.RemoteException). While the Spring default behavior for declarative transaction management follows EJB convention (roll back is automatic only on unchecked exceptions), it is often useful to customize this.

And that customization can be done very easily by just adding the parameter rollBackFor to the @Transactional attribute:

import org.springframework.transaction.Transactional;@Transactional(rollbackFor = Exception.class)public class MyService implements IMyService {  public List getResults () {    // do something  }  public void foo() {    throw new java.lang.UnsupportedOperationException();  }  public void bar() {    throw new java.lang.Exception();  }}

In that case, the transaction will even be be rollbacked on a call to the method bar().

Cloudflight Engineering Blog

Elevating Test Automation Excellence: Leveraging Interaction Modes in Team Topologies

Unlocking Interaction Modes: Empowering Enabling Teams

Team API

Our Journey - Success Through Effective Interaction Modes

Enhancing Test Automation with Team Topologies: Leveraging Enabling Teams

Understanding Team Topologies

Enabling Teams and Test Automation

Case Study

Conclusion

Lakehouse: Securing data access

Key Takeaways

Starting Point

Strategy

Share SQL Endpoint

Row-Level Security

Column-Level Security

Data Masking

Building reports with semantic models

Rapid Development with React-Admin and Fastify

Introduction

Tech stack

React-Admin

Pros

Cons

Fastify

Pros

Cons

Lessons Learned

Microsoft Power BI

Key Features

Licenses

Integrating Power BI Reports

Integrating into Power Apps or Power Pages

Integrating into a Custom Application

Row Level Security

Conclusion

Reliable communication using the Transactional Outbox Pattern

Transactional Outbox Pattern to the Rescue

Key Components of the Transactional Outbox Pattern:

Sequence Diagram

Implementation

Scheduled task (Email Relay)

Email Sender

Usage

Conclusion

Microsoft Power Platform

Overview

Pros & Cons

Pricing & Costs

Development

Testing

Deployment

Scale, Expand & Maintain...ability

i18n ("Internationalization")

Lessons Learned

Rapid Development with Strapi and Vue.js

Introduction

Tech stack

Strapi

Plugins

Performance

Mail Sender

VueJS

OpenAPI generator

Postgres

Spring boot - where is it?

Pros/cons discussions

Pros

Cons

Lessons Learned

Navigating Efficient Web Application Development: Cloudflight's Architectural Insights

Architectural Approaches

Goals

A simple real-world scenario

Test automation: API-based model

The King Is Dead, Long Live The King

What is Page Object Model (POM)?

The Alternative: API-based Model

Conclusion