Database Migrations using Flyway in dynamic multi-tenant Spring Boot applications

Introduction and use cases

Multi-tenancy is an architecture in which a single instance of a software application serves multiple customers. Each customer is called a tenant. Software as a Service (SaaS) platforms like Shopify provide a way for anyone to build up their own dropshipping business in a matter of hours. No code, no hassle! What makes these SaaS platforms possible are multi-tenant architectures that provide data consistency across all tenants.

There are 3 ways in which one could achieve this, all described in this article.

Database per Tenant: Each Tenant has its own database and is isolated from other tenants.
Shared Database, Shared Schema: All Tenants share a database and tables. Every table has a Column with the Tenant Identifier, that shows the owner of the row.
Shared Database, Separate Schema: All Tenants share a database, but have their own database schemas and tables.

multi-tenant architecture models

Interestingly enough, Florian Weingarten, a former engineering lead and manager at Shopify describes their architecture as being of the second type in this talk at SreCon16. Needless to say, the shared database approach has problems of its own but surely, data consistency is not one of them. One can easily provide the same database structure to all tenants as there is just one schema.

The tricky part of having a shared database is dealing with large amounts of data that tend to accumulate in its tables. Each table carries with it a tenantId column and all queries have to basically have appended a where clause that discriminates based on this id. Very large tables can pose performance issues, thus, sharing data either on different databases altogether or on different schemas within the same database seems to be a better choice. With the added difficulty of setting up a database for each tenant, the multiple schemas approach seems to offer the best trade-off.

What happens when there are separate schemas, one for each tenant?

Maintaining data consistency and ensuring migrations don't introduce unwanted side effects all the while introducing new schemas dynamically (whenever a new tenant registers) seems like a handful. But it's actually pretty simple. Here's how you can manage it in a simple Spring Boot app.

Spring boot application example

Configuration

The code for this can be found in this repository. We will be modeling a simple application with just one entity called Book.

We'll be needing a db connection with two databases:

metadata (single schema, containing data about users, tenants, and the links between them)
tenant (multiple schemas, containing data belonging to each tenant in particular)

Your application.properties will define those two separate databases and their credentials.

Next, we have to tell Spring where to find the entities for each one of these databases in order to properly map them. That implies a package structure that will split the entities (and/or functional modules) in two. meta and tenant.

We configure this by having two separate configuration classes, MetaDbConfig and TenantDbConfig.

@Configuration
@EnableJpaRepositories(
    basePackages = "com.multitenant.multitenancy.meta",
    entityManagerFactoryRef = "metaEntityManagerFactory",
    transactionManagerRef = "metaTransactionManager")
public class MetaDbConfig {

  @Primary
  @Bean
  @ConfigurationProperties("meta.datasource")
  public DataSource metaDataSource() {
    return DataSourceBuilder.create().build();
  }

  @Primary
  @Bean
  public LocalContainerEntityManagerFactoryBean metaEntityManagerFactory(
      EntityManagerFactoryBuilder builder) {
    return builder
        .dataSource(metaDataSource())
        .packages("com.multitenant.multitenancy.meta")
        .persistenceUnit("metaDB")
        .build();
  }

  @Primary
  @Bean
  public PlatformTransactionManager metaTransactionManager(
      @Qualifier("metaEntityManagerFactory") EntityManagerFactory userEntityManagerFactory) {
    return new JpaTransactionManager(userEntityManagerFactory);
  }
}

@Configuration
@EnableJpaRepositories(
    repositoryFactoryBeanClass = QuerydslJpaRepositoryFactoryBean.class,
    basePackages = "com.multitenant.multitenancy.tenant",
    entityManagerFactoryRef = "tenantEntityManagerFactory",
    transactionManagerRef = "tenantTransactionManager")
public class TenantDbConfig {

  @Autowired
  private JpaProperties jpaProperties;

  @Bean
  JpaVendorAdapter jpaVendorAdapter() {
    return new HibernateJpaVendorAdapter();
  }

  @Bean
  @ConfigurationProperties("tenant.datasource")
  public DataSource tenantDataSource() {
    return DataSourceBuilder.create().build();
  }

  @Bean
  public LocalContainerEntityManagerFactoryBean tenantEntityManagerFactory(
      MultiTenantConnectionProvider multiTenantConnectionProviderImpl,
      CurrentTenantIdentifierResolver currentTenantIdentifierResolverImpl) {

    Map<String, Object> jpaPropertiesMap = new HashMap<>(jpaProperties.getProperties());
    jpaPropertiesMap.put(Environment.MULTI_TENANT, MultiTenancyStrategy.SCHEMA);
    jpaPropertiesMap.put(Environment.MULTI_TENANT_CONNECTION_PROVIDER, multiTenantConnectionProviderImpl);
    jpaPropertiesMap.put(Environment.MULTI_TENANT_IDENTIFIER_RESOLVER, currentTenantIdentifierResolverImpl);

    LocalContainerEntityManagerFactoryBean em = new LocalContainerEntityManagerFactoryBean();
    em.setDataSource(tenantDataSource());
    em.setPackagesToScan("com.multitenant.multitenancy.tenant");
    em.setJpaVendorAdapter(this.jpaVendorAdapter());
    em.setJpaPropertyMap(jpaPropertiesMap);
    em.setPersistenceUnitName("tenantDB");
    return em;
  }

  @Bean
  public PlatformTransactionManager tenantTransactionManager(
      @Qualifier("tenantEntityManagerFactory") EntityManagerFactory tenantEntityManagerFactory) {
    return new JpaTransactionManager(tenantEntityManagerFactory);
  }
}

We can already see something of interest when it comes to multi-tenancy. These three properties from TenantDbConfig

jpaPropertiesMap.put(Environment.MULTI_TENANT, MultiTenancyStrategy.SCHEMA);
jpaPropertiesMap.put(Environment.MULTI_TENANT_CONNECTION_PROVIDER, multiTenantConnectionProviderImpl);
jpaPropertiesMap.put(Environment.MULTI_TENANT_IDENTIFIER_RESOLVER, currentTenantIdentifierResolverImpl);

Both MultiTenantConnectionProvider and CurrentTenantIdentifierResolver are interfaces that need to be implemented.

First, one gets implemented in TenantConnectionProvider which gets the tenantDataSource injected in its constructor and has two methods of interest:

  @Override
  public Connection getConnection(String tenantIdentifier) throws SQLException {
    final Connection connection = getAnyConnection();
    connection.setSchema(tenantIdentifier);
    return connection;
  }

  @Override
  public void releaseConnection(String tenantIdentifier, Connection connection)
      throws SQLException {
    connection.setSchema(DEFAULT_SCHEMA);
    releaseAnyConnection(connection);
  }

This is where each hibernate connection gets its schema set.

Next, we have CurrentTenantIdentifierResolver that is implemented in TenantSchemaResolver

@Component
public class TenantSchemaResolver implements CurrentTenantIdentifierResolver {

  @Override
  public String resolveCurrentTenantIdentifier() {
    String tenantUUID = TenantContext.getCurrentTenant();
    return tenantUUID != null ? tenantUUID : DEFAULT_SCHEMA;
  }

  @Override
  public boolean validateExistingCurrentSessions() {
    return true;
  }
}

We see a TenantContext class which allows us to set the current tenant/schema from anywhere in the app (note that when doing so you must ensure the current transaction is not opened on any other schema, otherwise, you would need to manually create a new transaction)

There is a filter (AuthTokenFilter) that intercepts requests and takes the schema name from the headers and through this TenantContext class sets the current schema for any subsequent requests.

Flyway Migrations and the Tenant Pool

What about database migrations? Also, when are new tenants/schemas created?

Schemas are normally created whenever a new user registers. However, schema creation can be a computationally expensive operation and we want to offer new users a smooth transition into the application when registering. It would be great if we'd have schemas prepared before the users get registered and simply assign them to the users when they do.

Introducing the concept of TenantPool (or, schema pool)

A tenant pool is nothing but a pool o unused but created schemas waiting to be attributed to newly registered users. A scheduled job set to run as often as one likes fills up that tenant pool.

  @Scheduled(cron = "0 * * * * *")
  public void execute() {
    log.info("TenantPoolJob");
    tenantPoolService.fillUpTenantPool();
    log.info("TenantPoolJob finished");
  }

Finally, here, we introduce Flyway which will be our preferred database migration tool. It is lightweight, easily configurable, and can be programmatically used (which, in our context, as we have dynamic schemas is exactly what we need)

The TenantPoolService will trigger our FlywayService that has two methods: initNewTenantSchema and initMetadataSchema

  public void initNewTenantSchema(String schema) {
    Flyway tenantDbMigration =
        Flyway.configure()
            .dataSource(tenantDbUrl, tenantDbUsername, tenantDbPassword)
            .locations("classpath:migrations/tenant")
            .target(LATEST)
            .baselineOnMigrate(true)
            .schemas(schema)
            .load();
    tenantDbMigration.migrate();
  }

  public void initMetadataSchema() {
    Flyway tenantDbMigration =
        Flyway.configure()
            .dataSource(metaDbUrl, metaDbUsername, metaDbPassword)
            .locations("classpath:migrations/metadata")
            .target(LATEST)
            .baselineOnMigrate(true)
            .schemas(METADATA_SCHEMA_NAME)
            .load();
    tenantDbMigration.migrate();
  }

However, it will only call the initNewTenantSchema as the metadata schema is hopefully created by now.

The same methods are used when first running the application.

@SpringBootApplication
@RequiredArgsConstructor
public class InitRunner {

  private final FlywayService flywayService;
  private final ApplicationContext context;

  public static void main(String[] args) {
    new SpringApplicationBuilder(InitRunner.class).web(WebApplicationType.NONE).run(args);
  }

  @PostConstruct
  public void run() {
    flywayService.initMetadataSchema();
    flywayService.initNewTenantSchema("public");
    System.exit(SpringApplication.exit(context, () -> 0));
  }
}

The last thing on our list is making sure that when introducing a new migration, all existing schemas suffer changes.

A FlywayConfig file is used for this purpose with a single method aptly called migrateFlyway()

  @PostConstruct
  public void migrateFlyway() {
    final Set<String> schemas = tenantService.getSchemas();

    schemas.forEach(
        tenant -> {
          Flyway tenantDbMigration =
              Flyway.configure()
                  .dataSource(tenantDbUrl, tenantDbUsername, tenantDbPassword)
                  .locations("classpath:migrations/tenant")
                  .target(LATEST)
                  .baselineOnMigrate(true)
                  .defaultSchema(tenant)
                  .load();
          tenantDbMigration.migrate();
        });

    Flyway metadataDbMigration =
        Flyway.configure()
            .dataSource(metadataDbUrl, metadataDbUsername, metadataDbPassword)
            .locations("classpath:migrations/metadata")
            .baselineOnMigrate(true)
            .target(LATEST)
            .load();

    metadataDbMigration.migrate();
  }

We fetch all existing tenants from the TenantService and apply new migrations to all of them. This happens automatically when the application starts which does lead to the drawback of having rather slow application startups when there are many tenants.

To ensure that all schemas contain all migrations in their flyway_database_history table, you must provide a baseline migration file in your respective migrations folders for both tenant and metadata databases.

Once again, all the code can be found in this repository. Hope you enjoyed a small introduction to how to work with multiple tenant architectures with database migrations in Spring Boot!