## Abstract

The authors present preliminary results on successfully training a recurrent neural network to learn a spatial grammar embodied in a data set, and then generate new designs that comply with the grammar but are not from the data set, demonstrating generalized learning. For the test case, the data were created by first exercising generative context-free spatial grammar representing physical layouts that included infeasible designs due to geometric interferences and then removing the designs that violated geometric constraints, resulting in a data set from a design grammar that is of a higher complexity context-sensitive grammar. A character recurrent neural network (char-RNN) was trained on the positive remaining results. Analysis shows that the char-RNN was able to effectively learn the spatial grammar with high reliability, and for the given problem with tuned hyperparameters, having up to 98% success rate compared to a 62% success rate when randomly sampling the generative grammar. For a more complex problem where random sampling results in only 18% success, a trained char-RNN generated feasible solutions with an 89% success rate. Further, the char-RNN also generated designs differing from the training set at a rate of over 99%, showing generalized learning.

## 1 Introduction

The authors are currently developing a process to optimize designs complying with spatial grammars, using recurrent neural networks (RNNs) to support the optimization process [1]. In our process, the RNN is trained to generate high scoring designs in two steps: first, initially train the RNN to generate designs that comply with the spatial grammar, in effect learning the proper syntax of the design space. Second, once able to generate feasible designs, the RNN is further trained using performance-based simulations as the scoring mechanism, in order to learn the semantics of high scoring designs. This paper addresses the first step—describing, detailing, and exploring hyperparameters of a character recurrent neural network (char-RNN) to learn a design grammar with generation rules and constraints defined separately.

When the formal definition of the grammar is known, the RNN can be trained by first exercising the grammar randomly to generate a feasible set of designs and then training the RNN to accurately classify the data set. For the problem considered here, a spatial grammar for modular multi-hull sailboats, the grammar *P* defining the feasible space is not formally known. Instead, *P* is defined as the intersection between two other known grammars; a generative spatial grammar *G* that is partially feasible in that is capable of generating spatial configurations that may have geometric conflicts, and a constraint-based grammar *C* capturing geometric conflicts of overlapping mesh geometries. Formally, *G* is a context-free grammar (CFG) [2], using a set of production rules [3] that can generate a diverse set of candidate designs. While *P* is not formally known, $P=G\u2229C$ can be shown to be a more complex grammar than *G* according to the Chomsky Hierarchy, at a minimum a *context-sensitive grammar*, as shown in the Appendix.

Analysis shows that the char-RNN was able to effectively learn the spatial grammar with high reliability, and for the given problem with tuned hyperparameters, having up to 98% success rate compared to a 62% success rate when randomly sampling the generative grammar. For a more complex problem where random sampling results in only 18% success, a trained char-RNN generated feasible solutions with an 89% success rate. Further, the char-RNN also generated designs differing from the training set at a rate of over 99%, showing generalized *learning*.

This paper summarizes the preliminary results, providing a background in related work, problem description, char-RNN setup, and results of training experiments. We close with a discussion of the results and comments on the applicability and generalizability of using artificial intelligence (AI) to represent grammars.

## 2 Literature Review

Recent developments have focused on using deep learning to classify images, speech, and text [4]. In addition to classification, deep learning has been applied to generative approaches including generative adversarial networks (GANSs) [5–7], Variational Autoencoders (VAEs) [8], and char-RNNs [9]. The generative approaches use existing data to train a deep neural network, which is then sampled to generate similar but new data. An approach to text generation includes using a char-RNN to learn and generate new string samples based on the text included in a training document. Training of the char-RNN updates the internal state of the network to bias the output character selection probabilities based on sampled input and output strings drawn from the document; the recurrent aspect means that the network has a form of long short-term memory. This approach allows the underlying structure of the document (i.e., its grammar) to be learned by the neural network by exposing it to repeated character-level language [10].

Shape grammars and similar spatial grammar-based design approaches are well studied in domains such as architecture [11], mechanical design [12,13], and aircraft design [14]. The closest work to ours is by Ruiz-Montiel et al. [15] where they directly address the difficulty of creating a shape grammar with all the design requirements embedded it (what they refer to as an *expert* shape grammar). Their solution is to first define a *naïve* shape grammar that can generate a broad variety of designs, but do not embody all of the constraints, so when the exercised result in a majority that are infeasible. The authors then apply reinforcement learning techniques to the naïve grammar in order to learn how to generate feasible designs. Their efforts differ from ours in how the generative process is embodied. In their work, the set of grammar rules are augmented with policies that restrict the next rule based on learned behavior. In our work, generation is completely subsumed into the char-RNN. However, we share the basic theme of using simple grammar and then additional constraints learned through a training process to simplify grammar development.

## 3 Definition of the Design Grammar

A formal language using a context-free grammar *G* defines a space of possible assembly configurations using a Backus-Naur form (BNF) notation [16]. The generative grammar *G* consists of hulls, positively and negatively buoyant shapes, hydrodynamically shaped and round cross-section struts, and foils, along with the rules for component interconnection, constituting a simple shape grammar [17,18]; Fig. 1 shows the grammar. Placement of sails are at the forward, mid, and aft positions along the main hull and can be small (*s*), big (*b*), or empty (*n*). Additionally, the hull has three sets of slots (forward, mid, and aft) where foils and rods are connected. Each of these three sets of hull slots has a total quantity of five slots apiece positioned underneath the hull. The slots can be filled with empty (*e*), a foil (*f*), or a strut assembly. If a strut assembly is added, a connection (foil or rod) is inserted into the slot and a node component (floater *l*, sinker *h*, or joiner *j*) is attached to the end of the connection. Each node component includes a set of connections positioned on the bottom of the component, where additional foils and strut assemblies may be added. This relationship extends recursively allowing highly complex assemblies using the small set of production rules. Figure 2 shows a full assembly layout and nomenclature.

Generation of random assembly configurations starts with the “main” symbol, and the production rules of *G* are exercised, randomly choosing an option where choices are available until all symbols in the string are terminal symbols. Figure 3 shows models and string representations of feasible and infeasible configurations based on the constraints; Fig. 4 shows examples of feasible random boat assembly configurations. The grammar is *recursive* meaning that expanding a non-terminal node can eventually lead to the same non-terminal again. Tuning the probabilities underlying the random choices within the rules allows biasing away from choosing structural assemblies (“sa”) for slots. This reduces the expected depth of the generated string, and therefore the expected complexity of the geometry. The effect of this “strut assembly probability” is demonstrated in Fig. 5, where several probabilities are plotted versus the average feasibility of designs that satisfy *C*. The results show that by allowing more “sa” choices, the *complexity* of the assembly increases in concert with the infeasibility of design, and Fig. 6 shows the average sampled string length of designs that satisfy $G\u2229C$, based on the choice probability for the strut assembly. In this paper, we will elect to use a “sa” probability of 0.3 when exploring the effect of hyperparameters when training the char-RNN in Sec. 5.

## 4 Char-Recurrent Neural Network Architecture

The char-RNN is a multi-layer network that accepts a sequence of characters to the model and uses an optimizer to tune the long- and short-term memory (LSTM) cell state parameters to predict probabilities of output characters. The key to this network design is the use of LSTM cells [19]. These LSTM cells allow data to persist in the network and provide a tunable “memory” for the network to learn syntax and semantics of the character-level language; Fig. 7 shows the architecture for a char-RNN. Research has shown how LSTM RNNs can be used to generate complex sentences having long-range structure and syntax [10] through character-level representation. Although many other forms of natural language processing AI exist, we choose to investigate char-RNNs as the proposed grammar is compact with a small vocabulary, while having long-range structure from its recursive production rules.

Training data are created for the network by forming input and output batch samples from the training document. Sequence length defines the number of characters to include in the input and output samples, where output samples are generated by offsetting the input characters by one. The corresponding pairs of samples are used to tune the cell state parameters of the char-RNN minimizing the average log loss function of the predicted output characters. Each output character in the sequence length has a softmax output array that determines the probability of sampling individual characters in the vocabulary of the network. There are a number of hyperparameters to tune when using a char-RNN including the size of the LSTM cell state, the number of hidden layers in the network, sequence length, batch size, number of epochs, learning rate, and decay rate of the network. The three critical variables are RNN size (*s*, the complexity), sequence length (*l*, the memory), and learning rate (*η*, the optimization rate). In our next experimental section, we will explore the effect of training data and hyperparameters on the effectiveness of a char-RNN to synthesis the boat grammar.

## 5 Results

### 5.1 Training Details.

All char-RNN models use deep LSTMs with two RNN layers, a variable number of cells (*s*), and with fixed input and output vocabularies. We used a naïve softmax over the output to predict the next character in the sequence. The complete training details are given below:

generate training document of specified size;

build the char-RNN while initializing all of the LSTM's parameters (

*s*,*l*,*η*, and two layers);use Adam optimizer [20] with a fixed decay rate of 0.97 and 100 epochs; and

use batches of 50 sequences to compute the gradient.

In our experiments, we test the performance of the char-RNN against random sampling to assess the efficacy of the proposed method. If the char-RNN can outperform random, then it suggests a learned internal representation of the intersection grammar.

### 5.2 Experiment: Training Document.

In this experiment, we explore the performance of the char-RNN model to generate valid strings with different training data of various sizes. The training data should include the least number of designs that satisfies *P* to reduce the cost of generating a training data set, while providing sufficient examples to accurately train the char-RNN. The experiment explored the quality of a char-RNN to learn the grammar by iterating with 100 unique training data sets for four training document sizes (1000, 2000, 5000, and 10,000) and fixed hyperparameters (*s*, *l*, *η*) = (128, 128, 0.002) resulting in 400 uniquely trained models. Figure 8 compares the valid geometry generation rates for the char-RNN simulation experiment of all 400 unique char-RNN models.

The resultant statistics shows that a random sampling of the grammar has a 62% probability of *generating a feasible* design, whereas a trained char-RNN can achieve a median 96% probability with 10,000 designs. This rate decreases with the size of the training document, with 5000 having ∼92%, 2000 having ∼79%, and 1000 having ∼53% probability. Additionally, the average training times for each model are 140, 250, 593, and 1218 s for training documents of size 1000, 2000, 5000, and 10,000, respectively (using a desktop PC with an NVIDIA Quadro P1000 GPU, Intel Xeon W-2133 CPU (6 × 3.6 GHz), and 32 GB RAM).

An important question concerning this model choice is whether the network has memorized the sequences or if it can generalize the grammar. Post-analysis of each trained model showed that >99% of all sampled strings satisfying the constraints were novel. The high rate of new configurations not included in the training document is a result of the key decision points in the boat grammar.

### 5.3 Experiment: Hyperparameter Exploration.

The char-RNN has many hyperparameters, and fine-tuning their performance requires a search over this hyperspace. Table 1 shows the design of experiments performed on the network model design using a fixed training document of 5000 designs (identified in the previous experiment as a good balance of performance and training time while affording room for improvement). The experiment varies the hyperparameters of (*s*, *l*, *η*), and the performance metric is the fraction of generated strings that satisfies *P*.

Sequence length | RNN size | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

128 | 256 | 384 | 512 | ||||||||||||||||||||||

32 | 64 | 96 | 128 | 160 | 192 | 32 | 64 | 96 | 128 | 160 | 192 | 32 | 64 | 96 | 128 | 160 | 192 | 32 | 64 | 96 | 128 | 160 | 192 | ||

Learning rate | 0.001 | 0.774 | 0.927 | 0.907 | 0.912 | 0.745 | 0.763 | 0.794 | 0.891 | 0.937 | 0.923 | 0.894 | 0.921 | 0.766 | 0.872 | 0.883 | 0.927 | 0.900 | 0.918 | 0.781 | 0.898 | 0.891 | 0.887 | 0.927 | 0.909 |

0.002 | 0.824 | 0.916 | 0.905 | 0.933 | 0.917 | 0.890 | 0.779 | 0.899 | 0.894 | 0.908 | 0.945 | 0.943 | 0.778 | 0.916 | 0.906 | 0.940 | 0.950 | 0.934 | 0.854 | 0.870 | 0.916 | 0.950 | 0.955 | 0.958 | |

0.003 | 0.777 | 0.910 | 0.913 | 0.966 | 0.946 | 0.939 | 0.775 | 0.841 | 0.917 | 0.954 | 0.978 | 0.932 | 0.831 | 0.913 | 0.897 | 0.941 | 0.949 | 0.925 | 0.835 | 0.851 | 0.816 | 0.931 | 0.925 | 0.949 | |

0.004 | 0.781 | 0.928 | 0.879 | 0.925 | 0.925 | 0.916 | 0.779 | 0.883 | 0.932 | 0.947 | 0.934 | 0.946 | 0.847 | 0.917 | 0.901 | 0.935 | 0.926 | 0.944 | 0.805 | 0.935 | 0.904 | 0.932 | 0.929 | 0.909 | |

0.005 | 0.766 | 0.933 | 0.942 | 0.896 | 0.926 | 0.954 | 0.759 | 0.813 | 0.884 | 0.945 | 0.974 | 0.954 | 0.785 | 0.905 | 0.894 | 0.921 | 0.939 | 0.928 | 0.811 | 0.904 | 0.905 | 0.934 | 0.900 | 0.930 | |

0.006 | 0.685 | 0.905 | 0.933 | 0.933 | 0.948 | 0.921 | 0.733 | 0.887 | 0.881 | 0.952 | 0.935 | 0.962 | 0.744 | 0.904 | 0.849 | 0.936 | 0.962 | 0.943 | 0.801 | 0.934 | 0.937 | 0.908 | 0.950 | 0.937 |

Sequence length | RNN size | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

128 | 256 | 384 | 512 | ||||||||||||||||||||||

32 | 64 | 96 | 128 | 160 | 192 | 32 | 64 | 96 | 128 | 160 | 192 | 32 | 64 | 96 | 128 | 160 | 192 | 32 | 64 | 96 | 128 | 160 | 192 | ||

Learning rate | 0.001 | 0.774 | 0.927 | 0.907 | 0.912 | 0.745 | 0.763 | 0.794 | 0.891 | 0.937 | 0.923 | 0.894 | 0.921 | 0.766 | 0.872 | 0.883 | 0.927 | 0.900 | 0.918 | 0.781 | 0.898 | 0.891 | 0.887 | 0.927 | 0.909 |

0.002 | 0.824 | 0.916 | 0.905 | 0.933 | 0.917 | 0.890 | 0.779 | 0.899 | 0.894 | 0.908 | 0.945 | 0.943 | 0.778 | 0.916 | 0.906 | 0.940 | 0.950 | 0.934 | 0.854 | 0.870 | 0.916 | 0.950 | 0.955 | 0.958 | |

0.003 | 0.777 | 0.910 | 0.913 | 0.966 | 0.946 | 0.939 | 0.775 | 0.841 | 0.917 | 0.954 | 0.978 | 0.932 | 0.831 | 0.913 | 0.897 | 0.941 | 0.949 | 0.925 | 0.835 | 0.851 | 0.816 | 0.931 | 0.925 | 0.949 | |

0.004 | 0.781 | 0.928 | 0.879 | 0.925 | 0.925 | 0.916 | 0.779 | 0.883 | 0.932 | 0.947 | 0.934 | 0.946 | 0.847 | 0.917 | 0.901 | 0.935 | 0.926 | 0.944 | 0.805 | 0.935 | 0.904 | 0.932 | 0.929 | 0.909 | |

0.005 | 0.766 | 0.933 | 0.942 | 0.896 | 0.926 | 0.954 | 0.759 | 0.813 | 0.884 | 0.945 | 0.974 | 0.954 | 0.785 | 0.905 | 0.894 | 0.921 | 0.939 | 0.928 | 0.811 | 0.904 | 0.905 | 0.934 | 0.900 | 0.930 | |

0.006 | 0.685 | 0.905 | 0.933 | 0.933 | 0.948 | 0.921 | 0.733 | 0.887 | 0.881 | 0.952 | 0.935 | 0.962 | 0.744 | 0.904 | 0.849 | 0.936 | 0.962 | 0.943 | 0.801 | 0.934 | 0.937 | 0.908 | 0.950 | 0.937 |

The results from the experiment show that the trained char-RNN having hyperparameters (*s*, *l*, *η*) = (256, 160, 0.003) resulted in the highest percentage of feasible strings (97.8%). The *worst* performing set of parameters achieved a 68.5% feasibility, which is still 7.5% greater than the base grammar. Figure 9 shows a graphical representation of the experiment with performance versus training time where each point is a combination of hyperparameters. An influential impact on training success is the sequence length, as larger sequences result in batches that read a significant portion of a design string. Additionally, smaller values of RNN size are faster to train as there are not many parameters in the model's internal representation to compute.

### 5.4 Experiment: Exploring the Tuned Char-Recurrent Neural Networks.

In this experiment, we use the highest performing network from the previous section (document size = 5000, (*s*, *l*, *η*) = (256, 160, 0.003)) and explore the complexity of designs generated. The complexity is characterized by the resulting geometries of sampled strings by the number of components (joiners, floaters, and sinkers) included in the assembly and the maximal branch length in the assembly tree. For example, the boat displayed in Fig. 10 has four components with a maximum branch length of eight. The smallest value for the number of components and the maximum branch length is one (a hull without any connections and end foils).

Table 2 displays the results of the geometric analysis using 10,000 valid samples from both the trained char-RNN and a random sampling of the grammar including mean, standard deviation, and covariance between the metrics. The distribution and relationship between the number of components and maximum branch length differ between the models and is shown visually in Fig. 11. On average, the char-RNN sampled strings are more restrictive in their geometrical structure, since the average number of components and maximum branch length is lower compared to random string generation. Further, a Cramér-von Mises [21] test indicates that the two distributions are statistically different.

Number of components | Maximum branch length | ||
---|---|---|---|

Random | Mean | 2.637 | 4.210 |

Std dev | 1.898 | 2.271 | |

Covariance | 3.894 | ||

Char-RNN | Mean | 2.097 | 3.634 |

Std dev | 1.711 | 1.711 | |

Covariance | 1.979 |

Number of components | Maximum branch length | ||
---|---|---|---|

Random | Mean | 2.637 | 4.210 |

Std dev | 1.898 | 2.271 | |

Covariance | 3.894 | ||

Char-RNN | Mean | 2.097 | 3.634 |

Std dev | 1.711 | 1.711 | |

Covariance | 1.979 |

Additionally, we trained the char-RNN (document size = 5000, (*s*, *l*, *η*) = (256, 160, 0.003)) using a document generated with a strut assembly probability equal to 0.59. As discussed in Sec. 3, this represents a difficult case to satisfy *C* as shown in Fig. 5, having an 18.4% success rate). The trained char-RNN was able to generate feasible designs satisfying *C* at a rate of 89% and, again, demonstrated generalized learning having 99.8% novelty compared to its training data.

Novelty is only one metric that can be used to describe the diversity of designs. Strings can be compared using edit metrics such as Levenshtein distance that computes the minimum number of single-character edits (insertions, deletions, or substitutions) required to convert one string into another. A normalized distance allows a comparison of pairs between the training document and generated strings as a percent difference. Figure 12 shows a distribution of minimum pairwise edit distances between the training document and newly generated strings from the char-RNN. The pairwise edit distance for each generated string is the minimum Levenshtein distance when compared to all strings in the training document, and the resulting distribution of pairwise edit distances for all generated strings is shown in Fig. 12. Also, a reverse pairwise edit distance analysis is performed going from the training document to char-RNN generated designs. On average, it requires approximately 10% edits by string length to convert between the two data sets and that the edit distances are unlikely to be zero. This further suggests that the char-RNN is not memorizing the data, nor is it making small perturbative changes from the training document.

## 6 Conclusions

The following conclusions, supported by the analysis and experimentation of using a char-RNN on generative grammar, have been presented in this technical brief:

A char-RNN can learn the feasible design grammar that is the

*intersection of the generative and constraint-based grammars*.Robustness of the char-RNN on input data showed that more data are more effective at learning the intersection grammar, but significant learning can be achieved with as little as 2000 samples from the example design grammar.

Exploration of the hyperparameter space showed sensitivity of the char-RNN to sequence length and RNN size especially

The complexity of designs generated by the char-RNN is comparable with that of the designs in the grammar, and

Experiments show that the char-RNN demonstrated generalized learning by producing novel designs with >99% probability.

The computational benefit of using the trained char-RNN is based on constraint analysis run times, where the efficiency of higher feasible sampling rates needs to offset the cost of training the char-RNN. Further, the char-RNN can be continually updated to further increase its generative power and reduce computational waste on invalid designs. Lastly, the use of an AI to learn the design grammar allows for simple expression of production rules and constraint rules separately. Allowing the AI to learn the intersection affords designers the opportunity to explore complex/nonlinear constraints using secondary calculations and have the AI subsume generation of feasible new designs quickly.

## Acknowledgment

The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.

## Funding Data

DARPA (Grant No. HROO1 11820008; Funder ID: 10.13039/100000185).

### Appendix

To show that *P* is a context-sensitive grammar (CSG), consider a simplified grammar for a 2D craft where the main hull has just two strut attachment points. At each point, we can attach a strut that either points down and left (labeled *l*) or down and right (labeled *r*). At the end of each of those, we can repeat the process zero or more times. Now consider a language of infeasible designs *M* where the strut assemblies are mirror images of each other, and meet directly beneath the craft, thus geometrically interfering with each other (see Fig. 13). We can model this infeasible set of designs as the set of strings of the form *l*^{n}*r*^{n}*r*^{n}*l*^{n}, or if we simplify by letting *s* = *rr*, then *l*^{n}*s*^{n}*l*^{n}. Strings of this form (and therefore the language *M*) are known to not be CFGs, instead they are modeled by CSGs [22]. Since the set of CSGs are closed under complement, then the language of all designs that are not members of *M*, and so do not violate this geometric constraint, is also a CSG, to include our design grammar *P*.