Monday, October 12, 2015

Good Design is Simple

I work for a software company, 1/3 of our staff are engineers and developers and we face real world problems that require a lot of strategic work and planning. Lately I’ve been doing a lot of research on development methodologies, project management processes and workflow proficiency. There is a lot out there and most of it is subjective, opinionated and you're left to determine whats best for your situation. But there is one thing that can be applied to almost anything and that is...
Good design is simple. You hear this from math to painting. In math it means that a shorter proof tends to be a better one. Where axioms are concerned, especially, less is more. It means much the same thing in programming. For architects and designers it means that beauty should depend on a few carefully chosen structural elements rather than a profusion of superficial ornament. (Ornament is not in itself bad, only when it's camouflage on insipid form.) Similarly, in painting, a still life of a few carefully observed and solidly modelled objects will tend to be more interesting than a stretch of flashy but mindlessly repetitive painting of, say, a lace collar. In writing it means: say what you mean and say it briefly.
It seems strange to have to emphasize simplicity. You'd think simple would be the default. Ornate is more work. But something seems to come over people when they try to be creative. Beginning writers adopt a pompous tone that doesn't sound anything like the way they speak. Designers trying to be artistic resort to swooshes and curlicues. Painters discover that they're expressionists. It's all evasion. Underneath the long words or the "expressive" brush strokes, there is not much going on, and that's frightening.
When you're forced to be simple, you're forced to face the real problem. When you can't deliver ornament, you have to deliver substance.
The above was written by Paul Graham and pulled from his site http://www.paulgraham.com/taste.html.

Thursday, October 8, 2015

myisamchk --sort-index and --analyze happy together?

Why use myisamchk

myisamchk is a tool used to check, repair, and optimize MyISAM tables. My company uses MyISAM tables to quickly update a large shared read only reference database. Normal dynamic data is kept in InnoDB tables and the MyISAM reference data is joined in queries. However, it’s worth noting that there are well documented issues, here and here, with mixing MyISAM and InnoDB tables together.

Cardinality is Key

The work flow of building the tables are:

  1. Pull data for many sources (third party)
  2. Compile data (app)
  3. build tables (lots of inserts)
  4. optimize (analyze and sort indexes)
  5. push to production (read only)

From this point we’re going to focus on the step 4, optimizing the table data.

Why Optimize?

After large amounts of data is inserted into a table, step 3 above, it is crucial to refresh the table indexes and give MySQL the best possible chance for the best query execution plan. To do this, you must run OPTIMIZE TABLE, ANALYZE TABLE or use the myisamchk tool when it is safe to use.

In this case the tables are being optimized outsize of MySQL, so the myisamchk command line comes in hand here. The command being ran to do this work:

[root@host]# myisamchk -vvv --analyze --sort-index test.MYI

When executed this output is displayed:

- Sorting index for MyISAM-table 'test.MYI'

At first this output looks normal, but anyone who has used myisamchk knows that the --analyze option outputs much more as show here:

[root@hoss]# myisamchk -vvv --analyze test.MYI
Checking MyISAM file: test.MYI
Data records:   60771   Deleted blocks:       0
- check file-size
- check record delete-chain
No recordlinks
- check key delete-chain
block_size 1024:
- check index reference
- check data record references index: 1
- check data record references index: 2
- check data record references index: 3

The Problem / Bug

If the --sort-index option is used with  the --analyze option, --sort-index is ignored without an error or warning from the program that the option will be ignored and only a sort index will be performed.

From the myisamchk source code, see line 961 and then line 1054

if (param->testflag & (T_REP_ANY | T_SORT_RECORDS | T_SORT_INDEX))
...
else if ((param->testflag & T_CHECK) || !(param->testflag & T_AUTO_INC))
...

If param->testflag is either T_SORT_RECORDS or T_SORT_INDEX, then the else if block that handles the T_CHECK / analyze is never execute.

Possible Fixes

  1. Update the documentation to note  the --sort-index and --analyze options cannot be ran together.
  2. Update myisamchk to ignore and show a warning that both options cannot be ran together.
  3. Update myisamchk to allow for --sort-index and --analyze to be executed together.