Quality — 2004-01-15 afternoon

Slides available from the agenda.

Quality — Steve Hicks

UML Diagrams

Using sequence diagrams for synchronization problems.

Borland Together Control Center: Generated sequence diagrams don’t show sync. Too detailed, etc.

Class diagram quite useful. Can add notes to sequence diagrams. Lots of messages to self which confuses things.

Synchronization

Deadlock problems

hard to test for
hard to diagnose
hard to reproduce: in testing

OptimizeIt helps by checking the order of monitor acqusition. Highlights problem if one thread gets monitors A,B and another gets monitors B,A. Work out if there are any circular dependencies.

Is it possible to collect this data (from Registry and Schema) and analyse it offline? This would need to be on a testbed in order to provide useful data and because of what it does it would reduce performance.

Aside: OptimizeIt includes unit test coverage tests.

Paper about visualising deadlock problems with Java in UML. Requires that you know where the deadlock problem is going to be in order to draw the diagram.

Include monitor as a seperate object.

A lot of design problems with using synchroization. There are existing deadlock detection algorithms but it’s probably better to work on a better design.

Diagrams don’t immediately reveal deadlock.

Monitor stereotype is not standard UML. Use of monitor lifeline to represent length of time for which lock is held.

Wait-for graph can show cycles. Generated from code.

There are problems with resource reservation — not very efficient.

Concentrate on threads holding more than one monitor and threads holding a monitor while making a remote call.

Redesign

Classes too big and not specialised enough. There’s only one package. Why would it be better with more packages? Easier to model. Easier to install things seperately. Better boundaries. But there’s no analogue to C++ friends so more public methods. But user API should have less: just ResultSet, ResultSetMetaData, ServletConnection, etc. Smaller packages will make things easier to understand. May need common package but this does not have to be distributed to end-users.

Some classes have too many dependencies: use too many other classes. A more hierarchical class inclusion structure will make testing easier.

Create UML diagrams as a target for redesign. The move to another design is risky. Only safe if there’s some good testing. Hard to test a distributed system with unit tests.

Algorithm for LatestProducer was not tested originally. Should someone else write the tests to avoid bias? This could avoid assumptions of original developers.

Use assertions to show invariants, design by contract. Check if possible to disable assertions at run or compile time.

Should do some form of code review.

Tools, procedures and quality — Steve Fisher

CVS

New CVS. Should be web service compatible.

Most applications don’t have much code dealing with the Grid. Backwards compatibility may be lost. Also want other customers outside LCG.

Aside: ATF proposal for GLUE will remove lots of attributes.

Porting to other systems (e.g. RGMA config) may cause problems. Cern looking to Solaris. Windows is quite popular, too. Possibly just API/client side on Windows. Moving towards web services will make it easier to run on Tomcat/Axis on Windows.

What is the position on support for old R-GMA? Just bugfixing; done by people developing new system.

Somewhere for the old code to live. Not clear if it will stay at Marianne, or get moved to sourceforge.

We want a clean break, but don’t want to spend months without working code.

Tracking

EGEE will use bugzilla. Use bugzilla just for bugs and enhancements.

Thinking of aceproject for general tracking. Allows users to show how far along they are in a project. $48 per month to use their server, or $1000 for 15 user server. More continuous than previous systems.

Working releases

Smaller classes makes distributed development easier. Reduces conflicts checking in CVS. Checkout seperately for different features. Allows you to check in individually.

Shouldn’t just tag HEAD. Want to pick and choose features to tag/release. Don’t want to check out things with sticky tags.

Other VCSs have been considered but people are familiar with CVS. Possible to run something on top of CVS.

Branch to test features and fix things in that branch and HEAD. This way people can check into HEAD frequently. But we lose control and when we want to release there’s lots of junk in HEAD. Isn’t this a problem with control of checkin and what people are working on?

Other things get checked in at the same time, but if there is some unconsidered coupling there may be problems.

Work shouldn’t be done outside the “current feature” and should be left until this is checked in. If there are multiple changes to be done then they should be done to complete that feature.

If we branch for a feature then others can still checkin to head. Otherwise people can’t check in while testing goes on.

People can do unit tests before checking in, but how could we do integration testing?

CruiseControl could compile and install onto the testbed. Jason does test by installing the .jar files onto the system.

Use virtual machines to have a virtual grid. But vmware is pretty inefficient. User-mode linux is also a possibility. Might be nice to have a few spare machines. During a proper release, use the full WP3 testbed. Rest of the time divide between testers. Made simple to reconfigure. Developer builds RPMs and installs. Run CMS test, etc. on these machines.

How can we test a change in Schema, say. Allocate testbed to one tester at a time.

Complete normal installation + easy for user to make changes. How is this system testif we can never verify it’s not broken. How long does the system run?

Should the work of coding/testing/integration/debugging be done by one person or have a coder/tester, another integrator, another one debugging?

System tests should be varied jobs. 1000s of short jobs to hit the registry.

Resilience testing framework: Takes a lot of effort to monitor. Bring it to working state with Steve F’s original design. Facility we’re not currently using. Want it in place before we begin reengineering. Need to automate testing — this was in the original design in Steve’s head. Do operations in sequence (remove network, power) in all possible combinations. Then do it with two machines. Need to have indicator “is it working?” One big logical AND: If there’s a FALSE in there it’s failed. Current framework not extensible: Steve’s design doesn’t have to change with distributed registry and schema.

How to prove the system is working: wait till it’s up and run all the tests? Too time-consuming. Run simple tests of comms between consumer and producer, etc. Generate a random network and attack it randomly. Needs to have at least one each of consumer, producer, registry and schema.

Test has to be tailored to what is specifically broken? No, if a producer and consumer are created they have to go through the registry and the schema. One aim is to provoke deadlocks. Also to test that systems come back up, null pointers don’t appear. Deadlocks more rare, and more likely to happen when trashing the system.

Cronjob could kill off services overnight and see what’s recovered in the morning. SSH to remote machines and tell e.g. tomcat to restart. Needs to be run on every release candidate. When something goes wrong checkpoint state.

Auditing and Metrics

Indirect arbitary measures of quality. Treat with suspicion any outliers (e.g. class that’s 4x bigger than the next). This is unjustifiable! Mistake to think unit tests test everything. Can’t work out coverage in terms of code path.

Worth having regular code reviews? We spend time looking at each others code. But you don’t get feedback this way. Not doing XP joint ownership of code. With new structures it should become easier to do review. Have buddy review code for a particular area. Some automatic system to do a review with each co/ci.

It has been found that code review concentrated on silly problems (indentation, etc.), design and only very rarely bad coding practices.

Design review is important. Need design docs. Design should be changed if needed. Code should be change if agreed and design is right. Design must be lightweight to be maintainable.

If you can’t understand the code then that’s a bad coding practice!

Do we need manual audits? Are people writing unit tests, documenting. Check that each class file has a corresponding test. Have test reviews to check that test cover important things.

One person can do code, design and testing review.

Quality — Some Ideas — Linda Cornwall

Software engineering quality. Generally accepted definition is “fitness for a purpose”.

What Is Not Quality (perhaps: Quality is more than…)

Complying with standards but standards may help to develop a quality system. Maybe standards compliance is necessary but not sufficient. Need criteria to assess quality.

Testing is another necessary but not sufficient condition for quality.

Both of these are tests for fitness for a purpose. And are standards fit for their purpose?

Fitness for purpose is totally unmeasurable! but very good!

What is Quality for R-GMA

Reliability: Consistency with valid input Robustness: sensible behaviour in the face of invalid input. Can’t be robust but not reliable. Currently not fully reliable and robust.

Maintainability: code for future developers. documentation. Code layout, comments are important. Architecture document last updated 9 months ago — not maintained.

Usability: docs improving. Installation guide needs lots of work. Problem with BrowserServlet: “text appears distorted” to Linda. R-GMA web page needs work. Needs to be more vibrant. Add links to take users to WP3 web site. “Follow the Rabbit!” If a programmer can’t use R-GMA without WP3 support that’s quite serious. Are there useful posters and slide shows? Some simple coding examples. Get dissemination team to look at this.

Flexibility: no, should do one thing well. Shouldn’t write functionality that might be needed. Is it designed in a way that’s extensible? Partly linked to maintainability. Suitable for wide range of monitoring? But doesn’t do more than it needs. It allows you to define your own Schema. This was a use case. Canonical producer also important for this. No improvement necessary in this area!

Security

Known holes. MySQL user name and password in CVS. But lots of other holes. Not the most serious problem if configured according to instructions. DoS should be easy.

Authentication should be turned on: limit who can DoS us! Need authorization. Doesn’t meet requirements for WP10.

But such security isn’t necessarily a requirement for most of the users. Are we talking about record-level security?

Security group in EGEE is also procedural and operational.

Scalability and Performance:

Could compare designs on limits of performance. Check if it’s fit for the purpose of being a Grid monitoring system. There maybe a minimum required speed.

If we don’t know the purpose of the project then we can’t have fitness for the purpose!

Re-engineering

Write down detailed requirements? No, spend time on the interfaces. Requirements aren’t going to change. New requirements such as QoS per tuple. Just a list of requirements to be sure we don’t miss anything. James’s OGSA document would be a good basis.

Most work is changing the internals. Not introducing new functionality. Doing guided refactoring. Limit on what changes to avoid going outside boundaries.

Architecture document needs to be updated.

Requirements should be testable with acceptance tests. New requirements might come from EGEE and from Logging. None of requirements were user requirements as we guessed. Reverse engineer from what people are using.

Analysing the Design

UML diagrams should be looked at together. Formal methods: don’t have the skills or the time. Need to prioritise the important areas so we have a working system when it’s required.

Aside: need temporary CVS until the official one is activated.

Stability

Core function code base shouldn’t change to much. What’s core? Gout/Gin aren’t core. Interfaces should be stable.

Recommendation

Analyse the design to “ensure that all situations are dealt with”. Might need a modelling tool. Effectively, animate the class diagram.

Formal Methods: CSP is a bit too abstract. Could use VDM. Or perhaps VHDL/Verilog. Perhaps talk to formal methodists or electronic engineers. Can find CSP tools to simulate these things — FDR. Won’t handle complex “realistic” behaviour.

R-GMA won’t get used if it’s not reliable/robust and secure. Re-engineering is not going to happen in one go. Gradual re-factoring.

Functionality hasn’t changed for months. But code will change. Might move more towards optimising performance, scaling, etc. After re-factoring, code won’t change much ⇒ end of project.

If OGSA stuff is accepted then we’ll have more work…

Written at 10:41 on 2004-01-19. Published to /grid/edg/wp3-rgma/ral-january-2004/2004-01-15-afternoon.html.